{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# SuaraKami\n",
"\n",
"Heavily exported from https://github.com/redapesolutions/suara-kami-community, Malay Speech-to-Text developed by https://github.com/khursani8"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install necessary requirements\n",
"\n",
"```bash\n",
"pip3 install onnxruntime librosa\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from malaysia_ai_projects import suarakami"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## List available models"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Size (MB) | \n",
" WER | \n",
" WER-LM | \n",
" CER | \n",
" CER-LM | \n",
" Entropy | \n",
" Language | \n",
"
\n",
" \n",
" \n",
" \n",
" small-conformer | \n",
" 60.3 | \n",
" 0.239 | \n",
" 0.14 | \n",
" 0.11 | \n",
" 0.03 | \n",
" 0.6 | \n",
" [malay] | \n",
"
\n",
" \n",
" tiny-conformer | \n",
" 17.9 | \n",
" 0.4 | \n",
" None | \n",
" 0.11 | \n",
" None | \n",
" 0.5 | \n",
" [malay] | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Size (MB) WER WER-LM CER CER-LM Entropy Language\n",
"small-conformer 60.3 0.239 0.14 0.11 0.03 0.6 [malay]\n",
"tiny-conformer 17.9 0.4 None 0.11 None 0.5 [malay]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"suarakami.available_model()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## List available language models"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Size (MB) | \n",
"
\n",
" \n",
" \n",
" \n",
" v1-lm | \n",
" 846 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Size (MB)\n",
"v1-lm 846"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"suarakami.available_lm()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load model\n",
"\n",
"```python\n",
"def load(model: str = 'small-conformer', lm: str = None):\n",
" \"\"\"\n",
" Load suarakami model.\n",
"\n",
" Parameters\n",
" ----------\n",
" model : str, optional (default='small-conformer')\n",
" Model architecture supported. Allowed values:\n",
"\n",
" * ``'small-conformer'`` - Small Conformer model.\n",
"\n",
" lm: str, optional (default=None)\n",
" Language Model supported. Allowed values:\n",
"\n",
" * ``None`` - No Language Model will use.\n",
" * ``'v1-lm'`` - Will use V1 Language Model, size ~800 MB.\n",
"\n",
" Returns\n",
" -------\n",
" result : malaysia_ai_projects.suarakami.Model class\n",
" \"\"\"\n",
"```\n",
"\n",
"If you are going to load language model, make sure you already installed the dependencies,\n",
"\n",
"```bash\n",
"pip3 install pyctcdecode pypi-kenlm\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"model = suarakami.load()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"model_with_lm = suarakami.load(lm = 'v1-lm')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Predict\n",
"\n",
"```python\n",
"def predict(self, input: np.array):\n",
" \"\"\"\n",
" Parameters\n",
" ----------\n",
" input: np.array\n",
" np.array, must in 16k rate, prefer from `librosa.load(file,16_000)`.\n",
"\n",
" Returns\n",
" -------\n",
" result: text, entropy, timesteps\n",
" \"\"\"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I am going to download few samples from https://github.com/huseinzol05/malaya-speech"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# !wget https://raw.githubusercontent.com/huseinzol05/malaya-speech/master/speech/example-speaker/husein-zolkepli.wav\n",
"# !wget https://raw.githubusercontent.com/huseinzol05/malaya-speech/master/speech/khutbah/wadi-annuar.wav"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5.6306875"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import librosa\n",
"\n",
"sr = 16000\n",
"y = librosa.load('husein-zolkepli.wav', sr)[0]\n",
"len(y) / sr"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10.0"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y2 = librosa.load('wadi-annuar.wav', sr)[0]\n",
"len(y2) / sr"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('testing nama saya hussin binzo kaple', -5691390.5, [0])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.predict(y)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('testing nama saya hussin binzokaple',\n",
" [-99643376.0, -244839264.0, -389759456.0, -2680290.0, -1222767.5],\n",
" [('testing', 1.01, 1.05),\n",
" ('nama', 2.03, 2.05),\n",
" ('saya', 2.05, 3.01),\n",
" ('hussin', 3.01, 3.04),\n",
" ('binzokaple', 3.05, 4.05)])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_with_lm.predict(y)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('jadi dalam perjalanan ini dunia yang susah ini ketika nabi mengajar muas bin jabar tadi ini allah ini',\n",
" -6861158.5,\n",
" [0])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.predict(y2)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('jadi dalam perjalanan ini dunia yang susah ini ketika nabi mengajar muasbinjabar tadi ni allah ini',\n",
" [-18959840.0,\n",
" -79510024.0,\n",
" -626076864.0,\n",
" -52262396.0,\n",
" -21833328.0,\n",
" -105376016.0,\n",
" -130774848.0,\n",
" -20116550.0,\n",
" -147432608.0,\n",
" -2211711.0,\n",
" -376740736.0,\n",
" -8059082.5,\n",
" -8033139.0,\n",
" -21874408.0,\n",
" -2780910.25,\n",
" -391667.3125],\n",
" [('jadi', 0.01, 0.02),\n",
" ('dalam', 0.03, 0.05),\n",
" ('perjalanan', 0.05, 1.03),\n",
" ('ini', 1.04, 1.04),\n",
" ('dunia', 2.02, 2.04),\n",
" ('yang', 2.04, 2.05),\n",
" ('susah', 2.06, 3.02),\n",
" ('ini', 3.02, 3.03),\n",
" ('ketika', 5.03, 5.05),\n",
" ('nabi', 6.0, 6.02),\n",
" ('mengajar', 6.02, 6.05),\n",
" ('muasbinjabar', 6.05, 7.05),\n",
" ('tadi', 7.05, 8.0),\n",
" ('ni', 8.01, 8.02),\n",
" ('allah', 8.02, 8.05),\n",
" ('ini', 9.03, 9.04)])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_with_lm.predict(y2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}