{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SuaraKami\n", "\n", "Heavily exported from https://github.com/redapesolutions/suara-kami-community, Malay Speech-to-Text developed by https://github.com/khursani8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install necessary requirements\n", "\n", "```bash\n", "pip3 install onnxruntime librosa\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from malaysia_ai_projects import suarakami" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List available models" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Size (MB)WERWER-LMCERCER-LMEntropyLanguage
small-conformer60.30.2390.140.110.030.6[malay]
tiny-conformer17.90.4None0.11None0.5[malay]
\n", "
" ], "text/plain": [ " Size (MB) WER WER-LM CER CER-LM Entropy Language\n", "small-conformer 60.3 0.239 0.14 0.11 0.03 0.6 [malay]\n", "tiny-conformer 17.9 0.4 None 0.11 None 0.5 [malay]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "suarakami.available_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List available language models" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Size (MB)
v1-lm846
\n", "
" ], "text/plain": [ " Size (MB)\n", "v1-lm 846" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "suarakami.available_lm()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load model\n", "\n", "```python\n", "def load(model: str = 'small-conformer', lm: str = None):\n", " \"\"\"\n", " Load suarakami model.\n", "\n", " Parameters\n", " ----------\n", " model : str, optional (default='small-conformer')\n", " Model architecture supported. Allowed values:\n", "\n", " * ``'small-conformer'`` - Small Conformer model.\n", "\n", " lm: str, optional (default=None)\n", " Language Model supported. Allowed values:\n", "\n", " * ``None`` - No Language Model will use.\n", " * ``'v1-lm'`` - Will use V1 Language Model, size ~800 MB.\n", "\n", " Returns\n", " -------\n", " result : malaysia_ai_projects.suarakami.Model class\n", " \"\"\"\n", "```\n", "\n", "If you are going to load language model, make sure you already installed the dependencies,\n", "\n", "```bash\n", "pip3 install pyctcdecode pypi-kenlm\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "model = suarakami.load()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "model_with_lm = suarakami.load(lm = 'v1-lm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predict\n", "\n", "```python\n", "def predict(self, input: np.array):\n", " \"\"\"\n", " Parameters\n", " ----------\n", " input: np.array\n", " np.array, must in 16k rate, prefer from `librosa.load(file,16_000)`.\n", "\n", " Returns\n", " -------\n", " result: text, entropy, timesteps\n", " \"\"\"\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I am going to download few samples from https://github.com/huseinzol05/malaya-speech" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# !wget https://raw.githubusercontent.com/huseinzol05/malaya-speech/master/speech/example-speaker/husein-zolkepli.wav\n", "# !wget https://raw.githubusercontent.com/huseinzol05/malaya-speech/master/speech/khutbah/wadi-annuar.wav" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.6306875" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import librosa\n", "\n", "sr = 16000\n", "y = librosa.load('husein-zolkepli.wav', sr)[0]\n", "len(y) / sr" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y2 = librosa.load('wadi-annuar.wav', sr)[0]\n", "len(y2) / sr" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('testing nama saya hussin binzo kaple', -5691390.5, [0])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.predict(y)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('testing nama saya hussin binzokaple',\n", " [-99643376.0, -244839264.0, -389759456.0, -2680290.0, -1222767.5],\n", " [('testing', 1.01, 1.05),\n", " ('nama', 2.03, 2.05),\n", " ('saya', 2.05, 3.01),\n", " ('hussin', 3.01, 3.04),\n", " ('binzokaple', 3.05, 4.05)])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_with_lm.predict(y)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('jadi dalam perjalanan ini dunia yang susah ini ketika nabi mengajar muas bin jabar tadi ini allah ini',\n", " -6861158.5,\n", " [0])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.predict(y2)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('jadi dalam perjalanan ini dunia yang susah ini ketika nabi mengajar muasbinjabar tadi ni allah ini',\n", " [-18959840.0,\n", " -79510024.0,\n", " -626076864.0,\n", " -52262396.0,\n", " -21833328.0,\n", " -105376016.0,\n", " -130774848.0,\n", " -20116550.0,\n", " -147432608.0,\n", " -2211711.0,\n", " -376740736.0,\n", " -8059082.5,\n", " -8033139.0,\n", " -21874408.0,\n", " -2780910.25,\n", " -391667.3125],\n", " [('jadi', 0.01, 0.02),\n", " ('dalam', 0.03, 0.05),\n", " ('perjalanan', 0.05, 1.03),\n", " ('ini', 1.04, 1.04),\n", " ('dunia', 2.02, 2.04),\n", " ('yang', 2.04, 2.05),\n", " ('susah', 2.06, 3.02),\n", " ('ini', 3.02, 3.03),\n", " ('ketika', 5.03, 5.05),\n", " ('nabi', 6.0, 6.02),\n", " ('mengajar', 6.02, 6.05),\n", " ('muasbinjabar', 6.05, 7.05),\n", " ('tadi', 7.05, 8.0),\n", " ('ni', 8.01, 8.02),\n", " ('allah', 8.02, 8.05),\n", " ('ini', 9.03, 9.04)])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_with_lm.predict(y2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }