SuaraKami¶
Heavily exported from https://github.com/redapesolutions/suara-kami-community, Malay Speech-to-Text developed by https://github.com/khursani8
Install necessary requirements¶
pip3 install onnxruntime librosa
[1]:
from malaysia_ai_projects import suarakami
List available models¶
[2]:
suarakami.available_model()
[2]:
Size (MB) | WER | WER-LM | CER | CER-LM | Entropy | Language | |
---|---|---|---|---|---|---|---|
small-conformer | 60.3 | 0.239 | 0.14 | 0.11 | 0.03 | 0.6 | [malay] |
tiny-conformer | 17.9 | 0.4 | None | 0.11 | None | 0.5 | [malay] |
List available language models¶
[3]:
suarakami.available_lm()
[3]:
Size (MB) | |
---|---|
v1-lm | 846 |
Load model¶
def load(model: str = 'small-conformer', lm: str = None):
"""
Load suarakami model.
Parameters
----------
model : str, optional (default='small-conformer')
Model architecture supported. Allowed values:
* ``'small-conformer'`` - Small Conformer model.
lm: str, optional (default=None)
Language Model supported. Allowed values:
* ``None`` - No Language Model will use.
* ``'v1-lm'`` - Will use V1 Language Model, size ~800 MB.
Returns
-------
result : malaysia_ai_projects.suarakami.Model class
"""
If you are going to load language model, make sure you already installed the dependencies,
pip3 install pyctcdecode pypi-kenlm
[4]:
model = suarakami.load()
[5]:
model_with_lm = suarakami.load(lm = 'v1-lm')
Predict¶
def predict(self, input: np.array):
"""
Parameters
----------
input: np.array
np.array, must in 16k rate, prefer from `librosa.load(file,16_000)`.
Returns
-------
result: text, entropy, timesteps
"""
I am going to download few samples from https://github.com/huseinzol05/malaya-speech
[6]:
# !wget https://raw.githubusercontent.com/huseinzol05/malaya-speech/master/speech/example-speaker/husein-zolkepli.wav
# !wget https://raw.githubusercontent.com/huseinzol05/malaya-speech/master/speech/khutbah/wadi-annuar.wav
[7]:
import librosa
sr = 16000
y = librosa.load('husein-zolkepli.wav', sr)[0]
len(y) / sr
[7]:
5.6306875
[8]:
y2 = librosa.load('wadi-annuar.wav', sr)[0]
len(y2) / sr
[8]:
10.0
[9]:
model.predict(y)
[9]:
('testing nama saya hussin binzo kaple', -5691390.5, [0])
[10]:
model_with_lm.predict(y)
[10]:
('testing nama saya hussin binzokaple',
[-99643376.0, -244839264.0, -389759456.0, -2680290.0, -1222767.5],
[('testing', 1.01, 1.05),
('nama', 2.03, 2.05),
('saya', 2.05, 3.01),
('hussin', 3.01, 3.04),
('binzokaple', 3.05, 4.05)])
[11]:
model.predict(y2)
[11]:
('jadi dalam perjalanan ini dunia yang susah ini ketika nabi mengajar muas bin jabar tadi ini allah ini',
-6861158.5,
[0])
[12]:
model_with_lm.predict(y2)
[12]:
('jadi dalam perjalanan ini dunia yang susah ini ketika nabi mengajar muasbinjabar tadi ni allah ini',
[-18959840.0,
-79510024.0,
-626076864.0,
-52262396.0,
-21833328.0,
-105376016.0,
-130774848.0,
-20116550.0,
-147432608.0,
-2211711.0,
-376740736.0,
-8059082.5,
-8033139.0,
-21874408.0,
-2780910.25,
-391667.3125],
[('jadi', 0.01, 0.02),
('dalam', 0.03, 0.05),
('perjalanan', 0.05, 1.03),
('ini', 1.04, 1.04),
('dunia', 2.02, 2.04),
('yang', 2.04, 2.05),
('susah', 2.06, 3.02),
('ini', 3.02, 3.03),
('ketika', 5.03, 5.05),
('nabi', 6.0, 6.02),
('mengajar', 6.02, 6.05),
('muasbinjabar', 6.05, 7.05),
('tadi', 7.05, 8.0),
('ni', 8.01, 8.02),
('allah', 8.02, 8.05),
('ini', 9.03, 9.04)])
[ ]: