SuaraKami

Heavily exported from https://github.com/redapesolutions/suara-kami-community, Malay Speech-to-Text developed by https://github.com/khursani8

Install necessary requirements

pip3 install onnxruntime librosa
[1]:
from malaysia_ai_projects import suarakami

List available models

[2]:
suarakami.available_model()
[2]:
Size (MB) WER WER-LM CER CER-LM Entropy Language
small-conformer 60.3 0.239 0.14 0.11 0.03 0.6 [malay]
tiny-conformer 17.9 0.4 None 0.11 None 0.5 [malay]

List available language models

[3]:
suarakami.available_lm()
[3]:
Size (MB)
v1-lm 846

Load model

def load(model: str = 'small-conformer', lm: str = None):
    """
    Load suarakami model.

    Parameters
    ----------
    model : str, optional (default='small-conformer')
        Model architecture supported. Allowed values:

        * ``'small-conformer'`` - Small Conformer model.

    lm: str, optional (default=None)
        Language Model supported. Allowed values:

        * ``None`` - No Language Model will use.
        * ``'v1-lm'`` - Will use V1 Language Model, size ~800 MB.

    Returns
    -------
    result : malaysia_ai_projects.suarakami.Model class
    """

If you are going to load language model, make sure you already installed the dependencies,

pip3 install pyctcdecode pypi-kenlm
[4]:
model = suarakami.load()
[5]:
model_with_lm = suarakami.load(lm = 'v1-lm')

Predict

def predict(self, input: np.array):
    """
    Parameters
    ----------
    input: np.array
        np.array, must in 16k rate, prefer from `librosa.load(file,16_000)`.

    Returns
    -------
    result: text, entropy, timesteps
    """

I am going to download few samples from https://github.com/huseinzol05/malaya-speech

[6]:
# !wget https://raw.githubusercontent.com/huseinzol05/malaya-speech/master/speech/example-speaker/husein-zolkepli.wav
# !wget https://raw.githubusercontent.com/huseinzol05/malaya-speech/master/speech/khutbah/wadi-annuar.wav
[7]:
import librosa

sr = 16000
y = librosa.load('husein-zolkepli.wav', sr)[0]
len(y) / sr
[7]:
5.6306875
[8]:
y2 = librosa.load('wadi-annuar.wav', sr)[0]
len(y2) / sr
[8]:
10.0
[9]:
model.predict(y)
[9]:
('testing nama saya hussin binzo kaple', -5691390.5, [0])
[10]:
model_with_lm.predict(y)
[10]:
('testing nama saya hussin binzokaple',
 [-99643376.0, -244839264.0, -389759456.0, -2680290.0, -1222767.5],
 [('testing', 1.01, 1.05),
  ('nama', 2.03, 2.05),
  ('saya', 2.05, 3.01),
  ('hussin', 3.01, 3.04),
  ('binzokaple', 3.05, 4.05)])
[11]:
model.predict(y2)
[11]:
('jadi dalam perjalanan ini dunia yang susah ini ketika nabi mengajar muas bin jabar tadi ini allah ini',
 -6861158.5,
 [0])
[12]:
model_with_lm.predict(y2)
[12]:
('jadi dalam perjalanan ini dunia yang susah ini ketika nabi mengajar muasbinjabar tadi ni allah ini',
 [-18959840.0,
  -79510024.0,
  -626076864.0,
  -52262396.0,
  -21833328.0,
  -105376016.0,
  -130774848.0,
  -20116550.0,
  -147432608.0,
  -2211711.0,
  -376740736.0,
  -8059082.5,
  -8033139.0,
  -21874408.0,
  -2780910.25,
  -391667.3125],
 [('jadi', 0.01, 0.02),
  ('dalam', 0.03, 0.05),
  ('perjalanan', 0.05, 1.03),
  ('ini', 1.04, 1.04),
  ('dunia', 2.02, 2.04),
  ('yang', 2.04, 2.05),
  ('susah', 2.06, 3.02),
  ('ini', 3.02, 3.03),
  ('ketika', 5.03, 5.05),
  ('nabi', 6.0, 6.02),
  ('mengajar', 6.02, 6.05),
  ('muasbinjabar', 6.05, 7.05),
  ('tadi', 7.05, 8.0),
  ('ni', 8.01, 8.02),
  ('allah', 8.02, 8.05),
  ('ini', 9.03, 9.04)])
[ ]: