Silero Models

Silero Models

Silero Models

Silero Models: pre-trained enterprise-grade STT models and benchmarks. Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

No Kaldi;
No compilation;
No 20-step instructions;

Speech-To-Text

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following checkpoints:

	PyTorch	ONNX	TensorFlow	Quantization	Quality
English (`en_v2`)	✔️	✔️	✔️	⌛	link
German (`de_v1`)	✔️	✔️	✔️	⌛	link
Spanish (`es_v1`)	✔️	✔️	✔️	⌛	link
Ukrainian (`ua_v3`)	✔️	✔️	⌛	✔️	N/A

Dependencies

All examples:
- torch (used to clone the repo in tf and onnx examples)
- torchaudio
- soundfile
- omegaconf
Additional for ONNX examples:
- onnx
- onnxruntime
Additional for TensorFlow examples:
- tensorflow
- tensorflow_hub

Please see the provided Colab for details for each example below.

PyTorch

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav') 
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

Text-To-Speech

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following speakers:

Speaker	Stress	Language	SR	PyTorch
`aidar_8khz`	yes	`ru`	8000	✔️
`baya_8khz`	yes	`ru`	8000	✔️
`ksenia_8khz`	yes	`ru`	8000	✔️
`irina_8khz`	yes	`ru`	8000	✔️
`natasha_8khz`	yes	`ru`	8000	✔️
`ruslan_8khz`	yes	`ru`	8000	✔️
`lj_8khz`	no	`en`	8000	✔️
`thorsten_8khz`	no	`de`	8000	✔️
`gilles_8khz`	no	`fr`	8000	✔️
`tux_8khz`	no	`es`	8000	✔️
`aidar_16khz`	yes	`ru`	16000	✔️
`baya_16khz`	yes	`ru`	16000	✔️
`ksenia_16khz`	yes	`ru`	16000	✔️
`irina_16khz`	yes	`ru`	16000	✔️
`natasha_16khz`	yes	`ru`	16000	✔️
`ruslan_16khz`	yes	`ru`	16000	✔️
`lj_16khz`	no	`en`	16000	✔️
`thorsten_16khz`	no	`de`	16000	✔️
`gilles_16khz`	no	`fr`	16000	✔️
`tux_16khz`	no	`es`	16000	✔️

Dependencies

Basic dependencies (see colab):

torch
omegaconf
torchaudio (required only because models are hosted together with STT, not required for work)

PyTorch

Coming soon

import torch

language = 'ru'
speaker = 'kseniya_16khz'
device = torch.device('cpu')
model, symbols, sample_rate, example_text, apply_tts = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                                                      model='silero_tts',
                                                                      language=language,
                                                                      speaker=speaker)
model = model.to(device)  # gpu or cpu
audio = apply_tts(texts=[example_text],
                  model=model,
                  sample_rate=sample_rate,
                  symbols=symbols,
                  device=device)

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to this wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, read our news.

Commercial Inquiries

Please see our wiki and tiers for relevant information and email us.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
files		files
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
changelog.md		changelog.md
colab_utils.py		colab_utils.py
examples.ipynb		examples.ipynb
examples_tts.ipynb		examples_tts.ipynb
hubconf.py		hubconf.py
models.yml		models.yml
tts_utils.py		tts_utils.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Silero Models

Speech-To-Text

Dependencies

PyTorch

ONNX

TensorFlow

Text-To-Speech

Dependencies

PyTorch

FAQ

Wiki

Performance and Quality

Adding new Languages

Contact

Get in Touch

Commercial Inquiries

About

Releases

Packages

Languages

License

AigizK/silero-models

Folders and files

Latest commit

History

Repository files navigation

Silero Models

Speech-To-Text

Dependencies

PyTorch

ONNX

TensorFlow

Text-To-Speech

Dependencies

PyTorch

FAQ

Wiki

Performance and Quality

Adding new Languages

Contact

Get in Touch

Commercial Inquiries

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages