Out-of-the-Box Test Sets for Validating Thai Automatic Speech Recognition System

This repository contains a collection of alternative Thai ASR test sets that we use to evaluate our Thai ASR project. We think it might be useful for your Thai ASR project as well to alternatively use these test sets as out-of-the-box ones. All of the audios in this repo are in 1 channel WAV files, with 16000 samplerate.

Some of test sets are collected from Mozilla Common Voice TH and SER dataset from VISTEC-depa.

We also report word-error-rate results of each test set transcribed by our models. Output transcription of out model can be seen in output_samples folder. To keep WER consistent across test sets, we used ThaiNLP to tokenize both reference and output transcriptions before calculating WER by jiwer.

1. Download Test Sets

Test Set	Description	Source	License
CommonVoice2000	A collection of 2000 audios randomly selected from CommonVoiceTH	Mozilla Common Voice	CC0 1.0
Male_Voice	A collection of 44 audios from 22 male speakers speaking the same sentence	VISTEC-depa	CC BY-SA 4.0
Female_Voice	A collection of 44 audios from 22 female speakers speaking the same sentence as in Male_Voice.	VISTEC-depa	CC BY-SA 4.0
Piyabutr_Interview	A collection of 33 audios from one of Piyabutr’s interviews with somewhat noisy environment.
Piyabutr_with_Music_BG	A collection of 7 audios from a political advertisement clip, a male speaker (Piyabutr Saengkanokkul), with upbeat music on the background.
Obodroid	A collection of 9 audios with clear speech from a male speaker talking about Obodroid’s products.
Reporter	A collection of 19 audios from more than 15 Thai reporters. Each audio comes from difference news programs. Length of audios varies from 8 seconds to 1 minute. The topics cover daily news, weather, sport, politic etc.

2. Benchmarks

Model / WER(%)	CommonVoice2000	Reporter	Piyabutr_Interview	Piyabutr_with_Music_BG	MaleVoice	FemaleVoice	Obodroid
DeepSpeech 300 hrs	31.78	21.91	38.10	43.54	26.87	22.19	16.02
DeepSpeech 330 hrs	32.1	21.02	35.83	42.31	28.07	29.81	11.57

Numbers after the model indicate the size of (private) dataset which the model was trained on.
The models were decoding with external KenLM (trained on ThaiSum dataset).

3. Contributor

All test sets, except CommonVoice2000, were manually transcribed by Nakhun Chumpolsathien.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
manifest_files		manifest_files
output_samples		output_samples
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Out-of-the-Box Test Sets for Validating Thai Automatic Speech Recognition System

1. Download Test Sets

2. Benchmarks

3. Contributor

About

Releases

Packages

nakhunchumpolsathien/Thai-ASR-OutOfTheBox-Test-Set

Folders and files

Latest commit

History

Repository files navigation

Out-of-the-Box Test Sets for Validating Thai Automatic Speech Recognition System

1. Download Test Sets

2. Benchmarks

3. Contributor

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages