Two Stage Automatic Speech Recognition System With Large Language Model

Project Overview

Developed by the Dain team at Huazhong University of Science and Technology, this project aims to integrate large language models (LLMs) into ASR systems to improve recognition accuracy. Note that the current work is based on the framework of existing datasets and requires pre-recognized text files.

Workflow Diagram

Usage Guide

Install Dependencies

pip install -r requirements.txt

Configuration File Settings

Select and Edit Configuration File: Choose the appropriate configuration file in the config directory based on the API provider.

Fill in API Information: Example:

# Example: gpt_conformer.yaml
provider: openai
model: gpt-3.5-turbo
api_key: YOUR_API_KEY
base_url: YOUR_BASE_URL

# Example: azure_conformer.yaml
provider: azure
model: gpt4o
api_key: YOUR_API_KEY
base_url: YOUR_AZURE_ENDPOINT
api_version: YOUR_API_VERSION

Specify Text Path and Label Information:

path:
    example1: "path_to_text1"
    example2: "path_to_text2"
text: "path_to_label"

Set Parameters: Adjust parameters based on the task language.

combination_num: 30 # Number of sentences per request
thread_num: 100 # Number of concurrent threads
temperature: 0.2
max_repeat_times: 1 # Number of repeat inquiries
top_p: 0.8

Add Prompt: Add as needed in the configuration file. Refer to PromptList
```
prompt: "Your custom prompt here"
```
Configuration File Path: Specify the configuration file path in main.py.
```
with open("path_to_your_config", 'r') as f:
```
Run the Program: Execute the command and generate the result directory.
```
python main.py
```

Contents of Result Directory

config: The configuration file used.
diff: Details of text modifications.
err: Error examples.
response: Responses from the LLM.
skips: Texts skipped based on filter settings.
text: Modified texts.
total: Statistical results.
wer: Word error rate files.
wrong_sentence: List of incorrect sentences.

Notes

Tokens: Using the Deepseekv2 model, the token consumption on the AISHELL-1 and Librispeech datasets is 250k(input)+260k(output) and 640k(input)+650k(output) respectively, with the combination numbers set to 30 and 10.
You can see the historical results in the result directory

Testing LLM Capabilities

Fill in the config/test/Chinese/gpt.yaml configuration file.
Run the test script:
```
python tools/test_model_capability.py
```
Results will be printed directly and need to be recorded manually. The testing process may take a long time without multithreading.

You can also fill in a new test

Result

The original model below is U2++ Conforemr

AISHELL-1

Two-stage	Decode	Chunk	Temp	Top p	Task num	Changed Sentence	Error Sentence	CER(%)	Changed
-	attention	full	-	-	-	-	2650	5.06	-
-	attention rescore	full	-	-	-	-	2493	4.62	-
-	ctc greedy search	full	-	-	-	-	2810	5.17	-
-	ctc prefix search	full	-	-	-	-	2810	5.17	-
deepseekv2	attention	full	0.2	0.8	20	1568	2365	4.69	-0.37(7.3%)
deepseekv2	attention rescore	full	0.2	0.8	20	1451	2189	4.21	-0.41(8.8%)
deepseekv2	ctc greedy search	full	0.2	0.8	20	1892	2331	4.51	-0.66(12.7%)
deepseekv2	ctc prefix search	full	0.2	0.8	20	1860	2324	4.48	-0.69(13%)
gpt-3.5-turbo	attention	full	0.8	0.8	20	595	2651	5.09	+0.03(0.5%)
gpt-3.5-turbo	attention rescore	full	0.8	0.8	20	568	2502	4.69	+0.07(1.5%)
gpt-3.5-turbo	ctc greedy search	full	0.8	0.8	20	519	2798	5.20	+0.03(0.6%)
gpt-3.5-turbo	ctc prefix search	full	0.8	0.8	20	433	2785	5.21	+0.04(0.7%)
gpt-4o	attention	full	0.2	0.8	20	1715	2295	4.32	-0.74(14.6%)
gpt-4o	attention rescore	full	0.2	0.8	20	1736	2153	4.01	-0.61(13%)
gpt-4o	ctc greedy search	full	0.2	0.8	20	2075	2272	4.06	-1.11(21%)
gpt-4o	ctc prefix search	full	0.2	0.8	20	2137	2234	4.06	-1.11(21%)

AISHELL-2

Two-stage	Decode	Chunk	Temp	Top p	Task num	Changed Sentence	Error Sentence	CER(%)	Change
-	attention rescore	16	-	-	-	-	1715	5.57	-
GPT4o	attention rescore	16	0.8	0.2	20	896	1517	4.95	-0.62(11%)
deepseekv2	attention rescore	16	0.8	0.8	20	223	1510	6.09	+0.52(9%)
GPT3.5-turbo	attention rescore	16	0.8	0.4	20	389	1715	5.65	+0.08(1.5%)

Librispeech

Two-stage	Decode	Chunk	Temp	Top p	Task num	test-clean		test-other
						WER(%)	Change	WER(%)	Change
-	attention	full	-	-		3.82	8.79	-
-	attention rescore	full	-	-		3.35	8.77	-
-	ctc greedy search	full	-	-		3.77	9.52	-
-	ctc prefix search	full	-	-		3.75	9.50	-
deepseekv2	attention	full	0.8	0.8	10	4.58	+0.76(19%)	10.11	+1.32(-15%)
deepseekv2	attention rescore	full	0.8	0.8	10	4.21	+0.86(25%)	10.42	+1.65(18%)
deepseekv2	ctc greedy search	full	0.8	0.8	10	4.45	+0.68(18%)	10.20	+0.68(7%)
deepseekv2	ctc prefix search	full	0.8	0.8	10	4.93	+1.18(31%)	9.97	+0.47(4.9%)
gpt-4o	attention	full	0.8	0.8	10	3.64	-0.18(4.7%)	8.32	-0.47(5.3%)
gpt-4o	attention rescore	full	0.8	0.8	10	3.19	-0.16(4.7%)	8.38	-0.39(4.4%)
gpt-4o	ctc greedy search	full	0.8	0.8	10	3.43	-0.34(9%)	8.45	-1.07(11.2%)
gpt-4o	ctc prefix search	full	0.8	0.8	10	3.58	-0.17(4.5%)	8.41	-1.09(11.4%)

Acknowledgments

Thanks to wenet for providing tools and pre-trained models.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
data		data
embedding		embedding
result		result
test		test
tools		tools
LICENSE		LICENSE
README.md		README.md
README_Chinise.md		README_Chinise.md
main.py		main.py
requirements.txt		requirements.txt
two_stage_recognition.png		two_stage_recognition.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Two Stage Automatic Speech Recognition System With Large Language Model

Project Overview

Workflow Diagram

Usage Guide

Install Dependencies

Configuration File Settings

Contents of Result Directory

Notes

Testing LLM Capabilities

Result

AISHELL-1

AISHELL-2

Librispeech

Acknowledgments

Citation

About

Releases

Packages

Languages

License

teamtee/LLM-ASR-Error-Correction

Folders and files

Latest commit

History

Repository files navigation

Two Stage Automatic Speech Recognition System With Large Language Model

Project Overview

Workflow Diagram

Usage Guide

Install Dependencies

Configuration File Settings

Contents of Result Directory

Notes

Testing LLM Capabilities

Result

AISHELL-1

AISHELL-2

Librispeech

Acknowledgments

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages