-
Notifications
You must be signed in to change notification settings - Fork 440
step by step guide to train new voice
Srikanth Ronanki edited this page Oct 13, 2016
·
3 revisions
-
Chose the vocoder:
(https://github.com/CSTR-Edinburgh/merlin/tree/master/misc/scripts/vocoder)
a) STRAIGHT - extracts 60-dim MGC, 25-dim BAP, 1-dim LF0
b) WORLD - extracts 60-dim MGC, variable-dim BAP, 1-dim LF0
- BAP dim (1 for 16Khz, 5 for 48Khz)
c) WORLD_v2 - extracts 60-dim MGC, 5-dim BAP, 1-dim LF0 -
For vocoder WORLD, use the below script to extract features:
https://github.com/CSTR-Edinburgh/merlin/blob/master/misc/scripts/vocoder/world/extract_features_for_merlin.sh
You have to configure the paths to the audio directory and output feature directory. Also, set sampling frequency as per the database you use.
- To derive labels, use alignment scripts provided below:
a) state_align - https://github.com/CSTR-Edinburgh/merlin/tree/master/misc/scripts/alignment/state_align
b) phone_align - https://github.com/CSTR-Edinburgh/merlin/tree/master/misc/scripts/alignment/phone_align
-
Create configuration files.
-
Run duration model and acoustic model scripts.
-
Use merlin synthesis to generate new sentences.