Skip to content

step by step guide to train new voice

Srikanth Ronanki edited this page Oct 13, 2016 · 3 revisions

Data Preparation

  1. Chose the vocoder:
    (https://github.com/CSTR-Edinburgh/merlin/tree/master/misc/scripts/vocoder)
    a) STRAIGHT - extracts 60-dim MGC, 25-dim BAP, 1-dim LF0
    b) WORLD - extracts 60-dim MGC, variable-dim BAP, 1-dim LF0
    - BAP dim (1 for 16Khz, 5 for 48Khz)
    c) WORLD_v2 - extracts 60-dim MGC, 5-dim BAP, 1-dim LF0

  2. For vocoder WORLD, use the below script to extract features:
    https://github.com/CSTR-Edinburgh/merlin/blob/master/misc/scripts/vocoder/world/extract_features_for_merlin.sh

You have to configure the paths to the audio directory and output feature directory. Also, set sampling frequency as per the database you use.

  1. To derive labels, use alignment scripts provided below:
    a) state_align - https://github.com/CSTR-Edinburgh/merlin/tree/master/misc/scripts/alignment/state_align
    b) phone_align - https://github.com/CSTR-Edinburgh/merlin/tree/master/misc/scripts/alignment/phone_align

Model building

  1. Create configuration files.

  2. Run duration model and acoustic model scripts.

  3. Use merlin synthesis to generate new sentences.

Clone this wiki locally