Skip to content

训练新语言(how to train the models with other languages)

RVC-Boss edited this page Jul 17, 2024 · 3 revisions

1、目前底模只见过中日英(未来会有中日英韩),因此如果需要训练新语言,需要有建议至少100h的新语言的训练数据,因为是底模,所以最好多一点

1、At least 100 hours new languages training datasets.

2、需要自备文本前端代码

2、text cleaner codes with new languages are needed

(1)https://github.com/RVC-Boss/GPT-SoVITS/blob/main/GPT_SoVITS/text/symbols.py 需要加上新语言的音素符号

You need add the symbols the new languages use here.

(2)需要自备一个新语言的g2p函数

G2p function of the new languages is needed.

https://github.com/RVC-Boss/GPT-SoVITS/blob/main/GPT_SoVITS/text/cleaner.py#L3

https://github.com/RVC-Boss/GPT-SoVITS/blob/main/GPT_SoVITS/text/cleaner.py#L22

3、如果基于现有底模微调新语言,训练数据量可以酌情降低一点用底模其他数据带;但要注意由于symbols数量变了加载模型text embedding会丢,不过现在的代码也是支持的,你也可以微操底模的权重给他shape加上去。

参考:https://huggingface.co/AkitoP/GPT-SoVITS-JA-ProsodyControl_model/blob/main/insert_symbol.ipynb