-
Notifications
You must be signed in to change notification settings - Fork 3.8k
训练新语言(how to train the models with other languages)
RVC-Boss edited this page Jul 17, 2024
·
3 revisions
1、目前底模只见过中日英(未来会有中日英韩),因此如果需要训练新语言,需要有建议至少100h的新语言的训练数据,因为是底模,所以最好多一点
1、At least 100 hours new languages training datasets.
2、需要自备文本前端代码
2、text cleaner codes with new languages are needed
(1)https://github.com/RVC-Boss/GPT-SoVITS/blob/main/GPT_SoVITS/text/symbols.py 需要加上新语言的音素符号
You need add the symbols the new languages use here.
(2)需要自备一个新语言的g2p函数
G2p function of the new languages is needed.
https://github.com/RVC-Boss/GPT-SoVITS/blob/main/GPT_SoVITS/text/cleaner.py#L3
https://github.com/RVC-Boss/GPT-SoVITS/blob/main/GPT_SoVITS/text/cleaner.py#L22
3、如果基于现有底模微调新语言,训练数据量可以酌情降低一点用底模其他数据带;但要注意由于symbols数量变了加载模型text embedding会丢,不过现在的代码也是支持的,你也可以微操底模的权重给他shape加上去。
参考:https://huggingface.co/AkitoP/GPT-SoVITS-JA-ProsodyControl_model/blob/main/insert_symbol.ipynb