Different from VISinger, It is just VITS without MAS and DurationPredictor.
作为一个用于学习的项目,就这样了:Pitch的预测是需要改进的地方
Pitch and Duration will be developed as add-on!
- 1 下载数据 segments.zip,并解压
segments
|-- test.txt
|-- train.txt
|-- transcriptions.txt
`-- wavs
|-- 2001000001.wav
|-- 2001000002.wav
|-- 2001000003.wav
- 2 转换采样率: 本项目采用32KHz
python util/resample.py -w segments/wavs/ -o data_svs/wavs -s 32000
- 3 生成数据标注
python util/generate_label.py --config configs/singing_base.yaml --data data_svs/ --file segments/transcriptions.txt
data_svs/labels.txt,内容格式:wave path|label path|score path|pitch path|slurs path
- 3 划分训练索引
python util/generate_label.py --file data_svs/labels.txt
生成 filelists/singing_train.txt 和 filelists/singing_valid.txt
- 4 启动训练
python svs_train.py -c configs/singing_base.yaml -n vits_svs
- 5 训练Pitch
python pit_train.py -c configs/singing_base.yaml -n pitch
- 0 模型导出
python svs_export.py --config configs/singing_base.yaml --model chkpt/vits_svs/vits_svs_****.pt
- 1 推理验证: F0根据乐谱生成
python svs_infer.py --config configs/singing_base.yaml --model svs_opencpop.pt
- 2 完整歌曲合成(使用release模型)
python svs_song.py --config configs/singing_base.yaml --model svs_opencpop.pt
- 0 模型导出
python svs_export.py --config configs/singing_base.yaml --model chkpt/vits_svs/vits_svs_****.pt
python pit_export.py --config configs/singing_base.yaml --model chkpt/pitch/pitch_****.pt
- 1 推理验证
python svs_infer_pitch.py --config configs/singing_base.yaml --model svs_opencpop.pt --pitch pit_opencpop.pt
- 2 完整歌曲合成(使用release模型)
python svs_song_pitch.py --config configs/singing_base.yaml --model svs_opencpop.pt --pitch pit_opencpop.pt
https://wenet.org.cn/opencpop/
https://github.com/SJTMusicTeam/Muskits
https://github.com/MoonInTheRiver/DiffSinger
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
https://github.com/NVIDIA/BigVGAN
https://github.com/jaywalnut310/vits
https://github.com/mindslab-ai/univnet
https://github.com/PlayVoice/so-vits-svc-5.0
https://github.com/shivammehta25/Matcha-TTS
RoFormer: Enhanced Transformer with rotary position embedding
https://github.com/thuhcsi/DiffVar
https://github.com/hayeong0/Diff-HierVC
https://github.com/tonnetonne814/SiFi-VITS2-44100-Ja
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech