Yu Zhang, Ziyue Jiang, Ruiqi Li, Changhao Pan, Jinzheng He, Rongjie Huang, Chuxin Wang, Zhou Zhao | Zhejiang University
PyTorch Implementation of TCSinger (EMNLP 2024): Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control.
We provide our implementation and pre-trained models in this repository.
Visit our demo page for audio samples.
- 2024.09: We released the full dataset of GTSinger!
- 2024.09: TCSinger is accepted by EMNLP 2024!
- We present TCSinger, the first zero-shot SVS model for style transfer across cross-lingual speech and singing styles, along with multi-level style control. TCSinger excels in personalized and controllable SVS tasks.
- We introduce the clustering style encoder to extract styles, and the Style and Duration Language Model (S&D-LM) to predict both style information and phoneme duration, addressing style modeling, transfer, and control.
- We propose the style adaptive decoder to generate intricately detailed songs using a novel mel-style adaptive normalization method.
- Experimental results show that TCSinger surpasses baseline models in synthesis quality, singer similarity, and style controllability across various tasks: zero-shot style transfer, multi-level style control, cross-lingual style transfer, and speech-to-singing style transfer.
We provide an example of how you can generate high-fidelity samples using TCSinger.
To try on your own dataset or GTSinger, simply clone this repo in your local machine provided with NVIDIA GPU + CUDA cuDNN and follow the below instructions.
The code will come soon...
This implementation uses parts of the code from the following Github repos: NATSpeech, StyleSinger as described in our code.
If you find this code useful in your research, please cite our work:
@article{zhang2024tcsinger,
title={TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control},
author={Zhang, Yu and Jiang, Ziyue and Li, Ruiqi and Pan, Changhao and He, Jinzheng and Huang, Rongjie and Wang, Chuxin and Zhao, Zhou},
journal={arXiv preprint arXiv:2409.15977},
year={2024}
}
Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's singing without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.