实时直播数字人 bilibili video
Audio Model training code released!Details can be found here.
Details on the render model training can be found here.
demo.mp4
This project is a real-time live streaming digital human powered by few-shot learning. It is designed to run smoothly on all 30 and 40 series graphics cards, ensuring a seamless and interactive live streaming experience.
- Real-time Performance: The digital human can interact in real-time with 25+ fps for common NVIDIA 30 and 40 series GPUs
- Few-shot Learning: The system is capable of learning from a few examples to generate realistic responses.
First, navigate to the checkpoint
directory and unzip the model file:
conda create -n dh_live python=3.12
conda activate dh_live
pip install torch --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
cd checkpoint
on Linux
cat render.pth.gz.001 render.pth.gz.002 > render.pth.gz
gzip -d -c render.pth.gz > render.pth
on Windows, use zip software such as 7zip/WinRAR to unzip checkpoint file.
Next, prepare your video using the data_preparation script. Replace YOUR_VIDEO_PATH with the path to your video:
python data_preparation.py YOUR_VIDEO_PATH
The result (video_info) will be stored in the ./video_data directory.
Run the demo script with an audio file. Make sure the audio file is in .wav format with a sample rate of 16kHz and 16-bit single channel. Replace video_data/test with the path to your video_info file, video_data/audio0.wav with the path to your audio file, and 1.mp4 with the desired output video path:
python demo.py video_data/test video_data/audio0.wav 1.mp4
For real-time operation using a microphone, simply run the following command:
python demo_avatar.py
We would like to thank the contributors of Wav2Lip, DINet, LiveSpeechPortrait repositories, for their open research and contributions.
This project is licensed under the MIT License.
For any questions or suggestions, please contact us at [kleinlee1@outlook.com].