[Paper] [Project Page] [Demo]
Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu
The Hong Kong University of Science and Technology
cartoon.mp4
celeb.mp4
🚩 Updates
- 🔥🔥✅ Add SPADE model, which produces more natural results.
- Python >= 3.7 (Recommend to use Anaconda or Miniconda)
- PyTorch >= 1.7
- Option: NVIDIA GPU + CUDA
- Option: Linux
We now provide a clean version of DaGAN, which does not require customized CUDA extensions.
-
Clone repo
git clone https://github.com/harlanhong/CVPR2022-DaGAN.git cd CVPR2022-DaGAN
-
Install dependent packages
pip install -r requirements.txt ## Install the Face Alignment lib cd face-alignment pip install -r requirements.txt python setup.py install
We take the paper version for an example. More models can be found here.
See config/vox-adv-256.yaml
to get description of each parameter.
The pre-trained checkpoint of face depth network and our DaGAN checkpoints can be found under following link: OneDrive.
Inference! To run a demo, download checkpoint and run the following command:
CUDA_VISIBLE_DEVICES=0 python demo.py --config config/vox-adv-256.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator
The result will be stored in result.mp4
. The driving videos and source images should be cropped before it can be used in our method. To obtain some semi-automatic crop suggestions you can use python crop-video.py --inp some_youtube_video.mp4
. It will generate commands for crops using ffmpeg.
- VoxCeleb. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.
To train a model on specific dataset run:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --master_addr="0.0.0.0" --master_port=12348 run.py --config config/vox-adv-256.yaml --name DaGAN --rgbd --batchsize 12 --kp_num 15 --generator DepthAwareGenerator
The code will create a folder in the log directory (each run will create a new name-specific directory).
Checkpoints will be saved to this folder.
To check the loss values during training see log.txt
.
By default the batch size is tunned to run on 8 GeForce RTX 3090 gpu (You can obtain the best performance after about 150 epochs). You can change the batch size in the train_params in .yaml
file.
Also, you can watch the training loss by running the following command:
tensorboard --logdir log/DaGAN/log
When you kill your process for some reasons in the middle of training, a zombie process may occur, you can kill it using our provided tool:
python kill_port.py PORT
-
Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images. We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.
-
Create a folder
data/dataset_name
with 2 subfolderstrain
andtest
, put training videos in thetrain
and testing in thetest
. -
Create a config
config/dataset_name.yaml
, in dataset_params specify the root dir theroot_dir: data/dataset_name
. Also adjust the number of epoch in train_params.
Our DaGAN implementation is inspired by FOMM. We appreciate the authors of FOMM for making their codes available to public.
@inproceedings{hong2022depth,
title={Depth-Aware Generative Adversarial Network for Talking Head Video Generation},
author={Hong, Fa-Ting and Zhang, Longhao and Shen, Li and Xu, Dan},
journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
If you have any question, please email fhongac@cse.ust.hk
.