Necessary libraries:
# install python libs
$ python3 -m pip install -r requirements.txt
# install cmake and sndfile lib
$ sudo apt install libsndfile1 cmake
(not necessary) If you want to prepare dataset, montreal-forced-aligner must be installed. (Some errors may occur during installation, please pay attention.)
$ bash scripts/install_mkl.sh
$ bash scripts/install_kaldi.sh
$ bash scripts/install_mfa.sh
Download pretrained model from Google Drive, unzip it, and put in ./pretrained_models/dgrad
.
Modify and run evaluate script bash evaluate.sh
.
Download VOCASET from https://voca.is.tue.mpg.de/ Unzip directories:
| VOCASET
-| unposedcleaneddata
-| sentencestext
-| templates
-| audio
Run the preload python script.
python3 -m saberspeech.datasets.voca.preload\
--source_root <ROOT_VOCASET> \
--output_root <ROOT_PROCESSED>
- dgrad
- offsets
- PCA of dgrad, offsets
@article{chai2022speech,
title={Speech-driven facial animation with spectral gathering and temporal attention},
author={Chai, Yujin and Weng, Yanlin and Wang, Lvdi and Zhou, Kun},
journal={Frontiers of Computer Science},
volume={16},
number={3},
pages={1--10},
year={2022},
publisher={Springer}
}