StarGAN-Voice-Conversion-2

A pytorch implementation based on: StarGAN-VC2: https://arxiv.org/pdf/1907.12279.pdf.

Uses source and target domain codes in D but not G as I found better quality output
Doesnt make use of PS in G.

Installation

Tested on Python version 3.6.2 in a linux VM environment

Recommended to use a linux environment - not tested for mac or windows OS

Python

Create a new environment using Anaconda

conda create -n stargan-vc python=3.6.2

Install conda dependencies

conda install pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.1 -c pytorch
conda install pillow=5.4.1
conda install -c conda-forge librosa=0.6.1
conda install -c conda-forge tqdm=4.43.0

Intall dependencies not available through conda using pip

pip install pyworld=0.2.8
pip install mcd=0.4

NB: For mac users who cannot install pyworld see: https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder

Libraries

Install binaries
- SoX: https://sourceforge.net/projects/sox/files/sox/14.4.2/
- libsndfile: http://linuxfromscratch.org/blfs/view/svn/multimedia/libsndfile.html
- yasm: http://www.linuxfromscratch.org/blfs/view/svn/general/yasm.html
- ffmpeg: https://ffmpeg.org/download.html
- libav: https://libav.org/download/

Usage

Download Datasets

VCTK
VCC2018

Example with VCTK:

mkdir ../data/VCTK-Data
wget https://datashare.is.ed.ac.uk/bitstream/handle/10283/2651/VCTK-Corpus.zip?sequence=2&isAllowed=y
unzip VCTK-Corpus.zip -d ../data/VCTK-Data

If the downloaded VCTK is in tar.gz, run this:

tar -xzvf VCTK-Corpus.tar.gz -C ../data/VCTK-Data

Preprocessing data

We will use Mel-Cepstral coefficients(MCEPs) here.

Example script for VCTK data which we can resample to 22.05kHz. The VCTK dataset is not split into train and test wavs, so we perform a data split.

# VCTK-Data
python preprocess.py --perform_data_split y \
                     --resample_rate 22050 \
                     --origin_wavpath ../data/VCTK-Data/VCTK-Corpus/wav48 \
                     --target_wavpath ../data/VCTK-Data/VCTK-Corpus/wav22 \
                     --mc_dir_train ../data/VCTK-Data/mc/train \
                     --mc_dir_test ../data/VCTK-Data/mc/test \
                     --speakers p229 p232 p236 p243

Example Script for VCC2018 data which is already seperated into train and test wav folders and is already at 22.05kHz.

# VCC2018-Data
python preprocess.py --perform_data_splt n \
                     --target_wav_path_train ../data/VCC2018-Data/VCC2018-Corpus/wav22_train \
                     --target_wav_path_eval ../data/VCC2018-Data/VCC2018-Corpus/wav22_eval \
                     --mc_dir_train ../data/VCC2018-Data/mc/train \
                     --mc_dir_test ../data/VCC2018-Data/mc/test \
                     --speakers VCC2SF1 VCC2SF2 VCC2SM1 VCC2SM2

Training

Example script:

# example with VCTK
python main.py --train_data_dir ../data/VCTK-Data/mc/train \
               --test_data_dir ../data/VCTK-Data/mc/test \
               --wav_dir ../data/VCTK-Data/VCTK-Corpus/wav22 \
               --model_save_dir ./models/experiment_name \
               --sample_dir ./samples/experiment_name \
               --speakers p229 p232 p236 p243

If you encounter an error such as:

ImportError: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found

You may need to export export LD_LIBRARY_PATH: (See Stack Overflow)

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/<PATH>/<TO>/<YOUR>/.conda/envs/<ENV>/lib/

Conversion

For example: restore model at step 90000 and specify the speakers

# example with VCTK
python convert.py --resume_model 90000 \
                  --speakers p229 p232 p236 p243 \
                  --train_data_dir ../data/VCTK-Data/mc/train/ \
                  --test_data_dir ../data/VCTK-Data/mc/test/ \
                  --wav_dir ../data/VCTK-Data/VCTK-Corpus/wav22 \
                  --model_save_dir ./models/experiment_name \
                  --convert_dir ./converted/experiment_name

TODO:

Include converted samples
Include s-t loss like original paper (NB: not exactly the same, see top of this README)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
converted		converted
.gitignore		.gitignore
README.md		README.md
convert.py		convert.py
data_loader.py		data_loader.py
main.py		main.py
model.py		model.py
preprocess.py		preprocess.py
solver.py		solver.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StarGAN-Voice-Conversion-2

Installation

Python

Libraries

Usage

Download Datasets

Preprocessing data

Training

Conversion

TODO:

About

Releases

Packages

Languages

SamuelBroughton/StarGAN-Voice-Conversion-2

Folders and files

Latest commit

History

Repository files navigation

StarGAN-Voice-Conversion-2

Installation

Python

Libraries

Usage

Download Datasets

Preprocessing data

Training

Conversion

TODO:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages