Watermarking for Voice Cloning

This repo has two experiments, one for voice cloning and the other for multiple watermarks. Voice Cloning is in main branch and multiple watermarks is in multiwm branch.

Introduce

This work tries to add a watermark into audio, then if a voice cloning model like VALL-E-X uses the audio to generate another audio, we can detect a watermark from the generated audio. The model we designed is a plugin model and is decoupled from any voice cloning model, so you can train it with other voice cloning models easily and adopt it in these voice cloning models. In this way, we can mitigate the societal risks like voice scams.

Our model refers to Wavmark. However, we modified it and replaced the attack module with a voice cloning module, especially VALL-E-X. We also wrote training code to train the model referring to Pixinwav. Since official VALL-E-X isn't open-source, we adopted another available VALL-E-X implementation.

Model Architecture

Our model

Similar to Wavmark, the architecture is shown in Figure 1. We use VALL-E-X to transform the watermarked audio and try to detect the watermark in the generated audio.

Figure.1

VALL-E-X

This model is a TTS model (Figure 2, from VALL-E-X paper) which needs a clip of source audio (audio to be cloned), a source text (the transcript of the source audio), and a target text (the transcript of generated audio) to generate audio that has the same tone as yours.

Figure.2

Training Strategy

Since it is a plugin model, it doesn't need to care about the implementation of voice cloning models. Therefore, as shown in Figure 1, we adopted a gradient skip connection to skip the gradient calculation in the voice cloning models.

We used a pre-trained VALL-E-X and froze its parameters.

Repository Outline

Important directory

src/:
- main.py: entry point for training, only the following args are unavailable
  - dtw, stft_small, ft_container, thet, mp_encoder, mp_decoder, mp_join permutation, embed, luma
- train.py: training code
- umodel.py: the model class
- loader.py: dataset class
- preprocessor.py: to build the dataset for training based on LibriSpeech
scripts/: for training
- run_train_multiwm.sh: training model above
- run_train_multiwm_share.sh: encoder and decoder have shared parameters
watermark_voice_clone.ipynb: for testing code in a more clear way
VALL-E-X's code:
- data/
- models/
- modules/
- utils/
- macros.py

Installation

Refer to Pixinwav, Wavmark and Plachtaa's VALL-E-X. You can also use envirnoment.yaml to install dependencies.

This project runs in Python 3.10. We use wandb to log the training progress.

Pay attention: whisper needs ffmpeg to read audio files, if you don't have it and cannot install it, you can replace the original code with torchaudio.load()

Usage

Dataset

This work processed LibriSpeech dataset and used a customized dataset. You can use src/preprocessor.py to build the dataset.

Training

Replace variables in .env to set up your default path.

Run scripts/run_trainVC.sh to train.

Thanks

Wavmark: Model's architecture is based on Wavmark.
Pixinwav: Training code refers to Pixinwav.
Plachtaa's VALL-E-X: VALL-E-X code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Watermarking for Voice Cloning

Introduce

Model Architecture

Our model

VALL-E-X

Training Strategy

Repository Outline

Installation

Usage

Dataset

Training

Thanks

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.assets/voicecloning		.assets/voicecloning
data		data
models		models
modules		modules
scripts		scripts
src		src
utils		utils
watermark_utils		watermark_utils
.env		.env
.gitignore		.gitignore
README.md		README.md
envirnoment.yaml		envirnoment.yaml
macros.py		macros.py
requirements.txt		requirements.txt
watermark_voice_clone.ipynb		watermark_voice_clone.ipynb

ObsisMc/VoiceCloningWatermark

Folders and files

Latest commit

History

Repository files navigation

Watermarking for Voice Cloning

Introduce

Model Architecture

Our model

VALL-E-X

Training Strategy

Repository Outline

Installation

Usage

Dataset

Training

Thanks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages