[ECCV 2022] TargetCLIP- Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer
This repository finds a global direction in StyleGAN's space to edit images according to a target image. We transfer the essence of a target image to any source image.
The notebook allows to use the directions on the sources presented in the examples. In addition, there's an option to edit your own inverted images with the pretrained directions, by uploading your latent vector to the dirs
folder.
We use images inverted by e4e.
NOTE: all the examples presented are available in our colab notebook. The recommended coefficient to use is between 0.5-1
The targets are plain images, that were not inverted, the direction optimization is initialized at random.
NOTE: for the joker, we use relatively large coefficients- 0.9-1.3
The targets are plain images that are out of the domain StyleGAN was trained on, the direction optimization is initialized at random.
The targets are inverted images, and the latents are used as initialization for the optimization.
First, please download all the pretrained weights for the experiments to the folder pretrained_models
. If you choose to save the pretrained weights in another path, please update the config file accordingly (configs/paths_config.py
).
Ours tests require downloading the pretrained StyleGAN2 weights, and the pretrained ArcFace weights. For our encoder finetuning and optimizer initialization, please download the e4e pretrained weights.
To enable alignment, run the following:
wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
bzip2 -dk shape_predictor_68_face_landmarks.dat.bz2
The targets for our celebrities test can be found here. To train the encoder, please download the CelebA-HQ dataset (both the test set and the train set), and for the FFHQ tests, download the FFHQ set as well, and extract the first 50 images from it.
Run the following command:
PYTHONPATH=`pwd` python optimization.py --target_path /path/to/target/image --output_folder path/to/optimizer/output --lambda_transfer 1 --weight_decay 3e-3 --lambda_consistency 0.5 --step 1000 --lr 0.2 --num_directions 1 --num_images 4
where num_directions
is the number of different directions you wish to train, and num_images
is the number of images to use in the consistency tests.
Use the random_initiate
parameter to initialize the direction randomly instead of the inversion of the target.
The result manipulations on the training sources, as well as the produced essence directions will be saved under output_folder
.
- Download ninja=1.10.0, using the following commands:
wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
sudo unzip ninja-linux.zip -d /usr/local/bin/
sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force
- Randomly select 200 images from the CelebsHQ train set and place them in:
data/celeba_minimized
. - Randomly select 50 images from the CelebsHQ test set and place them in:
data/data1024x1024/test
. - We train our encoder on 5 RTX 2080 Ti GPUs with 11 GB per each GPU. To train the encoder from scratch, run the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4 PYTHONPATH=`pwd` python scripts/train.py --exp_dir name/of/experiment/directory --lambda_consistency 0.5 --batch_size 1 --test_batch_size 1 --lambda_reg 3e-3 --checkpoint_path pretrained_models/e4e_ffhq_encode.pt --image_interval 1 --board_interval 5 --val_interval 31 --dataset_type celeba_encode_minimized --save_interval 200 --max_steps 3000
If you wish to train the encoder with a single GPU, please remove the use of DataParallel
in the coach file (training/coach
).
The best checkpoint will be saved to name/of/experiment/directory/checkpoints
.
Important: Please make sure to download the pretrained e4e weights before training in order to enable the finetuning.
- The latents for our 68 sources are saved under
pretrained_weights/celebs.pt
. - Use your method to produce a manipulation for each source, target, and save the manipulation results under a folder with the baseline name.
The naming convention our tests expect is:
{target_name}/{source_idx}.png
for example, the manipulation for ariel with source number 1 will be saved as:{baseline_name}/ariel/1.png
. - Produce results by running the following command:
PYTHONPATH=`pwd` python ./experiments/calc_metrics.py --style_img_path /path/to/target/images --manipulations_path /output/folder --input_img_path /path/to/source/images
where style_img_path
is the path to the target images, manipulations_path
is the path to the results of the manipulations, and input_img_path
is the path to the 68 source images.
Important: Please note that our optimizer also finds coefficients per source. In our experiments, we found that a 1.2 coefficient is usually the average coefficient for the targets, thus we used it for manipulation with new sources (for both celebrities and FFHQ experiments).
To run the FID test, follow these steps:
- Install the FID calculation package.
- Extract a random subset of size 7000 from the FFHQ test set.
- For each target name, the folder
{baseline}/target_name
needs to be compared to the subset of FFHQ:
python -m pytorch_fid --device cuda:{gpu_device} /path/to/FFHQ /outdir/target_name
- Calculate the average and standard deviation across the FID scores of all targets.
If you make use of our work, please cite our paper:
@article{chefer2021targetclip,
title={Image-Based CLIP-Guided Essence Transfer},
author={Chefer, Hila and Benaim, Sagie and Paiss, Roni and Wolf, Lior},
journal={arXiv preprint arXiv: 2110.12427},
year={2021}
}
The code in this repo draws from the StyleCLIP, e4e code bases.