Diffusion Curriculum (DisCL): Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion
Our approach is composed of two phases: (Phase 1) Interpolated Synthetic Generation and (Phase 2) Training with CL. In
Phase 1, we use a model pretrained on the original data to identify the ''hard'' samples, and generate data with a full
spectrum from synthetic to real images with various image guidance
conda create -n DisCL python=3.10
conda activate DisCL
pip3 install open_clip_torch
pip3 install wilds
pip3 install -r requirements.txt
We use two public datasets for training : ImageNet-LT and iWildCam.
- ImageNet-LT is a long-tailed subset of ImageNet data. Long-tailed meta information could be download from google drive.
- iWildCam is a image classification dataset captured by wildlife camera trap. It is release by WILDS and can be downloaded with its offical package.
- Prepare for a data csv file including hard samples
- Template of the csv file is shown in file sample.csv
- Use this csv to generate synthetic data with guidance scales & random seeds
python3 data_generation/iWildCam/gene_img.py --part=1 --total_parts=1 --data_csv="${PATH_TO_CSV}" --output_path="${OUTPUT_FOLDER}"
- Compute CLIPScore for filtering out poor-quality images.
python3 data_generation/iWildCam/comp_clip_scores.py --syn_path="${OUTPUT_FOLDER}" --real_path="${PATH_TO_WILDS}"
- Results 1 (clip_score.pkl): including the image-image similarity score and image-text similarity score
- Results 2 (filtered_results.pkl): including only the filtered image-image similarity score and image-text similarity score
- Prepare for a data csv file including hard samples
- Template of the csv file is shown in file sample.csv
- Use this csv to generate diversified text prompt for hard classes
python3 data_generation/ImageNet_LT/get_text_prompt.py --data_csv="${PATH_TO_CSV}" --prompt_json="${PATH_TO_PROMPT}"
- Use this csv to generate synthetic data with guidance scales & random seeds
python3 data_generation/ImageNet_LT/gene_img.py --part=1 --total_parts=1 --data_csv="${PATH_TO_CSV}" --output_path="${OUTPUT_FOLDER}" --prompt_json="${PATH_TO_PROMPT}"
- Compute CLIPScore for filtering out poor-quality images. This script will produce a clip_score.pkl including the
image-image similarity score and image-text similarity score
python3 data_generation/ImageNet_LT/comp_clip_scores.py --syn_path="${OUTPUT_FOLDER}" --real_path="${PATH_TO_INLT}"
- Results 1 (clip_score.pkl): including the image-image similarity score and image-text similarity score
- Results 2 (filtered_results.pkl): including only the filtered image-image similarity score and image-text similarity score
- Run training scripts run_training.sh
cd curriculum_training/ImageNet bash myshells/run_training.sh
- Run training scripts run_training.sh
cd curriculum_training/iWildCam bash myshells/run_training.sh
Our code is heavily based on FLYP, LDMLR, and Open CLIP. We greatly thank the authors for open-sourcing their code!
Please consider citing our paper if you think our codes, data, or models are useful. Thank you!
@inproceedings{liang-bhardwaj-zhou-2024-discl,
title = "Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion",
author = "Liang, Yijun and Bhardwaj, Shweta and Zhou, Tianyi",
booktitle = "arXiv:2410.xxxxx",
year = "2024",
}