IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Introduction

we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. Moreover, the image prompt can also work well with the text prompt to accomplish multimodal image generation.

Release

[2023/8/30] 🔥 Add an IP-Adapter with face image as prompt. The demo is here.
[2023/8/29] 🔥 Release the training code.
[2023/8/23] 🔥 Add code and models of IP-Adapter with fine-grained features. The demo is here.
[2023/8/18] 🔥 Add code and models for SDXL 1.0. The demo is here.
[2023/8/16] 🔥 We release the code and models.

Dependencies

diffusers >= 0.19.3

Download Models

you can download models from here. To run the demo, you should also download the following models:

How to Use

ip_adapter_demo: image variations, image-to-image, and inpainting with image prompt.

ip_adapter_controlnet_demo: structural generation with image prompt.

ip_adapter_multimodal_prompts_demo: generation with multimodal prompts.

ip_adapter-plus_demo: the demo of IP-Adapter with fine-grained features.

ip_adapter-plus-face_demo: generation with face image as prompt.

Best Practice

If you only use the image prompt, you can set the scale=1.0 and text_prompt=""(or some generic text prompts, e.g. "best quality", you can also use any negative text prompt). If you lower the scale, more diverse images can be generated, but they may not be as consistent with the image prompt.
For multimodal prompts, you can adjust the scale to get the best results. In most cases, setting scale=0.5 can get good results. For the version of SD 1.5, we recommend using community models to generate good images.

How to Train

For training, you should install accelerate and make your own dataset into a json file.

accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \
  tutorial_train.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \
  --image_encoder_path="{image_encoder_path}" \
  --data_json_file="{data.json}" \
  --data_root_path="{image_path}" \
  --mixed_precision="fp16" \
  --resolution=512 \
  --train_batch_size=8 \
  --dataloader_num_workers=4 \
  --learning_rate=1e-04 \
  --weight_decay=0.01 \
  --output_dir="{output_dir}" \
  --save_steps=10000

Citation

If you find IP-Adapter useful for your research and applications, please cite using this BibTeX:

@article{ye2023ip-adapter,
  title={IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models},
  author={Ye, Hu and Zhang, Jun and Liu, Sibo and Han, Xiao and Yang, Wei},
  booktitle={arXiv preprint arxiv:2308.06721},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
ip_adapter		ip_adapter
LICENSE		LICENSE
README.md		README.md
ip_adapter-plus-face_demo.ipynb		ip_adapter-plus-face_demo.ipynb
ip_adapter-plus_demo.ipynb		ip_adapter-plus_demo.ipynb
ip_adapter_controlnet_demo.ipynb		ip_adapter_controlnet_demo.ipynb
ip_adapter_demo.ipynb		ip_adapter_demo.ipynb
ip_adapter_multimodal_prompts_demo.ipynb		ip_adapter_multimodal_prompts_demo.ipynb
ip_adapter_sdxl_demo.ipynb		ip_adapter_sdxl_demo.ipynb
tutorial_train.py		tutorial_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Introduction

Release

Dependencies

Download Models

How to Use

How to Train

Citation

About

Releases

Packages

Languages

License

DarqueLilly/IP-Adapter

Folders and files

Latest commit

History

Repository files navigation

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Introduction

Release

Dependencies

Download Models

How to Use

How to Train

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages