Finetune Stable Diffusion

base model，e.g.：

git lfs clone https://huggingface.co/runwayml/stable-diffusion-v1-5
or
git lfs clone https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/fp16

Dockerfile

./dk/Dockerfile

Dreambooth

Dreambooth 原理：

Finetune Stable Diffusion: Dreambooth

微调运行：

sh ./dk/train.sh dreambooth

推理运行：

  base_model = './weights/sd1-5_dreambooth/'
  
  docker run --rm -it -d --gpus '"device=3"' --name inference_diff\
                   -v /data/wgs/finetune_stable_diffusion:/home \
                   wgs-torch:control_diffusion \
                   sh -c "python -u /home/inference.py 1>>/home/log/inference.log 2>>/home/log/inference.err"

脚本参数说明：

--MODEL_NAME：base model
--INSTANCE_DIR：微调数据集
--CLASS_DIR：用以先验损失的图像，通常为 num_epochs * num_samples ，num_class_images设置生成的数量，通常200-300
--OUTPUT_DIR：模型输出路径
--with_prior_preservation，--prior_loss_weight=1.0：分别是使用先验知识保留和先验损失权重
# 如果你的数据样本比较少，那么可以使用这两个参数，可以提升训练效果，还可以防止过拟合（即生成的图片与训练的图片相似度过高）
--instance_prompt：微调注入新的概念prompt（注意，e.g.：kejisks dog）
--class_prompt：先验prompt，使用同一类的其他图像作为训练过程的一部分（注意，e.g.：dog）
--revision：精度
--mixed_precision：混合精度
--resolution：input feature map size
--gradient_accumulation_steps：梯度积累步骤
----lr_scheduler：可选项有constant, linear, cosine, cosine_with_restarts, cosine_with_hard_restarts
# 学习率调整策略，一般是constant，即不调整
--lr_warmup_steps，如果你使用的是constant，那么这个参数可以忽略，
# 如果使用其他的，那么这个参数可以设置为0，即不使用warmup
# 也可以设置为其他的值，比如1000，即在前1000个step中，学习率从0慢慢增加到learning_rate的值
# 一般不需要设置, 除非你的数据集很大，训练收敛很慢
--num_class_images：调用模型生成图像，提示词为 class_prompt，保存位置为 CLASS_DIR
--max_train_steps：训练的最大步数 max_train_steps  e.g.: len(imgs) * 100
--checkpointing_steps：多少steps保存ckpt
--use_8bit_adam：使用8bit
--enable_xformers_memory_efficient_attention：使用xformers

微调效果测试

测试：注入概念【柯基犬 --> kejisks】

instance_prompt="kejisks dog"
class_prompt="dog"

prompt = "a photo of kejisks dog"

prompt = "a photo of dog"

融入其它语境测试

prompt = "A kejisks dog with Eiffel Tower in the background"

prompt = "A kejisks dog in the mountain Fuji"

prompt = "There's a kejisks dog on the beach"

prompt = "A kejisks dog in the woods"

prompt = "A kejisks dog in the car"

prompt = "The kejisks dog is running in the water"

prompt = "A kejisks dog is playing with a yellow ball on the lawn"

prompt = "A kejisks dog with glasses"

prompt = "The child of a kejisks dog and a tiger"

prompt = "A kejisks dog is smile on the snowstorm"

通用能力测试

prompts = ['heavy traffic on the street', 
					 "Cute squirrel", 
					 "Church in the snow",
           "Ancient style moon lantern flower tree", 
           "Chicken in the woods", 
           "The worker under the umbrella"]

显存占用

float16 + mixed precision
8bit
xformers

Steps: 100%|██████████| 800/800 [15:54<00:00,  1.19s/it, loss=0.272, lr=1e-6]

测试中遇到的问题

结论先行：

只是更换新注入主题的背景，效果还可以，上述图例可证；
要对新注入主题本身进行合成，效果不理想，下述图例可证；
如果该主题对应类的合成效果不好，则该主题合成效果也不好，下述图例可证；

原因分析：

base model 学到的先验知识不足以与新注入的主题合成我们想要的东西；
对该主题过拟合，不足以泛化合成到我的想要的东西；
如果该主题的训练样本数不足，模型没有学到该主题的多样性知识，不能很好的与先验知识合成；

可尝试方案：

对于 1.：更换baseline；
对于 2. 3.：丰富该主题训练样本的多样性、合适的steps及lr、dreambooth+Lora；

测试出现问题e.g.：

prompts = ["A kejisks dog chewing on a bone", 
					 "A dog chewing on a bone"]

prompts = ["A kejisks dog with a helmet and a cape", 
					 "A dog with a helmet and a cape"]

prompt = ["A kejisks dog is diving", 
          "A dog is diving"]

LoRA

LoRA 原理

Finetune LLMs: LoRA

微调运行

sh ./dk/train.sh lora

推理运行

base_model = './stable-diffusion-v1-5/'
lora_path = './weights/sd1-5_lora/'

docker run --rm -it -d --gpus '"device=3"' --name inference_diff\
                   -v /data/wgs/finetune_stable_diffusion:/home \
                   wgs-torch:control_diffusion \
                   sh -c "python -u /home/inference.py 1>>/home/log/inference.log 2>>/home/log/inference.err"

##微调效果测试

训练 prompt：

Yang Mi
summer palace in spring
A good-looking girl in the virtual world
Zibo Barbecue

原始效果：

微调效果：

prompt = "Yang Mi"
# 学习人物需要更多的、多角度的清晰样本及epoch

prompt = "summer palace in spring"
# 样本质量和数量、epoch要多，佛香阁的细节才能学得更好。不过这前景都学到了确实是我没想到的。

prompt = "A good-looking girl in the virtual world"
# 我直接在C站上找的样本，图片质量高模型学的也好。

prompt = "Zibo Barbecue"
# 网友上传的照片都太杂了，有的还比较模糊，能学成这样已经不错了

通用能力测试

prompts = ['heavy traffic on the street', 
					 "Cute squirrel", 
					 "Church in the snow",
           "Ancient style moon lantern flower tree", 
           "Chicken in the woods", 
           "The worker under the umbrella"]

显存占用

微调后 LoRA 的权重大小仅 3.2M；
“虚拟女孩”、“淄博烧烤” 训练：
- batch_size=1
- mixed precision fp16
- 梯度累积=4
- epoch=2000

Steps: 100%|██████████| 58000/58000 [43:45:04<00:00,  2.72s/it, lr=8.54e-5, step_loss=0.467]

“杨幂”、“颐和园” 训练：
- batch_size=1
- mixed precision fp16
- 梯度累积=4
- epoch=3000

Steps: 100%|██████████| 78000/78000 [60:10:59<00:00,  2.78s/it, lr=8.54e-5, step_loss=0.494]

测试中遇到的问题

如果训练人物，应该：

像素质量高；
脸部最好无遮挡；
多角度照片，侧脸、本身、全身；；
样本量多；

我在网上找到杨幂，不是特别清楚，所以学起来需要更多的epoch和样本量，否则脸部会变形：

这些是20张杨幂的图片，epoch是2000的结果。上面的例子是50张杨幂照片、epoch是3000的结果，效果挺明显的。

如果训练风景：

像素质量高；
长焦照片必须要清楚、且样本量要多；
照片要聚焦到想要生成的风景上；
样本量多；

我在网上找的颐和园，也有一些不是很清楚的，长焦下的佛香阁背景虚化明显，导致学习效果不好，想要生成佛香阁+昆明湖的照片的话，需要更多的佛香阁样本量，否则佛香阁就不是很清楚：

Dreambooth + LoRA

ControlNet

Tools

如果有 torch 和 torchvision 版本不匹配，参考：https://pytorch.org/get-started/previous-versions/
Error caught was: No module named 'triton'，参考：https://blog.csdn.net/qq_42363032/article/details/130006537

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
diffusers		diffusers
dk		dk
doc		doc
examples		examples
lora		lora
tools		tools
wandb		wandb
.dockerignore		.dockerignore
README.md		README.md
ckpt2hf.py		ckpt2hf.py
inference.py		inference.py
train_dreambooth.py		train_dreambooth.py
train_text_to_image_lora.py		train_text_to_image_lora.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finetune Stable Diffusion

目录

Dockerfile

Dreambooth

微调效果测试

融入其它语境测试

通用能力测试

显存占用

测试中遇到的问题

LoRA

通用能力测试

显存占用

测试中遇到的问题

Dreambooth + LoRA

ControlNet

Tools

About

Releases

Packages

Languages

WGS-note/finetune_stable_diffusion

Folders and files

Latest commit

History

Repository files navigation

Finetune Stable Diffusion

目录

Dockerfile

Dreambooth

微调效果测试

融入其它语境测试

通用能力测试

显存占用

测试中遇到的问题

LoRA

通用能力测试

显存占用

测试中遇到的问题

Dreambooth + LoRA

ControlNet

Tools

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages