Dissecting Image Generation with Stable Diffusion and ControlNet

For my detailed observations and analysis follow this word document - Dissecting Image Generation.docx

Project Overview

This project focuses on image generation using Stable Diffusion and ControlNet, guided by depth maps and Canny edges. The objective is to critique various conditioning techniques (depth maps, Canny edges) to produce the best possible output images. Additionally, this project explores the impact of different aspect ratios and generation latency.

Setup and Installation

Requirements

Python 3.9 or later
PyTorch 1.11.0+
Transformers (for ControlNet)
Diffusers
OpenCV
Matplotlib
Skimage

# Install PyTorch and torchvision (for GPU version, make sure CUDA is installed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install Diffusers for Stable Diffusion and ControlNet
pip install diffusers transformers accelerate
pip install timm

# Install PIL (Pillow) for image processing
pip install pillow
pip install diffusers transformers accelerate

# Install OpenCV for Canny edge detection
pip install opencv-python

# Install Matplotlib (optional, for plotting images)
pip install matplotlib

Models and Checkpoints

ControlNet Model: lllyasviel/control_v11f1p_sd15_depth
Stable Diffusion Checkpoint: runwayml/stable-diffusion-v1-5

Tasks

Task 1: Generating the Best Output Images

For this task, I used the provided depth maps and applied various conditioning techniques such as Canny edges to enhance the output. I experimented with different configurations to generate the "best" possible images.

Depth Map Only

Depth Map and Canny Edges Overlay

Depth Map + Canny Edges

Generated image using Colored canny images

Observations Across 25, 50, and 100 Steps

The number of inference steps significantly impacts both the quality and time taken to generate images.
25 steps: Provided faster results, but the image quality was lower than 50 or 100 steps.
50 steps: Achieved a balance between speed and image quality.
100 steps: Produced the most detailed images but required much longer generation times.

Example output images at 25, 50, and 100 steps:

Image generated in 25, 50, 100 steps

Prompt = "beautiful landscape, mountains in the background." Prompt = "luxurious bedroom interior." Prompt = "room with chair." Prompt = "house in the forest."

Task 2: Aspect Ratio Analysis

In this task, I explored the impact of aspect ratio on image quality by generating images in 1:1 and 4:3 aspect ratios.

Resized vs Cropped Images

The depth map image nocrop.png was resized to both 1:1 and 4:3 aspect ratios.
I also cropped the original image to these aspect ratios to compare the visual differences between resizing and cropping.

Observations

1:1 Aspect Ratio: Maintains a balanced composition, but resizing may lead to distortion in some regions.
4:3 Aspect Ratio: Provides a wider field of view but introduces some stretching when resized. Cropping yielded better results for preserving the visual quality.

Example images:

Resized to 1:1 aspect ratio and 4:3 aspect ratio
Cropped to 1:1 aspect ratio and 4:3 aspect ratio

Task 3: Generation Latency Analysis

This task evaluates the time taken to generate images and explores ways to reduce latency.

Observations on Latency

25 steps: Faster but lower-quality images.
50 steps: Provides a balance between speed and quality.
100 steps: Best image quality, but the generation time is significantly longer.
Prompt = "Majestic mountains at dusk, the peaks glowing in the setting sun, with a calm lake reflecting the sky and trees scattered across a serene valley."

Optimization Techniques

Model Quantization: By converting the model to INT8 precision, we can speed up inference without significantly compromising image quality.
Scheduler Tuning: We experimented with different schedulers (DDIM, LMS, Euler) to reduce inference time.
Low-Resolution Images: Reducing image resolution (e.g., 256x256) can decrease the overall generation time.

Example of latency results:

Image generation (25 steps) took 5.42 seconds.
Image generation (50 steps) took 10.82 seconds.
Image generation (100 steps) took 20.57 seconds.

Generated depths are the same as the input depths.

Results

Task 1 Results

Depth Map vs Depth Map + Canny Edges:
- The combination of depth maps and Canny edges provides sharper, more detailed images compared to using depth maps alone.
Inference Steps:
- Higher inference steps (50 or 100) provide better quality, but with a significant increase in generation time.

Task 2 Results

Aspect Ratio Differences:
- 1:1 vs 4:3 aspect ratios produced different compositions. The 1:1 aspect ratio provided a more balanced image, while 4:3 gave a broader view.
Resized vs Cropped:
- Cropped images maintained visual quality better than resized images.

Task 3 Results

Latency Optimization:
- Reducing Inference steps and reducing image resolution helped reduce generation time, with a slight impact on image quality.

Conclusion

The project demonstrates how depth maps and Canny edges can be effectively used to guide image generation with Stable Diffusion and ControlNet. Higher inference steps produce better quality images, but they also significantly increase the generation time. Using techniques like INT8 quantization and reducing image resolution can optimize the image generation process while still maintaining acceptable quality. Resizing and cropping images to different aspect ratios also provided interesting insights on composition and quality.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Images		Images
No Crop		No Crop
.gitattributes		.gitattributes
ReadMe.md		ReadMe.md
Setup.py		Setup.py
Suggested Prompts according to depth maps		Suggested Prompts according to depth maps
main.py		main.py
quantization.py		quantization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dissecting Image Generation with Stable Diffusion and ControlNet

For my detailed observations and analysis follow this word document - Dissecting Image Generation.docx

Project Overview

Table of Contents

Setup and Installation

Requirements

Models and Checkpoints

Tasks

Task 1: Generating the Best Output Images

Depth Map Only

Depth Map and Canny Edges Overlay

Depth Map + Canny Edges

Generated image using Colored canny images

Observations Across 25, 50, and 100 Steps

Example output images at 25, 50, and 100 steps:

Task 2: Aspect Ratio Analysis

Resized vs Cropped Images

Observations

Example images:

Task 3: Generation Latency Analysis

Observations on Latency

Prompt = "Majestic mountains at dusk, the peaks glowing in the setting sun, with a calm lake reflecting the sky and trees scattered across a serene valley."

Optimization Techniques

Example of latency results:

Generated depths are the same as the input depths.

Results

Task 1 Results

Task 2 Results

Task 3 Results

Conclusion

About

Releases

Packages

Languages

harshmorya/Assignment__HB1--1

Folders and files

Latest commit

History

Repository files navigation

Dissecting Image Generation with Stable Diffusion and ControlNet

For my detailed observations and analysis follow this word document - Dissecting Image Generation.docx

Project Overview

Table of Contents

Setup and Installation

Requirements

Models and Checkpoints

Tasks

Task 1: Generating the Best Output Images

Depth Map Only

Depth Map and Canny Edges Overlay

Depth Map + Canny Edges

Generated image using Colored canny images

Observations Across 25, 50, and 100 Steps

Example output images at 25, 50, and 100 steps:

Task 2: Aspect Ratio Analysis

Resized vs Cropped Images

Observations

Example images:

Task 3: Generation Latency Analysis

Observations on Latency

Prompt = "Majestic mountains at dusk, the peaks glowing in the setting sun, with a calm lake reflecting the sky and trees scattered across a serene valley."

Optimization Techniques

Example of latency results:

Generated depths are the same as the input depths.

Results

Task 1 Results

Task 2 Results

Task 3 Results

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages