Skip to content

Latest commit

 

History

History
708 lines (568 loc) · 60.1 KB

IMAGE_GEN.md

File metadata and controls

708 lines (568 loc) · 60.1 KB
Table of Contents

Glossary

for total newbies

  • Prompt: A simple (or complex!) text description that describes that the image portrays. This is affected by the prompt weight (see below).
  • txt2img (text-to-image): This is basically what we think of in terms of AI art: input a text prompt, generate an image.
  • Negative prompt: Anything you don’t want to see in the final image.
  • img2img: (image to image): Instead of generating a scene from scratch, you can upload an image and use that as inspiration for the output image. Want to turn your dog into a king? Upload the dog’s photo, then apply the AI art generation to the scene.
  • Model: AI uses different generative models (Stable Diffusion 1.5 or 2.1 are the most common, though there are many others like DALL-E 2 and Midjourney’s custom model) and each model will bring its own “look” to a scene. Experiment and see what works!
  • Prompt weight: How closely the model and image adheres to the prompt. This is one variable you may want to tweak on the sites that allow it. Simply put, a strong prompt weight won’t allow for much creativity by the AI algorithm, while a weak weight will.
  • Sampler: Nothing you probably need to worry about, though different samplers also affect the look of an image.
  • Steps: How many iterations an AI art generator will take to construct an image, generally improving the output. While many services will allow you to adjust this, a general rule of thumb is that anything over 50 steps offers diminishing improvements. One user uploaded a visual comparison of how steps and samples affect the resulting image.
  • Face fixing: Some sites offer the ability to “fix” faces using algorithms like GFPGAN, which can make portraits look more lifelike.
  • ControlNet: A new algorithm, and not widely used. ControlNet is specifically designed for image-to-image generation, “locking” aspects of the original image so they can’t be changed. If you have an image of a black cat and want to change it to a calico, ControlNet could be used to preserve the original pose, simply changing the color.
  • Upscaling: Default images are usually small, square, 1,024×1,024 images, though not always. Though upscaling often “costs” more in terms of time and computing resources, upscaling the image is one way to get a “big” image that you can use for other purposes besides just showing off to your friends on social media.
  • Inpainting: This is a rather interesting form of image editing. Inpainting is basically like Photoshop plus AI: you can take an image and highlight a specific area, and then alter that area using AI. (You can also edit everything but the highlighted area, alternatively.) Imagine uploading a photo of your father, “inpainting” the area where his hair is, and then adding a crown or a clown’s wig with AI.
  • Outpainting: This uses AI to expand the bounds of the scene. Imagine you just have a small photo, shot on a beach in Italy. You could use outpainting to “expand” the shot, adding more of the (AI-generated) beach, perhaps a few birds or a distant building. It’s not something you’d normally think of!

good reads

SD vs DallE vs MJ

July 2023: compare models: https://zoo.replicate.dev/

June 2023: https://news.ycombinator.com/item?id=36407272

DallE banned so SD https://twitter.com/almost_digital/status/1556216820788609025?s=20&t=GCU5prherJvKebRrv9urdw

https://i.redd.it/fqgv82ihav9a1.png but keep in mind that Dalle2 doesnt respond well to "photorealistic"

another comparison https://www.reddit.com/r/StableDiffusion/comments/zevuw2/a_simple_comparison_between_sd_15_20_21_and/

comparisons with other models https://www.reddit.com/r/StableDiffusion/comments/zlvrl6/i_tried_various_models_with_the_same_settings/

Lexica Aperture - finetuned version of SD https://lexica.art/aperture - fast - focused on photorealistic portraits and landscapes - negative prompting - dimensions

midjourney

Midjourney v5

nice trick to mix images https://twitter.com/javilopen/status/1613107083959738369

"midjourney style" - just feed "prompt" to it https://twitter.com/rainisto/status/1606221760189317122

or emojis: https://twitter.com/LinusEkenstam/status/1616841985599365120

DallE 3

DallE vs Imagen vs Parti architecture

DallE 3 writeup and links https://www.latent.space/p/sep-2023

DallE 3 paper and system card https://twitter.com/swyx/status/1715075287262597236

Runway Gen-1/2

usage example https://twitter.com/nickfloats/status/1639709828603084801?s=20

Gen1 explainer https://twitter.com/c_valenzuelab/status/1652282840971722754?s=20

other text to image models

Tooling

Misc

Products

product placement

Stable Diffusion prompts

The basic intuition of Stable Diffusion is that you have to add descriptors to get what you want.

From here:

"George Washington riding a Unicorn in Times Square"

image

George Washington riding a unicorn in Times Square, cinematic composition, concept art, digital illustration, detailed

image

Prompts might go in the form of

[Prefix] [Subject], [Enhancers]

Adding the right enhancers can really tweak the outcome:

image

SD v2 prompts

SD2 Prompt Book from Stability: https://stability.ai/sdv2-prompt-book

SD 1.4 vs 1.5 comparisons

Distilled Stable Diffusion

SD2 vs SD1 user notes

Hardware requirements

Stable Diffusion

stable diffusion specific notes

Required reading:

SD Distros

SD Major forks and UIs

Main Stable Diffusion repo: https://github.com/CompVis/stable-diffusion

OpenJourney: https://happyaccidents.ai/, https://www.bluewillow.ai/

an embedded version of SD named Tiny Dream: https://github.com/symisc/tiny-dream which let you generate high definition output images (2048x2048) in less than 10 seconds, and consumes less than 5GB per inference unlike this one which takes 11 hours to generate a 512x512 pixels output despite being memory efficient.

Name/Link Stars Description
AUTOMATIC1111 116000 The most well known Web UI, gradio based. features: https://github.com/AUTOMATIC1111/stable-diffusion-webui#features launch announcement https://www.reddit.com/r/StableDiffusion/comments/x28a76/stable_diffusion_web_ui/. M1 mac instructions https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon
Fooocus 33000 Fooocus is a rethinking of Stable Diffusion and Midjourney’s designs: Learned from Stable Diffusion, the software is offline, open source, and free. Learned from Midjourney, the manual tweaking is not needed, and users only need to focus on the prompts and images.
ComfyUI 29000 The up and comer GUI. features - flowchart https://github.com/comfyanonymous/ComfyUI#features See https://comfyworkflows.com/ for hosted site
easydiffusion 8500 "Easy Diffusion is easily my favorite UI". While it has a fraction of the features found in stable-diffusion-webui, it has the best out of the box UI I've tried so far.The way it enqueues tasks and renders the generated images beats anything I've seen in the various UIs I've played with. I also like that you can easily write plugins in Javascript, both for the UI and for server-side tweaks.
Disco Diffusion 7400 A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations.
sd-webui (formerly hlky fork) 6000 A fully-integrated and easy way to work with Stable Diffusion right from a browser window. Long list of UI and SD features (incl textual inversion, alternative samplers, prompt matrix): https://github.com/sd-webui/stable-diffusion-webui#project-features
InvokeAI (formerly lstein fork) 8800 This version of Stable Diffusion features a slick WebGUI, an interactive command-line script that combines text2img and img2img functionality in a "dream bot" style interface, and multiple features and other enhancements. It runs on Windows, Mac and Linux machines, with GPU cards with as little as 4 GB of RAM. Universal Canvas (see youtube)
XavierXiao/Dreambooth-Stable-Diffusion 4900 Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion. Dockerized: https://github.com/smy20011/dreambooth-docker
Basujindal: Optimized Stable Diffusion 2600 This repo is a modified version of the Stable Diffusion repo, optimized to use less VRAM than the original by sacrificing inference speed. img2img and txt2img and inpainting under 2.4GB VRAM
stablediffusion-infinity 2800 Outpainting with Stable Diffusion on an infinite canvas. This project mainly works as a proof of concept.
Waifu Diffusion (huggingface, replicate) 1600 stable diffusion finetuned on weeb stuff. "A model trained on danbooru (anime/manga drawing site with also lewds and nsfw on it) over 56k images.Produces FAR BETTER results if you're interested in getting manga and anime stuff out of stable diffusion."
AbdBarho/stable-diffusion-webui-docker 1600 Easy Docker setup for Stable Diffusion with both Automatic1111 and hlky UI included. HOWEVER - no mac support yet AbdBarho/stable-diffusion-webui-docker#35
fast-stable-diffusion 3200 +25-50% speed increase + memory efficient + DreamBooth
nolibox/carefree-creator 1800 An infinite draw board for you to save, review and edit all your creations. Almost EVERY feature about Stable Diffusion (txt2img, img2img, sketch2img, variations, outpainting, circular/tiling textures, sharing, ...). Many useful image editing methods (super resolution, inpainting, ...). Integrations of different Stable Diffusion versions (waifu diffusion, ...). GPU RAM optimizations, which makes it possible to enjoy these features with an NVIDIA GeForce GTX 1080 Ti! It might be fair to consider this as: An AI-powered, open source Figma. A more 'interactable' Hugging Face Space. A place where you can try all the exciting and cutting-edge models, together.
imaginAIry 🤖🧠 1600 Pythonic generation of stable diffusion images with just pip install imaginairy. "just works" on Linux and macOS(M1) (and maybe windows). Memory efficiency improvements, prompt-based editing, face enhancement, upscaling, tiled images, img2img, prompt matrices, prompt variables, BLIP image captions, comes with dockerfile/colab. Has unit tests.
neonsecret/stable-diffusion 582 This repo is a modified version of the Stable Diffusion repo, optimized to use less VRAM than the original by sacrificing inference speed. Also I invented the sliced atttention technique, which allows to push the model's abilities even further. It works by automatically determining the slice size from your vram and image size and then allocating it one by one accordingly. You can practically generate any image size, it just depends on the generation speed you are willing to sacrifice.
Deforum Stable Diffusion 591 Animating prompts with stable diffusion. Weighted Prompts, Perspective 2D Flipping, Dynamic Video Masking, Custom MATH expressions, Waifu and Robo Diffusion Models. twitter, changelog. replicate demo: https://replicate.com/deforum/deforum_stable_diffusion
Maple Diffusion 550 Maple Diffusion runs Stable Diffusion models locally on macOS / iOS devices, in Swift, using the MPSGraph framework (not Python). Matt Waller working on CoreML impl
Doggettx/stable-diffusion 158 Allows to use resolutions that require up to 64x more VRAM than possible on the default CompVis build.
Doohickey Diffusion 29 CLIP guidance, perceptual guidance, Perlin initial noise, and other features.

https://github.com/Filarius/stable-diffusion-webui/blob/master/scripts/vid2vid.py with Vid2Vid

akuma.ai https://x.com/AkumaAI_JP/status/1734899981583348067?s=20

Future Diffusion https://huggingface.co/nitrosocke/Future-Diffusion https://twitter.com/Nitrosocke/status/1599789199766716418

SD in Other languages

Other Lists of Forks

SD Model search and ratings: https://civitai.com/

Dormant projects, for historical/research interest:

Misc SD UI's

UI's that dont come with their own SD distro, just shelling out to one

UI Name/Link Stars Self-Description
ahrm/UnstableFusion 815 UnstableFusion is a desktop frontend for Stable Diffusion which combines image generation, inpainting, img2img and other image editing operation into a seamless workflow. https://www.youtube.com/watch?v=XLOhizAnSfQ&t=1s
stable-diffusion-2-gui 262 Lightweight Stable Diffusion v 2.1 web UI: txt2img, img2img, depth2img, inpaint and upscale4x.
breadthe/sd-buddy 165 Companion desktop app for the self-hosted M1 Mac version of Stable Diffusion, with Svelte and Tauri
leszekhanusz/diffusion-ui 65 This is a web interface frontend for the generation of images using diffusion models.

The goal is to provide an interface to online and offline backends doing image generation and inpainting like Stable Diffusion.
GenerationQ 21 GenerationQ (for "image generation queue") is a cross-platform desktop application (screens below) designed to provide a general purpose GUI for generating images via text2img and img2img models. Its primary target is Stable Diffusion but since there is such a variety of forked programs with their own particularities, the UI for configuring image generation tasks is designed to be generic enough to accommodate just about any script (even non-SD models).

SD Prompt galleries and search engines

SD Visual search

SD Prompt generators

Img2prompt - Reverse Prompt Engineering

Explore Artists, styles, and modifiers

See https://github.com/sw-yx/prompt-eng/blob/main/PROMPTS.md for more details and notes

SD Prompt Tools directories and guides

Finetuning/Dreambooth

How to finetune

Now LORA https://github.com/cloneofsimo/lora

Stable Diffusion + Midjourney

Embeddings/Textual Inversion

Dreambooth

Trained examples

ControlNet

SD Tooling

How SD Works - Internals and Studies

SD Results

Img2Img

InstructPix2Pix

  • https://www.timothybrooks.com/instruct-pix2pix
  • Pix2Pixzero - https://pix2pixzero.github.io/
    • We propose pix2pix-zero, a diffusion-based image-to-image approach that allows users to specify the edit direction on-the-fly (e.g., cat to dog). Our method can directly use pre-trained text-to-image diffusion models, such as Stable Diffusion, for editing real and synthetic images while preserving the input image's structure. Our method is training-free and prompt-free, as it requires neither manual text prompting for each input image nor costly fine-tuning for each task.

Extremely detailed prompt examples

Solving Hands

  • Negative prompts: ugly, disfigured, too many fingers, too many arms, too many legs, too many hands

Midjourney prompts

Misc