Skip to content

This script allows to automate video stylization task using StableDiffusion and ControlNet.

License

Notifications You must be signed in to change notification settings

CaptnSeraph/SD-CN-Animation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SD-CN-Animation

This project allows you to automate video stylization task using StableDiffusion and ControlNet. It also allows you to generate completely new videos from text at any resolution and length in contrast to other current text2video methods using any Stable Diffusion model as a backbone, including custom ones. It uses 'RAFT' optical flow estimation algorithm to keep the animation stable and create an inpainting mask that is used to generate the next frame. In text to video mode it relies on 'FloweR' method (work in progress) that predicts optical flow from the previous frames.

Video to Video Examples:

Original video "Jessica Chastain" "Watercolor painting"

Examples presented are generated at 1024x576 resolution using the 'realisticVisionV13_v13' model as a base. They were cropt, downsized and compressed for better loading speed. You can see them in their original quality in the 'examples' folder.

Text to Video Examples:

"close up of a flower" "bonfire near the camp in the mountains at night" "close up of a diamond laying on the table"
"close up of macaroni on the plate" "close up of golden sphere" "a tree standing in the winter forest"

All examples you can see here are originally generated at 512x512 resolution using the 'sd-v1-5-inpainting' model as a base. They were downsized and compressed for better loading speed. You can see them in their original quality in the 'examples' folder. Actual prompts used were stated in the following format: "RAW photo, {subject}, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3", only the 'subject' part is described in the table above.

Dependencies

To install all the necessary dependencies, run this command:

pip install opencv-python opencv-contrib-python numpy tqdm h5py scikit-image

You have to set up the RAFT repository as it described here: https://github.com/princeton-vl/RAFT . Basically it just comes down to running "./download_models.sh" in RAFT folder to download the models.

Running the scripts

This script works on top of Automatic1111/web-ui interface via API. To run this script you have to set it up first. You should also havesd-webui-controlnet extension installed. You need to have the control_hed-fp16 model installed. If you have web-ui with ControlNet working correctly, you have to also allow the API to work with controlNet. To do so, go to the web-ui settings -> ControlNet tab -> Set "Allow other script to control this extension" checkbox to active and set "Multi ControlNet: Max models amount (requires restart)" to more then 2 -> press "Apply settings".

Video To Video

Step 1.

To process the video, first of all you would need to precompute optical flow data before running web-ui with this command:

python3 compute_flow.py -i "path to your video" -o "path to output file with *.h5 format" -v -W width_of_the_flow_map -H height_of_the_flow_map

The main reason to do this step separately is to save precious GPU memory that will be useful to generate better quality images. Choose W and H parameters as high as your GPU can handle with respect to the proportion of original video resolution. Do not worry if it is higher or less then the processing resolution, flow maps will be scaled accordingly at the processing stage. This will generate quite a large file that may take up to a several gigabytes on the drive even for minute long video. If you want to process a long video consider splitting it into several parts beforehand.

Step 2.

Run web-ui with '--api' flag. It is also better to use '--xformers' flag, as you would need to have the highest resolution possible and using xformers memory optimization will greatly help.

bash webui.sh --xformers --api

Step 3.

Go to the vid2vid.py file and change main parameters (INPUT_VIDEO, FLOW_MAPS, OUTPUT_VIDEO, PROMPT, N_PROMPT, W, H) to the ones you need for your project. FLOW_MAPS parameter should contain a path to the flow file that you generated at the first step. The script is pretty simple so you may change other parameters as well, although I would recommend to leave them as is for the first time. Finally run the script with the command:

python3 vid2vid.py

Text To Video

This method is still in development and works on top of ‘Stable Diffusion’ and 'FloweR' - optical flow reconstruction method that is also in a yearly development stage. Do not expect much from it as it is more of a proof of a concept rather than a complete solution.

Step 1.

Download 'FloweR_0.1.pth' model from here: Google drive link and place it in the 'FloweR' folder.

Step 2.

Same as with vid2vid case, run web-ui with '--api' flag. It is also better to use '--xformers' flag, as you would need to have the highest resolution possible and using xformers memory optimization will greatly help.

bash webui.sh --xformers --api

Step 3.

Go to the txt2vid.py file and change main parameters (OUTPUT_VIDEO, PROMPT, N_PROMPT, W, H) to the ones you need for your project. Again, the script is simple so you may change other parameters if you want to. Finally run the script with the command:

python3 txt2vid.py

Last version changes: v0.5

  • Fixed an issue with the wrong direction of an optical flow applied to an image.
  • Added text to video mode within txt2vid.py script. Make sure to update new dependencies for this script to work!
  • Added a threshold for an optical flow before processing the frame to remove white noise that might appear, as it was suggested by @alexfredo.
  • Background removal at flow computation stage implemented by @CaptnSeraph, it should reduce ghosting effect in most of the videos processed with vid2vid script.

Licence

This repository can only be used for personal/research/non-commercial purposes. However, for commercial requests, please contact me directly at borsky.alexey@gmail.com

About

This script allows to automate video stylization task using StableDiffusion and ControlNet.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%