Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

Releases: ExponentialML/Text-To-Video-Finetuning

Update 2023-12-14

14 Dec 21:41
d09d52d
Compare
Choose a tag to compare

First of all a note from me. Thank you guys for your support, feedback, and journey through discovering the nascent, innate potential of video Diffusion Models.

@damo-vilab (the creators of ModelScope and others) Has released an official repository for finetuning all things Video Diffusion Models, and I recommend their implementations over this repository.
https://github.com/damo-vilab/i2vgen-xl

62e33a713e863650.mp4

This repository will no longer be updated, but will instead be archived for researchers & builders that wish to bootstrap their projects.
I will be leaving the issues, pull requests, and all related things for posterity purposes.

Thanks again!

Text To Video Finetuning v3

12 Jul 20:29
f35881d
Compare
Choose a tag to compare

New Release with some exciting features and bug fixes!

Changes

  • Add alternative to offset noise from https://arxiv.org/abs/2305.08891 rescale_schedule in the config.

  • Use default dropout of 0.1 on all temporal convolution layers.

  • Added support for training LoRA models for use with the text2video A1111 extension.
    lora_version: "stable_lora" in the config.

  • Add ability to choose different Accelerator loggers.

  • Regress Accelerator version to 0.19 to prevent model checkpoint saving issues.

  • Multiple contributions to inference.py for stability and ease of use. Thanks @bruefire, @JCBrouwer, and @bfasenfest!

Add Full LoRa Training

11 Apr 02:05
053b517
Compare
Choose a tag to compare

What's New

  • LoRa training based off of cloneofsimo's repository.
  • Add LoraInjectedConv3d module. 🎥
  • Add config for LoRA only training.
  • Add option to save LoRA for UNet & Text Encoder.
  • Fix checkpointing model files during training.

Text To Video Finetuning v2

09 Apr 00:43
9c85d2d
Compare
Choose a tag to compare

Changes and Updates

  • High quality VRAM config.
  • Add text encoder training.
  • Allow training on lowwe vram systems.
  • Allow single image training.
  • Train with image captions.
  • Train with video captions in folder.
  • Gradient checkpointing support.
  • Time agnostic training.
  • Add aspect ratio bucketing.
  • Add hybrid LoRA for training.
  • Add latent VAE caching.
  • Add optimizer agnostic settings in config.
  • Soup up unet finetuner for readability and efficiency.