[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

lawrence-cj · 2024-11-21T06:16:57Z

What does this PR do?

This PR will add the official Sana (SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer) into the diffusers lib. Sana first makes the Text-to-Image available on 32x compressed latent space, powered by DC-AE(https://arxiv.org/abs/2410.10733v1) without performance degradation. Also, Sana contains several popular efficiency related techs, like DiT with Linear Attention processor and we use Decoder-only LLM (Gemma-2B-IT) for low GPU requirement and fast speed.

Paper: https://arxiv.org/abs/2410.10629
Original code repo: https://github.com/NVlabs/Sana
Project: https://nvlabs.github.io/Sana

Core contributor of DC-AE:
work with @johnny_ez@163.com

Core library:

We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul, @yiyixuxu

Core library:

Schedulers: @yiyixuxu
Pipelines and pipeline callbacks: @yiyixuxu and @asomoza
Docs: @stevhliu and @sayakpaul
General functionalities: @sayakpaul @yiyixuxu @DN6

HF projects:

transformers: different repo
safetensors: different repo

-->

Images is generated by `SanaPAGPipeline` with `FlowDPMSolverMultistepScheduler`

# Conflicts: # src/diffusers/models/normalization.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

a-r-r-o-w · 2024-12-13T13:38:11Z

@lawrence-cj Awesome, tysm! I will complete the remaining docs and tests and merge soon!

lawrence-cj · 2024-12-13T13:40:26Z

Acutally, we support BF16 here: https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers
@bghira . You can also host your model. It's cool.

bghira · 2024-12-13T13:45:17Z

yes i am pointing this out merely for anyone who wants something to compare against. for my use case it is because i point the default model for simpletuner to this repo path and can more readily adjust config parameters (eg. scheduler) without impacting other use cases or requiring downstream users remember to pass the custom values in.

bghira · 2024-12-13T13:47:09Z

i think in the bf16 repository you have the fp32 weights as well; are those a fp32 copy of the bf16 compatible weights? if so that makes sense but otherwise it may confuse users that don't know to pass variant=bf16 in

lawrence-cj · 2024-12-13T14:29:07Z

i think in the bf16 repository you have the fp32 weights as well; are those a fp32 copy of the bf16 compatible weights? if so that makes sense but otherwise it may confuse users that don't know to pass variant=bf16 in

Yes. It's just a FP32 copy of BF16 weight and I run it successfully.

bghira · 2024-12-13T18:47:02Z

without complex human instruction:

with:

is it possible there is something wrong with the CHI implementation here? it makes all images worse.

for example with CHI enabled it's putting 508 tokens of input through the model instead of just 300 (206 from CHI plus the 300 prompt tokens (padded) and i don't know why we need this many tokens. is it supposed to be 300 total?

lawrence-cj · 2024-12-13T23:05:15Z

What’s your inference code? @bghira

bghira · 2024-12-14T02:06:45Z

we use encode_prompt via pipeline to save the embed and then pass it back in for inference time so the text encoder can be unloaded first. other than this just using the BF16 weights

lawrence-cj · 2024-12-14T03:14:46Z

What's your prompt? @bghira

a-r-r-o-w · 2024-12-15T15:08:22Z

@hlky Would you like to give the changes to schedulers here a review? I'm preparing to merge it shortly after I add the integration tests in the next hour since YiYi has approved and confirmed on Slack. I've tested all the normal models (not the multilingual ones) and they seem to work well (I did the conversions myself when testing, but for the integration tests, I will be using the remote checkpoints and match slices). I have not exhaustively tested all scheduler changes though - only DPMSolverMultistep and FlowMatchEulerDiscrete, but I think that should be okay since it is copied logic (from make fix-copies).

HuggingFaceDocBuilderDev · 2024-12-15T15:19:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky

@a-r-r-o-w Scheduler changes look good, thanks

a-r-r-o-w

Thank you @lawrence-cj and team! The paper was very insightful and it was very cool to come across the ideas developed.

Thanks for bearing with our reviews too! Will merge the PR once the CI passes

lawrence-cj · 2024-12-17T07:44:35Z

Thank you so much for your effort! Love you guys. I was stuck by other things, sorry for the late reply! !
@sayakpaul @a-r-r-o-w @bghira @yiyixuxu @hlky

lawrence-cj and others added 30 commits October 18, 2024 17:40

first add a script for DC-AE;

6e616a9

Merge remote-tracking branch 'upstream/main' into DC-AE

d2e187a

DC-AE init

90e8939

replace triton with custom implementation

825c975

1. rename file and remove un-used codes;

3a44fa4

no longer rely on omegaconf and dataclass

55b2615

merge

6fb7fdb

Merge remote-tracking branch 'upstream/main' into DC-AE

c323e76

replace custom activation with diffuers activation

da7caa5

remove dc_ae attention in attention_processor.py

fb6d92a

iinherit from ModelMixin

5e63a1a

inherit from ConfigMixin

72cce2b

dc-ae reduce to one file

8f9b4e4

Merge remote-tracking branch 'upstream/main' into DC-AE

b7f68f9

Merge branch 'huggingface:main' into DC-AE

6d96b95

Merge remote-tracking branch 'refs/remotes/origin/main' into DC-AE

3c3cc51

# Conflicts: # src/diffusers/models/normalization.py

update downsample and upsample

1448681

merge

bf40fe8

clean code

dd7718a

support DecoderOutput

19986a5

Merge branch 'main' into DC-AE

3481e23

Merge branch 'main' into DC-AE

0e818df

remove get_same_padding and val2tuple

c6eb233

remove autocast and some assert

59de0a3

update ResBlock

ea604a4

remove contents within super().__init__

80dce02

Update src/diffusers/models/autoencoders/dc_ae.py

1752afd

Co-authored-by: YiYi Xu <yixu310@gmail.com>

remove opsequential

883bcf4

Merge branch 'DC-AE' of github.com:lawrence-cj/diffusers into DC-AE

25ae389

update other blocks to support the removal of build_norm

96e844b

yujincheng08 mentioned this pull request Dec 14, 2024

cannot import name 'SanaPipeline' from 'diffusers' NVlabs/Sana#90

Closed

a-r-r-o-w added 3 commits December 15, 2024 12:16

Merge branch 'main' into Sana

c948a67

update docs

b7837c0

make fix-copies

5fb973c

a-r-r-o-w added 2 commits December 15, 2024 16:09

fix imports

168a0af

fix docs

0d722cb

hlky approved these changes Dec 15, 2024

View reviewed changes

a-r-r-o-w added 5 commits December 15, 2024 17:20

add integration test

0d32ef5

update docs

ea7878c

Merge branch 'main' into Sana

7b82bdc

update examples

884d29e

fix convert_model_output in schedulers

1bc1554

a-r-r-o-w approved these changes Dec 15, 2024

View reviewed changes

a-r-r-o-w added the roadmap Add to current release roadmap label Dec 15, 2024

fix failing tests

cb21289

a-r-r-o-w merged commit 5a196e3 into huggingface:main Dec 15, 2024
12 checks passed

vladmandic mentioned this pull request Dec 16, 2024

Sana issues #10241

Closed

vladmandic mentioned this pull request Dec 17, 2024

UniPC with FlowMatch fails with index out-of-bounds #10266

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

lawrence-cj commented Nov 21, 2024

a-r-r-o-w commented Dec 13, 2024

lawrence-cj commented Dec 13, 2024

bghira commented Dec 13, 2024

bghira commented Dec 13, 2024

lawrence-cj commented Dec 13, 2024

bghira commented Dec 13, 2024 •

edited

Loading

lawrence-cj commented Dec 13, 2024

bghira commented Dec 14, 2024

lawrence-cj commented Dec 14, 2024

a-r-r-o-w commented Dec 15, 2024

HuggingFaceDocBuilderDev commented Dec 15, 2024

hlky left a comment

a-r-r-o-w left a comment

lawrence-cj commented Dec 17, 2024

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

Conversation

lawrence-cj commented Nov 21, 2024

What does this PR do?

Images is generated by SanaPAGPipeline with FlowDPMSolverMultistepScheduler

a-r-r-o-w commented Dec 13, 2024

lawrence-cj commented Dec 13, 2024

bghira commented Dec 13, 2024

bghira commented Dec 13, 2024

lawrence-cj commented Dec 13, 2024

bghira commented Dec 13, 2024 • edited Loading

lawrence-cj commented Dec 13, 2024

bghira commented Dec 14, 2024

lawrence-cj commented Dec 14, 2024

a-r-r-o-w commented Dec 15, 2024

HuggingFaceDocBuilderDev commented Dec 15, 2024

hlky left a comment

Choose a reason for hiding this comment

a-r-r-o-w left a comment

Choose a reason for hiding this comment

lawrence-cj commented Dec 17, 2024

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

Images is generated by `SanaPAGPipeline` with `FlowDPMSolverMultistepScheduler`

bghira commented Dec 13, 2024 •

edited

Loading