Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

Merged
merged 177 commits into from
Dec 15, 2024

Conversation

lawrence-cj
Copy link
Contributor

What does this PR do?

This PR will add the official Sana (SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer) into the diffusers lib. Sana first makes the Text-to-Image available on 32x compressed latent space, powered by DC-AE(https://arxiv.org/abs/2410.10733v1) without performance degradation. Also, Sana contains several popular efficiency related techs, like DiT with Linear Attention processor and we use Decoder-only LLM (Gemma-2B-IT) for low GPU requirement and fast speed.

Paper: https://arxiv.org/abs/2410.10629
Original code repo: https://github.com/NVlabs/Sana
Project: https://nvlabs.github.io/Sana

Core contributor of DC-AE:
work with @johnny_ez@163.com

Core library:

We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul, @yiyixuxu

Core library:

HF projects:

-->

Images is generated by SanaPAGPipeline with FlowDPMSolverMultistepScheduler

5361732169697_ pic_hd

@a-r-r-o-w
Copy link
Member

@lawrence-cj Awesome, tysm! I will complete the remaining docs and tests and merge soon!

@lawrence-cj
Copy link
Contributor Author

Acutally, we support BF16 here: https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers
@bghira . You can also host your model. It's cool.

@bghira
Copy link
Contributor

bghira commented Dec 13, 2024

yes i am pointing this out merely for anyone who wants something to compare against. for my use case it is because i point the default model for simpletuner to this repo path and can more readily adjust config parameters (eg. scheduler) without impacting other use cases or requiring downstream users remember to pass the custom values in.

@bghira
Copy link
Contributor

bghira commented Dec 13, 2024

i think in the bf16 repository you have the fp32 weights as well; are those a fp32 copy of the bf16 compatible weights? if so that makes sense but otherwise it may confuse users that don't know to pass variant=bf16 in

@lawrence-cj
Copy link
Contributor Author

i think in the bf16 repository you have the fp32 weights as well; are those a fp32 copy of the bf16 compatible weights? if so that makes sense but otherwise it may confuse users that don't know to pass variant=bf16 in

Yes. It's just a FP32 copy of BF16 weight and I run it successfully.

@bghira
Copy link
Contributor

bghira commented Dec 13, 2024

without complex human instruction:

image

with:

image

is it possible there is something wrong with the CHI implementation here? it makes all images worse.

for example with CHI enabled it's putting 508 tokens of input through the model instead of just 300 (206 from CHI plus the 300 prompt tokens (padded) and i don't know why we need this many tokens. is it supposed to be 300 total?

@lawrence-cj
Copy link
Contributor Author

What’s your inference code? @bghira

@bghira
Copy link
Contributor

bghira commented Dec 14, 2024

we use encode_prompt via pipeline to save the embed and then pass it back in for inference time so the text encoder can be unloaded first. other than this just using the BF16 weights

@lawrence-cj
Copy link
Contributor Author

What's your prompt? @bghira

@a-r-r-o-w
Copy link
Member

@hlky Would you like to give the changes to schedulers here a review? I'm preparing to merge it shortly after I add the integration tests in the next hour since YiYi has approved and confirmed on Slack. I've tested all the normal models (not the multilingual ones) and they seem to work well (I did the conversions myself when testing, but for the integration tests, I will be using the remote checkpoints and match slices). I have not exhaustively tested all scheduler changes though - only DPMSolverMultistep and FlowMatchEulerDiscrete, but I think that should be okay since it is copied logic (from make fix-copies).

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@hlky hlky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-r-r-o-w Scheduler changes look good, thanks

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lawrence-cj and team! The paper was very insightful and it was very cool to come across the ideas developed.

Thanks for bearing with our reviews too! Will merge the PR once the CI passes

@a-r-r-o-w a-r-r-o-w added the roadmap Add to current release roadmap label Dec 15, 2024
@a-r-r-o-w a-r-r-o-w merged commit 5a196e3 into huggingface:main Dec 15, 2024
12 checks passed
@vladmandic vladmandic mentioned this pull request Dec 16, 2024
@lawrence-cj
Copy link
Contributor Author

Thank you so much for your effort! Love you guys. I was stuck by other things, sorry for the late reply! !
@sayakpaul @a-r-r-o-w @bghira @yiyixuxu @hlky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
close-to-merge roadmap Add to current release roadmap
Projects
Development

Successfully merging this pull request may close these issues.