Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux1 nf4 #1

Open
freecode-ai opened this issue Aug 29, 2024 · 2 comments
Open

Flux1 nf4 #1

freecode-ai opened this issue Aug 29, 2024 · 2 comments

Comments

@freecode-ai
Copy link

Does this support the nf4 model?
https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4

@SplittyDev
Copy link
Owner

Currently, it only supports the official models from BlackForestLabs. The main reason being that this is my first project running inference, and I have no idea how to load models manually.

I'll try to look into it, because I'm interested in getting the quantized models to run myself, but can't promise that I'll get it to work.

@SplittyDev
Copy link
Owner

@freecode-ai I've pushed a few commits with significant changes to model loading, theoretically allowing for loading of quantized models.

Experimental quantized model support

An fp8 version of dev and schnell is now available for selection, but I can't test it myself, because fp8 isn't supported on MPS, which is currently the only device I have access to. If you or anyone else reading this could test it, please let me know if it works correctly!

Issues regarding NF4 support

Honestly, I just don't know how to load them properly..

After the recent refactoring, there's support for loading a FluxTransformer2DModel manually, which allows for loading other kinds of models. But I still don't know how I'd go about loading nf4 models, because as far as I can tell, torch doesn't come with an nf4 dtype.

I've tried finding out how I can load these models and so far, I'm looking into the bitsandbytes package, but I'm not very experienced with this stuff and I don't know if it's even possible to just easily use bitsandbytes for NF4 support together with FluxTransformer2DModel, or whether I'd have to break down the pipeline even more and do even more stuff manually.

If anyone knows how to do this, please let me know, or even better, submit a PR :)

Future GGUF support

Now that the codebase is a bit cleaner and there's more support for doing more manual stuff regarding model loading, maybe we could support GGUF too to get access to all the nice integer GGUF quants?

My issues regarding this are basically the same as my nf4 issues: How can I load these models, and is it even compatible with the diffusers library? I can't seem to find any usable examples, or maybe I'm just really bad at googling this stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@SplittyDev @freecode-ai and others