-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: automatically adjust default gpu_layers by available GPU memory #3541
Comments
@mudler happy to take this task and work on it. I have to think a bit on the approach or google on alternatives |
@mudler rough design/thoughts on the addition of this feature. Chat-GPT generated markdown for the solution Design Document: Optimizing GPU Layer Configuration in LocalAI Using gguf-parserOverviewRough solution to optimize GPU layer configuration when using LocalAI for running large models, such as Problem StatementLarge models like Solution ApproachDynamically adjust the GPU layer configuration based on the model metadata provided by
Key Features
Components
Reference
Workflow
Estimation ProcessVRAM Estimation (Per GPU)Using
Tensor Split for Multi-GPU SetupThe model can be distributed across multiple GPUs using the
|
@mudler @sozercan Some more context now i have a working prototype for parsing gguf models. Using GGUF-Parser i have following output for model: RAM and VRAM estimates: Based on the above values rough math for a machine with 10GB VRAM, GPU_Layers comes out to 37 layers, which can then be set in localAI as a parameter to pass down to llamacpp. |
that sounds likely the good direction - would be cool now to instrument the library from the code, and set the GPU layers in the model defaults accordingly LocalAI/core/config/backend_config.go Line 291 in 04c0841
|
Is your feature request related to a problem? Please describe.
Having defaults high number of GPU layers doesn't always work. For instance big models can overfit the card and constrain the user to configure
gpu_layers
manuallyDescribe the solution you'd like
With libraries like https://github.com/gpustack/gguf-parser-go we could get along and identify beforeahead how much gpu vram could be used and adjust the default settings
Describe alternatives you've considered
Keep things as is
Additional context
The text was updated successfully, but these errors were encountered: