-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Crashes at the end of startup during first prompt processing #8096
Comments
Does 52fc870 still work correctly? |
Can you please link the exact model that you were using? |
https://huggingface.co/MaziyarPanahi/WizardLM-2-7B-GGUF/blob/main/WizardLM-2-7B.Q8_0.gguf btw is there a way to compile it for opencl instead of cuda? I only found some python refs when googling for this, but nothing for c. Maybe the problem happens only on cuda, so I'd like to try opencl. |
OpenCL was removed because there was no one to maintain it. You can try Vulkan.
It's very likely this is a CUDA-specific problem. That's why I would like you to test 52fc870 since that is the last commit before I changed something that I suspect to be the problem. |
I can't reproduce the issue. Can you post |
./llama-cli -m models/WizardLM-2-7B.Q8_0.gguf -t 6 --seed -1 -n -1 --keep -1 --color -i --in-prefix "Human:" --in-suffix "Helper" -f prompts/helper.txt -ngl 255 --interactive-first -c 8192 --temp 0.3 --repeat-penalty 1.1 --top_p 0.8 --top_k 100 Sorry, I confused two models, However, I just tried two other models randomly (Llama-3-8B-Instruct-MopeyMule_q8.gguf and Meta-Llama-3-8B-Instruct.Q8_0.gguf) and I got exactly the same error on startup, so I don't think it's particular to this specific one model. Edit: Googling didn't help me, I only found completely different forums about mining where I read about "virtual memory requiring to be increased" when this error happens, well, in some other situations though. No idea if this is somehow applicable here, I don't even know what "virtual mem" these guys were referring to. Another thread suggested lowering gpu clock. Not sure how to do that either. Any way to test the gpu/mem for being faulty perhaps? |
Can you check whether this fix #8100 works? |
Sure. (just a note - I just swapped the graphics card for exactly the same model (2080 ti 22gb) just to make sure this particular card wasn't broken. Got the same error. I assume that not both cards are faulty, so..) I added the 8100 diffs to ggml-cuda/mmq.cuh, cleared the build directory and rebuilt, still the same problem. :/ I could maybe add any kind of debug code if you tell me which files to edit and where to put it if it helps... |
Are you using make or CMake? |
cmake .. -DLLAMA_CUDA=ON -DLLAMA_BLAS_VENDOR=OpenBLAS |
I just realized CMake doesn't have an option for the degugging I need, sorry. I'll maybe try to add it. Or if you're up to it here is how you would do it with make:
|
how exactly do I do that? |
In the project root directory:
|
ok, I was just confused because of the warning that LLAMA_DEBUG has no effect. make'ing now... |
Just so there are no misunderstandings: you are not supposed to run any CMake commands at all. In the llama.cpp root directory there already is a Makefile without any commands. You are supposed to use that one. |
Oh wow, super slow debug mode output.
I also let it run without the compute-sanitizer, and again it didn't crash, but produced the same output msg spam. |
That's bad actually. If it crashes |
I also let it run without the compute-sanitizer, and again it didn't crash, but produced the same output msg spam (just much faster this time). |
I assume that is a different issue and will be fixed with #8102 . |
how can I download that as raw diff file? Last time I just copied the one line and manually erased the other 2, because I couldnt figure out how to get this issue-patch downloaded in a usable plain text (diff/patch) format |
cmake with |
ohhh, that #8102 seems to have fixed the output indeed!
|
Alright, I issued cmake -B build -DLLAMA_CUDA=1 -DCMAKE_CUDA_FLAGS="-g -lineinfo" and the resulting llama-cli has no more "disabling CUDA" messages in it, runs very fast, no crash, and gives coherent output! :) I'm not sure where the debug info comes no now with this commandline. But anyway... |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
What happened?
Started up a 7B model, completely offloaded into a 2080 Ti with 22GB RAM, so far succesful startup but at the end it crashes during the prompt processing.
https://huggingface.co/MaziyarPanahi/WizardLM-2-7B-GGUF/blob/main/WizardLM-2-7B.Q8_0.gguf
Name and Version
$ ./llama-cli --version
version: 3215 (d62e4aa)
built with cc (GCC) 14.1.1 20240522 for x86_64-pc-linux-gnu
What operating system are you seeing the problem on?
Linux archlinux 6.9.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 21 Jun 2024 19:49:19 +0000 x86_64 GNU/Linux
Relevant log output
The text was updated successfully, but these errors were encountered: