-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError due to dtype mismatch in fused_linear_cross_entropy_forward #305
Comments
Fixes linkedin#305 Fix dtype mismatch in fused_linear_cross_entropy_forward function. * Cast `logits_chunk` to the data type of `_input_chunk` before performing operations on it. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/linkedin/Liger-Kernel/issues/305?shareId=XXXX-XXXX-XXXX-XXXX).
I had the same problem when trying to train lm_head layer of LLaMA. |
|
I have been busy lately but I will test the new fix you provided and let you know if it solves the issue. I closed the old PR since it was just a temporary solution. @yundai424 |
+1 same issue here for llama3.2 1B + Trainer
|
we just merged the change. can you try downloading |
🐛 Describe the bug
I encountered a RuntimeError while running a full fine-tuning experiment using the LLaMA-Factory on a model with BFloat16 precision. The error occurred during the training process when executing the fused_linear_cross_entropy_forward operation. The error traceback indicates a mismatch in data types between mat1 and mat2, specifically BFloat16 and Float. The models used were qwen2.5 3b and llama3.2 3b.
Error Log
Versions
Main
The text was updated successfully, but these errors were encountered: