-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idea: Precision scaling research #127
Comments
@bachvudinh as co-author please help me to add some exploded attempts and MMLU score We have ran some testing to train Llama 3.2 1B Instruct to check if
for the training configs with fp32 and setting lr as 1e-4 and weight decay as 0.05 , there are some weird mmlu results with checkpoint step 1000, 2000 and 3000: |
A few pending issues:
Next steps:
cc @0xSage for interested |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem Statement
Hypothesis: Increasing numerical precision during training can improve the performance of small language models (≈1B parameters), potentially enabling them to achieve capabilities comparable to larger models (3B-7B parameters).
Implications
If validated, this hypothesis could:
Idea
Reference: https://arxiv.org/pdf/2411.04330
The text was updated successfully, but these errors were encountered: