We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Often used in pretraining of LMs for stabilization, i.e. the recent Chameleon & PaLM.
flash-attn has implementations of abovementioned features, however, does not support fusing with linear head.
No response
The text was updated successfully, but these errors were encountered:
Legit ask! We have tracked smooth label at #81. I modify the title for only Z loss to prevent duplication.
Sorry, something went wrong.
@ByronHsu #take To support z loss, I just need a little add-ons to #198. I'll work on it after merging label_smoothing PR.
shivam15s
Successfully merging a pull request may close this issue.
🚀 The feature, motivation and pitch
Often used in pretraining of LMs for stabilization, i.e. the recent Chameleon & PaLM.
Alternatives
flash-attn has implementations of abovementioned features, however, does not support fusing with linear head.
Additional context
No response
The text was updated successfully, but these errors were encountered: