You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm also not sure. I usually set it to 1. I have seen implementations where it's set to 0.5. I guess they do it so that some dimensions never get rotated and it makes it easier for the model to use attention only using content with no interference from the positional information.
I confirmed that there is code in the RotaryPEMultiHeadAttention class that reduces the dimension using a parameter called rope_percentage.
(URL:
annotated_deep_learning_paper_implementations/labml_nn/transformers/rope/__init__.py
Line 205 in 285cb37
I am curious in what cases you would set rope_percentage to a value less than 1.
(Of course, in experiment.py, we confirmed that rope_percentage is set to 1.0.)
The text was updated successfully, but these errors were encountered: