question about RotaryPEMultiHeadAttention: rotary_percentage #246

YOONSEOKHEO · 2024-03-13T19:04:45Z

I confirmed that there is code in the RotaryPEMultiHeadAttention class that reduces the dimension using a parameter called rope_percentage.
(URL:

annotated_deep_learning_paper_implementations/labml_nn/transformers/rope/__init__.py

Line 205 in 285cb37

d_rope = int(self.d_k * rope_percentage)

)

I am curious in what cases you would set rope_percentage to a value less than 1.

(Of course, in experiment.py, we confirmed that rope_percentage is set to 1.0.)

vpj · 2024-06-24T10:34:01Z

I'm also not sure. I usually set it to 1. I have seen implementations where it's set to 0.5. I guess they do it so that some dimensions never get rotated and it makes it easier for the model to use attention only using content with no interference from the positional information.

YOONSEOKHEO changed the title ~~question about RoPE code(rotary_percentage)~~ question about RotaryPEMultiHeadAttention: rotary_percentage Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about RotaryPEMultiHeadAttention: rotary_percentage #246

question about RotaryPEMultiHeadAttention: rotary_percentage #246

YOONSEOKHEO commented Mar 13, 2024 •

edited

Loading

vpj commented Jun 24, 2024

question about RotaryPEMultiHeadAttention: rotary_percentage #246

question about RotaryPEMultiHeadAttention: rotary_percentage #246

Comments

YOONSEOKHEO commented Mar 13, 2024 • edited Loading

vpj commented Jun 24, 2024

YOONSEOKHEO commented Mar 13, 2024 •

edited

Loading