-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does the discontinuous positional encodings confuse the model? #6
Comments
Hi @ovowei , Thank you for your interest in our work! Quest does not employ a similar process to LM-Infinite or StreamingLLM. Instead, Quest directly applies the original positional embeddings to the selected pages. Here are the reasons for this approach:
|
Hi @Sakits Thanks for your answers. It makes sense to me. I used a model pre-trained on shorter sequence datasets to process longer sequence tasks. I found that applying QUEST and assigning the same positional encodings to all tokens beyond a certain distance yields better results in this case. This suggests that QUEST might help models process extremely long sequences. I will conduct more experiments to verify this. If you have conducted similar experiments, I would appreciate it if you could share your results. Thanks! |
Hi @ovowei , Sorry for the delayed reply! I’ve been busy working on a paper submission recently. Thank you for sharing your insights and interesting discussions! :) Yes, we also found that assigning the same positional encodings to tokens beyond a certain distance can somehow extend the model’s effective context range. There are some interesting works that discuss similar ideas, such as InfLLM and LongHeads. However, with more and more models offering extended context windows (up to 128k~10M tokens), modifying positional encodings in this way might not be as necessary as before. Thank you again for your interest in our work! |
Hi,
I was reading your paper and have a question about the positional encodings. In my understanding, performing attention only on selected pages leads to selecting discontinuous pages, resulting in discontinuous positional encodings. LM-Infinite and StreamingLLM directly assign continuous positional encodings or assign the same positional encodings to all tokens beyond the local window size to handle this. Does Quest need similar processing?
Thanks!
The text was updated successfully, but these errors were encountered: