Replies: 2 comments
-
Hi @agbarbosa , it seems that llama3 is practically uses same architecture. Though, we haven't tested llama2 on llama3 models, but it looks like they should work |
Beta Was this translation helpful? Give feedback.
-
The architecture is the same. You even had the foresight to implement the grouped query attention before they rolled it out in Llama3. But the vocab is bigger and no longer uses sentencepiece. The new BPE switches from trying to get the highest score when combining tokens to going with the lowest rank. The vocab is provided in ranked order so it is nice that index = rank and you don't need to keep the score. But some small fiddling needs to be done if you want to be compatible with both Llama2 and Llama3. |
Beta Was this translation helpful? Give feedback.
-
Hey, what about making it to work with Llama3?
Beta Was this translation helpful? Give feedback.
All reactions