Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaMA 2 support for pre-training #77

Open
philschmid opened this issue Jul 19, 2023 · 6 comments
Open

LLaMA 2 support for pre-training #77

philschmid opened this issue Jul 19, 2023 · 6 comments

Comments

@philschmid
Copy link

Hello,

Are you planning to add support for LLaMA 2 to further pretrain the models?

@juliensalinas
Copy link
Contributor

That would be awesome 🥇

@philschmid
Copy link
Author

Hello,

Are you planning to add support for LLaMA 2 to further pretrain the models?

I know 7B and 13B should have the same architecture, would be good if you can confirm that it works. Also if there are plans for the 70B (GQA).

@windmaple
Copy link

+1

@young-geng
Copy link
Owner

Indeed this would be useful. Let me look into that.

@erfanzar
Copy link

I have implemented a version of that but I haven't checked that yet I used the same architecture as EasyLM in some parts
https://github.com/erfanzar/EasyDeL/blob/main/EasyDel/modules/llama/modelling_llama_flax.py

@iliemihai
Copy link

Has anyone tried implementing further pre-training in Flax/JAX to run it on TPU ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants