Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问要是中断训练,想继续的话需要加什么参数呢 #421

Open
yimuu opened this issue Aug 14, 2024 · 3 comments
Open

请问要是中断训练,想继续的话需要加什么参数呢 #421

yimuu opened this issue Aug 14, 2024 · 3 comments
Assignees

Comments

@yimuu
Copy link

yimuu commented Aug 14, 2024

训练中间一个step断掉了,可以从这个step继续训练吗

@yuhangzang
Copy link
Collaborator

You may modify the code and set resume_from_checkpoint =True in the TrainingArguments class.

@YerongLi
Copy link

#423 I found resume_from_checkpoint =True has its own issue with LoRA training, the LOSS restarts itself. Not sure whether you guys got a similar issue. @yuhangzang

@YerongLi
Copy link

YerongLi commented Aug 19, 2024

I find there is a problem saving checkpoints with 2d5-7b, while internlm-xcomposer2-vl-7b can saves checkpoint correctly in different settings. #423 #426

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants