Usage of LoadBestPeftModelCallback in Finetuning stage #136

ttssp · 2023-09-05T05:07:56Z

Hi friends,

I was trying to test the finetune/finetune.py script. It seems that state.best_model_checkpoint always return None leading to a failure at the end of the program. Is it that the program did not save a "best model" during training? I am a bit new to this, could anyone give some explanation on this and offer some hints on solving it? Thanks a lot!

command(single GPU):

python finetune/finetune.py --model_path="../../models/starcoder/" --dataset_name="../../datasets/ArmelR/stack-exchange-instruction" --subset="data/finetune" --split="train" --size_valid_set 10 --streaming --seq_length 2048 --max_steps 2 --batch_size 1 --input_column_name="question" --output_column_name="response" --gradient_accumulation_steps 1 --learning_rate 1e-4 --lr_scheduler_type="cosine" --num_warmup_steps 1 --weight_decay 0.05 --output_dir="./checkpoints"

error image(single GPU):

command(mulit GPUs):
python -m torch.distributed.launch --nproc_per_node 4 finetune/finetune.py --model_path="../../models/starcoder/" --dataset_name="../../datasets/ArmelR/stack-exchange-instruction" --subset="data/finetune" --split="train" --size_valid_set 10000 --streaming --seq_length 2048 --max_steps 2 --batch_size 1 --input_column_name="question" --output_column_name="response" --gradient_accumulation_steps 16 --learning_rate 1e-4 --lr_scheduler_type="cosine" --num_warmup_steps 100 --weight_decay 0.05 --output_dir="./checkpoints"

error image(mulit GPU):

The text was updated successfully, but these errors were encountered:

upjabir · 2023-11-22T06:21:11Z

@ttssp I believe by default save_steps=100, and you are trying to run fine-tuning only for 2 steps. Try reducing save_steps to 1 or 2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of LoadBestPeftModelCallback in Finetuning stage #136

Usage of LoadBestPeftModelCallback in Finetuning stage #136

ttssp commented Sep 5, 2023

upjabir commented Nov 22, 2023

Usage of LoadBestPeftModelCallback in Finetuning stage #136

Usage of LoadBestPeftModelCallback in Finetuning stage #136

Comments

ttssp commented Sep 5, 2023

upjabir commented Nov 22, 2023