You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{'loss': 6.2368, 'learning_rate': 5e-05, 'epoch': 1.0}
10%|████████▊ | 1/10 [00:13<02:01, 13.45s/it]/home/yerong2/local/miniconda3/envs/mllm/lib/python3.11/site-packages/deepspeed/runtime/zero/stage_1_and_2.py:1652: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)])
/home/yerong2/local/miniconda3/envs/mllm/lib/python3.11/site-packages/deepspeed/runtime/zero/stage_1_and_2.py:1652: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)])
{'loss': 0.46, 'learning_rate': 4.849231551964771e-05, 'epoch': 1.6}
{'loss': 0.7711, 'learning_rate': 4.415111107797445e-05, 'epoch': 2.0}
{'loss': 1.9513, 'learning_rate': 3.7500000000000003e-05, 'epoch': 3.0}
{'loss': 0.1499, 'learning_rate': 2.9341204441673266e-05, 'epoch': 3.2}
50%|████████████████████████████████████████████ | 5/10 [00:27<00:22, 4.44s/it]/home/yerong2/local/miniconda3/envs/mllm/lib/python3.11/site-packages/peft/utils/save_and_load.py:148: UserWarning: Could not find a config file in /home/yerong2/models/internlm-xcomposer2d5-7b - will assume that the vocabulary was not modified.
warnings.warn(
/home/yerong2/local/miniconda3/envs/mllm/lib/python3.11/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/yerong2/local/miniconda3/envs/mllm/lib/python3.11/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
{'loss': 1.9427, 'learning_rate': 2.0658795558326743e-05, 'epoch': 4.0}
{'loss': 1.33, 'learning_rate': 1.2500000000000006e-05, 'epoch': 4.8}
{'loss': 0.1506, 'learning_rate': 5.848888922025553e-06, 'epoch': 5.0}
{'loss': 2.8562, 'learning_rate': 1.5076844803522922e-06, 'epoch': 6.0}
{'loss': 0.5747, 'learning_rate': 0.0, 'epoch': 6.4}
{'train_runtime': 192.5393, 'train_samples_per_second': 0.519, 'train_steps_per_second': 0.052, 'train_loss': 1.6423154383897782, 'epoch': 6.4}
100%|███████████████████████████████████████████████████████████████████████████████████████| 10/10 [03:12<00:00, 19.25s/it]
[2024-08-18 07:50:06,882] [INFO] [launch.py:347:main] Process 3472257 exits successfully.
Step 2 merge the model to with merge_peft_adapter.py and place it at merged/checkpoint-10
Step 3 start from merged/checkpoint-10 and view the loss. LOSS restart from 6.0 !!!
Could not estimate the number of tokens of the input, floating-point operations will not be computed
Could not estimate the number of tokens of the input, floating-point operations will not be computed
{'loss': 5.7047, 'learning_rate': 5e-05, 'epoch': 1.0}
10%|████████▊ | 1/10 [00:12<01:56, 13.00s/it]/home/yerong2/local/miniconda3/envs/mllm/lib/python3.11/site-packages/deepspeed/runtime/zero/stage_1_and_2.py:1652: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)])
/home/yerong2/local/miniconda3/envs/mllm/lib/python3.11/site-packages/deepspeed/runtime/zero/stage_1_and_2.py:1652: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)])
{'loss': 0.4522, 'learning_rate': 4.849231551964771e-05, 'epoch': 1.6}
{'loss': 0.5286, 'learning_rate': 4.415111107797445e-05, 'epoch': 2.0}
{'loss': 1.2067, 'learning_rate': 3.7500000000000003e-05, 'epoch': 3.0}
{'loss': 0.1496, 'learning_rate': 2.9341204441673266e-05, 'epoch': 3.2}
50%|████████████████████████████████████████████ | 5/10 [00:27<00:21, 4.39s/it]/home/yerong2/local/miniconda3/envs/mllm/lib/python3.11/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/yerong2/local/miniconda3/envs/mllm/lib/python3.11/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
{'loss': 1.3043, 'learning_rate': 2.0658795558326743e-05, 'epoch': 4.0}
{'loss': 0.731, 'learning_rate': 1.2500000000000006e-05, 'epoch': 4.8}
{'loss': 0.1483, 'learning_rate': 5.848888922025553e-06, 'epoch': 5.0}
{'loss': 1.6292, 'learning_rate': 1.5076844803522922e-06, 'epoch': 6.0}
{'loss': 0.3236, 'learning_rate': 0.0, 'epoch': 6.4}
{'train_runtime': 191.0236, 'train_samples_per_second': 0.523, 'train_steps_per_second': 0.052, 'train_loss': 1.2178180634975433, 'epoch': 6.4}
100%|███████████████████████████████████████████████████████████████████████████████████████| 10/10 [03:11<00:00, 19.10s/it]
[2024-08-18 08:01:03,083] [INFO] [launch.py:347:main] Process 3485551 exits successfully.
[2024-08-18 08:01:04,084] [INFO] [launch.py:347:main] Process 3485552 exits successfully.
The text was updated successfully, but these errors were encountered:
LOSS is declining from 6.23 to 0.4
merge_peft_adapter.py
and place it atmerged/checkpoint-10
merged/checkpoint-10
and view the loss. LOSS restart from 6.0 !!!LOSS restart from 6.0
The text was updated successfully, but these errors were encountered: