You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<unk> 0
[2024-11-27 22:25:23,401] [ INFO] - Starting training from resume_from_checkpoint : None
n2:8140:8140 [0] NCCL INFO Bootstrap : Using ib0.8079:192.168.1.3<0>
n2:8140:8140 [0] NCCL INFO cudaDriverVersion 12000
n2:8140:8140 [0] NCCL INFO NCCL version 2.23.4+cuda12.0
n2:8140:8465 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal network plugin.
n2:8140:8465 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/IB [1]mlx5_0:1/IB [2]mlx5_3:1/IB [3]mlx5_1:1/IB ; OOB ib0.8079:192.168.1.3<0>
n2:8140:8465 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
n2:8140:8465 [0] NCCL INFO Using network IB
n2:8140:8465 [0] NCCL INFO ncclCommInitRank comm 0xaf438e60 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 10000 commId 0x74182825db762150 - Init START
The text was updated successfully, but these errors were encountered:
请提出你的问题
下面是我的环境:
卡死在以下这个位置:
The text was updated successfully, but these errors were encountered: