-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to train using 1 GPU #234
Comments
How do you solve it? I met the same problem.@amira-essawy |
@yjcreation I changed some parameters in the base config file, this line:
|
Thank you, this really helps. |
@amira-essawy @ERGOWHO were you able to successfully train and obtain good results using a single gpu? |
Can a GPU run experiment also achieve good results? |
I am trying to run the train.py using 1 GPU using this command
python tools/train.py configs/soft_teacher/soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k.py --gpus 1 --cfg-options fold=1 percent=10
The training started and ran till 4000 epoch then stopped giving this error, I am facing this problem on COCO dataset and my custom dataset
2023-02-09 13:00:26,292 - mmdet.ssod - INFO - Saving checkpoint at 4000 iterations 2023-02-09 13:00:36,802 - mmdet.ssod - INFO - Exp name: cv3.py 2023-02-09 13:00:36,803 - mmdet.ssod - INFO - Iter [4000/1080000] lr: 1.000e-02, eta: 9598 days, 18:19:02, time: 15.415, data_time: 0.941, memory: 6573, ema_momentum: 0.9990, sup_loss_rpn_cls: 0.0315, sup_loss_rpn_bbox: 0.0125, sup_loss_cls: 0.0654, sup_acc: 97.9980, sup_loss_bbox: 0.0812, loss: 0.1906 Traceback (most recent call last): File "tools/train.py", line 198, in <module> main() File "tools/train.py", line 186, in main train_detector( File "/root/workspace/amiras/SoftTeacher/ssod/apis/train.py", line 206, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 144, in run iter_runner(iter_loaders[i], **kwargs) File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 70, in train self.call_hook('after_train_iter') File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 317, in call_hook getattr(hook, fn_name)(self) File "/root/workspace/amiras/SoftTeacher/ssod/utils/hooks/submodules_evaluation.py", line 38, in after_train_iter self._do_evaluate(runner) File "/root/workspace/amiras/SoftTeacher/ssod/utils/hooks/submodules_evaluation.py", line 52, in _do_evaluate dist.broadcast(module.running_var, 0) File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1399, in broadcast default_pg = _get_default_group() File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 584, in _get_default_group raise RuntimeError( RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
Is the code not compatible with only GPU?!
The text was updated successfully, but these errors were encountered: