Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training ScanNet200 dataset Error #19

Open
xiaotiancai899 opened this issue Jun 10, 2023 · 3 comments
Open

Training ScanNet200 dataset Error #19

xiaotiancai899 opened this issue Jun 10, 2023 · 3 comments

Comments

@xiaotiancai899
Copy link

When I was training the ScanNet200 dataset, An error occured at the epoch55 out of 120.

Traceback (most recent call last):
File "tools/train.py", line 332, in
main()
File "tools/train.py", line 323, in main
train(epoch, model, optimizer, scheduler, scaler, train_loader, cfg, logger, writer)
File "tools/train.py", line 80, in train
loss, log_vars = model(batch, return_loss=True, epoch=epoch - 1) # 这个epoch有没有可能会变成-1之类的啊???
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/isbnet.py", line 219, in forward
return self.forward_train(**batch, epoch=epoch)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/util/utils.py", line 172, in wrapper
return func(*new_args, **new_kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/isbnet.py", line 265, in forward_train
feats, coords_float, voxel_coords, spatial_shape, batch_size, p2v_map
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/isbnet.py", line 632, in forward_backbone
output = self.unet(output)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 250, in forward
output_decoder = self.u(output_decoder)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d/student/Documents/software/wsl/isbnet/isbnet-master/isbnet-master/isbnet/model/blocks.py", line 249, in forward
output_decoder = self.conv(output)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/modules.py", line 137, in forward
input = module(input)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 404, in forward
raise e
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 395, in forward
timer=input._timer)
File "/home/clinton/anaconda3/envs/isbnet/lib/python3.7/site-packages/spconv/pytorch/ops.py", line 465, in get_indice_pairs_implicit_gemm
stream_int=stream)
RuntimeError: /tmp/pip-build-env-a41g0q_q/overlay/lib/python3.7/site-packages/cumm/include/tensorview/cuda/launch.h(53)
N > 0 assert faild. CUDA kernel launch blocks must be positive, but got N= 0

I used bach_size=1, and also avoided OOM during training freezing all BatchNorm layers during training.
Any ideas about that? Thanks so much in advance!

@xiaotiancai899
Copy link
Author

@ngoductuanlhp

@ngoductuanlhp
Copy link
Collaborator

You could check similar issues on the original repo of spconv: traveller59/spconv#406, mit-han-lab/bevfusion#82.

Best.

@xiaotiancai899
Copy link
Author

Those two cannot solve my problem. Any other advice?

You could check similar issues on the original repo of spconv: traveller59/spconv#406, mit-han-lab/bevfusion#82.

Best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants