Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: out of memory #72

Open
Me-Maped opened this issue Oct 12, 2023 · 1 comment
Open

CUDA error: out of memory #72

Me-Maped opened this issue Oct 12, 2023 · 1 comment

Comments

@Me-Maped
Copy link

我在训练一个1080*1920尺寸大约5秒钟的视频,使用ffmpeg提取出约300张图片,同时使用SegmentAndTrackAnything提取出了遮罩。在执行train_multi过程中抛出错误:
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

我的环境是windows下wsl2,分配了26g内存,显卡是4070ti 12g。
base.yaml配置:
img_wh: [1080, 1920]
canonical_wh: [1080, 1920]
lr: 0.001
bg_loss: 0.003
ref_idx: null # 0
N_xyz_w: [8,8]
flow_loss: 1
flow_step: -1
self_bg: True
deform_hash: True
vid_hash: True
num_steps: 10000
decay_step: [2500, 5000, 7500]
annealed_begin_step: 4000
annealed_step: 4000
save_model_iters: 2000

尝试将视频缩短,只提取出10张图片,但在train的过程仍然中断并抛出CUDA error: out of memory的错误,实在不理解为什么,是哪里配置错误了吗?跑测试用例没有问题。

@Me-Maped
Copy link
Author

Me-Maped commented Oct 12, 2023

yaml中如果加上mask_dir: null 配置,200张图片,可以train成功,但产出的视频抖动十分夸张。压缩视频到原质量的一半,重新产出没有问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant