CUDA error: out of memory #72

Me-Maped · 2023-10-12T09:48:55Z

我在训练一个1080*1920尺寸大约5秒钟的视频，使用ffmpeg提取出约300张图片，同时使用SegmentAndTrackAnything提取出了遮罩。在执行train_multi过程中抛出错误：
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

我的环境是windows下wsl2，分配了26g内存，显卡是4070ti 12g。
base.yaml配置：
img_wh: [1080, 1920]
canonical_wh: [1080, 1920]
lr: 0.001
bg_loss: 0.003
ref_idx: null # 0
N_xyz_w: [8,8]
flow_loss: 1
flow_step: -1
self_bg: True
deform_hash: True
vid_hash: True
num_steps: 10000
decay_step: [2500, 5000, 7500]
annealed_begin_step: 4000
annealed_step: 4000
save_model_iters: 2000

尝试将视频缩短，只提取出10张图片，但在train的过程仍然中断并抛出CUDA error: out of memory的错误，实在不理解为什么，是哪里配置错误了吗？跑测试用例没有问题。

The text was updated successfully, but these errors were encountered:

Me-Maped · 2023-10-12T09:53:16Z

yaml中如果加上mask_dir: null 配置，200张图片，可以train成功，但产出的视频抖动十分夸张。压缩视频到原质量的一半，重新产出没有问题。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error: out of memory #72

CUDA error: out of memory #72

Me-Maped commented Oct 12, 2023

Me-Maped commented Oct 12, 2023 •

edited

Loading

CUDA error: out of memory #72

CUDA error: out of memory #72

Comments

Me-Maped commented Oct 12, 2023

Me-Maped commented Oct 12, 2023 • edited Loading

Me-Maped commented Oct 12, 2023 •

edited

Loading