When using the complete data set(about 800K data), the usage of MiB Mem will continue to increase, resulting in OOM. Is there any solution? #42

wyg-okk · 2023-10-25T04:08:45Z

When I train with the data set of about 800k objects, the number circled in the graph keeps increasing as the number of training steps increases.
My configs/syncdreamer-train.yaml is the same as provided by the author, except for the data path
https://github.com/liuyuan-pal/SyncDreamer/blob/main/configs/syncdreamer-train.yaml

liuyuan-pal · 2023-10-25T13:51:13Z

Hi, the training is based on the pytorch_lightning and it is supposed to manage the resources correctly. You can see that the dataset class

SyncDreamer/ldm/data/sync_dreamer.py

Line 57 in eb41a0c

def get_data_for_index(self, index):

which simply loads data here and is not supposed to cause increasing memory usage. Maybe, you can check whether the memory usage is growing or not when running the dataset solely.

wyg-okk · 2023-10-26T03:50:34Z

Hi, the training is based on the pytorch_lightning and it is supposed to manage the resources correctly. You can see that the dataset class

SyncDreamer/ldm/data/sync_dreamer.py

Line 57 in eb41a0c

def get_data_for_index(self, index):

which simply loads data here and is not supposed to cause increasing memory usage. Maybe, you can check whether the memory usage is growing or not when running the dataset solely.

Thank you very much. We think this is a problem with the configured environment. I am checking with docker environment and will give feedback if there is any result.

rgxie · 2024-01-08T15:51:15Z

I have the same problem. Have you found the solution yet? The speed of OOM is proportional to the amount of num workers.

wyg-okk · 2024-01-08T16:14:42Z

I have the same problem. Have you found the solution yet? The speed of OOM is proportional to the amount of num workers.

When I run the code in docker provided by the author, this problem is solved. Your can try to run with author's docker.

rgxie · 2024-01-09T01:18:34Z

I have the same problem. Have you found the solution yet? The speed of OOM is proportional to the amount of num workers.

When I run the code in docker provided by the author, this problem is solved. Your can try to run with author's docker.

Thank you for your information. I also exactly use the docker env, this may indeed be a docker environment problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using the complete data set(about 800K data), the usage of MiB Mem will continue to increase, resulting in OOM. Is there any solution? #42

When using the complete data set(about 800K data), the usage of MiB Mem will continue to increase, resulting in OOM. Is there any solution? #42

wyg-okk commented Oct 25, 2023

liuyuan-pal commented Oct 25, 2023

wyg-okk commented Oct 26, 2023

rgxie commented Jan 8, 2024 •

edited

Loading

wyg-okk commented Jan 8, 2024

rgxie commented Jan 9, 2024

When using the complete data set(about 800K data), the usage of MiB Mem will continue to increase, resulting in OOM. Is there any solution? #42

When using the complete data set(about 800K data), the usage of MiB Mem will continue to increase, resulting in OOM. Is there any solution? #42

Comments

wyg-okk commented Oct 25, 2023

liuyuan-pal commented Oct 25, 2023

wyg-okk commented Oct 26, 2023

rgxie commented Jan 8, 2024 • edited Loading

wyg-okk commented Jan 8, 2024

rgxie commented Jan 9, 2024

rgxie commented Jan 8, 2024 •

edited

Loading