Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using the complete data set(about 800K data), the usage of MiB Mem will continue to increase, resulting in OOM. Is there any solution? #42

Open
wyg-okk opened this issue Oct 25, 2023 · 5 comments

Comments

@wyg-okk
Copy link

wyg-okk commented Oct 25, 2023

image

When I train with the data set of about 800k objects, the number circled in the graph keeps increasing as the number of training steps increases.
My configs/syncdreamer-train.yaml is the same as provided by the author, except for the data path
https://github.com/liuyuan-pal/SyncDreamer/blob/main/configs/syncdreamer-train.yaml

@liuyuan-pal
Copy link
Owner

Hi, the training is based on the pytorch_lightning and it is supposed to manage the resources correctly. You can see that the dataset class

def get_data_for_index(self, index):

which simply loads data here and is not supposed to cause increasing memory usage. Maybe, you can check whether the memory usage is growing or not when running the dataset solely.

@wyg-okk
Copy link
Author

wyg-okk commented Oct 26, 2023

Hi, the training is based on the pytorch_lightning and it is supposed to manage the resources correctly. You can see that the dataset class

def get_data_for_index(self, index):

which simply loads data here and is not supposed to cause increasing memory usage. Maybe, you can check whether the memory usage is growing or not when running the dataset solely.

Thank you very much. We think this is a problem with the configured environment. I am checking with docker environment and will give feedback if there is any result.

@rgxie
Copy link

rgxie commented Jan 8, 2024

I have the same problem. Have you found the solution yet? The speed of OOM is proportional to the amount of num workers.

@wyg-okk
Copy link
Author

wyg-okk commented Jan 8, 2024

I have the same problem. Have you found the solution yet? The speed of OOM is proportional to the amount of num workers.

When I run the code in docker provided by the author, this problem is solved. Your can try to run with author's docker.

@rgxie
Copy link

rgxie commented Jan 9, 2024

I have the same problem. Have you found the solution yet? The speed of OOM is proportional to the amount of num workers.

When I run the code in docker provided by the author, this problem is solved. Your can try to run with author's docker.

Thank you for your information. I also exactly use the docker env, this may indeed be a docker environment problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants