-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryError when fitting model.fit #34
Comments
Hi Tatyana! Are you using the GPU version? The non-GPU does use sparse matrices so I will check on this asap for you! |
Thank you for getting back to me! |
Hi Tatyana - thanks for catching this! I think that somehow the non-sparse matrix version ended up being the one pushed most recently. I will fix this ASAP and push the sparse version, hopefully by Friday at latest and update this thread when it's done. |
Hi - just providing an update on this. I misunderstood which matrix the sparsity error was coming from. The kernel matrix itself is sparse, but the A_ and B_ matrices are not necessarily sparse (although in practice they tend to be). It'll take a bit longer to check how a sparse version of those matrices would perform but in the meanwhile my recommendation would be to split the cells into chunks to process. e.g. you can compute SEACells within each sample and then merge them, or compute SEACells within cell types/ low resolution leiden clusters as a way to chunk it. |
Hi, thanks a lot! Got it. I though about analyzing it by cell type/cluster. The problem is that then the number of resulting metacells will be determined by original number of cells per cell type, while I was hoping that with Seacells more heterogeneous cell types will be have more metacells and vice versa. |
Just updated the CPU version to use sparse matrices - the current issue is that it's a bit slower than the CPU version without sparse matrices on small inputs since computing the reconstruction error using scipy.sparse.linalg.norm is a decent amount slower than np.linalg.norm. Not totally sure how to solve this yet, but if you don't mind it being a bit slow you can use this version. You can use this option by specifying use_sparse=True as an argument to the SEACells model initialisation. WIP: Fully sparse version for GPU still in progress, will update when that one is complete! |
Great, thanks a lot! I'll try it out. |
Good day! Any news on when the fully sparse version for GPU will be available? :)
And not sure why since it says that can not allocate 23,5GB since I am using a server with 500GB RAM and NVIDIA A100 80GB PCIe MIG 2g.20gb. I saved the model and can load it but would like to know if could change the initialization parameters so I can set Thanks in advance! |
I think you will have to reinitialize the model, but then you can just assign precomputed kernel matrix and archetypes from saved model. |
Thanks @tatyana-perlova, for your suggestion. In the end, re-run the model but did not finish after five days (limit time on our HPC for any job). How long did it take for you in your 440K dataset? I am still waiting for an answer about the GPU implementation handling sparse matrices. I subset a population of interest (260K cells) and still could not run it using GPU :( |
Thank you for a great tool! I tested the tutorial workflow on a subset of my data and it worked like charm. Now I'm doing it on the full dataset of 440k cells. When running
model.fit(min_iter=10, max_iter=50)
after 2.5 hours I got the memory error:Is it not using sparse matrices?
The text was updated successfully, but these errors were encountered: