Replies: 1 comment
-
It's difficult to say without knowing a bit more:
That's not possible to say since it depends on the vocab size, embedding model used, parameters of HDBSCAN/UMAP, representation models, etc.
That depends on where the computation fails. If it is the c-TF-IDF step, then it's a RAM issue that can be resolved with several tricks. If its an LLM representation model, then it depends on the GPU memory. Therefore, it is always important to share your code. Regardless, these are some tricks that can be used (but again, depends on where it is failing):
Could be but kernel crashings are typically as a result of memory issues. Either way, I'll need to see the code first.
No. Since it depends on many things there can't be a single recommendation since one set of millions of documents could have a much larger vocabulary than another set of millions of documents. |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm trying to topic model on a dataset of around 1 million articles from the Economist. I'm running into issues during the Representation step. I'm using an ml.g6.2xlarge on AWS Sagemaker with RAPIDS/cuML installed. Embeddings, UMAP, and HDBSCAN all take place rather quickly. I have "calculate probabilities" set to True. I haven't been able to identify why the crash keeps occuring.
Since it fails during the Representation step, I suspect it is a memory issue. Reading through the docs, it gives some guidance on how to handle this, specifically with the min_df parameter. My ml.g6.2xlarge instance has GPU with 24GB memory, 8 vCPUs, and 32 GB of (regular) memory. I'm open to using a larger instance, but I don't want to have to guess and check, wasting time and money. This has led me to the following questions:
I have considered doing online training, but it doesn't fit my use case especially well, since I don't have a guarantee that if I partition my data, the appropriate topics can be found in a given partition. Any suggestions appreciated!
Beta Was this translation helpful? Give feedback.
All reactions