Replies: 1 comment 2 replies
-
That's not easily done to make this a factor of the data itself since we know certain characteristics of the embeddings in low dimensional space only after reducing them in their dimensionality. Otherwise, there wouldn't have been a need for this parameter in the first place. Ideally, it would be great if the parameters would be a function of the data but since the input data can vary wildly (embedding size, distribution of values, number of datapoints, etc.) there isn't a straightforward way to make sure all parameters are perfectly tuned towards the data. |
Beta Was this translation helpful? Give feedback.
-
Hi there!
Two of the visualizations,
visualize_topics()
andvisualize_documents()
both use a 2d-reduced version of embeddings,visualize_topics()
has a UMAPn_neighbors = 2
, whilevisualize_documents()
usesn_neighbors = 10
.My question is should these parameters be varied based on certain attributes of the fitted model/data? For example, should the
n_neighbors
ofvisualize_topics()
be a function of the number of topics in the model, does this make mathematical sense? Should the same be applied tovisualize_documents()
, or should the parameter be scaled baed on some other property; not at all?I've been doing my best to read the math behind UMAP and the
n_neighbors
parameter seems like it should vary a bit more.Thanks
Beta Was this translation helpful? Give feedback.
All reactions