Implementing SAE training on embeddings as a new approach to topic modeling #2063

raphael-milliere · 2024-06-22T09:05:06Z

raphael-milliere
Jun 22, 2024

Unsupervised interpretability techniques using sparse autoencoders (SAEs) have gained traction over the past few months. SAEs trained on model activations can extract sparse interpretable features from LLMs.¹ But this approach can also be used on dense sentence embeddings to extract interpretable sparse features from each embedding. (Sparse features can be automatically labeled by an LLM, similarly to how BERTopic already handles automatic topic labeling.) Linus Lee just wrote a blog post exploring this idea, with very promising results.²

Implementing a version of this in BERTopic could prove very useful. Essentially, interpretable sparse features could be treated as topics, and you could straightforwardly quantify how each document relates to specific features, as well as cluster documents that share common features. This would be a novel approach to topic modeling, but it could potentially be used in conjunction with BERTopic's current approach (e.g. using sparse feature for better topic representations).

Linus has some interesting demos up on Nomic Atlas that illustrate the general idea.³

MaartenGr · 2024-06-23T07:17:56Z

MaartenGr
Jun 23, 2024
Maintainer

Thank you for sharing these resources! These are cool techniques and definitely something that I would like to see integrated in BERTopic. Since it's main source of information are both embeddings and text, I believe it lends itself quite well to SAEs. We could automatically train a SAE on the embeddings + text and use the generated features as a representation model in BERTopic. That way, we won't have to face the problem of finding the "right" features as some might be irrelevant. Having said that, we could technically still use the set of features (sort of a "Bag of Features") to perform the clustering. Then, applying (c-)TF-IDF would actually be quite natural in this case.

Are you familiar with any packages already implementing easy-to-train SAEs?

2 replies

raphael-milliere Jun 24, 2024
Author

Yes, agreed on both counts! I think having both options (using feature sets to perform clustering or not) would be great.

Regarding your question, EleutherAI has already re-implented k-sparse autoencoders from the OpenAI paper¹, which is currently SOTA for SAEs: https://github.com/EleutherAI/sae

https://arxiv.org/abs/2406.04093 ↩

MaartenGr Jul 9, 2024
Maintainer

Apologies for the late reply! I haven't had the time to dive into this and how it could potentially be integrated but from a quick view the library seems easy-to-use. Extracting the features themselves might need some manual work it seems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing SAE training on embeddings as a new approach to topic modeling #2063

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Implementing SAE training on embeddings as a new approach to topic modeling #2063

raphael-milliere Jun 22, 2024

Footnotes

Replies: 1 comment · 2 replies

MaartenGr Jun 23, 2024 Maintainer

raphael-milliere Jun 24, 2024 Author

Footnotes

MaartenGr Jul 9, 2024 Maintainer

raphael-milliere
Jun 22, 2024

Replies: 1 comment 2 replies

MaartenGr
Jun 23, 2024
Maintainer

raphael-milliere Jun 24, 2024
Author

MaartenGr Jul 9, 2024
Maintainer