Implementing SAE training on embeddings as a new approach to topic modeling #2063
Replies: 1 comment 2 replies
-
Thank you for sharing these resources! These are cool techniques and definitely something that I would like to see integrated in BERTopic. Since it's main source of information are both embeddings and text, I believe it lends itself quite well to SAEs. We could automatically train a SAE on the embeddings + text and use the generated features as a representation model in BERTopic. That way, we won't have to face the problem of finding the "right" features as some might be irrelevant. Having said that, we could technically still use the set of features (sort of a "Bag of Features") to perform the clustering. Then, applying (c-)TF-IDF would actually be quite natural in this case. Are you familiar with any packages already implementing easy-to-train SAEs? |
Beta Was this translation helpful? Give feedback.
-
Unsupervised interpretability techniques using sparse autoencoders (SAEs) have gained traction over the past few months. SAEs trained on model activations can extract sparse interpretable features from LLMs.1 But this approach can also be used on dense sentence embeddings to extract interpretable sparse features from each embedding. (Sparse features can be automatically labeled by an LLM, similarly to how BERTopic already handles automatic topic labeling.) Linus Lee just wrote a blog post exploring this idea, with very promising results.2
Implementing a version of this in BERTopic could prove very useful. Essentially, interpretable sparse features could be treated as topics, and you could straightforwardly quantify how each document relates to specific features, as well as cluster documents that share common features. This would be a novel approach to topic modeling, but it could potentially be used in conjunction with BERTopic's current approach (e.g. using sparse feature for better topic representations).
Linus has some interesting demos up on Nomic Atlas that illustrate the general idea.3
Footnotes
https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html, http://arxiv.org/abs/2406.04093 ↩
https://thesephist.com/posts/prism/ ↩
https://atlas.nomic.ai/data/thesephist/sae-xl-v6-v2/map ↩
Beta Was this translation helpful? Give feedback.
All reactions