Question about metrics for evaluate topic modelling #2182
MattBlue92
started this conversation in
General
Replies: 1 comment
-
I think it's good to read through this issue: #90 There are already many things said about topic coherence that I think is a nice read. If anything is unclear, please let me know! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I have a question about how to evaluate topic modelling using gensim.
I'm using Bertopic for legal research. I have judgments that I've segmented and made different versions of my dataset.
I'm using topic coherence (c_v) as a metric, but Bertopic only found 2 topics in one version of my dataset. These have CV values of 0.3940 and 0.7125, the last topic is an good cv value but it'snt very useful.
A particular version of my dataset finds about 10 topics that exceed the 0.49 threshold, while others do not. This lowers the results in terms of average and is a little lower than the previous result but is actually more interesting.
Are these metrics useful for finding the number of topics or just an indicator of topic quality and are they reliable?
Beta Was this translation helpful? Give feedback.
All reactions