-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topic modeling #122
Comments
@herrtao Wow, somehow this completely slipped by me -- apologies for not responding. Tethne is primarily designed for cases where you are starting with bibliographic metadata (e.g. from Web of Science, JSTOR, Zotero). If you're just working with a bunch of plain-text files, then there are potentially simpler approaches. As a starting-place, you might take a look at the notebooks in this project. There are several different workflows -- in the topic modeling sections, there are notebooks that demonstrate LDA with Tethne/MALLET and gensim. In particular, this notebook demonstrates LDA with gensim -- if you don't have metadata, you can just skip/comment out those parts. I hope that helps! Let me know if you have any other questions. We can also discuss further off-channel if you'd prefer (erick.peirson@asu.edu). |
This will be TETHNE-131 |
@herrtao Take a look at this thread for a related discussion. It's not exactly what you asked, but maybe helpful. |
@herrtao Ok, as of v0.8.1.dev5 this is now a feature! Since this is a pre-release version you'll have to upgrade Tethne with the --pre flag. pip install -U tethne --pre Here's an example. Please let me know what you think. If you run into issues, or have other requests, please check out our new Q/A group. >>> from tethne.readers.plain_text import read
>>> corpus = read('/path/to/directory/with/texts') To use the corpus for topic modeling, you could then do: >>> model = LDAModel(corpus, featureset_name='plain_text')
>>> model.fit(Z=5, max_iter=200) More documentation will be forthcoming, but here's the docstring for now:
|
thanks for the reply! |
can I use Tethne to do topic modeling for my own txt files, about 700 different files?
The text was updated successfully, but these errors were encountered: