Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError while trying to create a topicmodel #56

Open
khalidkhannz78PK opened this issue Jan 12, 2015 · 8 comments
Open

KeyError while trying to create a topicmodel #56

khalidkhannz78PK opened this issue Jan 12, 2015 · 8 comments
Labels
Milestone

Comments

@khalidkhannz78PK
Copy link

Hi there,

I have been trying to follow the tutorial on topic modelling on the main tethne website. I installed anaconda, tethne, nltk, and also mallet. But when I run the line

MyLDAModel = MyManager.build(Z=50, max_iter=300, prep=True)

i get the following error

Traceback (most recent call last):
File "", line 1, in
File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 108, in build
self.prep()
File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 89, in prep
self._generate_corpus(meta)
File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/mallet.py", line 152, in _generate_corpus
vocab=self.D.features[self.feature]['index'] )
File "//anaconda/lib/python2.7/site-packages/tethne/writers/corpora.py", line 59, in to_documents
meta += [ str(metadict[p][f]) for f in metakeys ]
KeyError: '10.1525/rac.2006.16.1.95'

I will appreciate all the help in this regard

@erickpeirson
Copy link
Collaborator

Ack, this bug won't die. There were a couple of places where we assumed that metadata records and feature sets were complete for all papers in a corpus, which is often false. This should be an easy fix, hopefully can get a patch out next week.

Thanks for reporting this!

On Jan 11, 2015, at 8:14 PM, khalidkhannz78PK notifications@github.com wrote:

Hi there,

I have been trying to follow the tutorial on topic modelling on the main tethne website. I installed anaconda, tethne, nltk, and also mallet. But when I run the line

MyLDAModel = MyManager.build(Z=50, max_iter=300, prep=True)

i get the following error

Traceback (most recent call last):
File "", line 1, in
File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 108, in build
self.prep()
File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/init.py", line 89, in prep
self._generate_corpus(meta)
File "//anaconda/lib/python2.7/site-packages/tethne/model/managers/mallet.py", line 152, in _generate_corpus
vocab=self.D.features[self.feature]['index'] )
File "//anaconda/lib/python2.7/site-packages/tethne/writers/corpora.py", line 59, in to_documents
meta += [ str(metadict[p][f]) for f in metakeys ]
KeyError: '10.1525/rac.2006.16.1.95'

I will appreciate all the help in this regard


Reply to this email directly or view it on GitHub.

@khalidkhannz78PK
Copy link
Author

Hi Erick,

Any update on rectifying this issue??

@erickpeirson
Copy link
Collaborator

Yes, sorry it took so long. The patched version is available as release v0.6.3.3-beta2 , or via PyPI.

If you're using pip, you should be able to just do:

$ pip uninstall tethne
$ pip install tethne --pre

Let me know whether this solves the problem.

@mubashirqasim
Copy link

Hi Eric,

You may also have noticed the Mallet path error in Window or received a query from some other tethne user.

When I try to build the model using following syntax, I am getting the following error in windows. However the program runs fine in Linux.

model = M.build(Z=50, max_iter=300, prep=True)

OSError Traceback (most recent call last)
in ()
----> 1 model = M.build(Z=50, max_iter=300, prep=True)

C:\Anaconda\lib\site-packages\tethne\model\managers__init__.pyc in build(self, Z, max_iter, prep, **kwargs)
106 if not self.prepped:
107 if prep:
--> 108 self.prep()
109 else:
110 raise RuntimeError('Not so fast! Call prep() or set prep=True.')

C:\Anaconda\lib\site-packages\tethne\model\managers__init__.pyc in prep(self, meta)
87 """
88
---> 89 self._generate_corpus(meta)
90 self.prepped = True
91

C:\Anaconda\lib\site-packages\tethne\model\managers\mallet.pyc in _generate_corpus(self, meta)
152 vocab=self.D.features[self.feature]['index'] )
153
--> 154 self._export_corpus()
155
156 def _export_corpus(self):

C:\Anaconda\lib\site-packages\tethne\model\managers\mallet.pyc in _export_corpus(self)
171
172 except OSError: # Raised if mallet_path is bad.
--> 173 raise OSError("MALLET path invalid or non-existent.")
174
175 if exit != 0:

OSError: MALLET path invalid or non-existent.

I wonder if windows should be give Mallet path in any specific format?

@erickpeirson
Copy link
Collaborator

Hi @mubashirqasim,

Can you post your code for initializing the MALLETModelManager? Its constructor accepts a parameter mallet_path, and I'm specifically interested in what you're passing there.

Tethne is almost entirely untested in Windows. Maybe if I get some time/funding I'll start pushing it in that direction, but until then I'm afraid that you'll find plenty of odd things when you run Tethne in Windows.

@mubashirqasim
Copy link

Hi Eric,

Thanks for the prompt response. Here is the code to call MALLETModelManager.

from tethne.model.managers import MALLETModelManager
malletpath = 'c:/mallet'
outpath = 'c:/tmp/out'
feature = 'unigrams_filtered'
MyManager = MALLETModelManager(MyCorpus, feature, outpath, mallet_path=malletpath)

@erickpeirson
Copy link
Collaborator

Flagging this for a future Windows-compatible version

@erickpeirson erickpeirson added this to the v2.0-windows milestone May 26, 2015
@erickpeirson
Copy link
Collaborator

This may be fixed in v0.8-beta. If anyone has a chance to test this in Windows, I'd appreciate hearing about it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants