Possibility to ignore certain WoS-tags #143

ghost · 2016-05-18T07:32:45Z

Hi,

first of all, thanks for this amazing python package.

The corpus object in my analysis gets really huge. For example i don't need Funding information, mail adresses and some more. Is it possible to ignore certain Web of Science field tags when populating the paper-objects?

Like this?
ignore_tags = ['EM', 'FU']

The text was updated successfully, but these errors were encountered:

erickpeirson · 2016-05-18T10:12:50Z

@Epipremnum Yeah, that's a great idea. Just the last few days I have been working on "streaming" representations of corpora and papers (i.e. on disk, in a database) to cut down on memory overhead -- the logic is basically as you describe, to pass over the metadata records once and load into memory only the immediately-needed fields. So your suggestion is a logical continuation of that line of work. I'll keep this thread up to date as we work on it!

Thanks for using tethne -- it would be good to hear more about your use-case, if you're willing to share. :-)

ghost · 2016-05-18T16:09:03Z

Hi Erick,

thanks for the quick response. That would be great.

For now i wrote a script to preprocess my bibliography files. I copied all lines corresponding to tags i wanted to keep to a new file. Tethne then gets only these lines as input that i am interested in. It works. But there is still a lot of memory used.

Thank you.

erickpeirson · 2016-05-19T03:27:28Z

This will be TETHNE-124.

erickpeirson · 2016-06-20T21:00:52Z

@Epipremnum On the develop branch I have added a parameter called parse_only to the WoS and DfR readers. If you have a moment, it would be great to hear whether or not this addresses your need.

The relevant tests are here: https://github.com/diging/tethne/blob/develop/tethne/tests/test_readers_parseonly.py

erickpeirson · 2016-07-11T15:13:07Z

This is now in v0.8.1.dev2, which can be installed via pip with --pre:

$ pip install -U tethne --pre

Example:

>>> from tethne.readers.wos import read
>>> corpus = read('/path/to/my/data', parse_only=['title', 'date'])

erickpeirson added the enhancement label May 18, 2016

erickpeirson added this to the v0.8-beta milestone May 18, 2016

erickpeirson self-assigned this May 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility to ignore certain WoS-tags #143

Possibility to ignore certain WoS-tags #143

ghost commented May 18, 2016

erickpeirson commented May 18, 2016

ghost commented May 18, 2016

erickpeirson commented May 19, 2016

erickpeirson commented Jun 20, 2016

erickpeirson commented Jul 11, 2016

Possibility to ignore certain WoS-tags #143

Possibility to ignore certain WoS-tags #143

Comments

ghost commented May 18, 2016

erickpeirson commented May 18, 2016

ghost commented May 18, 2016

erickpeirson commented May 19, 2016

erickpeirson commented Jun 20, 2016

erickpeirson commented Jul 11, 2016