You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
first of all, thanks for this amazing python package.
The corpus object in my analysis gets really huge. For example i don't need Funding information, mail adresses and some more. Is it possible to ignore certain Web of Science field tags when populating the paper-objects?
Like this? ignore_tags = ['EM', 'FU']
The text was updated successfully, but these errors were encountered:
@Epipremnum Yeah, that's a great idea. Just the last few days I have been working on "streaming" representations of corpora and papers (i.e. on disk, in a database) to cut down on memory overhead -- the logic is basically as you describe, to pass over the metadata records once and load into memory only the immediately-needed fields. So your suggestion is a logical continuation of that line of work. I'll keep this thread up to date as we work on it!
Thanks for using tethne -- it would be good to hear more about your use-case, if you're willing to share. :-)
thanks for the quick response. That would be great.
For now i wrote a script to preprocess my bibliography files. I copied all lines corresponding to tags i wanted to keep to a new file. Tethne then gets only these lines as input that i am interested in. It works. But there is still a lot of memory used.
@Epipremnum On the develop branch I have added a parameter called parse_only to the WoS and DfR readers. If you have a moment, it would be great to hear whether or not this addresses your need.
Hi,
first of all, thanks for this amazing python package.
The corpus object in my analysis gets really huge. For example i don't need Funding information, mail adresses and some more. Is it possible to ignore certain Web of Science field tags when populating the paper-objects?
Like this?
ignore_tags = ['EM', 'FU']
The text was updated successfully, but these errors were encountered: