datagenerater_for_torchIE

These programs generate the data used for torchIE project's bioinfer branch (https://github.com/qulizhen/torchIE).

The main program is BioJsonProcesser.java. Arguments for this program are: inputfile -- args[0] input json data file output -- args[1] entityListFile -- args[2]; dependencyFile -- args[3]; postagFile -- args[4]; lseq -- args[5]; rseq -- args[6]; text_toks -- args[7]; lPos -- args[8]; rPos -- args[9];

Please edit the arguments in Eclipse, Run -> Run configration -> Select the BioJsonProcesser on the lsft file browser -> edit the arguments in "Arguments" tab on the right.

Addtional part:

to generate the data can be used for torchIE, please follow these steps:

ReBioinfer dataset to xmi files: Org.data61.bioinfer: BioinferReaqder BioRERunner ./scala_run org.data61.bioinfer.BioRERunner
Xmi to json: org.nicta.transferLearningTest.CreateJsonObject.scala change input xmi dir Org.nicta.ie.relationExtraction.uima.ae.instCreator.JsonInstanceCreator.java -> to change output path
xmi to embedding: Org.nicta.transferLearningTest.Bioinferembedder.scala ./scala_run org.nicta.transferLearningTest.Bioinferembedder /home/chengchen/BIoinfer/xmi_result /media/data2tb4/wordEmbedings /home/chengchen/BIoinfer/embed_result Add the embedding number (can be count by wc –l "fileyouget from last step") and dim number(i.e. 200) to the embedding file you got from last strp. Use gensim(https://radimrehurek.com/gensim/models/word2vec.html) and word@vec.py modified by Zhuang Li on (izhuang@CRL-DLGPU:/media/data2tb1/SKIPGRAM_EMBEDDING // GPU 1 or 2 in Nicta) to convert the embedding file to binary file

In Python terminal: model = gensim.models.Word2vec.load_word2Vec_format('/home/chengchen/BIoinfer/embed_result',binary=False)

model.save_word2vec_format('/home/chengchen/BIoinfer/embed_gensim',None,True)

Then use this https://github.com/rotmanmi/word2vec.torch/blob/master/bintot7.lua to convert binary embedding file to torch7 friendly file (save as *.th)

QUestion please send to cc94226@live.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.settings		.settings
bin/TestExperiments		bin/TestExperiments
lib		lib
src/TestExperiments		src/TestExperiments
.classpath		.classpath
.project		.project
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datagenerater_for_torchIE

About

Releases

Packages

Languages

cc94226/data_generater_for_torchIE

Folders and files

Latest commit

History

Repository files navigation

datagenerater_for_torchIE

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages