Skip to content

Latest commit

 

History

History
149 lines (92 loc) · 4.87 KB

README.md

File metadata and controls

149 lines (92 loc) · 4.87 KB

RDFer

Python3

Introduction

Strctured data as inputs are once commonly expected to go through the strctured embedding neural functions, while in this repository we try to demonstrate the efficiency simply by re-arranging the structure of input data themselves. Plus, contradictory to the conventional impression that sequence to sequence pipelines were highly dependent on recurrent neural networks, our model makes a step by only adopting lite attention-based neural networks. We will see how it works here.

Dependency

Methodology

    1. Instead of taking the input as dependency-parsed data, this model flattens the input item as a sentence consists of multiple phrases.

In the input sentence, we chain it up as a list of triples, tokenized and masked by SOS_token and EOS_token:

"互联网  advmod  将 [SOS_token]  专  amod  唯一  [SOS_token]  顶级  amod  全新 [SOS_token]  ...  为  top  专  [EOS_token]"

And the target output is expected as a list of tokens, where the punctuations serve as the segmentation marks for the origine sentences:

"[MASKED]", "MOBI", "域名", "", "全球", "唯一", "", "", ..., "[MASKED]", "是", "为", "ICANN", "批准", "的", "全新", "国际", "顶级", "域名", "[MASKED]", "其", "致力", "于", "将", ..., "移动设备", "上", "[MASKED]"

dataset

    1. Without traditional recurrence mechanism (i.e. recurrent neural networks), we leverage the attention mechanisms to optimize the prediction function between the input distribution and the target distribution:

We use the general design of attention layer from the paper Attention Is All You Need [1], denoting the $Key$, $Value$, and$Query$ in the original algorithm as $K$ ,$V$, and$Q$, then the attention layer is defined as

Then we build the multi-head attention function as follows, the $concat$ is the function to concatenate the tensors,

Experiments & Comparison

Results

Recently, the initial training stage (43 batches) is displayed as the plotting below, where denotes the loss values: loss_plot

Comparison

It's feasible and suitable to use Tree-LSTM [2] or Tree-based CNN [3] for the ablation study on how the different layer structures impact on the encoding part through the whole optimization process, and we can better find baseline from them.

Models Param Score
Tree-LSTM - -
Tree-based CNN - -
our model - -
  • details arrive soon

Instruction

Preprocessing

>python config.py --preprocess=true  --train_dataset=./dataset/train_sample.json  --save_dataset=./dataset/train_sample.pt

This command will automatically generate the files below in path ./dataset/:

│  input_tensors.pt
│  src.2116.index2word.json
│  src.2116.word2index.json
│  srclex.pkl
│  target_tensors.pt
│  tgt.2214.index2word.json
│  tgt.2214.word2index.json
│  tgtlex.pkl
│  train_sample.json
│
└─ 

Training

>python config.py --train=true  --model_name=RDFer  --batch_size=62  --num_iter=100  --learning_rate=0.01  

The terminal log is like this previous standard output record:

* -- Begining model with cpu -- *


 Initiating dataset from raw files... 


 SRC number of tokens:  2116  TGT number of tokens:  2214  

2116 2214
 Dataset loader ready, begin training. 


 Dataset loader length is  119 , save model every batch. 


  - Epoch  1  | batch  0 
 | input lenght:    torch.Size([181, 6]) 
 | target length:    torch.Size([75, 6])  

torch.Size([6, 181, 512])
0 torch.Size([450, 1, 512])
1 torch.Size([75, 6])
 output_gen :  torch.Size([75, 2214, 6])  

 loss: 7.976774215698242


  - Epoch  1  | batch  1 
 | input lenght:    torch.Size([181, 6]) 
 | target length:    torch.Size([75, 6])  

torch.Size([6, 181, 512])
0 torch.Size([450, 1, 512])
1 torch.Size([75, 6])
 output_gen :  torch.Size([75, 2214, 6])  

 loss: 6.059077262878418

 ...

Reference

[1] Ashish Vaswani et al. (2017) Attention Is All You Need

[2] Kai Sheng Tai et al. (2015) Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

[3] Lili Mou et al. (2014) Convolutional Neural Networks over Tree Structures for Programming Language Processing


*contact: erisjchurnng@gmail.com