Skip to content

Simple Tensorflow Implementation of "A Structured Self-attentive Sentence Embedding" (ICLR 2017)

License

Notifications You must be signed in to change notification settings

roomylee/self-attentive-emb-tf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Structured Self-attentive Sentence Embedding

Tensorflow Implementation of "A Structured Self-attentive Sentence Embedding" (ICLR 2017).

image

Usage

Data

  • AG's news topic classification dataset.
  • The csv files (in my data directory) were available from here.

Train

  • "GoogleNews-vectors-negative300" is used as pre-trained word2vec model.

  • Display help message:

     $ python train.py --help
     train.py:
     	--[no]allow_soft_placement: Allow device soft device placement
     		(default: 'true')
     	--batch_size: Batch Size
     		(default: '64')
     		(an integer)
     	--checkpoint_every: Save model after this many steps
     		(default: '100')
     		(an integer)
     	--d_a_size: Size of W_s1 embedding
     		(default: '350')
     		(an integer)
     	--dev_sample_percentage: Percentage of the training data to use for validation
     		(default: '0.1')
     		(a number)
     	--display_every: Number of iterations to display training info.
     		(default: '10')
     		(an integer)
     	--embedding_dim: Dimensionality of word embedding
     		(default: '300')
     		(an integer)
     	--evaluate_every: Evaluate model on dev set after this many steps
     		(default: '100')
     		(an integer)
     	--fc_size: Size of fully connected layer
     		(default: '2000')
     		(an integer)
     	--hidden_size: Size of LSTM hidden layer
     		(default: '256')
     		(an integer)
     	--learning_rate: Which learning rate to start with.
     		(default: '0.001')
     		(a number)
     	--[no]log_device_placement: Log placement of ops on devices
     		(default: 'false')
     	--max_sentence_length: Max sentence length in train/test data
     		(default: '50')
     		(an integer)
     	--num_checkpoints: Number of checkpoints to store
     		(default: '5')
     		(an integer)
     	--num_epochs: Number of training epochs
     		(default: '10')
     		(an integer)
     	--p_coef: Coefficient for penalty
     		(default: '1.0')
     		(a number)
     	--r_size: Size of W_s2 embedding
     		(default: '30')
     		(an integer)
     	--train_dir: Path of train data
     		(default: 'data/train.csv')
     	--word2vec: Word2vec file with pre-trained embeddings
  • Train Example (with word2vec):

    $ python train.py --word2vec "GoogleNews-vectors-negative300.bin"

Evalutation

  • You must give "checkpoint_dir" argument, path of checkpoint(trained neural model) file, like below example.

  • If you don't want to visualize the attention, give option like --visualize False.

  • Evaluation Example:

     $ python eval.py --checkpoint_dir "runs/1523902663/checkpoints/"

Results

1) Accuracy test data = 0.920789

2) Visualization of Self Attention

viz

Reference

Releases

No releases published

Packages

No packages published

Languages