Skip to content

The reproduce of Transformer architecture in paper "Attention is all your need"

Notifications You must be signed in to change notification settings

SxJyJay/Transformer-backbone

Repository files navigation

Transformer-backbone

This is the reproduce of Transformer architecture in paper "Attention is all your need". The aim of this repository is to help those who want an insight to the details of Transformer realization, without being bothered with data preprocessing.
The structure of Transformer is illustrated as bellow

Thus, we build the network hierarchically. From the top to bottom level is

Transformer--Fused_Embedding Encoder Decoder--Encoder_layer Decoder_layer--Multiheaded Attention PositionWise_FeedForwardNetwork

the tree structure is shown as bellow:

-Transformer.py

--Fus_Embeddings(AggregationModel.py)

-- word Embedding Vectors
-- Positional Encoding(Modules.py)

--Encoder(AggregationModel.py)

-- Encoder Layer(Model.py)
   -- MultiHeadedAttention(Modules.py)
   -- PostionWiseFFN(Modules.py)

--Decoder(AggregationModel.py)

-- Decoder Layer(Model.py)
   -- MultiHeadedAttention(Modules.py)
   -- PostionWiseFFN(Modules.py)

Environment Configuration

  • pytorch 1.1.0
  • python 3.6.8
  • torchtext 0.5.0
  • tqdm
  • dill

Usage

WMT'17 Multimodal Translation: de-en BPE

  1. The byte-pair-encoding has already been processed so that you can focus on the specific structure of Transformer
  2. Train the model
    python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -label_smoothing -save_model trained -b 256 -warmup 128000 -epoch 400
  3. GPU requirement: 4 TitanX

Performance

   Loss                                      Accuracy

Acknowledgement

About

The reproduce of Transformer architecture in paper "Attention is all your need"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages