A bidirectional encoder-decoder LSTM neural network is trained for text summarization on the cnn/dailymail dataset.
-The unprocessed dataset can be downloaded here
-The version (only cnn articles and summaries) used in this project can be found here
Word2vec algorithm skipgram is used for the encoder input sequence. This is achieved by training a shallow neural network to ro predict context words given a current word. after training the hidden layer is used as the embedding layer. embedding size was kept at 128. skipgram was pre-trained on both the articles and golden summary words.
For the decoder input and output, one hot encoding of the summary words was used. vocabulary size was initialy 50k but reduced to 30k due to memory constraints. one hot encoding was also to allow addition of attention layer later.
We use a bidirectional encoder lstm with state size = 128, dropout=0.2 and a tanh activation. The Decoder is a unidirectional lstm with size = 128, droput = 0.2 and a softmax ativation.
An attention layer between the encoder and decoder over the source sequence's hidden states. as the skipgram embedding and the one-hot vector sizes arent the same, pca over embedding to allow multiplication with one-hot vectors to get attention weights and vectors. final prediction of output word in decoder sequence is done by the attention layer. It helps allow the decoder individual encoder state information. (forgive the figure below if unclear or messy)
Download the data and run the following scripts in this order:
python cnn_daily_load.py
python word2vec.py
python lstm_Attention.py
dependencies
- tensorflow, keras,sklearn
- numpy, pandas, pyrouge, matplotlib
- regex(re), NLTK, gensim
LSTM encoder decoder architectural and trainig parameters:
- batch_size = 50
- epochs = 20
- hidden_units = 128
- learning_rate = 0.005
- clip_norm = 2.0
- test_size = 0.2
- optimizer = RMsprop
- dropout = 0.2 (both encoder and decoder during training)
The generated summaries are readable and xmake sense, however they contain repetitions and sometimes skip over important facts or get the plot wrong altogether.
- Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate
- Zixiang Ding, Rui Xia, Jianfei Yu, Xiang Li and Jian Yang. Densely Connected Bidirectional LSTM with Applications to Sentence Classification