This project is completed as a self-chosen research assignment for the CSE 256 NLP course (Spring 2024) at the University of California, San Diego. The goal is to experiment with different parts (i.e. the encoder and the decoder) of the transformer architecture from scratch without using any existing transformer-related libraries (such as nn.MultiheadAttention) and complete the following tasks:
- Part 1. Encoder with Classifier: Implement a transformer encoder and train it jointly with a feedforward classifier for a downstream task of predicting the speaker of a given speech segment.
- Part 2. Decoder for Language Modeling: Implement a GPT-like transformer decoder, pretrain it on an autoregressive language modeling task, and report perplexity on speeches from different politicians.
- Part 3. Exploration: Experiment with different parts of the architecture, such as positional encoding and sparse attention patterns, to improve the performance of the classifier or the decoder.
PA2_code/
│
├── speechesdataset/
│ ├── test_CLS.tsv
│ ├── test_LM_hbush.txt
│ ├── test_LM_obama.txt
│ ├── test_LM_wbush.txt
│ ├── train_CLS.tsv
│ └── train_LM.txt
│
├── Attention maps (PDF)
│
├── dataset.py
├── main.py
├── tokenizer.py
├── transformer.py
└── utilities.py
Ensure you have the following libraries installed:
- Python 3.7+
- PyTorch
- NLTK
- NumPy
- Matplotlib
Part 1: Encoder with Classifier
python main.py part1
Part 2: Decoder for Language Modeling Choose the intended test set by changing the input path before running
python main.py part2
Part 3: Exploration
python main.py part3
Model initialization with hyperparameters, optimizer initialization, pretraining and evaluation are completed in main.py
.
The implementation of the required tranformer and improved models are completed in transformer.py
.