An Unofficial Implementation for Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
- Document Classification
- dcard + ptt 共 20 萬篇文章
- 這個 repo 為各種 pretrained model 的分類任務結果
- Python 3.7
pip install -r requirements.txt
- see
word2vec/train.py
- see
dataset.py
- See
config.py
- default
RoBERTa
python main.py --model=bert
- set config in
config.py
distil_hparams python main.py --model=bert
- Roberta 3 epoch acc. : 90 %
- Distill 3 epoch LSTM :
'data': {
'maxlen': 350
},
'lstm_model': {
'freeze': False,
'embed_size': 250,
'hid_size': 256,
'num_layers': 2,
'dropout': 0.3,
'with_attn': False,
'num_classes': 16,
},
'bert_model': {
'num_classes': 16,
'ckpt': 'logs/roberta/version_6/epoch=1.ckpt'
}
- (Distilling Task-Specific Knowledge from BERT into Simple Neural Networks)[https://arxiv.org/pdf/1903.12136.pdf]