- Requirements
- python 3
- pytorch
- https://github.com/huggingface/transformers
- Install
- Clone this repository
- Download FGC dataset unzip and place under a sub-directory named
json
- Download DRCD corpus and place under
json
- Preprocess dataset
- Run
python FGC_merge_to_DRCD_json.py
to merge FGC training data into DRCD - Run
python FGC_mocks_to_DRCD_json.py
to create development set data using FGC mock tests - Run
python FGC_final_to_DRCD.py
to convert official test set data to DRCD format
- Run
- Run run_fgc_baseline.ipynb (can be run in Google Colab)
- On a single Titan X GPU with 12G of memory, we can use the hyperparameters listed in here
- Multi-GPU support is in beta
- Test set performance: correctly answer 15 out of 50 questions
-
Notifications
You must be signed in to change notification settings - Fork 0
ylhsieh/formosa-grand-challenge-2020-baseline
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Chinese extractive question answering baseline using Formosa Grand Challenge 2020 dataset and BERT.