JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification
This repository contains the implementation of the paper:
JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification [Paper] [ACL Anthology] [OpenReview] [arXiv]
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Henry Peng Zou, Cornelia Caragea
🌱 Welcome to check out our other work on semi-supervised learning and pseudo-label debiasing: DeCrisisMB!
conda create -n jointmatch python=3.8 -y
conda activate jointmatch
# install pytorch
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 -c pytorch
# install dependency
pip install -r requirements.txt
The file structure should look like:
code/
|-- criterions
|-- models
|-- utils
|-- main.py
|-- panel_main.py
......
data/
|-- ag_news
|-- imdb
|-- yahoo
|-- train.csv
|-- val.csv
|-- test.csv
......
To reproduce our main paper results, simply run:
python panel_main.py
Specify the dataset, output location if you need, e.g., dataset = 'ag_news' and experiment_home = './experiment'.
To reproduce results with varying number of labeled data, simply run:
python panel_num_labels.py
To customize your own datasets (generate weak and strong data augmentations), please follow the demo preprocess.ipynb
in the custom_dataset folder.
If you have any questions related to the code or the paper, feel free to email Henry Peng Zou (pzou3@uic.edu). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
If you find this repository helpful, please consider citing our paper 💕:
@inproceedings{zou2023jointmatch,
title={JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification},
author={Zou, Henry and Caragea, Cornelia},
booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
pages={7290--7301},
year={2023}
}
@inproceedings{zou2023decrisismb,
title={DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank},
author={Zou, Henry and Zhou, Yue and Zhang, Weizhi and Caragea, Cornelia},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
pages={6104--6115},
year={2023}
}
This repo borrows some data and codes from SAT and USB. We appreciate their great works.
Besides, welcome to check out our other work on semi-supervised learning: DeCrisisMB!