This repository contains the code for our paper FLatS: Principled Out-of-Distribution Detection with Feature-Based Likelihood Ratio Score by Haowei Lin and Yuntian Gu.
Detecting out-of-distribution (OOD) instances is crucial for NLP models in practical applications. Although numerous OOD detection methods exist, most of them are empirical. Backed by theoretical analysis, this paper advocates for the measurement of the "OOD-ness" of a test case
First, install PyTorch by following the instructions from the official website. Please use the correct 1.6.0 version corresponding to your platforms/CUDA versions to faithfully reproduce our results. PyTorch version higher than 1.6.0
should also work. For example, if you use Linux and CUDA11 (how to check CUDA version), install PyTorch by the following command,
pip install torch==1.6.0+cu110 -f https://download.pytorch.org/whl/torch_stable.html
If you instead use CUDA <11
or CPU, install PyTorch by the following command,
pip install torch==1.6.0
Then run the following script to install the remaining dependencies,
pip install -r requirements.txt
We use faiss to run fast K-nearest neighbor search algorithm, so please follow the repo https://github.com/facebookresearch/faiss to install faiss-cpu
.
In the following section, we describe how to implement FLatS based on RoBERTa model by using our code.
Before training and evaluation, please download the datasets CLINC150 and SNIPS (note that the code can directly download banking77 using Huggingface API, and the data for ROSTD and wiki has already been prepared). The default working directory is set as ./
(current directory) in our code. You can modify it according to your need.
We provide the scripts to run FLatS on all the datasets. e.g., for CLINC150, train and evaluate using this command:
bash scripts/clinc.sh
If you have any questions related to the code or the paper, feel free to email Haowei. If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
We thank Sishuo Chen from Peking University and his repo which provides an extendable framework for OOD detection in NLP. We use this repo as a reference when developing this code base.
Please cite our paper if you use this code or part of it in your work:
@inproceedings{lin2023flats,
title={FLatS: Principled Out-of-Distribution Detection with Feature-Based Likelihood Ratio Score},
author={Lin, Haowei and Gu, Yuntian},
booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
year={2023}
}