Can you detect AI written text? Test your skills at roft.io!
(Read our demo paper at EMNLP 2020 for more on our methodology.)
In this project, we aim to measure how good neural langauge models are at writing text. If you're familiar with the Turing Test, RoFT is a very similar experiment!
We hope that by testing how good humans are at detecting text we can better understand what makes text sound "human".
- View text one sentence at a time.
- Determine when the text switches from human written text to machine-generated text.
- Recieve points according to your precision.
- Climb the leaderboard and see how good you are at detecting generated text!
- How reliably can humans detect generated text?
- Can we train humans to detect genrated text?
- How does the size of model affect human detection accuracy?
- How does the length of prompt affect human detection accuracy?
- How does the genre of prompt text affect human detection accuracy?
- How does the sampling strategy used for generation affect human detection accuracy?
- How does fine-tuning affect human detection accuracy?
- How do control codes for conditional generative models affect human detection accuracy?
- New York Times Annotated Corpus (Sandhaus, 2008)
- Reddit Writing Prompts (Fan et al., 2018)
- Corpus of Presidential Speeches (Brown, 2016)
- Recipe1M+ (Marin et al., 2019)
If you use the RoFT tool for your research, please cite us as:
@article{dugan2020roft, title={RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text}, author={Dugan, Liam and Ippolito, Daphne and Kirubarajan, Arun and Callison-Burch, Chris}, booktitle={Empirical Methods in Natural Language Processing, Demo Track}, year={2020} }