This repository contains the code and documentation for the paper "Towards AI-Assisted Protocol Analysis in Design Research: Automating Question Labeling with GPT-4 According to Eris' (2004) Taxonomy."
Presented at the DCC 2024, the 11th International Conference on Design Computing and Cognition, Montreal, Canada. 8–10 July 2024.
Create a python virtual environment and install the required dependencies -
pip insrall -r requirements.txt
Update .env
with your settings. You can use .env.example
as a reference:
OPENAI_API_KEY=<your-key>
: Your OpenAI API key.OPENAI_MODEL=gpt-4-1106-preview
: GPT model version.PROMPT_COST_PER_1000=0.01
: Cost for 1,000 prompt tokens in USD.COMPLETION_COST_PER_1000=0.03
: Cost for 1,000 completion tokens in USD.DATA_DIR=dataset
: Dataset directory.DATA_FILE=convo-qs-eris-labelled.xlsx
: Your dataset. A sample dataset is available in thedataset
folder.
Update the system message for the OpenAI Chat Completion API in the system-message.txt
file.
The experiments
folder contains Jupyter notebooks detailing the experiments conducted for the paper.
- Determine the baseline performance by classifying a test set of standalone question utterances, with/without training set.
- Determine the effect of the size of the training set on the accuracy of labelling by the GPT-4.
- Determine the sensitivity of the results across multiple “runs” of the experiment.
- Determine whether the GPT-4 can also use context in the labelling task, and if it improves the labelling performance.
- Training set could be useful
- Labelling is probabilistic; a larger training set reduces uncertainty.
- Providing context surrounding each question results in degraded performance which aligns with recent findings on LLMs’ struggle with long context
- One notable study by Liu et al. (2024) Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12:157–173.