- Run
preprocess_data.py
to generate dictionaries containing n-best asr scores for each utterance. - Run
lllm_scoring.py
to update dictionaries with llm scores for each utterance. (forgpt2
andbert
) - Run
combined_scores.py
with arg--lambda_param
to combine the asr and llm scores. - Run
compute_error_rate.py
to compute the error rate for a given hypothesis dictionary. gridsearch.sh
Tests error rates on a range of lambda values.hyp_comb_10_dict_test_other.json
contains the hypotheses and all the scores for the automasking experimenthyp_comb_masks_10_dict_test_other.json
contains the hypotheses and all the scores for the selective mask-based experiment