xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
benchmark
regex
reliability
evaluation
dataset
gpt
phi
large-language-models
llm
open-compass
chatglm
qwen
lm-evaluation
llm-as-a-judge
llm-as-evaluator
xfinder
reliable-evaluation
key-answer-extraction
judge-model
-
Updated
Oct 28, 2024 - Python