You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While performing correctness evaluation on the llm model's performance with GSM8K dataset, the response that contains latex symbols resulted in Python's failure to parse it to json.
The code for correction evaluation :
# Correctness Evaluator: Run Evaluation
correctness_eval= LLMEval(
subcolumn="category",
additional_columns={"target_response": "target_response"},
template = BinaryClassificationPromptTemplate(
criteria = """
An ANSWER is correct when it is the same as the REFERENCE in all facts and details, even if worded differently.
The ANSWER is incorrect if it contradicts the REFERENCE, adds additional claims, omits or changes details.
REFERENCE:
=====
{target_response}
=====
""",
target_category="incorrect",
non_target_category="correct",
uncertainty="unknown",
include_reasoning=True,
pre_messages=[("system", "You are an expert evaluator. will be given an ANSWER and REFERENCE.")],
),
provider = "openai",
model = "gpt-4o-mini",
display_name = "Correctness",
)
correctness_report = Report(metrics=[
TextEvals(column_name="new_response", descriptors=[
correctness_eval
])
])
correctness_report.run(reference_data=None,
current_data=golden_dataset)
correctness_report
Error message produced:
LLMResponseParseError: Failed to parse response '{
"category": "correct",
"reasoning": "The answer provided correctly follows the reasoning in the reference. It defines the number of people on the first ship as \( x \), and then accurately establishes the number of people on the subsequent ships as \( 2x \) and \( 4x \). The total is calculated correctly as \( x + 2x + 4x = 847 \), leading to the correct result of \( x = 121 \). Therefore, the final conclusion about the number of people on the first ship is correct."
}' as json
The text was updated successfully, but these errors were encountered:
While performing correctness evaluation on the llm model's performance with GSM8K dataset, the response that contains latex symbols resulted in Python's failure to parse it to json.
The code for correction evaluation :
Error message produced:
The text was updated successfully, but these errors were encountered: