Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Responses containing latex symbols resulted in Python unable to parse them to json during correction evaluation #1305

Open
YEEthanCC opened this issue Sep 17, 2024 · 0 comments

Comments

@YEEthanCC
Copy link

While performing correctness evaluation on the llm model's performance with GSM8K dataset, the response that contains latex symbols resulted in Python's failure to parse it to json.

The code for correction evaluation :

# Correctness Evaluator: Run Evaluation

correctness_eval= LLMEval(
    subcolumn="category",
    additional_columns={"target_response": "target_response"},
    template = BinaryClassificationPromptTemplate(
        criteria = """
An ANSWER is correct when it is the same as the REFERENCE in all facts and details, even if worded differently.
The ANSWER is incorrect if it contradicts the REFERENCE, adds additional claims, omits or changes details.

REFERENCE:

=====
{target_response}
=====
        """,
        target_category="incorrect",
        non_target_category="correct",
        uncertainty="unknown",
        include_reasoning=True,
        pre_messages=[("system", "You are an expert evaluator. will be given an ANSWER and REFERENCE.")],
        ),
    provider = "openai",
    model = "gpt-4o-mini",
    display_name = "Correctness",
)

correctness_report = Report(metrics=[
    TextEvals(column_name="new_response", descriptors=[
        correctness_eval
    ])
])

correctness_report.run(reference_data=None,
           current_data=golden_dataset)
correctness_report

Error message produced:

LLMResponseParseError: Failed to parse response '{
  "category": "correct",
  "reasoning": "The answer provided correctly follows the reasoning in the reference. It defines the number of people on the first ship as \( x \), and then accurately establishes the number of people on the subsequent ships as \( 2x \) and \( 4x \). The total is calculated correctly as \( x + 2x + 4x = 847 \), leading to the correct result of \( x = 121 \). Therefore, the final conclusion about the number of people on the first ship is correct."
}' as json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant