Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This commit implements the F-beta score metric #1543

Merged

Conversation

Yuri-Albuquerque
Copy link
Contributor

for the AnswerCorrectness class. The beta parameter is introduced to control the relative importance of recall and precision when calculating the score. Specifically:

  • beta > 1 places more emphasis on recall.
  • beta < 1 favors precision.
  • beta ==1 stands for the regular F1 score that can be interpreted as a harmonic mean of the precision and recall.

Key Changes:
The method _compute_statement_presence is updated to calculate the F-beta score based on true positives (TP), false positives (FP), and false negatives (FN).

This ensures that we can balance between recall and precision, depending on the task's requirements, by tuning the beta value.

source: https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.fbeta_score.html

for the AnswerCorrectness class.
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Oct 20, 2024
@shahules786 shahules786 self-requested a review October 21, 2024 14:08
Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, this is useful. From ragas 0.2 onwards, we have factual correctness score - can you also add this to it?

score = 2 * (precision * recall) / (precision + recall + 1e-8)

@Yuri-Albuquerque
Copy link
Contributor Author

Hey, this is useful. From ragas 0.2 onwards, we have factual correctness score - can you also add this to it?

score = 2 * (precision * recall) / (precision + recall + 1e-8)

oh thank sr. I'll do it now.

_factual_correctness, which is a
weighted harmonic mean of precision and recall,
where the recall is weighted by a factor of beta.
The F-beta score is defined as:
F-beta = (1 + beta^2) * (precision * recall) / (beta^2 * precision + recall)
The F-beta score is a generalization of the F1 score,
where beta = 1.0. The F1 score is the harmonic mean of
precision and recall, and is defined as:
F1 = 2 * (precision * recall) / (precision + recall)
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Oct 21, 2024
calculation in factual correctness and keeping
the f1 score as f1-beta score as requested.
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Oct 22, 2024
@shahules786
Copy link
Member

Hey @Yuri-Albuquerque thanks for the change.
Since F1 score is repeatedly used in multiple metrics, I think we should have it defined in utils and reuse everywhere else to avoid code duplication. Let me take care of this , and will merge this PR. Thanks for your contribution.

@Yuri-Albuquerque
Copy link
Contributor Author

Hey @Yuri-Albuquerque thanks for the change. Since F1 score is repeatedly used in multiple metrics, I think we should have it defined in utils and reuse everywhere else to avoid code duplication. Let me take care of this , and will merge this PR. Thanks for your contribution.

Absolutely, @shahules786, feel free to take over this merge! I completely agree that this function belongs in utils. I'm still learning how to contribute effectively to this project, so I appreciate your guidance. I work at one of the largest private banks in Brazil (Itaú SA), and we're using RAGAS to evaluate various features in our chatbot. The "F1-beta" metric is something we've been wanting for a long time.
Thanks for your attention

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Oct 25, 2024
Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Yuri-Albuquerque just made the changes from my end. Thanks a lot :)

@shahules786 shahules786 merged commit 6d114e5 into explodinggradients:main Oct 25, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants