Evaluate LLMs with custom metrics with LLM as a judge #77

iRahulPandey · 2024-06-20T13:34:28Z

Summary

This template is intended to capture a few base requirements that are needed to be met prior to filing a PR that contains a new blog post submission.

Please fill out this form in its entirety so that an MLflow maintainer can review and work with you in the process of drafting your blog content and in reviewing your blog submission PR.

PRs that are filed without a linked Blog Post Submission issue and a subsequent agreement on the content and topics covered for the blog post are not guaranteed to be reviewed or merged.

Acknowledgements

ack/guide I have read through the contributing guide
ack/readme I have configured my local development environment so that I can build a local instance of the MLflow website by following the development guide
ack/legal I have verified that there are no legal considerations associated with the nature of the blog post, its content, or references to organizations, ideas, or individuals contained within my post. If I mention a particular organization, idea, or person, I will provide evidence of consent to post by any organization or individual that is mentioned prior to filing my PR.

Proposed Title

Evaluate LLMs with custom metrics with LLM as a judge

Abstract

This blog post explores the capability of using large language models (LLMs) as automated judges to evaluate the quality of outputs from retrieval-augmented generation (RAG) pipelines within the MLflow framework. RAG pipelines combine information retrieval with language models to generate outputs informed by relevant textual sources. The post discusses how MLflow's
mlflow.evaluate() function can leverage LLMs to score RAG outputs across multiple dimensions like relevance, coherence, and factuality, and even custom metrics, providing an automated way to assess both the retrieved information and the generated text.

Blog Type

blog/how-to: A how-to guide to using core MLflow functionality, focused on a common use case user journey
blog/deep-dive: An in-depth guide that covers a specific feature in MLflow
blog/use-case: A comprehensive overview of a real-world project that leverages MLflow
blog/best-practices: A comprehensive tutorial that covers usage patterns of MLflow, focusing on an MLOps journey
blog/tips: A short blog covering tips and tricks for using MLflow APIs or the MLflow UI components
blog/features: A feature-focused announcement that introduces a significant new feature that is recently or not-yet released
blog/meetup: A report on an MLflow community event or other Linux Foundation MLflow Ambassador Program event
blog/news: Summaries of significant mentions of MLflow or major initiatives for the MLflow project

Topics Covered in Blog

topic/genai: Highlights MLflow's use in training, tuning, or deploying GenAI applications
topic/tracking: Covering the use of Model Tracking APIs and integrated Model Flavors
topic/deployment: Featuring topics related to the deployment of MLflow models and the MLflow Model Registry
topic/training: Concerned with the development loop of training and tuning models using MLflow for tracking
topic/mlflow-service: Topics related to the deployment of the MLflow Tracking Service or the MLflow Deployments Server
topic/core: Topics covering core MLflow APIs and related features
topic/advanced: Featuring guides on Custom Model Development or usage of the plugin architecture of MLflow
topic/ui: Covering features of the MLflow UI
topic/other: < please fill in >

Thank you for your proposal! An MLflow Maintainer will reach out to you with next steps!

The text was updated successfully, but these errors were encountered:

BenWilson2 · 2024-06-20T23:29:10Z

Great topic selection!
Let's be sure to cover:

Built-in metrics that are defined by specifying a particular constant in the model_type argument of evaluate()
Defining extra_metrics to include additional pre-built metrics in addition to the default metrics that are defined by setting model_type
Building a custom metric via make_genai_metric() (and using it)
Building a fully customized metric via make_genai_metric_with_prompt() (and using it)

@iRahulPandey please let me know how you would prefer to submit working drafts (either through Google Docs or via a Draft PR in .md format) and I'll be happy to review progress and help you work toward getting it published in the blog!

iRahulPandey · 2024-06-21T13:38:01Z

@BenWilson2 Sure! the blog will cover all points.

I will submit a draft via PR in .md format

iRahulPandey assigned BenWilson2 Jun 20, 2024

callmesora mentioned this issue Jun 26, 2024

Add llm_as_judge_blog #79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate LLMs with custom metrics with LLM as a judge #77

Evaluate LLMs with custom metrics with LLM as a judge #77

iRahulPandey commented Jun 20, 2024

BenWilson2 commented Jun 20, 2024

iRahulPandey commented Jun 21, 2024

Evaluate LLMs with custom metrics with LLM as a judge #77

Evaluate LLMs with custom metrics with LLM as a judge #77

Comments

iRahulPandey commented Jun 20, 2024

Summary

Acknowledgements

Proposed Title

Abstract

Blog Type

Topics Covered in Blog

BenWilson2 commented Jun 20, 2024

iRahulPandey commented Jun 21, 2024