Evaluate LLMs with custom metrics with LLM as a judge #77
Labels
ack/guide
I have read through and am familiar with the contributing guide
ack/legal
I have read and understand the legal considerations for blog posting
ack/readme
I have configured my local development environment for building the website locally
blog/deep-dive
I want to write an in-depth guide blog
topic/core
I'm writing about MLflow public APIs or core features
topic/genai
I'm writing about GenAI use cases or features
Summary
This template is intended to capture a few base requirements that are needed to be met prior to filing a PR that contains a new blog post submission.
Please fill out this form in its entirety so that an MLflow maintainer can review and work with you in the process of drafting your blog content and in reviewing your blog submission PR.
PRs that are filed without a linked Blog Post Submission issue and a subsequent agreement on the content and topics covered for the blog post are not guaranteed to be reviewed or merged.
Acknowledgements
ack/guide
I have read through the contributing guideack/readme
I have configured my local development environment so that I can build a local instance of the MLflow website by following the development guideack/legal
I have verified that there are no legal considerations associated with the nature of the blog post, its content, or references to organizations, ideas, or individuals contained within my post. If I mention a particular organization, idea, or person, I will provide evidence of consent to post by any organization or individual that is mentioned prior to filing my PR.Proposed Title
Evaluate LLMs with custom metrics with LLM as a judge
Abstract
This blog post explores the capability of using large language models (LLMs) as automated judges to evaluate the quality of outputs from retrieval-augmented generation (RAG) pipelines within the MLflow framework. RAG pipelines combine information retrieval with language models to generate outputs informed by relevant textual sources. The post discusses how MLflow's
mlflow.evaluate() function can leverage LLMs to score RAG outputs across multiple dimensions like relevance, coherence, and factuality, and even custom metrics, providing an automated way to assess both the retrieved information and the generated text.
Blog Type
blog/how-to
: A how-to guide to using core MLflow functionality, focused on a common use case user journeyblog/deep-dive
: An in-depth guide that covers a specific feature in MLflowblog/use-case
: A comprehensive overview of a real-world project that leverages MLflowblog/best-practices
: A comprehensive tutorial that covers usage patterns of MLflow, focusing on an MLOps journeyblog/tips
: A short blog covering tips and tricks for using MLflow APIs or the MLflow UI componentsblog/features
: A feature-focused announcement that introduces a significant new feature that is recently or not-yet releasedblog/meetup
: A report on an MLflow community event or other Linux Foundation MLflow Ambassador Program eventblog/news
: Summaries of significant mentions of MLflow or major initiatives for the MLflow projectTopics Covered in Blog
topic/genai
: Highlights MLflow's use in training, tuning, or deploying GenAI applicationstopic/tracking
: Covering the use of Model Tracking APIs and integrated Model Flavorstopic/deployment
: Featuring topics related to the deployment of MLflow models and the MLflow Model Registrytopic/training
: Concerned with the development loop of training and tuning models using MLflow for trackingtopic/mlflow-service
: Topics related to the deployment of the MLflow Tracking Service or the MLflow Deployments Servertopic/core
: Topics covering core MLflow APIs and related featurestopic/advanced
: Featuring guides on Custom Model Development or usage of the plugin architecture of MLflowtopic/ui
: Covering features of the MLflow UItopic/other
: < please fill in >Thank you for your proposal! An MLflow Maintainer will reach out to you with next steps!
The text was updated successfully, but these errors were encountered: