Eval-Driven Agents: From Uncertainty to Reliability 🚀

Reducing uncertainty when introducing changes to AI Apps or Agents is the key unlock for widespread adoption. Over the past decade, test-driven development (TDD) paved the way for building robust, maintainable software. As we step into the next era, evaluation-driven development (Eval-Driven or EDD) will play a pivotal role in ensuring that compound AI-driven systems are both reliable, observable, and maintainable in production.

This repository, eval-driven-agents, provides a series of samples and best practices to help developers and organizations confidently evolve their AI solutions. By integrating evaluation-driven methodologies—such as continuous evaluation, tracing, telemetry, and observability—teams can iterate rapidly, maintain high quality, and make data-driven improvements.

What’s Inside? 🌱

Incremental Complexity:
Discover samples starting with basic function-calling agents with tracing, progressing towards comprehensive, fully instrumented systems.
Observability & Tracing:
Gain visibility into model decisions, tool usage, system behaviors, costs, latency metrics, and other key performance indicators to diagnose issues quickly and refine AI performance.
Evaluation-Driven Workflows:
Learn how to continuously evaluate changes through experimentation, measure their impact via automated CI/CD pipelines with GitHub Actions, and ensure that every update is a step toward greater reliability.

Structure 📂

<subfolder>: Each folder highlights a specific capability or pattern (e.g., tracing, evaluations, experimentations, scenario testing), building on the fundamental concepts of Eval-Driven methodologies.

As you explore these samples, you’ll see how Eval-Driven development transforms the way we approach building, testing, and deploying AI agents—ultimately driving more robust solutions and confident decision-making.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
tracing		tracing
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Eval-Driven Agents: From Uncertainty to Reliability 🚀

What’s Inside? 🌱

Structure 📂

About

Releases

Packages

Contributors 3

Languages

License

Azure-Samples/eval-driven-agents

Folders and files

Latest commit

History

Repository files navigation

Eval-Driven Agents: From Uncertainty to Reliability 🚀

What’s Inside? 🌱

Structure 📂

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages