structured-generation-benchmark

To use Large Language Models (LLMs) effectively and reliably, it's essential to include structured generation techniques. Being able to get outputs like regular expressions, JSON, or a Pydantic data model is key for making useful software.

But what's the real effect of using libraries like Outlines or Instructor to achieve that goal?

This repository has put together evaluations to answer this question.

Function Calling

The ability of the LLM to call functions.

Datasets

Berkeley Function Calling Leaderboard [April 16, 2024 update]

Evaluation

We deployed a modal function to run open-source models using Transformers + Outlines.
We created different model handlers to run the Gorilla BFCL scripts [April 6, 2024 version] for the AST simple evaluation category.
We evaluated and reported the results comparing them with the Leaderboard Website [April 26, 2024 version].

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
evals		evals
modal		modal
reports		reports
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

structured-generation-benchmark

Function Calling

Datasets

Evaluation

Reports

Synthetic Data Generation

Reports

About

Releases

Packages

Languages

License

aastroza/structured-generation-benchmark

Folders and files

Latest commit

History

Repository files navigation

structured-generation-benchmark

Function Calling

Datasets

Evaluation

Reports

Synthetic Data Generation

Reports

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages