Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add reference tool call to required cols #1580

Merged
merged 1 commit into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions docs/concepts/metrics/available_metrics/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,9 @@ scorer = TopicAdherenceScore(mode="recall")
`ToolCallAccuracy` is a metric that can be used to evaluate the performance of the LLM in identifying and calling the required tools to complete a given task. This metric needs `user_input` and `reference_tool_calls` to evaluate the performance of the LLM in identifying and calling the required tools to complete a given task. The metric is computed by comparing the `reference_tool_calls` with the Tool calls made by the AI. The values range between 0 and 1, with higher values indicating better performance.

```python
from ragas.metrics import ToolCallAccuracy
from ragas.dataset_schema import MultiTurnSample
from ragas.messages import HumanMessage,AIMessage,ToolMessage,ToolCall
from ragas.metrics import ToolCallAccuracy


sample = [
HumanMessage(content="What's the weather like in New York right now?"),
Expand All @@ -89,7 +88,7 @@ sample = [
AIMessage(content="75°F is approximately 23.9°C.")
]

sampl2 = MultiTurnSample(
sample = MultiTurnSample(
user_input=sample,
reference_tool_calls=[
ToolCall(name="weather_check", args={"location": "New York"}),
Expand All @@ -98,7 +97,7 @@ sampl2 = MultiTurnSample(
)

scorer = ToolCallAccuracy()
await metric.multi_turn_ascore(sample)
await scorer.multi_turn_ascore(sample)
```

The tool call sequence specified in `reference_tool_calls` is used as the ideal outcome. If the tool calls made by the AI does not the the order or sequence of the `reference_tool_calls`, the metric will return a score of 0. This helps to ensure that the AI is able to identify and call the required tools in the correct order to complete a given task.
Expand Down
4 changes: 2 additions & 2 deletions src/ragas/metrics/_tool_call_accuracy.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class ToolCallAccuracy(MultiTurnMetric):
default_factory=lambda: {
MetricType.MULTI_TURN: {
"user_input",
"reference",
"reference_tool_calls",
}
}
)
Expand Down Expand Up @@ -61,7 +61,7 @@ def is_sequence_aligned(
async def _multi_turn_ascore(
self, sample: MultiTurnSample, callbacks: Callbacks
) -> float:
assert sample.reference_tool_calls is not None, "Reference is not set"
assert sample.reference_tool_calls is not None, "Reference tool calls is not set"

pred_tool_calls = []
for item in sample.user_input:
Expand Down
Loading