Eval an Agent result with tools

abusedmedia · October 21, 2025, 10:47am

Hi there,

I’m trying to Eval an Agent result that uses a tool.

In the dataset I have a test case that is supposed to test the “Final answer”.

The problem is that the Eval fails because the “output” is set as “Output is not an answer. Received TOOL instead”.

Here the full output:

Output & Scoring
Score
0.0%

Rationale
Output is not an answer. Received TOOL instead
Agent Output
Tool used: 
Search Web
Args:
{
  "search_query": "carbonara recipe"
}

I understand that such info is useful if I want to eval if an Agent picked the right tool with the right payload.

But how to evaluate the final answer, independently of whether it uses a tool or not?

Thanks

Jack_T · October 28, 2025, 5:30pm

Hi @abusedmedia,

The amazing @kent gave me a rundown on the Eval tool for Agents, Eval Items and why the output from the Eval tool is giving you that message

The Eval tool only ever runs one iteration of an Agent, by design. So in this case, the Eval tool is more or less running the tool call, then running its scoring and exiting before the Agent gets the result of the tool call. Resulting in that message.

An analogy to this would be how in Javascript, if you don't use async and await correctly, you might end up returning a Promise object(result of the tool call), instead of the result of an async function call you are awaiting the resolution of.

We can fix this by having the result of the tool call to the Eval Item. Then the LLM evaluating the Eval Item will have the additional context of both the tool call, as well as the data from the result of said tool call, so it can judge the accuracy properly.

There are two ways to build out the Eval Item to have both the tool call and the tool's results. One way is very manual the other is quicker and works just as well. Potentially even leaving less margin for typos or small bugs.

The quicker and easier way is done from the chat window. By clicking "Add eval to dataset", at the bottom under the '...' button.

This will fully create the Eval item, which will appear in the modal that pops up. All you need to do is click your Dataset at the top and then it will be ready for you to click 'Create' at the bottom!

Once it has been created you can run the Eval and get the correct scoring.

abusedmedia · October 29, 2025, 10:39am

Thanks @Jack_T for the hint

Topic		Replies	Views
How the Agent instructions works? 💬 Agents agents	8	223	February 13, 2026
Custom tool being called by Agent but not working properly 💬 Agents	3	123	September 5, 2025
Understanding How AI Agents Use Tools (with Working Example + Video) 🤝 Community Show & Tell agents	0	700	June 6, 2025
Retool Agent dataset test case fails with error 400 💬 Agents bug	3	94	October 3, 2025
Force Agent to use a tool 💬 Agents prompt	12	341	October 30, 2025

Eval an Agent result with tools

Related topics