Nous.Eval.Evaluators.ToolUsage (nous v0.13.3)

Evaluator that verifies correct tool usage by the agent.

Expected Format

%{
  tools_called: ["tool_name1", "tool_name2"],
  tools_not_called: ["tool_name3"],
  output_contains: ["expected", "text"],
  min_tool_calls: 1,
  max_tool_calls: 5,
  tool_args: %{
    "tool_name" => %{"arg1" => "value1"}
  }
}

All fields are optional.

Configuration

:strict_order - Tools must be called in order (default: false)
:check_args - Verify tool arguments (default: true if tool_args provided)

Examples

# Verify specific tools were called
TestCase.new(
  id: "tool_test",
  input: "What's the weather in Tokyo?",
  expected: %{
    tools_called: ["get_weather"],
    output_contains: ["Tokyo"]
  },
  eval_type: :tool_usage
)

# Verify tool call count
TestCase.new(
  id: "multi_tool",
  input: "Compare weather in Tokyo and Paris",
  expected: %{
    tools_called: ["get_weather"],
    min_tool_calls: 2
  },
  eval_type: :tool_usage
)