Puck.Eval.Result (Puck v0.2.11)

Copy Markdown View Source

Aggregates grader results from an evaluation.

The Result struct captures the output, trajectory, and results of applying multiple graders to an agent execution.

Fields

  • :passed? - Whether all graders passed
  • :output - The agent's final output
  • :trajectory - The captured Puck.Eval.Trajectory
  • :grader_results - List of individual grader results

Example

alias Puck.Eval.{Collector, Graders, Result}

{output, trajectory} = Collector.collect(fn -> MyAgent.run(input) end)

result = Result.from_graders(output, trajectory, [
  Graders.contains("john@example.com"),
  Graders.max_steps(5)
])

if result.passed? do
  IO.puts("All graders passed!")
else
  IO.puts("Failed graders:")
  for gr <- result.grader_results, !gr.passed? do
    IO.puts("  - #{gr.reason}")
  end
end

Summary

Functions

Returns only the failed grader results.

Creates a Result by applying graders to an output and trajectory.

Returns only the passed grader results.

Returns a summary map of the result.

Types

grader_result()

@type grader_result() :: %{
  grader:
    module() | (term(), Puck.Eval.Trajectory.t() -> Puck.Eval.Grader.result()),
  result: Puck.Eval.Grader.result(),
  passed?: boolean(),
  reason: String.t() | nil
}

t()

@type t() :: %Puck.Eval.Result{
  grader_results: [grader_result()],
  output: term(),
  passed?: boolean(),
  trajectory: Puck.Eval.Trajectory.t()
}

Functions

failures(result)

Returns only the failed grader results.

Example

failed = Result.failures(result)
for f <- failed do
  IO.puts("Failed: #{f.reason}")
end

from_graders(output, trajectory, graders)

Creates a Result by applying graders to an output and trajectory.

Returns a Result struct with all grader results aggregated. The passed? field is true only if all graders pass.

Example

result = Result.from_graders(output, trajectory, [
  Graders.contains("hello"),
  Graders.max_steps(3)
])

result.passed?         # => true if all passed
result.grader_results  # => list of individual results

passes(result)

Returns only the passed grader results.

summary(result)

Returns a summary map of the result.

Useful for logging or serialization.

Example

summary = Result.summary(result)
# => %{
#   passed?: true,
#   total_graders: 3,
#   passed_count: 3,
#   failed_count: 0,
#   total_steps: 2,
#   total_tokens: 385,
#   total_duration_ms: 1200
# }