DeepEvalEx.Result (DeepEvalEx v0.1.0)

Represents the result of evaluating a test case against a metric.

Fields

:metric - Name of the metric that produced this result
:score - Numeric score from 0.0 to 1.0
:success - Whether the score meets the threshold
:reason - Explanation for the score (from LLM-based metrics)
:threshold - The threshold used for pass/fail determination
:metadata - Additional metric-specific data
:evaluation_cost - Cost of the LLM calls for this evaluation
:latency_ms - Time taken for the evaluation in milliseconds

Examples

%DeepEvalEx.Result{
  metric: "Faithfulness",
  score: 0.85,
  success: true,
  reason: "4 out of 5 claims are supported by the retrieval context.",
  threshold: 0.5,
  metadata: %{
    claims: ["claim1", "claim2", "claim3", "claim4", "claim5"],
    verdicts: [:yes, :yes, :yes, :yes, :no]
  },
  evaluation_cost: 0.002,
  latency_ms: 1250
}

Summary

Types

t()

Functions

new(opts)

Creates a new result struct.

success?(result)

Checks if the result is successful (score >= threshold).

summary(result)

Returns a human-readable summary of the result.