DeepEvalEx.Result (DeepEvalEx v0.1.0)

View Source

Represents the result of evaluating a test case against a metric.

Fields

  • :metric - Name of the metric that produced this result
  • :score - Numeric score from 0.0 to 1.0
  • :success - Whether the score meets the threshold
  • :reason - Explanation for the score (from LLM-based metrics)
  • :threshold - The threshold used for pass/fail determination
  • :metadata - Additional metric-specific data
  • :evaluation_cost - Cost of the LLM calls for this evaluation
  • :latency_ms - Time taken for the evaluation in milliseconds

Examples

%DeepEvalEx.Result{
  metric: "Faithfulness",
  score: 0.85,
  success: true,
  reason: "4 out of 5 claims are supported by the retrieval context.",
  threshold: 0.5,
  metadata: %{
    claims: ["claim1", "claim2", "claim3", "claim4", "claim5"],
    verdicts: [:yes, :yes, :yes, :yes, :no]
  },
  evaluation_cost: 0.002,
  latency_ms: 1250
}

Summary

Functions

Creates a new result struct.

Checks if the result is successful (score >= threshold).

Returns a human-readable summary of the result.

Types

t()

@type t() :: %DeepEvalEx.Result{
  evaluation_cost: float() | nil,
  latency_ms: non_neg_integer() | nil,
  metadata: map() | nil,
  metric: String.t(),
  reason: String.t() | nil,
  score: float(),
  success: boolean(),
  threshold: float()
}

Functions

new(opts)

@spec new(keyword()) :: t()

Creates a new result struct.

Options

  • :metric - Name of the metric (required)
  • :score - Numeric score 0.0-1.0 (required)
  • :threshold - Pass/fail threshold (default: 0.5)
  • :reason - Explanation for the score
  • :metadata - Additional data
  • :evaluation_cost - LLM API cost
  • :latency_ms - Evaluation time

Examples

DeepEvalEx.Result.new(
  metric: "GEval",
  score: 0.8,
  threshold: 0.5,
  reason: "The response is accurate and relevant."
)

success?(result)

@spec success?(t()) :: boolean()

Checks if the result is successful (score >= threshold).

summary(result)

@spec summary(t()) :: String.t()

Returns a human-readable summary of the result.