DeepEvalEx.Metrics.BaseMetric behaviour (DeepEvalEx v0.1.0)

View Source

Behaviour for evaluation metrics.

All metrics in DeepEvalEx implement this behaviour, which defines the interface for measuring test cases against evaluation criteria.

Implementing a Custom Metric

defmodule MyApp.CustomMetric do
  use DeepEvalEx.Metrics.BaseMetric

  @impl true
  def metric_name, do: "CustomMetric"

  @impl true
  def required_params, do: [:input, :actual_output]

  @impl true
  def measure(test_case, opts) do
    # Your evaluation logic
    score = calculate_score(test_case)
    threshold = Keyword.get(opts, :threshold, 0.5)

    {:ok, DeepEvalEx.Result.new(
      metric: metric_name(),
      score: score,
      threshold: threshold,
      reason: "Explanation..."
    )}
  end
end

Using the using Macro

The use DeepEvalEx.Metrics.BaseMetric macro provides:

  • Default implementation of validate_test_case/2
  • Telemetry instrumentation around measure/2
  • Consistent error handling

You can override any of these defaults.

Summary

Callbacks

Optional callback to provide default options for the metric.

Measures a test case and returns a result.

Returns the name of this metric.

Returns the list of required test case parameters for this metric.

Types

measure_result()

@type measure_result() :: {:ok, DeepEvalEx.Result.t()} | {:error, term()}

opts()

@type opts() :: keyword()

test_case()

@type test_case() :: DeepEvalEx.TestCase.t()

Callbacks

default_opts()

(optional)
@callback default_opts() :: keyword()

Optional callback to provide default options for the metric.

measure(test_case, opts)

@callback measure(test_case(), opts()) :: measure_result()

Measures a test case and returns a result.

Parameters

  • test_case - The test case to evaluate
  • opts - Options including:
    • :threshold - Pass/fail threshold (0.0 - 1.0)
    • :model - LLM model for LLM-based metrics
    • :adapter - LLM adapter to use
    • :include_reason - Whether to include reasoning (default: true)

Returns

  • {:ok, result} - Successful evaluation with result struct
  • {:error, reason} - Evaluation failed

metric_name()

@callback metric_name() :: String.t()

Returns the name of this metric.

required_params()

@callback required_params() :: [atom()]

Returns the list of required test case parameters for this metric.

These are validated before measure/2 is called.

Common parameters:

  • :input - The input prompt
  • :actual_output - The LLM's response
  • :expected_output - Expected response (for comparison)
  • :retrieval_context - Retrieved context (for RAG metrics)