DeepEvalEx.Metrics.BaseMetric behaviour (DeepEvalEx v0.1.0)
View SourceBehaviour for evaluation metrics.
All metrics in DeepEvalEx implement this behaviour, which defines the interface for measuring test cases against evaluation criteria.
Implementing a Custom Metric
defmodule MyApp.CustomMetric do
use DeepEvalEx.Metrics.BaseMetric
@impl true
def metric_name, do: "CustomMetric"
@impl true
def required_params, do: [:input, :actual_output]
@impl true
def measure(test_case, opts) do
# Your evaluation logic
score = calculate_score(test_case)
threshold = Keyword.get(opts, :threshold, 0.5)
{:ok, DeepEvalEx.Result.new(
metric: metric_name(),
score: score,
threshold: threshold,
reason: "Explanation..."
)}
end
endUsing the using Macro
The use DeepEvalEx.Metrics.BaseMetric macro provides:
- Default implementation of
validate_test_case/2 - Telemetry instrumentation around
measure/2 - Consistent error handling
You can override any of these defaults.
Summary
Callbacks
Optional callback to provide default options for the metric.
Measures a test case and returns a result.
Returns the name of this metric.
Returns the list of required test case parameters for this metric.
Types
@type measure_result() :: {:ok, DeepEvalEx.Result.t()} | {:error, term()}
@type opts() :: keyword()
@type test_case() :: DeepEvalEx.TestCase.t()
Callbacks
@callback default_opts() :: keyword()
Optional callback to provide default options for the metric.
@callback measure(test_case(), opts()) :: measure_result()
Measures a test case and returns a result.
Parameters
test_case- The test case to evaluateopts- Options including::threshold- Pass/fail threshold (0.0 - 1.0):model- LLM model for LLM-based metrics:adapter- LLM adapter to use:include_reason- Whether to include reasoning (default: true)
Returns
{:ok, result}- Successful evaluation with result struct{:error, reason}- Evaluation failed
@callback metric_name() :: String.t()
Returns the name of this metric.
@callback required_params() :: [atom()]
Returns the list of required test case parameters for this metric.
These are validated before measure/2 is called.
Common parameters:
:input- The input prompt:actual_output- The LLM's response:expected_output- Expected response (for comparison):retrieval_context- Retrieved context (for RAG metrics)