DeepEvalEx.Evaluator (DeepEvalEx v0.1.0)

View Source

Concurrent evaluation engine for DeepEvalEx.

Evaluates test cases against metrics using BEAM's lightweight processes for parallel execution.

Usage

# Single test case
[results] = DeepEvalEx.Evaluator.evaluate([test_case], [metric])

# Multiple test cases (concurrent)
all_results = DeepEvalEx.Evaluator.evaluate(test_cases, metrics,
  concurrency: 20
)

Options

  • :concurrency - Maximum concurrent evaluations (default: schedulers * 2)
  • :timeout - Timeout per test case in milliseconds (default: 60_000)
  • :threshold - Default threshold for all metrics
  • :model - Default LLM model for LLM-based metrics
  • :adapter - Default LLM adapter

Results

Returns a list of result lists, one per test case:

[
  [%Result{metric: "Metric1", ...}, %Result{metric: "Metric2", ...}],
  [%Result{metric: "Metric1", ...}, %Result{metric: "Metric2", ...}]
]

Summary

Functions

Evaluates test cases against metrics concurrently.

Evaluates a single test case against a single metric.

Evaluates a single test case against all metrics.

Functions

evaluate(test_cases, metrics, opts \\ [])

@spec evaluate([DeepEvalEx.TestCase.t()], [module() | struct()], keyword()) :: [
  [DeepEvalEx.Result.t()]
]

Evaluates test cases against metrics concurrently.

Parameters

  • test_cases - List of test cases to evaluate
  • metrics - List of metric modules or structs
  • opts - Evaluation options

Returns

List of result lists, one per test case.

evaluate_metric(test_case, metric, opts)

@spec evaluate_metric(DeepEvalEx.TestCase.t(), module() | struct(), keyword()) ::
  DeepEvalEx.Result.t()

Evaluates a single test case against a single metric.

evaluate_single(test_case, metrics, opts \\ [])

@spec evaluate_single(DeepEvalEx.TestCase.t(), [module() | struct()], keyword()) :: [
  DeepEvalEx.Result.t()
]

Evaluates a single test case against all metrics.