DeepEvalEx.Evaluator (DeepEvalEx v0.1.0)
View SourceConcurrent evaluation engine for DeepEvalEx.
Evaluates test cases against metrics using BEAM's lightweight processes for parallel execution.
Usage
# Single test case
[results] = DeepEvalEx.Evaluator.evaluate([test_case], [metric])
# Multiple test cases (concurrent)
all_results = DeepEvalEx.Evaluator.evaluate(test_cases, metrics,
concurrency: 20
)Options
:concurrency- Maximum concurrent evaluations (default: schedulers * 2):timeout- Timeout per test case in milliseconds (default: 60_000):threshold- Default threshold for all metrics:model- Default LLM model for LLM-based metrics:adapter- Default LLM adapter
Results
Returns a list of result lists, one per test case:
[
[%Result{metric: "Metric1", ...}, %Result{metric: "Metric2", ...}],
[%Result{metric: "Metric1", ...}, %Result{metric: "Metric2", ...}]
]
Summary
Functions
Evaluates test cases against metrics concurrently.
Evaluates a single test case against a single metric.
Evaluates a single test case against all metrics.
Functions
@spec evaluate([DeepEvalEx.TestCase.t()], [module() | struct()], keyword()) :: [ [DeepEvalEx.Result.t()] ]
Evaluates test cases against metrics concurrently.
Parameters
test_cases- List of test cases to evaluatemetrics- List of metric modules or structsopts- Evaluation options
Returns
List of result lists, one per test case.
@spec evaluate_metric(DeepEvalEx.TestCase.t(), module() | struct(), keyword()) :: DeepEvalEx.Result.t()
Evaluates a single test case against a single metric.
@spec evaluate_single(DeepEvalEx.TestCase.t(), [module() | struct()], keyword()) :: [ DeepEvalEx.Result.t() ]
Evaluates a single test case against all metrics.