ADR-0001: Behaviour-Based Plugin Architecture
View SourceStatus
Accepted
Date
2024-12-25
Context
DeepEvalEx needs extensibility for two core abstractions:
- Metrics - Different evaluation strategies (exact match, G-Eval, faithfulness, etc.)
- LLM Adapters - Different LLM providers (OpenAI, Anthropic, Ollama, etc.)
Users should be able to implement custom metrics and adapters that integrate seamlessly with the framework, including automatic telemetry instrumentation, validation, and error handling.
Elixir offers two main polymorphism mechanisms: protocols and behaviours.
Decision
Use Elixir behaviours for both metrics and LLM adapters.
Metrics implement DeepEvalEx.Metrics.BaseMetric:
defmodule DeepEvalEx.Metrics.BaseMetric do
@callback metric_name() :: String.t()
@callback required_params() :: [atom()]
@callback do_measure(TestCase.t(), keyword()) :: Result.t()
endLLM adapters implement DeepEvalEx.LLM.Adapter:
defmodule DeepEvalEx.LLM.Adapter do
@callback generate(String.t(), keyword()) :: {:ok, String.t()} | {:error, term()}
@callback generate_with_schema(String.t(), map(), keyword()) :: {:ok, map()} | {:error, term()}
@callback supports_structured_outputs?() :: boolean()
@callback model_name(keyword()) :: String.t()
endConsequences
Positive
- Clear contracts: Behaviours enforce explicit callback definitions with typespecs
- Compile-time checks: Missing callbacks generate warnings/errors at compile time
- Macro integration:
__using__macros can inject shared functionality (validation, telemetry) - Documentation: Behaviour modules serve as authoritative interface documentation
- Discoverability: IDE autocompletion and documentation tools understand behaviours
Negative
- Less dynamic dispatch: Unlike protocols, behaviours require knowing the module at compile time or passing it explicitly
- No data-driven dispatch: Protocols dispatch on data type; behaviours require explicit module references
- More boilerplate: Each implementation requires
@behaviourdeclaration
Neutral
- Custom metrics/adapters have full feature parity with built-in ones
- Configuration must specify adapter modules explicitly (e.g.,
adapter: DeepEvalEx.LLM.Adapters.OpenAI)
Alternatives Considered
Protocols
- Rejected: Protocols dispatch on data types, but metrics and adapters are module-based, not data-based. There's no natural "data type" to dispatch on.
Simple function contracts (no behaviour)
- Rejected: Loses compile-time checking and documentation benefits. No enforcement of required callbacks.
GenServer-based plugins
- Rejected: Adds unnecessary process overhead for stateless operations. Metrics and adapters don't need to maintain state between calls.