Behaviour for LLM-as-judge assertions.
All judges (built-in and custom) implement this behaviour. This provides a consistent interface for evaluation criteria.
Example
defmodule MyApp.Judges.BrandVoice do
@behaviour Tribunal.Judge
@impl true
def name, do: :brand_voice
@impl true
def prompt(test_case, _opts) do
"""
Evaluate if the response matches our brand voice guidelines:
- Friendly but professional tone
- No jargon or technical terms
- Empathetic and helpful
Response to evaluate:
#{test_case.actual_output}
Query: #{test_case.input}
"""
end
endConfiguration
Register your custom judges in config:
config :tribunal, :custom_judges, [
MyApp.Judges.BrandVoice,
MyApp.Judges.Compliance
]Then use them like built-in assertions:
assert_judge :brand_voice, response, query: input
Summary
Callbacks
Optional: customize how the LLM result is interpreted.
Returns the atom name for this judge.
Optional: whether "no" verdict means pass (for negative metrics like toxicity).
Builds the evaluation prompt for the LLM judge.
Optional: validate that the test case has required fields.
Functions
Returns list of all judge names (built-in + custom).
Returns all judge modules (built-in + custom).
Checks if a name is a built-in judge.
Returns list of built-in judge names.
Returns all built-in judge modules.
Checks if a name is a registered custom judge.
Returns list of custom judge names.
Returns all configured custom judge modules.
Finds a judge module by name.
Callbacks
Optional: customize how the LLM result is interpreted.
By default, uses verdict and threshold logic. Override for custom pass/fail logic.
Should return {:pass, details} or {:fail, details}.
@callback name() :: atom()
Returns the atom name for this judge.
This name is used to invoke the judge in assertions:
assert_judge :my_judge_name, response, opts
@callback negative_metric?() :: boolean()
Optional: whether "no" verdict means pass (for negative metrics like toxicity).
When true, verdict "no" = pass and "yes" = fail. When false (default), verdict "yes" = pass and "no" = fail.
@callback prompt(test_case :: Tribunal.TestCase.t(), opts :: keyword()) :: String.t()
Builds the evaluation prompt for the LLM judge.
Receives the test case and any options passed to the assertion. Should return a prompt string that asks the LLM to evaluate the response and return a JSON verdict.
The prompt should instruct the LLM to return JSON with:
verdict: "yes", "no", or "partial"reason: explanation for the verdictscore: confidence score 0.0-1.0
@callback validate(test_case :: Tribunal.TestCase.t()) :: :ok | {:error, String.t()}
Optional: validate that the test case has required fields.
Return :ok if valid, or {:error, reason} if not.
Default implementation always returns :ok.
Functions
Returns list of all judge names (built-in + custom).
Returns all judge modules (built-in + custom).
Checks if a name is a built-in judge.
Returns list of built-in judge names.
Returns all built-in judge modules.
Checks if a name is a registered custom judge.
Returns list of custom judge names.
Returns all configured custom judge modules.
Finds a judge module by name.
Returns {:ok, module} or :error.