Tribunal.TestCase (Tribunal v1.3.6)

Represents a single evaluation test case.

Fields

input - The user query/prompt (required)
actual_output - The LLM response to evaluate (required for evaluation)
expected_output - Golden/ideal answer for comparison (optional)
context - Ground truth context for faithfulness checks (optional)
retrieval_context - Actual retrieved docs from RAG (optional)
metadata - Additional info like latency, tokens, cost (optional)

Example

test_case = %Tribunal.TestCase{
  input: "What's the return policy?",
  actual_output: "You can return items within 30 days.",
  context: ["Returns accepted within 30 days with receipt."],
  expected_output: "Items can be returned within 30 days with a receipt."
}

Summary

Types

t()

Functions

new(attrs)

Creates a new test case from a map or keyword list.

with_metadata(test_case, metadata)

Adds metadata (latency, tokens, cost, etc).

with_output(test_case, output)

Sets the actual output on an existing test case. Useful when the dataset provides input/context but output comes from your LLM.

with_retrieval_context(test_case, context)

Sets the retrieval context from your RAG pipeline.