Represents a single evaluation test case.
Fields
input- The user query/prompt (required)actual_output- The LLM response to evaluate (required for evaluation)expected_output- Golden/ideal answer for comparison (optional)context- Ground truth context for faithfulness checks (optional)retrieval_context- Actual retrieved docs from RAG (optional)metadata- Additional info like latency, tokens, cost (optional)
Example
test_case = %Tribunal.TestCase{
input: "What's the return policy?",
actual_output: "You can return items within 30 days.",
context: ["Returns accepted within 30 days with receipt."],
expected_output: "Items can be returned within 30 days with a receipt."
}
Summary
Functions
Creates a new test case from a map or keyword list.
Adds metadata (latency, tokens, cost, etc).
Sets the actual output on an existing test case. Useful when the dataset provides input/context but output comes from your LLM.
Sets the retrieval context from your RAG pipeline.
Types
Functions
Creates a new test case from a map or keyword list.
Examples
Tribunal.TestCase.new(input: "Hello", actual_output: "Hi there!")
Tribunal.TestCase.new(%{"input" => "Hello", "actual_output" => "Hi!"})
Adds metadata (latency, tokens, cost, etc).
Sets the actual output on an existing test case. Useful when the dataset provides input/context but output comes from your LLM.
Sets the retrieval context from your RAG pipeline.