DeepEvalEx.TestCase (DeepEvalEx v0.1.0)

Represents a test case for LLM evaluation.

A test case contains the input, actual output, and optional context for evaluating LLM responses.

Fields

:input - The input prompt sent to the LLM (required)
:actual_output - The LLM's response to evaluate (required for most metrics)
:expected_output - The expected/ground truth output (optional)
:retrieval_context - List of retrieved context chunks for RAG evaluation
:context - Alias for retrieval_context (for compatibility)
:tools_called - List of tool calls made by the LLM
:expected_tools - Expected tool calls for tool use evaluation
:metadata - Additional metadata for the test case

Examples

# Basic test case
test_case = %DeepEvalEx.TestCase{
  input: "What is the capital of France?",
  actual_output: "The capital of France is Paris."
}

# RAG evaluation test case
test_case = %DeepEvalEx.TestCase{
  input: "What are the benefits of exercise?",
  actual_output: "Exercise improves cardiovascular health and mood.",
  retrieval_context: [
    "Regular exercise strengthens the heart and improves circulation.",
    "Physical activity releases endorphins, improving mental well-being."
  ]
}

# With expected output
test_case = %DeepEvalEx.TestCase{
  input: "Summarize: The quick brown fox jumps over the lazy dog.",
  actual_output: "A fox jumped over a dog.",
  expected_output: "A fox leaps over a resting dog."
}

Summary

Types

t()

Functions

get_retrieval_context(test_case)

Returns the effective retrieval context, preferring :retrieval_context over :context.

new(attrs)

Creates a new test case struct.

new!(attrs)

Creates a new test case struct, raising on error.

validate_params(test_case, required_params)

Validates that the test case has the required parameters for a given metric.