DeepEvalEx.TestCase (DeepEvalEx v0.1.0)

View Source

Represents a test case for LLM evaluation.

A test case contains the input, actual output, and optional context for evaluating LLM responses.

Fields

  • :input - The input prompt sent to the LLM (required)
  • :actual_output - The LLM's response to evaluate (required for most metrics)
  • :expected_output - The expected/ground truth output (optional)
  • :retrieval_context - List of retrieved context chunks for RAG evaluation
  • :context - Alias for retrieval_context (for compatibility)
  • :tools_called - List of tool calls made by the LLM
  • :expected_tools - Expected tool calls for tool use evaluation
  • :metadata - Additional metadata for the test case

Examples

# Basic test case
test_case = %DeepEvalEx.TestCase{
  input: "What is the capital of France?",
  actual_output: "The capital of France is Paris."
}

# RAG evaluation test case
test_case = %DeepEvalEx.TestCase{
  input: "What are the benefits of exercise?",
  actual_output: "Exercise improves cardiovascular health and mood.",
  retrieval_context: [
    "Regular exercise strengthens the heart and improves circulation.",
    "Physical activity releases endorphins, improving mental well-being."
  ]
}

# With expected output
test_case = %DeepEvalEx.TestCase{
  input: "Summarize: The quick brown fox jumps over the lazy dog.",
  actual_output: "A fox jumped over a dog.",
  expected_output: "A fox leaps over a resting dog."
}

Summary

Functions

Returns the effective retrieval context, preferring :retrieval_context over :context.

Creates a new test case struct.

Creates a new test case struct, raising on error.

Validates that the test case has the required parameters for a given metric.

Types

t()

@type t() :: %DeepEvalEx.TestCase{
  actual_output: String.t() | nil,
  context: [String.t()] | nil,
  expected_output: String.t() | nil,
  expected_tools: [DeepEvalEx.Schemas.ToolCall.t()] | nil,
  input: String.t(),
  metadata: map() | nil,
  name: String.t() | nil,
  retrieval_context: [String.t()] | nil,
  tags: [String.t()] | nil,
  tools_called: [DeepEvalEx.Schemas.ToolCall.t()] | nil
}

Functions

get_retrieval_context(test_case)

@spec get_retrieval_context(t()) :: [String.t()] | nil

Returns the effective retrieval context, preferring :retrieval_context over :context.

new(attrs)

@spec new(keyword() | map()) :: {:ok, t()} | {:error, Ecto.Changeset.t()}

Creates a new test case struct.

Options

  • :input - The input prompt (required)
  • :actual_output - The LLM's response
  • :expected_output - Expected output for comparison
  • :retrieval_context - List of retrieved context strings
  • :context - Alias for retrieval_context
  • :tools_called - List of tool calls made
  • :expected_tools - Expected tool calls
  • :metadata - Additional metadata map

new!(attrs)

@spec new!(keyword() | map()) :: t()

Creates a new test case struct, raising on error.

validate_params(test_case, required_params)

@spec validate_params(t(), [atom()]) :: :ok | {:error, {:missing_params, [atom()]}}

Validates that the test case has the required parameters for a given metric.

Handles aliases:

  • :context and :retrieval_context are interchangeable