ExactMatch Metric

View Source

A simple, non-LLM metric that checks if the actual output exactly matches the expected output.

When to Use

  • Factual Q&A where exact answers are expected
  • Classification tasks
  • Structured output validation
  • Quick baseline evaluation

Required Parameters

ParameterTypeDescription
inputStringThe input prompt
actual_outputStringThe LLM's response
expected_outputStringThe expected/ground truth output

Usage

alias DeepEvalEx.{TestCase, Metrics}

test_case = TestCase.new!(
  input: "What is the capital of France?",
  actual_output: "Paris",
  expected_output: "Paris"
)

{:ok, result} = Metrics.ExactMatch.measure(test_case)

result.score    # => 1.0
result.success  # => true
result.reason   # => "The actual and expected outputs are exact matches."

Options

OptionTypeDefaultDescription
:thresholdfloat1.0Score threshold for pass/fail
:case_sensitivebooleantrueWhether comparison is case-sensitive
:normalize_whitespacebooleanfalseCollapse multiple whitespace to single space

Case Insensitive Matching

test_case = TestCase.new!(
  input: "Capital of France?",
  actual_output: "PARIS",
  expected_output: "paris"
)

# Case sensitive (default) - fails
{:ok, result} = Metrics.ExactMatch.measure(test_case)
result.score  # => 0.0

# Case insensitive - passes
{:ok, result} = Metrics.ExactMatch.measure(test_case, case_sensitive: false)
result.score  # => 1.0

Whitespace Normalization

test_case = TestCase.new!(
  input: "Greeting",
  actual_output: "Hello    World",   # Multiple spaces
  expected_output: "Hello World"      # Single space
)

# Without normalization - fails
{:ok, result} = Metrics.ExactMatch.measure(test_case)
result.score  # => 0.0

# With normalization - passes
{:ok, result} = Metrics.ExactMatch.measure(test_case, normalize_whitespace: true)
result.score  # => 1.0

Score Interpretation

ScoreMeaning
1.0Exact match (after normalization if enabled)
0.0No match

Result Metadata

The result includes metadata for debugging:

result.metadata
# => %{
#   expected: "Paris",
#   actual: "Paris",
#   case_sensitive: true,
#   normalize_whitespace: false
# }

Limitations

  • Binary scoring (1.0 or 0.0) - no partial credit
  • Sensitive to formatting differences
  • Not suitable for free-form text evaluation

For more flexible evaluation, consider GEval.

Source

Ported from deepeval/metrics/exact_match/exact_match.py