Nous.Eval.Evaluators.LLMJudge (nous v0.9.0)

Evaluator that uses an LLM to judge output quality.

This evaluator is useful when:

Expected output is subjective
Multiple valid answers exist
Complex reasoning is needed to evaluate

Expected Format

%{
  criteria: "Is the response helpful, accurate, and well-formatted?",
  rubric: "5: Excellent, 4: Good, 3: Average, 2: Poor, 1: Very Poor"
}

Configuration

:judge_model - Model to use for judging (default: from suite or config)
:criteria - Evaluation criteria
:rubric - Scoring rubric
:pass_threshold - Minimum score to pass (default: 0.6)
:reference_answer - Optional reference answer for comparison

Examples

TestCase.new(
  id: "quality_check",
  input: "Explain recursion",
  expected: %{
    criteria: "Is the explanation clear, accurate, and uses good examples?",
    rubric: "5: Excellent explanation with clear examples, 3: Adequate, 1: Poor"
  },
  eval_type: :llm_judge,
  eval_config: %{
    judge_model: "lmstudio:ministral-3-14b-reasoning",
    pass_threshold: 0.6
  }
)