Nous.Eval.Evaluators.LLMJudge (nous v0.13.3)

View Source

Evaluator that uses an LLM to judge output quality.

This evaluator is useful when:

  • Expected output is subjective
  • Multiple valid answers exist
  • Complex reasoning is needed to evaluate

Expected Format

%{
  criteria: "Is the response helpful, accurate, and well-formatted?",
  rubric: "5: Excellent, 4: Good, 3: Average, 2: Poor, 1: Very Poor"
}

Configuration

  • :judge_model - Model to use for judging (default: from suite or config)
  • :criteria - Evaluation criteria
  • :rubric - Scoring rubric
  • :pass_threshold - Minimum score to pass (default: 0.6)
  • :reference_answer - Optional reference answer for comparison

Examples

TestCase.new(
  id: "quality_check",
  input: "Explain recursion",
  expected: %{
    criteria: "Is the explanation clear, accurate, and uses good examples?",
    rubric: "5: Excellent explanation with clear examples, 3: Adequate, 1: Poor"
  },
  eval_type: :llm_judge,
  eval_config: %{
    judge_model: "lmstudio:ministral-3-14b-reasoning",
    pass_threshold: 0.6
  }
)