Nous.Eval.Evaluators.LLMJudge (nous v0.9.0)
View SourceEvaluator that uses an LLM to judge output quality.
This evaluator is useful when:
- Expected output is subjective
- Multiple valid answers exist
- Complex reasoning is needed to evaluate
Expected Format
%{
criteria: "Is the response helpful, accurate, and well-formatted?",
rubric: "5: Excellent, 4: Good, 3: Average, 2: Poor, 1: Very Poor"
}Configuration
:judge_model- Model to use for judging (default: from suite or config):criteria- Evaluation criteria:rubric- Scoring rubric:pass_threshold- Minimum score to pass (default: 0.6):reference_answer- Optional reference answer for comparison
Examples
TestCase.new(
id: "quality_check",
input: "Explain recursion",
expected: %{
criteria: "Is the explanation clear, accurate, and uses good examples?",
rubric: "5: Excellent explanation with clear examples, 3: Adequate, 1: Poor"
},
eval_type: :llm_judge,
eval_config: %{
judge_model: "lmstudio:ministral-3-14b-reasoning",
pass_threshold: 0.6
}
)