ADR-0006: JSON Schema Mode for Structured LLM Outputs
View SourceStatus
Accepted
Date
2024-12-25
Context
DeepEvalEx metrics require structured outputs from LLMs:
- Claims extraction needs
{claims: ["claim1", "claim2"]} - Verdict generation needs
{verdicts: [{claim: "...", verdict: "yes|no", reason: "..."}]} - G-Eval needs
{score: 0.85, reason: "..."}
Without enforcement, LLMs may:
- Return malformed JSON
- Include extra fields
- Use wrong types (string instead of number)
- Wrap responses in markdown code blocks
Parsing failures cause metric evaluation to fail entirely.
Decision
Use OpenAI's JSON Schema mode with strict schema validation.
defmodule DeepEvalEx.Schemas.MetricOutputs.Faithfulness do
def claims_schema do
%{
type: "object",
properties: %{
claims: %{
type: "array",
items: %{type: "string"}
}
},
required: ["claims"],
additionalProperties: false
}
end
endThe adapter sends the schema with the API request:
def generate_with_schema(prompt, schema, opts) do
body = %{
model: model,
messages: [%{role: "user", content: prompt}],
response_format: %{
type: "json_schema",
json_schema: %{
name: "response",
strict: true,
schema: schema
}
}
}
# ...
endConsequences
Positive
- Guaranteed structure: LLM response always matches schema
- No parsing errors: JSON is always valid and typed correctly
- No hallucinated fields:
additionalProperties: falseblocks extras - Self-documenting: Schema modules define expected output format
- Reduced token waste: No "Please respond in JSON format" preamble
Negative
- Provider lock-in: JSON schema mode is OpenAI-specific (Anthropic has different approach)
- Model requirements: Only newer models support strict JSON schema
- Increased latency: Schema validation adds processing time
- Rigid outputs: Cannot include optional or variable fields
Neutral
- Schema modules live alongside metrics for easy reference
- Adapter abstraction allows different structured output approaches per provider
- Fallback to prompt-based JSON for providers without schema mode
Alternatives Considered
Prompt-based JSON ("Respond in JSON format: {...}")
- Rejected: Unreliable. LLMs frequently add markdown, explanations, or malformed JSON.
Function calling / Tool use
- Rejected: Semantically different purpose. Function calling is for actions, not structured data extraction.
Custom parsing with regex/string manipulation
- Rejected: Brittle and error-prone. JSON schema mode eliminates this category of bugs.
Pydantic-style class validation (via Instructor)
- Rejected: Adds dependency and complexity. Direct schema definition is simpler for our needs.