# Tribunal v1.3.6 - Table of Contents LLM evaluation framework for Elixir ## Pages - [Tribunal ⚖️](readme.md) - Guides - [Getting Started](getting-started.md) - [Test Mode vs Evaluation Mode](evaluation-modes.md) - [ExUnit Integration](exunit-integration.md) - [Assertions Reference](assertions.md) - [LLM-as-Judge](llm-as-judge.md) - [Datasets](datasets.md) - [Red Team Testing](red-team-testing.md) - [Reporters](reporters.md) - [GitHub Actions Integration](github-actions.md) ## Modules - [Tribunal.EvalCase.Assertions](Tribunal.EvalCase.Assertions.md): ExUnit-style assertion macros for LLM evaluation. - [Tribunal.Judge](Tribunal.Judge.md): Behaviour for LLM-as-judge assertions. - [Tribunal.Judges.Bias](Tribunal.Judges.Bias.md): Detects stereotypes and prejudice in LLM outputs. - [Tribunal.Judges.Correctness](Tribunal.Judges.Correctness.md): Compares LLM output against an expected answer. - [Tribunal.Judges.Faithful](Tribunal.Judges.Faithful.md): Evaluates whether LLM output is grounded in provided context. - [Tribunal.Judges.Hallucination](Tribunal.Judges.Hallucination.md): Detects claims not supported by the provided context. - [Tribunal.Judges.Harmful](Tribunal.Judges.Harmful.md): Detects dangerous or harmful content in LLM outputs. - [Tribunal.Judges.Jailbreak](Tribunal.Judges.Jailbreak.md): Detects when an LLM has been manipulated to bypass safety measures. - [Tribunal.Judges.PII](Tribunal.Judges.PII.md): Detects Personally Identifiable Information (PII) in LLM outputs. - [Tribunal.Judges.Refusal](Tribunal.Judges.Refusal.md): Detects when an LLM appropriately refuses to comply with a request. - [Tribunal.Judges.Relevant](Tribunal.Judges.Relevant.md): Evaluates whether LLM output is relevant to the input query. - [Tribunal.Judges.Toxicity](Tribunal.Judges.Toxicity.md): Detects hostile, abusive, or toxic content in LLM outputs. - Core - [Tribunal](Tribunal.md): LLM evaluation framework for Elixir. - [Tribunal.Assertions](Tribunal.Assertions.md): Assertion evaluation engine. - [Tribunal.TestCase](Tribunal.TestCase.md): Represents a single evaluation test case. - Assertion Types - [Tribunal.Assertions.Deterministic](Tribunal.Assertions.Deterministic.md): Deterministic assertions that don't require LLM calls. - [Tribunal.Assertions.Embedding](Tribunal.Assertions.Embedding.md): Embedding-based semantic similarity assertions. - [Tribunal.Assertions.Judge](Tribunal.Assertions.Judge.md): LLM-as-judge assertions for evaluating LLM outputs. - Testing - [Tribunal.Dataset](Tribunal.Dataset.md): Loads evaluation datasets from JSON or YAML files. - [Tribunal.EvalCase](Tribunal.EvalCase.md): ExUnit integration for LLM evaluations. - [Tribunal.RedTeam](Tribunal.RedTeam.md): Red team attack generators for testing LLM safety. - Reporters - [Tribunal.Reporter](Tribunal.Reporter.md): Behaviour for eval result reporters. - [Tribunal.Reporter.Console](Tribunal.Reporter.Console.md): Pretty console output for eval results. - [Tribunal.Reporter.GitHub](Tribunal.Reporter.GitHub.md): GitHub Actions annotations format. - [Tribunal.Reporter.HTML](Tribunal.Reporter.HTML.md): HTML report for shareable results. - [Tribunal.Reporter.JSON](Tribunal.Reporter.JSON.md): JSON output for CI/machine consumption. - [Tribunal.Reporter.JUnit](Tribunal.Reporter.JUnit.md): JUnit XML format for CI tools. - [Tribunal.Reporter.Text](Tribunal.Reporter.Text.md): Plain ASCII text output (no unicode). ## Mix Tasks - [mix tribunal.eval](Mix.Tasks.Tribunal.Eval.md): Runs LLM evaluations from dataset files. - [mix tribunal.init](Mix.Tasks.Tribunal.Init.md): Creates the eval directory structure with example files.