LlmGuard.Stage (LlmGuard v0.3.1)

View Source

Pipeline stage for LLM security guardrails.

This module integrates LlmGuard with CrucibleIR pipelines by providing a stage implementation that validates inputs and outputs according to CrucibleIR.Reliability.Guardrail configuration.

Context Requirements

The stage expects the following in the context:

Configuration

The stage reads guardrail configuration from the experiment and converts it to LlmGuard configuration. Supported guardrail options:

  • prompt_injection_detection - Enable prompt injection detection
  • jailbreak_detection - Enable jailbreak detection
  • pii_detection - Enable PII detection
  • pii_redaction - Enable PII redaction (implies pii_detection)
  • content_moderation - Enable content moderation
  • fail_on_detection - Return error on threat detection (vs warning)

Usage

# Create a guardrail configuration
guardrail = %CrucibleIR.Reliability.Guardrail{
  profiles: [:default],
  prompt_injection_detection: true,
  jailbreak_detection: true,
  pii_detection: true,
  pii_redaction: false,
  fail_on_detection: true
}

# Add to experiment context
context = %{
  experiment: %{
    reliability: %{
      guardrails: guardrail
    }
  },
  inputs: "User input to validate"
}

# Run the stage
{:ok, updated_context} = LlmGuard.Stage.run(context)

# Check results
results = updated_context.guardrails
# => %{
#   status: :safe | :detected | :error,
#   validated_inputs: [...],
#   detections: [...],
#   ...
# }

Results

The stage adds a :guardrails key to the context with validation results:

  • status - Overall status (:safe, :detected, :error)
  • validated_inputs or validated_outputs - Sanitized content
  • detections - List of detected threats (if any)
  • errors - List of errors (if any)
  • config - LlmGuard config used for validation

Error Handling

If fail_on_detection is true, the stage returns {:error, reason} when threats are detected. Otherwise, it returns {:ok, context} with detection details in context.guardrails.

Summary

Functions

Describes the stage for pipeline introspection.

Converts CrucibleIR Guardrail configuration to LlmGuard Config.

Runs security checks on inputs or outputs.

Types

context()

@type context() :: map()

stage_opts()

@type stage_opts() :: map()

stage_result()

@type stage_result() :: {:ok, context()} | {:error, term()}

Functions

describe(opts \\ %{})

@spec describe(stage_opts()) :: map()

Describes the stage for pipeline introspection.

Returns a description of what this stage does and its configuration.

Parameters

  • opts - Stage options (currently unused)

Returns

A map describing the stage.

Examples

iex> LlmGuard.Stage.describe()
%{
  name: "LlmGuard Security Stage",
  description: "Validates inputs/outputs for security threats",
  type: :security
}

from_ir_config(guardrail)

@spec from_ir_config(struct()) :: LlmGuard.Config.t()

Converts CrucibleIR Guardrail configuration to LlmGuard Config.

Maps CrucibleIR guardrail settings to LlmGuard's configuration format.

Parameters

  • guardrail - CrucibleIR.Reliability.Guardrail struct

Returns

LlmGuard.Config struct

Examples

iex> guardrail = %CrucibleIR.Reliability.Guardrail{
...>   prompt_injection_detection: true,
...>   pii_detection: true
...> }
iex> config = LlmGuard.Stage.from_ir_config(guardrail)
iex> config.prompt_injection_detection
true

run(context, opts \\ %{})

@spec run(context(), stage_opts()) :: stage_result()

Runs security checks on inputs or outputs.

Expects context with:

  • experiment.reliability.guardrails - Guardrail configuration
  • inputs or outputs - Content to validate

Returns updated context with :guardrails results, or error if fail_on_detection is enabled and threats detected.

Parameters

  • context - Pipeline context map
  • opts - Stage options (currently unused, for future extensibility)

Returns

  • {:ok, updated_context} - Validation completed, results in context.guardrails
  • {:error, reason} - Validation error or threat detected (if fail_on_detection: true)

Examples

iex> guardrail = %CrucibleIR.Reliability.Guardrail{
...>   prompt_injection_detection: true,
...>   fail_on_detection: false
...> }
iex> context = %{
...>   experiment: %{reliability: %{guardrails: guardrail}},
...>   inputs: "Safe message"
...> }
iex> {:ok, result} = LlmGuard.Stage.run(context)
iex> result.guardrails.status
:safe