LlmGuard.Detectors.PromptInjection (LlmGuard v0.3.1)

View Source

Prompt injection detector (Layer 1: Pattern Matching).

Detects attempts to manipulate LLM behavior through malicious prompts including:

  • Instruction override attacks
  • System prompt extraction
  • Delimiter injection
  • Mode switching
  • Role manipulation

Performance

  • Latency: <2ms (P95)
  • Precision: ~98%
  • Recall: ~60% (Layer 1 only, improves with additional layers)

Detection Categories

  • :instruction_override - Attempts to override previous instructions
  • :system_extraction - Attempts to extract system prompts
  • :delimiter_injection - Injection using delimiters or special tokens
  • :mode_switching - Attempts to switch AI mode (debug, admin, etc.)
  • :role_manipulation - Attempts to manipulate AI role or persona

Examples

iex> LlmGuard.Detectors.PromptInjection.detect("Ignore all previous instructions", [])
{:detected, %{
  confidence: 0.95,
  category: :instruction_override,
  patterns_matched: ["ignore_previous_instructions"],
  metadata: %{}
}}

iex> LlmGuard.Detectors.PromptInjection.detect("What's the weather?", [])
{:safe, %{patterns_checked: 50}}