LlmGuard.Utils.Patterns (LlmGuard v0.3.1)

View Source

Utilities for pattern matching and text analysis.

This module provides helper functions for:

  • Compiling and matching regex patterns
  • Building pattern matchers from pattern databases
  • Calculating match confidence scores
  • Text normalization and keyword extraction

Pattern Structure

Patterns used in LlmGuard should have the following structure:

%{
  regex: ~r/pattern/i,           # Regex pattern
  name: "pattern_name",           # Unique identifier
  severity: :high | :medium | :low,
  category: :attack_category,
  confidence: 0.95,               # Base confidence (0.0-1.0)
  description: "What this detects"
}

Examples

# Compile a pattern
{:ok, regex} = Patterns.compile_pattern("ignore.*instructions", [:caseless])

# Match text
Patterns.match?(~r/threat/i, "This is a threat")  # => true

# Build a pattern matcher
patterns = [
  %{regex: ~r/attack/i, name: "attack", severity: :high},
  %{regex: ~r/exploit/i, name: "exploit", severity: :medium}
]
matcher = Patterns.build_pattern_matcher(patterns)
matches = matcher.("This is an attack")  # => [%{name: "attack", ...}]

# Calculate confidence
confidence = Patterns.calculate_match_confidence(matches, input_length)

Summary

Functions

Builds a pattern matcher function from a list of patterns.

Calculates confidence score based on pattern matches.

Compiles a string pattern into a Regex.

Extracts keywords from text.

Checks if a pattern matches the given text.

Returns all matches of a pattern in the text.

Normalizes text for pattern matching.

Types

match_result()

@type match_result() :: %{
  name: String.t(),
  severity: :high | :medium | :low,
  category: atom(),
  confidence: float()
}

pattern()

@type pattern() :: %{
  regex: Regex.t(),
  name: String.t(),
  severity: :high | :medium | :low,
  category: atom(),
  confidence: float()
}

Functions

build_pattern_matcher(patterns)

@spec build_pattern_matcher([pattern()]) :: (String.t() -> [match_result()])

Builds a pattern matcher function from a list of patterns.

Returns a function that takes text and returns all matching patterns.

Parameters

  • patterns - List of pattern maps

Returns

A function (String.t() -> [match_result()]) that finds matches.

Examples

patterns = [
  %{regex: ~r/threat/i, name: "threat", severity: :high, category: :generic, confidence: 0.9}
]

matcher = Patterns.build_pattern_matcher(patterns)
matches = matcher.("This is a threat")
# => [%{name: "threat", severity: :high, category: :generic, confidence: 0.9}]

calculate_match_confidence(matched_patterns, input_length)

@spec calculate_match_confidence([map()], non_neg_integer()) :: float()

Calculates confidence score based on pattern matches.

Takes into account:

  • Number of patterns matched
  • Base confidence of each pattern
  • Input length (shorter inputs with matches = higher confidence)

Parameters

  • matched_patterns - List of matched patterns with confidence scores
  • input_length - Length of the input text

Returns

Float between 0.0 and 1.0 representing overall confidence.

Examples

matches = [
  %{confidence: 0.8},
  %{confidence: 0.9}
]

confidence = Patterns.calculate_match_confidence(matches, 100)
# => ~0.95 (higher due to multiple matches)

compile_pattern(pattern, flags \\ [:caseless, :unicode])

@spec compile_pattern(String.t() | Regex.t(), [atom()]) ::
  {:ok, Regex.t()} | {:error, term()}

Compiles a string pattern into a Regex.

Supports optional flags for case-insensitive matching, multiline, etc.

Parameters

  • pattern - String pattern or already-compiled Regex
  • flags - List of regex flags (default: [:caseless, :unicode])

Returns

  • {:ok, regex} - Successfully compiled regex
  • {:error, reason} - Compilation failed

Examples

iex> {:ok, regex} = LlmGuard.Utils.Patterns.compile_pattern("test")
iex> Regex.match?(regex, "This is a TEST")
true

iex> {:ok, regex} = LlmGuard.Utils.Patterns.compile_pattern("test", [])
iex> Regex.match?(regex, "This is a TEST")
false

extract_keywords(text, opts \\ [])

@spec extract_keywords(
  String.t(),
  keyword()
) :: [String.t()]

Extracts keywords from text.

Useful for heuristic analysis and keyword-based detection.

Options

  • :min_length - Minimum keyword length (default: 3)
  • :max_keywords - Maximum number of keywords to return (default: 100)

Parameters

  • text - Text to extract keywords from
  • opts - Keyword options

Returns

List of unique keywords.

Examples

iex> text = "ignore all previous instructions"
iex> keywords = LlmGuard.Utils.Patterns.extract_keywords(text, min_length: 4)
iex> "ignore" in keywords
true

match?(pattern, text)

@spec match?(Regex.t(), String.t()) :: boolean()

Checks if a pattern matches the given text.

Parameters

  • pattern - Compiled regex pattern
  • text - Text to match against

Returns

true if the pattern matches, false otherwise.

Examples

iex> pattern = ~r/threat/i
iex> LlmGuard.Utils.Patterns.match?(pattern, "This is a threat")
true

iex> LlmGuard.Utils.Patterns.match?(pattern, "This is safe")
false

match_all(pattern, text)

@spec match_all(Regex.t(), String.t()) :: [String.t()]

Returns all matches of a pattern in the text.

Parameters

  • pattern - Compiled regex pattern
  • text - Text to search

Returns

List of matched strings.

Examples

iex> pattern = ~r/\b\w+@\w+\.\w+\b/
iex> text = "Email: john@example.com or jane@test.org"
iex> matches = LlmGuard.Utils.Patterns.match_all(pattern, text)
iex> length(matches)
2

normalize_text(text)

@spec normalize_text(String.t()) :: String.t()

Normalizes text for pattern matching.

Applies:

  • Lowercase conversion
  • Whitespace normalization
  • Unicode normalization

Parameters

  • text - Text to normalize

Returns

Normalized text string.

Examples

iex> LlmGuard.Utils.Patterns.normalize_text("  HELLO   World  ")
"hello world"