LlmGuard.Utils.Patterns (LlmGuard v0.3.1)
View SourceUtilities for pattern matching and text analysis.
This module provides helper functions for:
- Compiling and matching regex patterns
- Building pattern matchers from pattern databases
- Calculating match confidence scores
- Text normalization and keyword extraction
Pattern Structure
Patterns used in LlmGuard should have the following structure:
%{
regex: ~r/pattern/i, # Regex pattern
name: "pattern_name", # Unique identifier
severity: :high | :medium | :low,
category: :attack_category,
confidence: 0.95, # Base confidence (0.0-1.0)
description: "What this detects"
}Examples
# Compile a pattern
{:ok, regex} = Patterns.compile_pattern("ignore.*instructions", [:caseless])
# Match text
Patterns.match?(~r/threat/i, "This is a threat") # => true
# Build a pattern matcher
patterns = [
%{regex: ~r/attack/i, name: "attack", severity: :high},
%{regex: ~r/exploit/i, name: "exploit", severity: :medium}
]
matcher = Patterns.build_pattern_matcher(patterns)
matches = matcher.("This is an attack") # => [%{name: "attack", ...}]
# Calculate confidence
confidence = Patterns.calculate_match_confidence(matches, input_length)
Summary
Functions
Builds a pattern matcher function from a list of patterns.
Calculates confidence score based on pattern matches.
Compiles a string pattern into a Regex.
Extracts keywords from text.
Checks if a pattern matches the given text.
Returns all matches of a pattern in the text.
Normalizes text for pattern matching.
Types
Functions
@spec build_pattern_matcher([pattern()]) :: (String.t() -> [match_result()])
Builds a pattern matcher function from a list of patterns.
Returns a function that takes text and returns all matching patterns.
Parameters
patterns- List of pattern maps
Returns
A function (String.t() -> [match_result()]) that finds matches.
Examples
patterns = [
%{regex: ~r/threat/i, name: "threat", severity: :high, category: :generic, confidence: 0.9}
]
matcher = Patterns.build_pattern_matcher(patterns)
matches = matcher.("This is a threat")
# => [%{name: "threat", severity: :high, category: :generic, confidence: 0.9}]
@spec calculate_match_confidence([map()], non_neg_integer()) :: float()
Calculates confidence score based on pattern matches.
Takes into account:
- Number of patterns matched
- Base confidence of each pattern
- Input length (shorter inputs with matches = higher confidence)
Parameters
matched_patterns- List of matched patterns with confidence scoresinput_length- Length of the input text
Returns
Float between 0.0 and 1.0 representing overall confidence.
Examples
matches = [
%{confidence: 0.8},
%{confidence: 0.9}
]
confidence = Patterns.calculate_match_confidence(matches, 100)
# => ~0.95 (higher due to multiple matches)
Compiles a string pattern into a Regex.
Supports optional flags for case-insensitive matching, multiline, etc.
Parameters
pattern- String pattern or already-compiled Regexflags- List of regex flags (default:[:caseless, :unicode])
Returns
{:ok, regex}- Successfully compiled regex{:error, reason}- Compilation failed
Examples
iex> {:ok, regex} = LlmGuard.Utils.Patterns.compile_pattern("test")
iex> Regex.match?(regex, "This is a TEST")
true
iex> {:ok, regex} = LlmGuard.Utils.Patterns.compile_pattern("test", [])
iex> Regex.match?(regex, "This is a TEST")
false
Extracts keywords from text.
Useful for heuristic analysis and keyword-based detection.
Options
:min_length- Minimum keyword length (default: 3):max_keywords- Maximum number of keywords to return (default: 100)
Parameters
text- Text to extract keywords fromopts- Keyword options
Returns
List of unique keywords.
Examples
iex> text = "ignore all previous instructions"
iex> keywords = LlmGuard.Utils.Patterns.extract_keywords(text, min_length: 4)
iex> "ignore" in keywords
true
Checks if a pattern matches the given text.
Parameters
pattern- Compiled regex patterntext- Text to match against
Returns
true if the pattern matches, false otherwise.
Examples
iex> pattern = ~r/threat/i
iex> LlmGuard.Utils.Patterns.match?(pattern, "This is a threat")
true
iex> LlmGuard.Utils.Patterns.match?(pattern, "This is safe")
false
Returns all matches of a pattern in the text.
Parameters
pattern- Compiled regex patterntext- Text to search
Returns
List of matched strings.
Examples
iex> pattern = ~r/\b\w+@\w+\.\w+\b/
iex> text = "Email: john@example.com or jane@test.org"
iex> matches = LlmGuard.Utils.Patterns.match_all(pattern, text)
iex> length(matches)
2
Normalizes text for pattern matching.
Applies:
- Lowercase conversion
- Whitespace normalization
- Unicode normalization
Parameters
text- Text to normalize
Returns
Normalized text string.
Examples
iex> LlmGuard.Utils.Patterns.normalize_text(" HELLO World ")
"hello world"