Pattern-based detection and redaction of personally-identifiable information.
Useful as a sanitisation step before logging text, before sending user input to a third-party service, or before training/eval on a corpus that may contain accidental PII. The detectors are pure regex (with a Luhn check on credit cards) — fast, deterministic, and small enough to inspect.
Pattern coverage is conservative: false positives are minimised
at the cost of missing unusual formats. For broader recall on
names, addresses, and other open-class entities, combine this
with Text.NER (which uses a Bumblebee model).
Detected types
| Type | What it catches |
|---|---|
:email | RFC-5322-ish email addresses. |
:phone | International E.164 (+1234567890) and common US/EU dashed/parens forms with 7+ digits. |
:credit_card | 13–19 digit sequences that pass the Luhn check. |
:iban | IBAN format (country code + 2 check digits + up to 30 alphanumerics). |
:ssn | US Social Security numbers NNN-NN-NNNN. |
:ipv4 | Dotted-quad IP addresses with octets 0–255. |
:ipv6 | IPv6 addresses (full and compressed forms). |
:url | http(s):// URLs. |
Summary
Functions
Detects PII matches in the text.
Replaces every detected PII match with a redaction placeholder.
Returns the list of detector type atoms supported by this module.
Functions
@spec detect( String.t(), keyword() ) :: [ %{ type: atom(), value: String.t(), start: non_neg_integer(), length: pos_integer() } ]
Detects PII matches in the text.
Arguments
textis the input string.
Options
:typesis the list of detector types to run. Default is all types fromtypes/0. Pass[:email, :phone]to limit detection.
Returns
- A list of maps
%{type: atom, value: String.t(), start: integer, length: integer}sorted by:start. The:startis a byte offset, suitable forString.slice/3. Credit-card matches are filtered to only those that pass the Luhn check.
Examples
iex> [m] = Text.PII.detect("contact me at alice@example.com please")
iex> {m.type, m.value}
{:email, "alice@example.com"}
iex> Text.PII.detect("nothing here")
[]
Replaces every detected PII match with a redaction placeholder.
Arguments
textis the input string.
Options
:types— same asdetect/2.:placeholder— either a string (used for every match) or a function(type :: atom -> String.t())returning the placeholder for each match type. The default isfn type -> "[" <> String.upcase(to_string(type)) <> "]" end.
Returns
- The text with every detected match replaced by the configured placeholder. If matches overlap, the earlier-starting match wins.
Examples
iex> Text.PII.redact("email me at alice@example.com")
"email me at [EMAIL]"
iex> Text.PII.redact("phone +1-555-123-4567 email alice@x.io",
...> placeholder: fn _ -> "***" end)
"phone *** email ***"
@spec types() :: [atom()]
Returns the list of detector type atoms supported by this module.