# `Text.PII`
[🔗](https://github.com/kipcole9/text/blob/v0.5.0/lib/pii.ex#L1)

Pattern-based detection and redaction of personally-identifiable
information.

Useful as a sanitisation step before logging text, before sending
user input to a third-party service, or before training/eval on a
corpus that may contain accidental PII. The detectors are pure
regex (with a Luhn check on credit cards) — fast, deterministic,
and small enough to inspect.

Pattern coverage is conservative: false positives are minimised
at the cost of missing unusual formats. For broader recall on
names, addresses, and other open-class entities, combine this
with `Text.NER` (which uses a Bumblebee model).

### Detected types

| Type            | What it catches                                          |
|-----------------|----------------------------------------------------------|
| `:email`        | RFC-5322-ish email addresses.                            |
| `:phone`        | International E.164 (`+1234567890`) and common US/EU dashed/parens forms with 7+ digits. |
| `:credit_card`  | 13–19 digit sequences that pass the Luhn check.          |
| `:iban`         | IBAN format (country code + 2 check digits + up to 30 alphanumerics). |
| `:ssn`          | US Social Security numbers `NNN-NN-NNNN`.                |
| `:ipv4`         | Dotted-quad IP addresses with octets 0–255.              |
| `:ipv6`         | IPv6 addresses (full and compressed forms).              |
| `:url`          | `http(s)://` URLs.                                       |

# `detect`

```elixir
@spec detect(
  String.t(),
  keyword()
) :: [
  %{
    type: atom(),
    value: String.t(),
    start: non_neg_integer(),
    length: pos_integer()
  }
]
```

Detects PII matches in the text.

### Arguments

* `text` is the input string.

### Options

* `:types` is the list of detector types to run. Default is all
  types from `types/0`. Pass `[:email, :phone]` to limit detection.

### Returns

* A list of maps `%{type: atom, value: String.t(), start: integer, length: integer}`
  sorted by `:start`. The `:start` is a byte offset, suitable for
  `String.slice/3`. Credit-card matches are filtered to only those
  that pass the Luhn check.

### Examples

    iex> [m] = Text.PII.detect("contact me at alice@example.com please")
    iex> {m.type, m.value}
    {:email, "alice@example.com"}

    iex> Text.PII.detect("nothing here")
    []

# `redact`

```elixir
@spec redact(
  String.t(),
  keyword()
) :: String.t()
```

Replaces every detected PII match with a redaction placeholder.

### Arguments

* `text` is the input string.

### Options

* `:types` — same as `detect/2`.

* `:placeholder` — either a string (used for every match) or a
  function `(type :: atom -> String.t())` returning the
  placeholder for each match type. The default is
  `fn type -> "[" <> String.upcase(to_string(type)) <> "]" end`.

### Returns

* The text with every detected match replaced by the configured
  placeholder. If matches overlap, the earlier-starting match wins.

### Examples

    iex> Text.PII.redact("email me at alice@example.com")
    "email me at [EMAIL]"

    iex> Text.PII.redact("phone +1-555-123-4567 email alice@x.io",
    ...>   placeholder: fn _ -> "***" end)
    "phone *** email ***"

# `types`

```elixir
@spec types() :: [atom()]
```

Returns the list of detector type atoms supported by this module.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
