# `Text.Extract.Tld`
[🔗](https://github.com/kipcole9/text/blob/v0.6.1/lib/extract/tld.ex#L1)

Top-level domain validation for `Text.Extract`.

At compile time, this module reads `priv/extract/tlds.txt` (the IANA
TLD list, refreshed by `mix text.download_tlds`) and bakes the entries
into a `MapSet` for O(1) lookup. The bundled file is committed to
source control; the mix task exists to make refreshes reproducible.

TLD comparison is case-insensitive and operates on the **ASCII** form
of a label. Internationalised TLDs in the IANA list are stored in
Punycode (`xn--…`) — pass labels through `Unicode.IDNA.to_ascii/2`
before lookup.

### Modes

* `:iana` — match against the full bundled IANA list (~1,440 entries).

* `:any` — accept any non-empty ASCII label (used by callers that
  need to bypass TLD validation, e.g. for intranet hostnames or
  ad-hoc strings).

Twitter-style tiered ccTLD/gTLD lists could be layered on top by a
caller, but in practice the IANA list and a "must end in a known TLD"
rule reproduce twitter-text's behaviour for every URL conformance
fixture we've checked: the TLDs that twitter-text rejects (e.g.
`.baz`, `.govedu`, `.comm`) are simply not in IANA either.

# `ascii_sorted`

```elixir
@spec ascii_sorted() :: [String.t()]
```

Returns the ASCII TLDs sorted longest-first.

Useful for building regex alternations where longer TLDs must be
tried first.

### Examples

    iex> ascii = Text.Extract.Tld.ascii_sorted()
    iex> "com" in ascii
    true

    iex> "xn--p1ai" in Text.Extract.Tld.ascii_sorted()
    false

# `count`

```elixir
@spec count() :: non_neg_integer()
```

Returns the count of TLDs in the bundled IANA list.

### Examples

    iex> Text.Extract.Tld.count() > 1000
    true

# `iana`

```elixir
@spec iana() :: MapSet.t(String.t())
```

Returns the IANA TLD list as a `MapSet` of lowercased ASCII labels.

### Examples

    iex> "com" in Text.Extract.Tld.iana()
    true

    iex> "googleusercontent" in Text.Extract.Tld.iana()
    false

# `idn_unicode`

```elixir
@spec idn_unicode() :: [String.t()]
```

Returns the IDN TLDs in their Unicode form.

Built at compile time from the `xn--` ACE entries by passing each
through `Unicode.IDNA.to_unicode/1`. Used by `Text.Extract.Scanner`
to extend its bare-host regex with explicit alternatives for IDN
TLDs.

### Examples

    iex> tlds = Text.Extract.Tld.idn_unicode()
    iex> length(tlds) > 100
    true

    iex> "みんな" in Text.Extract.Tld.idn_unicode()
    true

# `tld?`

```elixir
@spec tld?(String.t(), :iana | :any) :: boolean()
```

Returns whether `label` is a known TLD under `mode`.

### Arguments

* `label` is an ASCII string. Pass IDN labels through
  `Unicode.IDNA.to_ascii/2` first.

* `mode` is `:iana` (default) or `:any`.

### Returns

* `true` if the label is a known TLD under the mode, `false`
  otherwise.

### Examples

    iex> Text.Extract.Tld.tld?("com")
    true

    iex> Text.Extract.Tld.tld?("COM")
    true

    iex> Text.Extract.Tld.tld?("baz")
    false

    iex> Text.Extract.Tld.tld?("baz", :any)
    true

# `version_line`

```elixir
@spec version_line() :: String.t() | nil
```

Returns the version header line from the bundled `tlds.txt`.

### Examples

    iex> Text.Extract.Tld.version_line() =~ "Last Updated"
    true

---

*Consult [api-reference.md](api-reference.md) for complete listing*
