Text.Extract.Twitter (Text v0.6.1)

Copy Markdown View Source

Twitter-text-specific URL handling quirks, gated behind the :twitter_quirks option.

Twitter's twitter-text library applies a few extraction rules that aren't part of any RFC and don't generalise to other contexts (Mastodon, Slack, prose). These are useful when you specifically want behavioural parity with Twitter's auto-linking — e.g. for a Twitter-clone client — but surprising for general URL extraction:

  • t.co slug rules. Twitter's URL shortener (t.co) accepts only alphanumeric slugs and caps slugs at 40 characters. Anything after the alphanumeric run ('s, +c, .x, #a, -, …) gets stripped. Slugs longer than 40 chars cause the entire URL to be rejected.

  • English-possessive 's stripping. A trailing 's after any URL (not just t.co) is treated as English prose, not part of the URL.

Both behaviours match twitter-text's published conformance fixtures.

Summary

Functions

Applies Twitter quirks to a parsed URL record.

Functions

apply(wrapped, text)

@spec apply(
  {:ok, map()},
  String.t()
) :: {:ok, map()} | {:error, :twitter_quirk_rejected}

Applies Twitter quirks to a parsed URL record.

Arguments

  • wrapped is {:ok, record} where record is a URL record (from Text.Extract.Url). Wrapped form is used so this can be a step in a with-style pipeline.

  • text is the original source string.

Returns

  • {:ok, record} — record possibly mutated by quirks (e.g. shorter :url, :span, :path).

  • {:error, :twitter_quirk_rejected} — the URL is rejected entirely (e.g. t.co slug exceeds 40 chars).

Examples

iex> [r] = Text.Extract.urls("see http://t.co/abcde123 today", twitter_quirks: true)
iex> r.url
"http://t.co/abcde123"

iex> Text.Extract.urls("http://t.co/abcdefghijklmnopqrstuvwxyz012345678901234",
...>                   twitter_quirks: true)
[]