Twitter-text-specific URL handling quirks, gated behind the
:twitter_quirks option.
Twitter's twitter-text
library applies a few extraction rules that aren't part of any RFC
and don't generalise to other contexts (Mastodon, Slack, prose).
These are useful when you specifically want behavioural parity with
Twitter's auto-linking — e.g. for a Twitter-clone client — but
surprising for general URL extraction:
t.coslug rules. Twitter's URL shortener (t.co) accepts only alphanumeric slugs and caps slugs at 40 characters. Anything after the alphanumeric run ('s,+c,.x,#a,-, …) gets stripped. Slugs longer than 40 chars cause the entire URL to be rejected.English-possessive
'sstripping. A trailing'safter any URL (not just t.co) is treated as English prose, not part of the URL.
Both behaviours match twitter-text's published conformance fixtures.
Summary
Functions
Applies Twitter quirks to a parsed URL record.
Functions
Applies Twitter quirks to a parsed URL record.
Arguments
wrappedis{:ok, record}whererecordis a URL record (fromText.Extract.Url). Wrapped form is used so this can be a step in awith-style pipeline.textis the original source string.
Returns
{:ok, record}— record possibly mutated by quirks (e.g. shorter:url,:span,:path).{:error, :twitter_quirk_rejected}— the URL is rejected entirely (e.g. t.co slug exceeds 40 chars).
Examples
iex> [r] = Text.Extract.urls("see http://t.co/abcde123 today", twitter_quirks: true)
iex> r.url
"http://t.co/abcde123"
iex> Text.Extract.urls("http://t.co/abcdefghijklmnopqrstuvwxyz012345678901234",
...> twitter_quirks: true)
[]