View Source Unicode.Regex (Unicode Set v1.3.0)

Implements Unicode regular expressions by transforming them into regular expressions supported by the Elixir Regex module.

Summary

Functions

Compiles a binary regular expression after expanding any Unicode Sets.

Compiles a binary regular expression after interpolating any Unicode Sets.

Returns a boolean indicating whether there was a match or not with a Unicode Set.

Split a regex into character classes so that these can then be later compiled.

Functions

Link to this function

compile(string, options \\ "u")

View Source

Compiles a binary regular expression after expanding any Unicode Sets.

Arguments

  • string is a regular expression in string form

  • options is a string or a list which is passed unchanged to Regex.compile/2. The default is "u" meaning the regular expression will operate in Unicode mode

Returns

  • {:ok, regex} or

  • {:error, {message, index}}

Notes

This function operates by splitting the string at the boundaries of Unicode Set markers which are:

  • Posix style: [: and :]
  • Perl style: \p{ and }

This parsing is naive meaning that is does not take any character escaping into account when s plitting the string.

Example

iex> Unicode.Regex.compile("[:Zs:]")
{:ok, ~r/[\x{20}\x{A0}\x{1680}\x{2000}-\x{200A}\x{202F}\x{205F}\x{3000}]/u}

iex> Unicode.Regex.compile("\\p{Zs}")
{:ok, ~r/[\x{20}\x{A0}\x{1680}\x{2000}-\x{200A}\x{202F}\x{205F}\x{3000}]/u}

iex> Unicode.Regex.compile("[:ZZZZ:]")
{:error, {'POSIX named classes are supported only within a class', 0}}
Link to this function

compile!(string, opts \\ "u")

View Source

Compiles a binary regular expression after interpolating any Unicode Sets.

Arguments

  • string is a regular expression in string form.

  • options is a string or a list which is passed unchanged to Regex.compile/2. The default is "u" meaning the regular expression will operate in Unicode mode

Returns

  • regex or

  • raises an exception

Example

iex> Unicode.Regex.compile!("[:Zs:]")
~r/[\x{20}\x{A0}\x{1680}\x{2000}-\x{200A}\x{202F}\x{205F}\x{3000}]/u
Link to this macro

is_perl_set(c)

View Source (macro)
Link to this function

match?(regex_string, string, opts \\ "u")

View Source

Returns a boolean indicating whether there was a match or not with a Unicode Set.

Arguments

  • regex_string is a regular expression in string form.

  • string is any string against which the regex match is executed

  • options is a string or a list which is passed unchanged to Regex.compile/2. The default is "u" meaning the regular expression will operate in Unicode mode

Returns

  • a boolean indicating if there was a match or

  • raises an exception if regex is not a valid regular expression.

Example

iex> Unicode.Regex.match?("[:Sc:]", "$")
true
Link to this function

split_character_classes(string)

View Source

Split a regex into character classes so that these can then be later compiled.

Arguments

  • string is a regular expression in string form.

Returns

  • A list of string split at the boundaries of unicode sets

Example

iex> Unicode.Regex.split_character_classes("This is [:Zs:] and more")
["This is ", "[:Zs:]", " and more"]