View Source Unicode.Regex (Unicode Set v1.3.0)
Implements Unicode regular expressions by transforming them into regular expressions supported by the Elixir Regex module.
Summary
Functions
Compiles a binary regular expression after expanding any Unicode Sets.
Compiles a binary regular expression after interpolating any Unicode Sets.
Returns a boolean indicating whether there was a match or not with a Unicode Set.
Split a regex into character classes so that these can then be later compiled.
Functions
Compiles a binary regular expression after expanding any Unicode Sets.
Arguments
string
is a regular expression in string formoptions
is a string or a list which is passed unchanged toRegex.compile/2
. The default is "u" meaning the regular expression will operate in Unicode mode
Returns
{:ok, regex}
or{:error, {message, index}}
Notes
This function operates by splitting the string at the boundaries of Unicode Set markers which are:
- Posix style:
[:
and:]
- Perl style:
\p{
and}
This parsing is naive meaning that is does not take any character escaping into account when s plitting the string.
Example
iex> Unicode.Regex.compile("[:Zs:]")
{:ok, ~r/[\x{20}\x{A0}\x{1680}\x{2000}-\x{200A}\x{202F}\x{205F}\x{3000}]/u}
iex> Unicode.Regex.compile("\\p{Zs}")
{:ok, ~r/[\x{20}\x{A0}\x{1680}\x{2000}-\x{200A}\x{202F}\x{205F}\x{3000}]/u}
iex> Unicode.Regex.compile("[:ZZZZ:]")
{:error, {'POSIX named classes are supported only within a class', 0}}
Compiles a binary regular expression after interpolating any Unicode Sets.
Arguments
string
is a regular expression in string form.options
is a string or a list which is passed unchanged toRegex.compile/2
. The default is "u" meaning the regular expression will operate in Unicode mode
Returns
regex
orraises an exception
Example
iex> Unicode.Regex.compile!("[:Zs:]")
~r/[\x{20}\x{A0}\x{1680}\x{2000}-\x{200A}\x{202F}\x{205F}\x{3000}]/u
Returns a boolean indicating whether there was a match or not with a Unicode Set.
Arguments
regex_string
is a regular expression in string form.string
is any string against which the regex match is executedoptions
is a string or a list which is passed unchanged toRegex.compile/2
. The default is "u" meaning the regular expression will operate in Unicode mode
Returns
a boolean indicating if there was a match or
raises an exception if
regex
is not a valid regular expression.
Example
iex> Unicode.Regex.match?("[:Sc:]", "$")
true
Split a regex into character classes so that these can then be later compiled.
Arguments
string
is a regular expression in string form.
Returns
- A list of string split at the boundaries of unicode sets
Example
iex> Unicode.Regex.split_character_classes("This is [:Zs:] and more")
["This is ", "[:Zs:]", " and more"]