unicode v0.0.1 Unicode
Provides functionality to efficiently check properties of Unicode codepoints, graphemes and strings.
The current implementation is based on Unicode version 8.0.0.
Summary
Functions
Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Alphabetic
True for alphanumeric characters, but much more performant than an :alnum: regexp checking the same thing
Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Lowercase
Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Math
True for the digits [0-9], but much more performant than a regexp checking the same thing
Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Uppercase
Functions
Specs
alphabetic?(String.codepoint | String.t) :: boolean
Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Alphabetic.
These are all characters that are usually used as representations of letters/syllabes/ in words/sentences. The function takes a unicode codepoint or a string as input.
For the string-version, the result will be true only if all codepoints in the string adhere to the property.
Examples
iex>Unicode.alphabetic?(?a)
true
iex>Unicode.alphabetic?("A")
true
iex>Unicode.alphabetic?("Elixir")
true
iex>Unicode.alphabetic?("الإكسير")
true
iex>Unicode.alphabetic?("foo, bar") # comma and whitespace
false
iex>Unicode.alphabetic?("42")
false
iex>Unicode.alphabetic?("龍王")
true
iex>Unicode.alphabetic?("∑") # Summation, ∑
false
iex>Unicode.alphabetic?("Σ") # Greek capital letter sigma, Σ
true
True for alphanumeric characters, but much more performant than an :alnum: regexp checking the same thing.
Returns true if Unicode.alphabetic?(x) or Unicode.numeric?(x).
Derived from http://www.unicode.org/reports/tr18/#alnum
Examples
iex> Unicode.alphanumeric? "1234"
true
iex> Unicode.alphanumeric? "KeyserSöze1995"
true
iex> Unicode.alphanumeric? "3段"
true
iex> Unicode.alphanumeric? "dragon@example.com"
false
Specs
lowercase?(String.codepoint | String.t) :: boolean
Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Lowercase.
Notice that there are many languages that do not have a distinction between cases. Their characters are not included in this group.
The function takes a unicode codepoint or a string as input.
For the string-version, the result will be true only if all codepoints in the string adhere to the property.
Examples
iex>Unicode.lowercase?(?a)
true
iex>Unicode.lowercase?("A")
false
iex>Unicode.lowercase?("Elixir")
false
iex>Unicode.lowercase?("léon")
true
iex>Unicode.lowercase?("foo, bar")
false
iex>Unicode.lowercase?("42")
false
iex>Unicode.lowercase?("Σ")
false
iex>Unicode.lowercase?("σ")
true
Specs
math?(String.codepoint | String.t) :: boolean
Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Math.
These are all characters whose primary usage is in mathematical concepts (and not in alphabets).
Notice that the numerical digits are not part of this group. Use Unicode.digit?/1 instead.
The function takes a unicode codepoint or a string as input.
For the string-version, the result will be true only if all codepoints in the string adhere to the property.
Examples
iex>Unicode.math?(?=)
true
iex>Unicode.math?("=")
true
iex>Unicode.math?("1+1=2") # Note that digits themselves are not part of `Math`.
false
iex>Unicode.math?("परिस")
false
iex>Unicode.math?("∑") # Summation, ∑
true
iex>Unicode.math?("Σ") # Greek capital letter sigma, Σ
false
True for the digits [0-9], but much more performant than a regexp checking the same thing.
Derived from http://www.unicode.org/reports/tr18/#digit
Examples
iex> Unicode.numeric?("65535")
true
iex> Unicode.numeric?("42")
true
iex> Unicode.numeric?("lapis philosophorum")
false
Specs
uppercase?(String.codepoint | String.t) :: boolean
Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Uppercase.
Notice that there are many languages that do not have a distinction between cases. Their characters are not included in this group.
The function takes a unicode codepoint or a string as input.
For the string-version, the result will be true only if all codepoints in the string adhere to the property.
Examples
iex>Unicode.uppercase?(?a)
false
iex>Unicode.uppercase?("A")
true
iex>Unicode.uppercase?("Elixir")
false
iex>Unicode.uppercase?("CAMEMBERT")
true
iex>Unicode.uppercase?("foo, bar")
false
iex>Unicode.uppercase?("42")
false
iex>Unicode.uppercase?("Σ")
true
iex>Unicode.uppercase?("σ")
false