unicode v0.0.1 Unicode

Provides functionality to efficiently check properties of Unicode codepoints, graphemes and strings.

The current implementation is based on Unicode version 8.0.0.

Summary

Functions

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Alphabetic

alphanumeric?(codepoint)

True for alphanumeric characters, but much more performant than an :alnum: regexp checking the same thing

lowercase?(codepoint_or_string)

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Lowercase

math?(codepoint_or_string)

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Math

numeric?(codepoint)

True for the digits [0-9], but much more performant than a regexp checking the same thing

uppercase?(codepoint_or_string)

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Uppercase

Functions

alphabetic?(codepoint_or_string)

Specs

alphabetic?(String.codepoint | String.t) :: boolean

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Alphabetic.

These are all characters that are usually used as representations of letters/syllabes/ in words/sentences. The function takes a unicode codepoint or a string as input.

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex>Unicode.alphabetic?(?a)
true
iex>Unicode.alphabetic?("A")
true
iex>Unicode.alphabetic?("Elixir")
true
iex>Unicode.alphabetic?("الإكسير")
true
iex>Unicode.alphabetic?("foo, bar") # comma and whitespace
false
iex>Unicode.alphabetic?("42")
false
iex>Unicode.alphabetic?("龍王")
true
iex>Unicode.alphabetic?("∑") # Summation, ∑
false
iex>Unicode.alphabetic?("Σ") # Greek capital letter sigma, Σ
true

alphanumeric?(codepoint)

True for alphanumeric characters, but much more performant than an :alnum: regexp checking the same thing.

Returns true if Unicode.alphabetic?(x) or Unicode.numeric?(x).

Derived from http://www.unicode.org/reports/tr18/#alnum

Examples

iex> Unicode.alphanumeric? "1234"
true
iex> Unicode.alphanumeric? "KeyserSöze1995"
true
iex> Unicode.alphanumeric? "3段"
true
iex> Unicode.alphanumeric? "dragon@example.com"
false

lowercase?(codepoint_or_string)

Specs

lowercase?(String.codepoint | String.t) :: boolean

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Lowercase.

Notice that there are many languages that do not have a distinction between cases. Their characters are not included in this group.

The function takes a unicode codepoint or a string as input.

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex>Unicode.lowercase?(?a)
true
iex>Unicode.lowercase?("A")
false
iex>Unicode.lowercase?("Elixir")
false
iex>Unicode.lowercase?("léon")
true
iex>Unicode.lowercase?("foo, bar")
false
iex>Unicode.lowercase?("42")
false
iex>Unicode.lowercase?("Σ")
false
iex>Unicode.lowercase?("σ")
true

math?(codepoint_or_string)

Specs

math?(String.codepoint | String.t) :: boolean

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Math.

These are all characters whose primary usage is in mathematical concepts (and not in alphabets). Notice that the numerical digits are not part of this group. Use Unicode.digit?/1 instead.

The function takes a unicode codepoint or a string as input.

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex>Unicode.math?(?=)
true
iex>Unicode.math?("=")
true
iex>Unicode.math?("1+1=2") # Note that digits themselves are not part of `Math`.
false
iex>Unicode.math?("परिस")
false
iex>Unicode.math?("∑") # Summation, ∑
true
iex>Unicode.math?("Σ") # Greek capital letter sigma, Σ
false

numeric?(codepoint)

True for the digits [0-9], but much more performant than a regexp checking the same thing.

Derived from http://www.unicode.org/reports/tr18/#digit

Examples

iex> Unicode.numeric?("65535")
true
iex> Unicode.numeric?("42")
true
iex> Unicode.numeric?("lapis philosophorum")
false

uppercase?(codepoint_or_string)

Specs

uppercase?(String.codepoint | String.t) :: boolean

Checks if a single Unicode codepoint (or all characters in the given binary string) adhere to the Derived Core Property Uppercase.

Notice that there are many languages that do not have a distinction between cases. Their characters are not included in this group.

The function takes a unicode codepoint or a string as input.

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex>Unicode.uppercase?(?a)
false
iex>Unicode.uppercase?("A")
true
iex>Unicode.uppercase?("Elixir")
false
iex>Unicode.uppercase?("CAMEMBERT")
true
iex>Unicode.uppercase?("foo, bar")
false
iex>Unicode.uppercase?("42")
false
iex>Unicode.uppercase?("Σ")
true
iex>Unicode.uppercase?("σ")
false