Cldr Unicode v0.8.0 Cldr.Unicode View Source

Functions to introspect the Unicode character database and to provide fast codepoint lookups.

Link to this section Summary

Types

Functions

Returns true if a single Unicode codepoint (or all characters in the given string) adhere to the Derived Core Property Alphabetic otherwise returns false

alphanumeric?(codepoint_or_string)

Returns true if a single Unicode codepoint (or all characters in the given string) are either alphabetic?/1 or numeric?/1 otherwise returns false

block(codepoint_or_string)

Returns the block name of a codepoint or the list of block names for each codepoint in a string

cased?(codepoint_or_string)

Returns either true if the codepoint has the :cased property or false

category(codepoint_or_string)

Returns the Unicode category for a codepoint or a list of categories for a string

class(codepoint_or_string)

digits?(codepoint_or_string)

Returns true if a single Unicode codepoint (or all characters in the given string) adhere to Unicode category :Nd otherwise returns false

emoji?(codepoint_or_string)

Returns true if a single Unicode codepoint (or all characters in the given string) are emoji otherwise returns false

lowercase?(codepoint_or_string)

Returns true if a single Unicode codepoint (or all characters in the given string) the category :Ll otherwise returns false

math?(codepoint_or_string)

Returns true if a single Unicode codepoint (or all characters in the given string) the category :Sm otherwise returns false

numeric?(codepoint_or_string)

Returns true if a single Unicode codepoint (or all characters in the given string) adhere to Unicode categories :Nd, :Nl and :No otherwise returns false

properties(codepoint_or_string)

Returns the list of properties of each codepoint in a given string or the list of properties for a given string

script(codepoint_or_string)

Returns the script name of a codepoint or the list of block names for each codepoint in a string

unaccent(string)

Removes accents (diacritical marks) from a string

uppercase?(codepoint_or_string)

Returns true if a single Unicode codepoint (or all characters in the given string) the category :Lu otherwise returns false

version()

Returns the version of Unicode in Cldr.Unicode

Link to this section Types

codepoint()

codepoint() :: non_neg_integer()

codepoint_or_string()

codepoint_or_string() :: codepoint() | String.t()

Link to this section Functions

alphabetic?(codepoint_or_string)

alphabetic?(codepoint_or_string()) :: boolean()

Returns true if a single Unicode codepoint (or all characters in the given string) adhere to the Derived Core Property Alphabetic otherwise returns false.

These are all characters that are usually used as representations of letters/syllabes/ in words/sentences.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

true or false

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex> Cldr.Unicode.alphabetic?(?a)
true

iex> Cldr.Unicode.alphabetic?("A")
true

iex> Cldr.Unicode.alphabetic?("Elixir")
true

iex> Cldr.Unicode.alphabetic?("الإكسير")
true

iex> Cldr.Unicode.alphabetic?("foo, bar") # comma and whitespace
false

iex> Cldr.Unicode.alphabetic?("42")
false

iex> Cldr.Unicode.alphabetic?("龍王")
true

iex> Cldr.Unicode.alphabetic?("∑") # Summation, ∑
false

iex> Cldr.Unicode.alphabetic?("Σ") # Greek capital letter sigma, Σ
true

alphanumeric?(codepoint_or_string)

alphanumeric?(codepoint_or_string()) :: boolean()

Returns true if a single Unicode codepoint (or all characters in the given string) are either alphabetic?/1 or numeric?/1 otherwise returns false.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

true or false

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex> Cldr.Unicode.alphanumeric? "1234"
true

iex> Cldr.Unicode.alphanumeric? "KeyserSöze1995"
true

iex> Cldr.Unicode.alphanumeric? "3段"
true

iex> Cldr.Unicode.alphanumeric? "dragon@example.com"
false

block(codepoint_or_string)

block(codepoint_or_string()) :: atom() | [atom(), ...]

Returns the block name of a codepoint or the list of block names for each codepoint in a string.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

in the case of a single codepoint, an atom block name
in the case of a string, a list of atom block names for each codepoint in the codepoint_or_string

Exmaples

iex> Cldr.Unicode.block ?ä
:latin_1_supplement

iex> Cldr.Unicode.block ?A
:basic_latin

iex> Cldr.Unicode.block "äA"
[:latin_1_supplement, :basic_latin]

cased?(codepoint_or_string)

cased?(codepoint_or_string()) :: boolean()

Returns either true if the codepoint has the :cased property or false.

The :cased property means that this character has at least an upper and lower representation and possibly a titlecase representation too.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

true or false

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex> Cldr.Unicode.cased? ?ယ
false

iex> Cldr.Unicode.cased? ?A
true

category(codepoint_or_string)

category(codepoint_or_string()) :: atom() | [atom(), ...]

category(codepoint_or_string()) :: atom() | [atom(), ...]

Returns the Unicode category for a codepoint or a list of categories for a string.

Argument

codepoint_or_string is a single integer codepoint or a String.t.

Returns

in the case of a single codepoint, an atom representing one of the categories listed below
in the case of a string, a list representing the category for each codepoint in the string

Notes

THese categories match the names of the Unicode character classes used in various regular expression engine. The full list of categories is:

Category	Matches
:C	Other
:Cc	Control
:Cf	Format
:Cn	Unassigned
:Co	Private use
:Cs	Surrogate
:L	Letter
:Ll	Lower case letter
:Lm	Modifier letter
:Lo	Other letter
:Lt	Title case letter
:Lu	Upper case letter
:M	Mark
:Mc	Spacing mark
:Me	Enclosing mark
:Mn	Non-spacing mark
:N	Number
:Nd	Decimal number
:Nl	Letter number
:No	Other number
:P	Punctuation
:Pc	Connector punctuation
:Pd	Dash punctuation
:Pe	Close punctuation
:Pf	Final punctuation
:Pi	Initial punctuation
:Po	Other punctuation
:Ps	Open punctuation
:S	Symbol
:Sc	Currency symbol
:Sk	Modifier symbol
:Sm	Mathematical symbol
:So	Other symbol
:Z	Separator
:Zl	Line separator
:Zp	Paragraph separator
:Zs	Space separator

Note too that the group level categories like :L, :M, :S and so on are not assigned to any codepoint. They can only be identified by combining the results for each of the subsidiary categories.

Examples

iex> Cldr.Unicode.category ?ä
:Ll

iex> Cldr.Unicode.category ?A
:Lu

iex> Cldr.Unicode.category ?🧐
:So

iex> Cldr.Unicode.category ?+
:Sm

iex> Cldr.Unicode.category ?1
:Nd

iex> Cldr.Unicode.category "äA"
[:Ll, :Lu]

class(codepoint_or_string) View Source

digits?(codepoint_or_string)

digits?(codepoint_or_string()) :: boolean()

Returns true if a single Unicode codepoint (or all characters in the given string) adhere to Unicode category :Nd otherwise returns false.

This group of characters represents the decimal digits zero through nine (0..9) and the equivalents in non-Latin scripts.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

true or false

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

emoji?(codepoint_or_string)

emoji?(codepoint_or_string()) :: boolean()

Returns true if a single Unicode codepoint (or all characters in the given string) are emoji otherwise returns false.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

true or false

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex> Cldr.Unicode.emoji? "🧐🤓🤩🤩️🤯"
true

lowercase?(codepoint_or_string)

lowercase?(codepoint_or_string()) :: boolean()

Returns true if a single Unicode codepoint (or all characters in the given string) the category :Ll otherwise returns false.

Notice that there are many languages that do not have a distinction between cases. Their characters are not included in this group.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

true or false

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex> Cldr.Unicode.lowercase?(?a)
true

iex> Cldr.Unicode.lowercase?("A")
false

iex> Cldr.Unicode.lowercase?("Elixir")
false

iex> Cldr.Unicode.lowercase?("léon")
true

iex> Cldr.Unicode.lowercase?("foo, bar")
false

iex> Cldr.Unicode.lowercase?("42")
false

iex> Cldr.Unicode.lowercase?("Σ")
false

iex> Cldr.Unicode.lowercase?("σ")
true

math?(codepoint_or_string)

math?(codepoint_or_string()) :: boolean()

Returns true if a single Unicode codepoint (or all characters in the given string) the category :Sm otherwise returns false.

These are all characters whose primary usage is in mathematical concepts (and not in alphabets). Notice that the numerical digits are not part of this group.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

true or false

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex> Cldr.Unicode.math?(?=)
true

iex> Cldr.Unicode.math?("=")
true

iex> Cldr.Unicode.math?("1+1=2") # Digits do not have the `:math` property.
false

iex> Cldr.Unicode.math?("परिस")
false

iex> Cldr.Unicode.math?("∑") # Summation, \u2211
true

iex> Cldr.Unicode.math?("Σ") # Greek capital letter sigma, \u03a3
false

numeric?(codepoint_or_string)

numeric?(codepoint_or_string()) :: boolean()

Returns true if a single Unicode codepoint (or all characters in the given string) adhere to Unicode categories :Nd, :Nl and :No otherwise returns false.

This group of characters represents the decimal digits zero through nine (0..9) and the equivalents in non-Latin scripts.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

true or false

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex> Cldr.Unicode.numeric?("65535")
true

iex> Cldr.Unicode.numeric?("42")
true

iex> Cldr.Unicode.numeric?("lapis philosophorum")
false

properties(codepoint_or_string)

properties(codepoint_or_string()) :: [atom(), ...] | [[atom(), ...], ...]

Returns the list of properties of each codepoint in a given string or the list of properties for a given string.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

in the case of a single codepoint, an atom list of properties
in the case of a string, a list of atom lisr for each codepoint in the codepoint_or_string

Exmaples

iex> Cldr.Unicode.properties 0x1bf0
[:alphabetic, :case_ignorable]

iex> Cldr.Unicode.properties ?A
[:alphabetic, :uppercase, :cased]

iex> Cldr.Unicode.properties ?+
[:math]

iex> Cldr.Unicode.properties "a1+"
[[:alphabetic, :lowercase, :cased], [:numeric, :emoji], [:math]]

script(codepoint_or_string)

script(codepoint_or_string()) :: String.t() | [String.t(), ...]

Returns the script name of a codepoint or the list of block names for each codepoint in a string.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

in the case of a single codepoint, a string script name
in the case of a string, a list of string script names for each codepoint in the codepoint_or_string

Exmaples

iex> Cldr.Unicode.script ?ä
"latin"

iex> Cldr.Unicode.script ?خ
"arabic"

iex> Cldr.Unicode.script ?अ
"devanagari"

iex> Cldr.Unicode.script ?א
"hebrew"

iex> Cldr.Unicode.script ?Ж
"cyrillic"

iex> Cldr.Unicode.script ?δ
"greek"

iex> Cldr.Unicode.script ?ก
"thai"

iex> Cldr.Unicode.script ?ယ
"myanmar"

unaccent(string) View Source

Removes accents (diacritical marks) from a string.

Arguments

string is any String.t

Returns

A string with all diacritical marks removed

Notes

The string is first normalised to :nfd form and then all characters in the block :comnbining_diacritical_marks is removed from the string

Example

iex> Cldr.Unicode.unaccent("Et Ça sera sa moitié.")
"Et Ca sera sa moitie."

uppercase?(codepoint_or_string)

uppercase?(codepoint_or_string()) :: boolean()

Returns true if a single Unicode codepoint (or all characters in the given string) the category :Lu otherwise returns false.

Notice that there are many languages that do not have a distinction between cases. Their characters are not included in this group.

Arguments

codepoint_or_string is a single integer codepoint or a String.t.

Returns

true or false

For the string-version, the result will be true only if all codepoints in the string adhere to the property.

Examples

iex> Cldr.Unicode.uppercase?(?a)
false

iex> Cldr.Unicode.uppercase?("A")
true

iex> Cldr.Unicode.uppercase?("Elixir")
false

iex> Cldr.Unicode.uppercase?("CAMEMBERT")
true

iex> Cldr.Unicode.uppercase?("foo, bar")
false

iex> Cldr.Unicode.uppercase?("42")
false

iex> Cldr.Unicode.uppercase?("Σ")
true

iex> Cldr.Unicode.uppercase?("σ")
false

version() View Source

Returns the version of Unicode in Cldr.Unicode.

Cldr Unicode

v0.8.0

Cldr Unicode v0.8.0 Cldr.Unicode View Source

Link to this section Summary

Types

Functions

Link to this section Types

codepoint() View Source codepoint() :: non_neg_integer()

codepoint_or_string() View Source codepoint_or_string() :: codepoint() | String.t()

Link to this section Functions

alphabetic?(codepoint_or_string) View Source alphabetic?(codepoint_or_string()) :: boolean()

Arguments

Returns

Examples

alphanumeric?(codepoint_or_string) View Source alphanumeric?(codepoint_or_string()) :: boolean()

Arguments

Returns

Examples

block(codepoint_or_string) View Source block(codepoint_or_string()) :: atom() | [atom(), ...]

Arguments

Returns

Exmaples

cased?(codepoint_or_string) View Source cased?(codepoint_or_string()) :: boolean()

Arguments

Returns

Examples

category(codepoint_or_string) View Source category(codepoint_or_string()) :: atom() | [atom(), ...] category(codepoint_or_string()) :: atom() | [atom(), ...]

Argument

Returns

Notes

Examples

class(codepoint_or_string) View Source

digits?(codepoint_or_string) View Source digits?(codepoint_or_string()) :: boolean()

Arguments

Returns

Examples

emoji?(codepoint_or_string) View Source emoji?(codepoint_or_string()) :: boolean()

Arguments

Returns

Examples

lowercase?(codepoint_or_string) View Source lowercase?(codepoint_or_string()) :: boolean()

Arguments

Returns

Examples

math?(codepoint_or_string) View Source math?(codepoint_or_string()) :: boolean()

Arguments

Returns

Examples

numeric?(codepoint_or_string) View Source numeric?(codepoint_or_string()) :: boolean()

Arguments

Returns

Examples

properties(codepoint_or_string) View Source properties(codepoint_or_string()) :: [atom(), ...] | [[atom(), ...], ...]

Arguments

Returns

Exmaples

script(codepoint_or_string) View Source script(codepoint_or_string()) :: String.t() | [String.t(), ...]

Arguments

Returns

Exmaples

unaccent(string) View Source

Arguments

Returns

Notes

Example

uppercase?(codepoint_or_string) View Source uppercase?(codepoint_or_string()) :: boolean()

Arguments

Returns

Examples

version() View Source

codepoint() View Source

codepoint() :: non_neg_integer()

codepoint_or_string() View Source

codepoint_or_string() :: codepoint() | String.t()

alphabetic?(codepoint_or_string) View Source

alphabetic?(codepoint_or_string()) :: boolean()

alphanumeric?(codepoint_or_string) View Source

alphanumeric?(codepoint_or_string()) :: boolean()

block(codepoint_or_string) View Source

block(codepoint_or_string()) :: atom() | [atom(), ...]

cased?(codepoint_or_string) View Source

cased?(codepoint_or_string()) :: boolean()

category(codepoint_or_string) View Source

category(codepoint_or_string()) :: atom() | [atom(), ...]

category(codepoint_or_string()) :: atom() | [atom(), ...]

digits?(codepoint_or_string) View Source

digits?(codepoint_or_string()) :: boolean()

emoji?(codepoint_or_string) View Source

emoji?(codepoint_or_string()) :: boolean()

lowercase?(codepoint_or_string) View Source

lowercase?(codepoint_or_string()) :: boolean()

math?(codepoint_or_string) View Source

math?(codepoint_or_string()) :: boolean()

numeric?(codepoint_or_string) View Source

numeric?(codepoint_or_string()) :: boolean()

properties(codepoint_or_string) View Source

properties(codepoint_or_string()) :: [atom(), ...] | [[atom(), ...], ...]

script(codepoint_or_string) View Source

script(codepoint_or_string()) :: String.t() | [String.t(), ...]

uppercase?(codepoint_or_string) View Source

uppercase?(codepoint_or_string()) :: boolean()