Unicode.category
You're seeing just the function
category
, go back to Unicode module for more information.
Specs
category(codepoint_or_string()) :: atom() | [atom(), ...]
Returns the Unicode category for a codepoint or a list of categories for a string.
Argument
codepoint_or_string
is a single integer codepoint or aString.t
.
Returns
in the case of a single codepoint, an atom representing one of the categories listed below
in the case of a string, a list representing the category for each codepoint in the string
Notes
THese categories match the names of the Unicode character classes used in various regular expression engines and in Unicode Sets. The full list of categories is:
Category | Matches |
---|---|
:C | Other |
:Cc | Control |
:Cf | Format |
:Cn | Unassigned |
:Co | Private use |
:Cs | Surrogate |
:L | Letter |
:Ll | Lower case letter |
:Lm | Modifier letter |
:Lo | Other letter |
:Lt | Title case letter |
:Lu | Upper case letter |
:M | Mark |
:Mc | Spacing mark |
:Me | Enclosing mark |
:Mn | Non-spacing mark |
:N | Number |
:Nd | Decimal number |
:Nl | Letter number |
:No | Other number |
:P | Punctuation |
:Pc | Connector punctuation |
:Pd | Dash punctuation |
:Pe | Close punctuation |
:Pf | Final punctuation |
:Pi | Initial punctuation |
:Po | Other punctuation |
:Ps | Open punctuation |
:S | Symbol |
:Sc | Currency symbol |
:Sk | Modifier symbol |
:Sm | Mathematical symbol |
:So | Other symbol |
:Z | Separator |
:Zl | Line separator |
:Zp | Paragraph separator |
:Zs | Space separator |
Note too that the group level categories like :L
,
:M
, :S
and so on are not assigned to any codepoint.
They can only be identified by combining the results
for each of the subsidiary categories.
Examples
iex> Unicode.category ?ä
:Ll
iex> Unicode.category ?A
:Lu
iex> Unicode.category ?🧐
:So
iex> Unicode.category ?+
:Sm
iex> Unicode.category ?1
:Nd
iex> Unicode.category "äA"
[:Ll, :Lu]