ucwidth v0.2.0 Ucwidth View Source
A module to determine the width of a Unicode charactor (or codepoint) on monotyped screens.
A quick comparing between full-width and half-width:
"δΈ" # 1 full-width grapheme
"gg" # 2 half-width graphemes
This module is originally ported from Dr Markus Kuhn's ucwidth library in C, but with updated Unicode database (v13.0.0 currently).
Furthermore, Emoji characters are supported, e.g:
iex> Ucwidth.width("π")
2
Functions provided by this module are grouped into:
width/2
for determining the display widthwide?/1
,ambiguous?/1
,combining?/1
for determining the property of a grapheme
Ambiguous width
According to the Unicode specification of East Asian Width,
some characters have variable width, depending on the context. The left single quotation mark "β"
(\u{2018}
), for example, may take one ore two cells depending on whether it is in a East Asian context or not.
see https://www.unicode.org/reports/tr11/#ED6 for more information.
This module provides an option to specify how ambiguous characters are treated.
see width/2
for more information.
Combined Emoji characters
Sticking to latest Unicode specifications, a combined Emoji grapheme's width is counted as if they are a single emoji, which is 2 cells. Please note not all terminals support latest version of Unicode specification, so there might be conflicts displaying these combined Emoji characters.
For example, the "woman scientist" emoji's width is 2:
iex> Ucwidth.width("π©βπ¬")
2
But in some terminals it may be displayed as π©π¬
This problem is implementation related and this library sticks to canonical Unicode specifications.
Link to this section Summary
Functions
Check if a grapheme is ambiguous in Unicode.
Check if a Unicode grapheme is a combining character.
Check if a grapheme is wide in Unicode.
Check if a grapheme is wide or ambiguous in Unicode.
Get width of a codepoint or grapheme.
Link to this section Functions
ambiguous?(codepoint_or_grapheme)
View Sourceambiguous?(non_neg_integer() | String.t()) :: boolean()
Check if a grapheme is ambiguous in Unicode.
The dataset is generated with uniset: uniset eaw:A
The display width of an ambiguous grapheme is termined based on the context provided. It might take two cells if in an East Asia content context, and one cell otherwise.
iex> Ucwidth.ambiguous?(0x273d)
true
iex> Ucwidth.ambiguous?("ε¨")
false
combining?(codepoint_or_grapheme)
View Sourcecombining?(non_neg_integer() | String.t()) :: boolean()
Check if a Unicode grapheme is a combining character.
The dataset is generated with uniset: uniset cat:Me,Mn,Cf + U+00AD + U+1160..U+11FF + U+200B + U+000C
For example:
iex> Ucwidth.combining?("\u061c")
true
iex> Ucwidth.combining?("-")
false
wide?(codepoint_or_grapheme)
View Sourcewide?(non_neg_integer() | String.t()) :: boolean()
Check if a grapheme is wide in Unicode.
The dataset is generated with uniset: uniset eaw:W,F
A grapheme is considered wide only if it:
- is East Asia Wide, or
- is East Asia Fullwidth
wide_or_ambiguous?(codepoint_or_grapheme)
View Sourcewide_or_ambiguous?(non_neg_integer() | String.t()) :: boolean()
Check if a grapheme is wide or ambiguous in Unicode.
The dataset is generated with uniset: uniset eaw:W,F,A
see wide?/1
for definition of wide.
see ambiguous?/1
for definition of ambiguous.
width(codepoint_or_graphemes, ambiguous_as \\ :narrow)
View Sourcewidth(non_neg_integer() | String.t(), :wide | :narrow) :: 0 | 1 | 2 | {:error, :bad_arg}
Get width of a codepoint or grapheme.
Parameters
codepoint_or_graphemes
- a string or unicode codepoint- an integer within valid unicode code range (
0..0x11ffff
) - a string, e.g
"c"
,"\u{3f0a1}"
,"hey"
- an integer within valid unicode code range (
ambiguous_as
- the treament of ambiguous characters, by default:narrow
:narrow
- treated as f they are narrow:wide
- treated as if they are wideFor example:
iex> Ucwidth.width("\u00a1", :narrow) 1 iex> Ucwidth.width("\u00a1", :wide) 2
Return values
Returns the width of the grapheme/codepoint:
0
means this grapheme is invisible and takes no space on screen.1
means it takes one cell to display. For instance, English letters are one cell wide.2
means it takes two cells to display. This is quite common in East Asian charsets.
Examples
iex> Ucwidth.width(0)
0
iex> Ucwidth.width("5")
1
iex> Ucwidth.width("γΏ")
1
iex> Ucwidth.width("β©")
2
iex> Ucwidth.width("βΊ")
2
iex> Ucwidth.width(255)
1
If string length is greater than 1, the sum of its graphemes' width is returned.
iex> Ucwidth.width("abc")
3
iex> Ucwidth.width("δ»δ»")
4