View Source Unicode.String.Case.Mapping (Unicode String v1.4.1)
The Unicode Case Mapping algorithm defines the process and data to transform text into upper case, lower case or title case.
Since most languages are not bicameral, characters which have no appropriate mapping remain unchanged.
Three case mapping functions are provided as a public API which have their implementations in this module:
Unicode.String.upcase/2
which will convert text to upper case characters.Unicode.String.downcase/2
which will convert text to lower case characters.Unicode.String.titlecase/2
which will convert text to title case. Title case means that the first character or each word is set to upper case and all other characters in the word are set to lower case.Unicode.String.split/2
is used to split the string into words before title casing.
Each function operates in a locale-aware manner implementing some basic capabilities:
- Casing rules for the Turkish dotted capital
I
and dotless smalli
. - Casing rules for the retention of dots over
i
for Lithuanian letters with additional accents. - Titlecasing of IJ at the start of words in Dutch.
- Removal of accents when upper casing letters in Greek.
There are other casing rules that are not currently implemented such as:
- Titlecasing of second or subsequent letters in words in orthographies that include caseless letters such as apostrophes.
- Uppercasing of U+00DF
ß
latin small letter sharps
to U+1E9Eẞ
latin capital letter sharps
.
Examples
# Basic case transformation
iex> Unicode.String.Case.Mapping.upcase("the quick brown fox")
"THE QUICK BROWN FOX"
# Dotted-I in Turkish and Azeri
iex> Unicode.String.Case.Mapping.upcase("Diyarbakır", :tr)
"DİYARBAKIR"
# Upper case in Greek removes diacritics
iex> Unicode.String.Case.Mapping.upcase("Πατάτα, Αέρας, Μυστήριο", :el)
"ΠΑΤΑΤΑ, ΑΕΡΑΣ, ΜΥΣΤΗΡΙΟ"
# Lower case Greek with a final sigma
iex> Unicode.String.Case.Mapping.downcase("ὈΔΥΣΣΕΎΣ", :el)
"ὀδυσσεύς"
# Title case Dutch with leading dipthong
iex> Unicode.String.Case.Mapping.titlecase("ijsselmeer", :nl)
"IJsselmeer"
Summary
Functions
Replace upper case characters with their lower case equivalents.
Apply to Unicode title case algorithm.
Replace lower case characters with their uppercase equivalents.
Functions
Replace upper case characters with their lower case equivalents.
Apply to Unicode title case algorithm.
Replace lower case characters with their uppercase equivalents.
Lower case characters are replaced with their upper case equivalents. All other characters remain unchanged.
For the Greek language (:el
), all accents are
removed prior to capitalization as is the normal
practise for this language.