Elixir v1.7.4 String View Source
A String in Elixir is a UTF-8 encoded binary.
Codepoints and grapheme cluster
The functions in this module act according to the Unicode Standard, version 11.0.0.
As per the standard, a codepoint is a single Unicode Character, which may be represented by one or more bytes.
For example, the codepoint “é” is two bytes:
iex> byte_size("é")
2
However, this module returns the proper length:
iex> String.length("é")
1
Furthermore, this module also presents the concept of grapheme cluster (from now on referenced as graphemes). Graphemes can consist of multiple codepoints that may be perceived as a single character by readers. For example, “é” can be represented either as a single “e with acute” codepoint or as the letter “e” followed by a “combining acute accent” (two codepoints):
iex> string = "\u0065\u0301"
iex> byte_size(string)
3
iex> String.length(string)
1
iex> String.codepoints(string)
["e", "́"]
iex> String.graphemes(string)
["é"]
Although the example above is made of two characters, it is perceived by users as one.
Graphemes can also be two characters that are interpreted as one by some languages. For example, some languages may consider “ch” as a single character. However, since this information depends on the locale, it is not taken into account by this module.
In general, the functions in this module rely on the Unicode Standard, but do not contain any of the locale specific behaviour.
More information about graphemes can be found in the Unicode Standard Annex #29. The current Elixir version implements Extended Grapheme Cluster algorithm.
For converting a binary to a different encoding and for Unicode
normalization mechanisms, see Erlang’s :unicode
module.
String and binary operations
To act according to the Unicode Standard, many functions in this module run in linear time, as they need to traverse the whole string considering the proper Unicode codepoints.
For example, String.length/1
will take longer as
the input grows. On the other hand, Kernel.byte_size/1
always runs
in constant time (i.e. regardless of the input size).
This means often there are performance costs in using the functions in this module, compared to the more low-level operations that work directly with binaries:
Kernel.binary_part/3
- retrieves part of the binaryKernel.bit_size/1
andKernel.byte_size/1
- size related functionsKernel.is_bitstring/1
andKernel.is_binary/1
- type checking function- Plus a number of functions for working with binaries (bytes)
in the
:binary
module
There are many situations where using the String
module can
be avoided in favor of binary functions or pattern matching.
For example, imagine you have a string prefix
and you want to
remove this prefix from another string named full
.
One may be tempted to write:
iex> take_prefix = fn full, prefix ->
...> base = String.length(prefix)
...> String.slice(full, base, String.length(full) - base)
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"
Although the function above works, it performs poorly. To
calculate the length of the string, we need to traverse it
fully, so we traverse both prefix
and full
strings, then
slice the full
one, traversing it again.
A first attempt at improving it could be with ranges:
iex> take_prefix = fn full, prefix ->
...> base = String.length(prefix)
...> String.slice(full, base..-1)
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"
While this is much better (we don’t traverse full
twice),
it could still be improved. In this case, since we want to
extract a substring from a string, we can use Kernel.byte_size/1
and Kernel.binary_part/3
as there is no chance we will slice in
the middle of a codepoint made of more than one byte:
iex> take_prefix = fn full, prefix ->
...> base = byte_size(prefix)
...> binary_part(full, base, byte_size(full) - base)
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"
Or simply use pattern matching:
iex> take_prefix = fn full, prefix ->
...> base = byte_size(prefix)
...> <<_::binary-size(base), rest::binary>> = full
...> rest
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"
On the other hand, if you want to dynamically slice a string
based on an integer value, then using String.slice/3
is the
best option as it guarantees we won’t incorrectly split a valid
codepoint into multiple bytes.
Integer codepoints
Although codepoints could be represented as integers, this module represents all codepoints as strings. For example:
iex> String.codepoints("olá")
["o", "l", "á"]
There are a couple of ways to retrieve a character integer
codepoint. One may use the ?
construct:
iex> ?o
111
iex> ?á
225
Or also via pattern matching:
iex> <<aacute::utf8>> = "á"
iex> aacute
225
As we have seen above, codepoints can be inserted into a string by their hexadecimal code:
"ol\u0061\u0301" #=>
"olá"
Self-synchronization
The UTF-8 encoding is self-synchronizing. This means that if malformed data (i.e., data that is not possible according to the definition of the encoding) is encountered, only one codepoint needs to be rejected.
This module relies on this behaviour to ignore such invalid
characters. For example, length/1
will return
a correct result even if an invalid codepoint is fed into it.
In other words, this module expects invalid data to be detected
elsewhere, usually when retrieving data from the external source.
For example, a driver that reads strings from a database will be
responsible to check the validity of the encoding. String.chunk/2
can be used for breaking a string into valid and invalid parts.
Patterns
Many functions in this module work with patterns. For example,
String.split/2
can split a string into multiple patterns given
a pattern. This pattern can be a string, a list of strings or
a compiled pattern:
iex> String.split("foo bar", " ")
["foo", "bar"]
iex> String.split("foo bar!", [" ", "!"])
["foo", "bar", ""]
iex> pattern = :binary.compile_pattern([" ", "!"])
iex> String.split("foo bar!", pattern)
["foo", "bar", ""]
The compiled pattern is useful when the same match will be done over and over again. Note though the compiled pattern cannot be stored in a module attribute as the pattern is generated at runtime and does not survive compile term.
Link to this section Summary
Types
A UTF-8 codepoint. It may be one or more bytes
Multiple codepoints that may be perceived as a single character by readers
A UTF-8 encoded binary
Functions
Returns the grapheme at the position
of the given UTF-8 string
.
If position
is greater than string
length, then it returns nil
Converts the first character in the given string to
uppercase and the remainder to lowercase according to mode
Splits the string into chunks of characters that share a common trait
Returns all codepoints in the string
Checks if string
contains any of the given contents
Converts all characters in the given string to lowercase according to mode
Returns a string subject
duplicated n
times
Returns true
if string
ends with any of the suffixes given
Returns true
if string1
is canonically equivalent to ‘string2’
Returns the first grapheme from a UTF-8 string,
nil
if the string is empty
Returns Unicode graphemes in the string as per Extended Grapheme Cluster algorithm
Returns a float value between 0 (equates to no similarity) and 1 (is an exact match)
representing Jaro
distance between string1
and string2
Returns the last grapheme from a UTF-8 string,
nil
if the string is empty
Returns the number of Unicode graphemes in a UTF-8 string
Checks if string
matches the given regular expression
Returns a keyword list that represents an edit script
Returns the next codepoint in a string
Returns the next grapheme in a string
Returns the size of the next grapheme
Converts all characters in string
to Unicode normalization
form identified by form
Returns a new string padded with a leading filler
which is made of elements from the padding
Returns a new string padded with a trailing filler
which is made of elements from the padding
Checks if a string contains only printable characters up to character_limit
Returns a new string created by replacing occurrences of pattern
in
subject
with replacement
Replaces all leading occurrences of match
by replacement
of match
in string
Replaces prefix in string
by replacement
if it matches match
Replaces suffix in string
by replacement
if it matches match
Replaces all trailing occurrences of match
by replacement
in string
Reverses the graphemes in given string
Returns a substring from the offset given by the start of the range to the offset given by the end of the range
Returns a substring starting at the offset start
, and of
length len
Divides a string into substrings at each Unicode whitespace occurrence with leading and trailing whitespace ignored. Groups of whitespace are treated as a single occurrence. Divisions do not occur on non-breaking whitespace
Divides a string into substrings based on a pattern
Splits a string into two at the specified offset. When the offset given is negative, location is counted from the end of the string
Returns an enumerable that splits a string on demand
Returns true
if string
starts with any of the prefixes given
Converts a string to an atom
Converts a string into a charlist
Converts a string to an existing atom
Returns a float whose text representation is string
Returns an integer whose text representation is string
Returns an integer whose text representation is string
in base base
Returns a string where all leading and trailing Unicode whitespaces have been removed
Returns a string where all leading and trailing to_trim
s have been
removed
Returns a string where all leading Unicode whitespaces have been removed
Returns a string where all leading to_trim
s have been removed
Returns a string where all trailing Unicode whitespaces has been removed
Returns a string where all trailing to_trim
s have been removed
Converts all characters in the given string to uppercase according to mode
Checks whether string
contains only valid characters
Link to this section Types
A UTF-8 codepoint. It may be one or more bytes.
Multiple codepoints that may be perceived as a single character by readers
A UTF-8 encoded binary.
Note String.t()
and binary()
are equivalent to analysis tools.
Although, for those reading the documentation, String.t()
implies
it is a UTF-8 encoded binary.
Link to this section Functions
Returns the grapheme at the position
of the given UTF-8 string
.
If position
is greater than string
length, then it returns nil
.
Examples
iex> String.at("elixir", 0)
"e"
iex> String.at("elixir", 1)
"l"
iex> String.at("elixir", 10)
nil
iex> String.at("elixir", -1)
"r"
iex> String.at("elixir", -10)
nil
Converts the first character in the given string to
uppercase and the remainder to lowercase according to mode
.
mode
may be :default
, :ascii
or :greek
. The :default
mode considers
all non-conditional transformations outlined in the Unicode standard. :ascii
lowercases only the letters A to Z. :greek
includes the context sensitive
mappings found in Greek.
Examples
iex> String.capitalize("abcd")
"Abcd"
iex> String.capitalize("fin")
"Fin"
iex> String.capitalize("olá")
"Olá"
Splits the string into chunks of characters that share a common trait.
The trait can be one of two options:
:valid
- the string is split into chunks of valid and invalid character sequences:printable
- the string is split into chunks of printable and non-printable character sequences
Returns a list of binaries each of which contains only one kind of characters.
If the given string is empty, an empty list is returned.
Examples
iex> String.chunk(<<?a, ?b, ?c, 0>>, :valid)
["abc\0"]
iex> String.chunk(<<?a, ?b, ?c, 0, 0xFFFF::utf16>>, :valid)
["abc\0", <<0xFFFF::utf16>>]
iex> String.chunk(<<?a, ?b, ?c, 0, 0x0FFFF::utf8>>, :printable)
["abc", <<0, 0x0FFFF::utf8>>]
Returns all codepoints in the string.
For details about codepoints and graphemes, see the String
module documentation.
Examples
iex> String.codepoints("olá")
["o", "l", "á"]
iex> String.codepoints("оптими зации")
["о", "п", "т", "и", "м", "и", " ", "з", "а", "ц", "и", "и"]
iex> String.codepoints("ἅἪῼ")
["ἅ", "Ἢ", "ῼ"]
iex> String.codepoints("é")
["é"]
iex> String.codepoints("é")
["e", "́"]
Checks if string
contains any of the given contents
.
contents
can be either a string, a list of strings,
or a compiled pattern.
Examples
iex> String.contains?("elixir of life", "of")
true
iex> String.contains?("elixir of life", ["life", "death"])
true
iex> String.contains?("elixir of life", ["death", "mercury"])
false
The argument can also be a compiled pattern:
iex> pattern = :binary.compile_pattern(["life", "death"])
iex> String.contains?("elixir of life", pattern)
true
An empty string will always match:
iex> String.contains?("elixir of life", "")
true
iex> String.contains?("elixir of life", ["", "other"])
true
Note this function can match within or across grapheme boundaries.
For example, take the grapheme “é” which is made of the characters
“e” and the acute accent. The following returns true
:
iex> String.contains?(String.normalize("é", :nfd), "e")
true
However, if “é” is represented by the single character “e with acute”
accent, then it will return false
:
iex> String.contains?(String.normalize("é", :nfc), "e")
false
Converts all characters in the given string to lowercase according to mode
.
mode
may be :default
, :ascii
or :greek
. The :default
mode considers
all non-conditional transformations outlined in the Unicode standard. :ascii
lowercases only the letters A to Z. :greek
includes the context sensitive
mappings found in Greek.
Examples
iex> String.downcase("ABCD")
"abcd"
iex> String.downcase("AB 123 XPTO")
"ab 123 xpto"
iex> String.downcase("OLÁ")
"olá"
The :ascii
mode ignores Unicode characters and provides a more
performant implementation when you know the string contains only
ASCII characters:
iex> String.downcase("OLÁ", :ascii)
"olÁ"
And :greek
properly handles the context sensitive sigma in Greek:
iex> String.downcase("ΣΣ")
"σσ"
iex> String.downcase("ΣΣ", :greek)
"σς"
Returns a string subject
duplicated n
times.
Inlined by the compiler.
Examples
iex> String.duplicate("abc", 0)
""
iex> String.duplicate("abc", 1)
"abc"
iex> String.duplicate("abc", 2)
"abcabc"
Returns true
if string
ends with any of the suffixes given.
suffixes
can be either a single suffix or a list of suffixes.
Examples
iex> String.ends_with?("language", "age")
true
iex> String.ends_with?("language", ["youth", "age"])
true
iex> String.ends_with?("language", ["youth", "elixir"])
false
An empty suffix will always match:
iex> String.ends_with?("language", "")
true
iex> String.ends_with?("language", ["", "other"])
true
Returns true
if string1
is canonically equivalent to ‘string2’.
It performs Normalization Form Canonical Decomposition (NFD) on the strings before comparing them. This function is equivalent to:
String.normalize(string1, :nfd) == String.normalize(string2, :nfd)
Therefore, if you plan to compare multiple strings, multiple times in a row, you may normalize them upfront and compare them directly to avoid multiple normalization passes.
Examples
iex> String.equivalent?("abc", "abc")
true
iex> String.equivalent?("man\u0303ana", "mañana")
true
iex> String.equivalent?("abc", "ABC")
false
iex> String.equivalent?("nø", "nó")
false
Returns the first grapheme from a UTF-8 string,
nil
if the string is empty.
Examples
iex> String.first("elixir")
"e"
iex> String.first("եոգլի")
"ե"
Returns Unicode graphemes in the string as per Extended Grapheme Cluster algorithm.
The algorithm is outlined in the Unicode Standard Annex #29, Unicode Text Segmentation.
For details about codepoints and graphemes, see the String
module documentation.
Examples
iex> String.graphemes("Ńaïve")
["Ń", "a", "ï", "v", "e"]
iex> String.graphemes("é")
["é"]
iex> String.graphemes("é")
["é"]
Returns a float value between 0 (equates to no similarity) and 1 (is an exact match)
representing Jaro
distance between string1
and string2
.
The Jaro distance metric is designed and best suited for short strings such as person names.
Examples
iex> String.jaro_distance("dwayne", "duane")
0.8222222222222223
iex> String.jaro_distance("even", "odd")
0.0
iex> String.jaro_distance("same", "same")
1.0
Returns the last grapheme from a UTF-8 string,
nil
if the string is empty.
Examples
iex> String.last("elixir")
"r"
iex> String.last("եոգլի")
"ի"
Returns the number of Unicode graphemes in a UTF-8 string.
Examples
iex> String.length("elixir")
6
iex> String.length("եոգլի")
5
Checks if string
matches the given regular expression.
Examples
iex> String.match?("foo", ~r/foo/)
true
iex> String.match?("bar", ~r/foo/)
false
Returns a keyword list that represents an edit script.
Check List.myers_difference/2
for more information.
Examples
iex> string1 = "fox hops over the dog"
iex> string2 = "fox jumps over the lazy cat"
iex> String.myers_difference(string1, string2)
[eq: "fox ", del: "ho", ins: "jum", eq: "ps over the ", del: "dog", ins: "lazy cat"]
Returns the next codepoint in a string.
The result is a tuple with the codepoint and the
remainder of the string or nil
in case
the string reached its end.
As with other functions in the String module, this function does not check for the validity of the codepoint. That said, if an invalid codepoint is found, it will be returned by this function.
Examples
iex> String.next_codepoint("olá")
{"o", "lá"}
Returns the next grapheme in a string.
The result is a tuple with the grapheme and the
remainder of the string or nil
in case
the String reached its end.
Examples
iex> String.next_grapheme("olá")
{"o", "lá"}
next_grapheme_size(t()) :: {pos_integer(), t()} | nil
Returns the size of the next grapheme.
The result is a tuple with the next grapheme size and
the remainder of the string or nil
in case the string
reached its end.
Examples
iex> String.next_grapheme_size("olá")
{1, "lá"}
Converts all characters in string
to Unicode normalization
form identified by form
.
Forms
The supported forms are:
:nfd
- Normalization Form Canonical Decomposition. Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.:nfc
- Normalization Form Canonical Composition. Characters are decomposed and then recomposed by canonical equivalence.
Examples
iex> String.normalize("yêṩ", :nfd)
"yêṩ"
iex> String.normalize("leña", :nfc)
"leña"
pad_leading(t(), non_neg_integer(), t() | [t()]) :: t()
Returns a new string padded with a leading filler
which is made of elements from the padding
.
Passing a list of strings as padding
will take one element of the list
for every missing entry. If the list is shorter than the number of inserts,
the filling will start again from the beginning of the list.
Passing a string padding
is equivalent to passing the list of graphemes in it.
If no padding
is given, it defaults to whitespace.
When count
is less than or equal to the length of string
,
given string
is returned.
Raises ArgumentError
if the given padding
contains non-string element.
Examples
iex> String.pad_leading("abc", 5)
" abc"
iex> String.pad_leading("abc", 4, "12")
"1abc"
iex> String.pad_leading("abc", 6, "12")
"121abc"
iex> String.pad_leading("abc", 5, ["1", "23"])
"123abc"
pad_trailing(t(), non_neg_integer(), t() | [t()]) :: t()
Returns a new string padded with a trailing filler
which is made of elements from the padding
.
Passing a list of strings as padding
will take one element of the list
for every missing entry. If the list is shorter than the number of inserts,
the filling will start again from the beginning of the list.
Passing a string padding
is equivalent to passing the list of graphemes in it.
If no padding
is given, it defaults to whitespace.
When count
is less than or equal to the length of string
,
given string
is returned.
Raises ArgumentError
if the given padding
contains non-string element.
Examples
iex> String.pad_trailing("abc", 5)
"abc "
iex> String.pad_trailing("abc", 4, "12")
"abc1"
iex> String.pad_trailing("abc", 6, "12")
"abc121"
iex> String.pad_trailing("abc", 5, ["1", "23"])
"abc123"
printable?(t(), 0) :: true
printable?(t(), pos_integer() | :infinity) :: boolean()
Checks if a string contains only printable characters up to character_limit
.
Takes an optional character_limit
as a second argument. If character_limit
is 0
, this
function will return true
.
Examples
iex> String.printable?("abc")
true
iex> String.printable?("abc" <> <<0>>)
false
iex> String.printable?("abc" <> <<0>>, 2)
true
iex> String.printable?("abc" <> <<0>>, 0)
true
Returns a new string created by replacing occurrences of pattern
in
subject
with replacement
.
The pattern
may be a string, a regular expression, or a compiled pattern.
By default it replaces all occurrences but this behaviour can be controlled
through the :global
option; see the “Options” section below.
Options
:global
- (boolean) iftrue
, all occurrences ofpattern
are replaced withreplacement
, otherwise only the first occurrence is replaced. Defaults totrue
:insert_replaced
- (integer or list of integers) specifies the position where to insert the replaced part inside thereplacement
. If any position given in the:insert_replaced
option is larger than the replacement string, or is negative, anArgumentError
is raised. See the examples below
Examples
iex> String.replace("a,b,c", ",", "-")
"a-b-c"
iex> String.replace("a,b,c", ",", "-", global: false)
"a-b,c"
When the pattern is a regular expression, one can give \N
or
\g{N}
in the replacement
string to access a specific capture in the
regular expression:
iex> String.replace("a,b,c", ~r/,(.)/, ",\\1\\g{1}")
"a,bb,cc"
Notice we had to escape the backslash escape character (i.e., we used \\N
instead of just \N
to escape the backslash; same thing for \\g{N}
). By
giving \0
, one can inject the whole matched pattern in the replacement
string.
When the pattern is a string, a developer can use the replaced part inside
the replacement
by using the :insert_replaced
option and specifying the
position(s) inside the replacement
where the string pattern will be
inserted:
iex> String.replace("a,b,c", "b", "[]", insert_replaced: 1)
"a,[b],c"
iex> String.replace("a,b,c", ",", "[]", insert_replaced: 2)
"a[],b[],c"
iex> String.replace("a,b,c", ",", "[]", insert_replaced: [1, 1])
"a[,,]b[,,]c"
A compiled pattern can also be given:
iex> pattern = :binary.compile_pattern(",")
iex> String.replace("a,b,c", pattern, "[]", insert_replaced: 2)
"a[],b[],c"
When an empty string is provided as a pattern
, the function will treat it as
an implicit empty string between each grapheme and the string will be
interspersed. If an empty string is provided as replacement
the subject
will be returned:
iex> String.replace("ELIXIR", "", ".")
".E.L.I.X.I.R."
iex> String.replace("ELIXIR", "", "")
"ELIXIR"
Replaces all leading occurrences of match
by replacement
of match
in string
.
Returns the string untouched if there are no occurrences.
If match
is ""
, this function raises an ArgumentError
exception: this
happens because this function replaces all the occurrences of match
at
the beginning of string
, and it’s impossible to replace “multiple”
occurrences of ""
.
Examples
iex> String.replace_leading("hello world", "hello ", "")
"world"
iex> String.replace_leading("hello hello world", "hello ", "")
"world"
iex> String.replace_leading("hello world", "hello ", "ola ")
"ola world"
iex> String.replace_leading("hello hello world", "hello ", "ola ")
"ola ola world"
Replaces prefix in string
by replacement
if it matches match
.
Returns the string untouched if there is no match. If match
is an empty
string (""
), replacement
is just prepended to string
.
Examples
iex> String.replace_prefix("world", "hello ", "")
"world"
iex> String.replace_prefix("hello world", "hello ", "")
"world"
iex> String.replace_prefix("hello hello world", "hello ", "")
"hello world"
iex> String.replace_prefix("world", "hello ", "ola ")
"world"
iex> String.replace_prefix("hello world", "hello ", "ola ")
"ola world"
iex> String.replace_prefix("hello hello world", "hello ", "ola ")
"ola hello world"
iex> String.replace_prefix("world", "", "hello ")
"hello world"
Replaces suffix in string
by replacement
if it matches match
.
Returns the string untouched if there is no match. If match
is an empty
string (""
), replacement
is just appended to string
.
Examples
iex> String.replace_suffix("hello", " world", "")
"hello"
iex> String.replace_suffix("hello world", " world", "")
"hello"
iex> String.replace_suffix("hello world world", " world", "")
"hello world"
iex> String.replace_suffix("hello", " world", " mundo")
"hello"
iex> String.replace_suffix("hello world", " world", " mundo")
"hello mundo"
iex> String.replace_suffix("hello world world", " world", " mundo")
"hello world mundo"
iex> String.replace_suffix("hello", "", " world")
"hello world"
Replaces all trailing occurrences of match
by replacement
in string
.
Returns the string untouched if there are no occurrences.
If match
is ""
, this function raises an ArgumentError
exception: this
happens because this function replaces all the occurrences of match
at
the end of string
, and it’s impossible to replace “multiple” occurrences of
""
.
Examples
iex> String.replace_trailing("hello world", " world", "")
"hello"
iex> String.replace_trailing("hello world world", " world", "")
"hello"
iex> String.replace_trailing("hello world", " world", " mundo")
"hello mundo"
iex> String.replace_trailing("hello world world", " world", " mundo")
"hello mundo mundo"
Reverses the graphemes in given string.
Examples
iex> String.reverse("abcd")
"dcba"
iex> String.reverse("hello world")
"dlrow olleh"
iex> String.reverse("hello ∂og")
"go∂ olleh"
Keep in mind reversing the same string twice does not necessarily yield the original string:
iex> "̀e"
"̀e"
iex> String.reverse("̀e")
"è"
iex> String.reverse(String.reverse("̀e"))
"è"
In the first example the accent is before the vowel, so it is considered two graphemes. However, when you reverse it once, you have the vowel followed by the accent, which becomes one grapheme. Reversing it again will keep it as one single grapheme.
Returns a substring from the offset given by the start of the range to the offset given by the end of the range.
If the start of the range is not a valid offset for the given
string or if the range is in reverse order, returns ""
.
If the start or end of the range is negative, the whole string is traversed first in order to convert the negative indices into positive ones.
Remember this function works with Unicode graphemes and considers
the slices to represent grapheme offsets. If you want to split
on raw bytes, check Kernel.binary_part/3
instead.
Examples
iex> String.slice("elixir", 1..3)
"lix"
iex> String.slice("elixir", 1..10)
"lixir"
iex> String.slice("elixir", 10..3)
""
iex> String.slice("elixir", -4..-1)
"ixir"
iex> String.slice("elixir", 2..-1)
"ixir"
iex> String.slice("elixir", -4..6)
"ixir"
iex> String.slice("elixir", -1..-4)
""
iex> String.slice("elixir", -10..-7)
""
iex> String.slice("a", 0..1500)
"a"
iex> String.slice("a", 1..1500)
""
slice(t(), integer(), non_neg_integer()) :: grapheme()
Returns a substring starting at the offset start
, and of
length len
.
If the offset is greater than string length, then it returns ""
.
Remember this function works with Unicode graphemes and considers
the slices to represent grapheme offsets. If you want to split
on raw bytes, check Kernel.binary_part/3
instead.
Examples
iex> String.slice("elixir", 1, 3)
"lix"
iex> String.slice("elixir", 1, 10)
"lixir"
iex> String.slice("elixir", 10, 3)
""
iex> String.slice("elixir", -4, 4)
"ixir"
iex> String.slice("elixir", -10, 3)
""
iex> String.slice("a", 0, 1500)
"a"
iex> String.slice("a", 1, 1500)
""
iex> String.slice("a", 2, 1500)
""
Divides a string into substrings at each Unicode whitespace occurrence with leading and trailing whitespace ignored. Groups of whitespace are treated as a single occurrence. Divisions do not occur on non-breaking whitespace.
Examples
iex> String.split("foo bar")
["foo", "bar"]
iex> String.split("foo" <> <<194, 133>> <> "bar")
["foo", "bar"]
iex> String.split(" foo bar ")
["foo", "bar"]
iex> String.split("no\u00a0break")
["no\u00a0break"]
Divides a string into substrings based on a pattern.
Returns a list of these substrings. The pattern can be a string, a list of strings, a regular expression, or a compiled pattern.
The string is split into as many parts as possible by
default, but can be controlled via the :parts
option.
Empty strings are only removed from the result if the
:trim
option is set to true
.
When the pattern used is a regular expression, the string is
split using Regex.split/3
.
Options
:parts
(positive integer or:infinity
) - the string is split into at most as many parts as this option specifies. If:infinity
, the string will be split into all possible parts. Defaults to:infinity
.:trim
(boolean) - iftrue
, empty strings are removed from the resulting list.
This function also accepts all options accepted by Regex.split/3
if pattern
is a regular expression.
Examples
Splitting with a string pattern:
iex> String.split("a,b,c", ",")
["a", "b", "c"]
iex> String.split("a,b,c", ",", parts: 2)
["a", "b,c"]
iex> String.split(" a b c ", " ", trim: true)
["a", "b", "c"]
A list of patterns:
iex> String.split("1,2 3,4", [" ", ","])
["1", "2", "3", "4"]
A regular expression:
iex> String.split("a,b,c", ~r{,})
["a", "b", "c"]
iex> String.split("a,b,c", ~r{,}, parts: 2)
["a", "b,c"]
iex> String.split(" a b c ", ~r{\s}, trim: true)
["a", "b", "c"]
iex> String.split("abc", ~r{b}, include_captures: true)
["a", "b", "c"]
A compiled pattern:
iex> pattern = :binary.compile_pattern([" ", ","])
iex> String.split("1,2 3,4", pattern)
["1", "2", "3", "4"]
Splitting on empty string returns graphemes:
iex> String.split("abc", "")
["", "a", "b", "c", ""]
iex> String.split("abc", "", trim: true)
["a", "b", "c"]
iex> String.split("abc", "", parts: 1)
["abc"]
iex> String.split("abc", "", parts: 3)
["", "a", "bc"]
Note this function can split within or across grapheme boundaries.
For example, take the grapheme “é” which is made of the characters
“e” and the acute accent. The following returns true
:
iex> String.split(String.normalize("é", :nfd), "e")
["", "́"]
However, if “é” is represented by the single character “e with acute”
accent, then it will return false
:
iex> String.split(String.normalize("é", :nfc), "e")
["é"]
Splits a string into two at the specified offset. When the offset given is negative, location is counted from the end of the string.
The offset is capped to the length of the string. Returns a tuple with two elements.
Note: keep in mind this function splits on graphemes and for such it
has to linearly traverse the string. If you want to split a string or
a binary based on the number of bytes, use Kernel.binary_part/3
instead.
Examples
iex> String.split_at("sweetelixir", 5)
{"sweet", "elixir"}
iex> String.split_at("sweetelixir", -6)
{"sweet", "elixir"}
iex> String.split_at("abc", 0)
{"", "abc"}
iex> String.split_at("abc", 1000)
{"abc", ""}
iex> String.split_at("abc", -1000)
{"", "abc"}
splitter(t(), pattern(), keyword()) :: Enumerable.t()
Returns an enumerable that splits a string on demand.
This is in contrast to split/3
which splits the
entire string upfront.
Note splitter does not support regular expressions (as it is often more efficient to have the regular expressions traverse the string at once than in multiple passes).
Options
- :trim - when
true
, does not emit empty patterns
Examples
iex> String.splitter("1,2 3,4 5,6 7,8,...,99999", [" ", ","]) |> Enum.take(4)
["1", "2", "3", "4"]
iex> String.splitter("abcd", "") |> Enum.take(10)
["", "a", "b", "c", "d", ""]
iex> String.splitter("abcd", "", trim: true) |> Enum.take(10)
["a", "b", "c", "d"]
A compiled pattern can also be given:
iex> pattern = :binary.compile_pattern([" ", ","])
iex> String.splitter("1,2 3,4 5,6 7,8,...,99999", pattern) |> Enum.take(4)
["1", "2", "3", "4"]
Returns true
if string
starts with any of the prefixes given.
prefix
can be either a string, a list of strings, or a compiled
pattern.
Examples
iex> String.starts_with?("elixir", "eli")
true
iex> String.starts_with?("elixir", ["erlang", "elixir"])
true
iex> String.starts_with?("elixir", ["erlang", "ruby"])
false
A compiled pattern can also be given:
iex> pattern = :binary.compile_pattern(["erlang", "elixir"])
iex> String.starts_with?("elixir", pattern)
true
An empty string will always match:
iex> String.starts_with?("elixir", "")
true
iex> String.starts_with?("elixir", ["", "other"])
true
Converts a string to an atom.
Warning: this function creates atoms dynamically and atoms are
not garbage-collected. Therefore, string
should not be an
untrusted value, such as input received from a socket or during
a web request. Consider using to_existing_atom/1
instead.
By default, the maximum number of atoms is 1_048_576
. This limit
can be raised or lowered using the VM option +t
.
The maximum atom size is of 255 characters. Prior to Erlang/OTP 20, only latin1 characters are allowed.
Inlined by the compiler.
Examples
iex> String.to_atom("my_atom")
:my_atom
Converts a string into a charlist.
Specifically, this functions takes a UTF-8 encoded binary and returns a list of its integer
codepoints. It is similar to codepoints/1
except that the latter returns a list of codepoints as
strings.
In case you need to work with bytes, take a look at the
:binary
module.
Examples
iex> String.to_charlist("æß")
'æß'
Converts a string to an existing atom.
The maximum atom size is of 255 characters. Prior to Erlang/OTP 20, only latin1 characters are allowed.
Inlined by the compiler.
Examples
iex> _ = :my_atom
iex> String.to_existing_atom("my_atom")
:my_atom
iex> String.to_existing_atom("this_atom_will_never_exist")
** (ArgumentError) argument error
Returns a float whose text representation is string
.
string
must be the string representation of a float including a decimal point.
In order to parse a string without decimal point as a float then Float.parse/1
should be used. Otherwise, an ArgumentError
will be raised.
Inlined by the compiler.
Examples
iex> String.to_float("2.2017764e+0")
2.2017764
iex> String.to_float("3.0")
3.0
String.to_float("3")
#=> ** (ArgumentError) argument error
Returns an integer whose text representation is string
.
Inlined by the compiler.
Examples
iex> String.to_integer("123")
123
Returns an integer whose text representation is string
in base base
.
Inlined by the compiler.
Examples
iex> String.to_integer("3FF", 16)
1023
Returns a string where all leading and trailing Unicode whitespaces have been removed.
Examples
iex> String.trim("\n abc\n ")
"abc"
Returns a string where all leading and trailing to_trim
s have been
removed.
Examples
iex> String.trim("a abc a", "a")
" abc "
Returns a string where all leading Unicode whitespaces have been removed.
Examples
iex> String.trim_leading("\n abc ")
"abc "
Returns a string where all leading to_trim
s have been removed.
Examples
iex> String.trim_leading("__ abc _", "_")
" abc _"
iex> String.trim_leading("1 abc", "11")
"1 abc"
Returns a string where all trailing Unicode whitespaces has been removed.
Examples
iex> String.trim_trailing(" abc\n ")
" abc"
Returns a string where all trailing to_trim
s have been removed.
Examples
iex> String.trim_trailing("_ abc __", "_")
"_ abc "
iex> String.trim_trailing("abc 1", "11")
"abc 1"
Converts all characters in the given string to uppercase according to mode
.
mode
may be :default
, :ascii
or :greek
. The :default
mode considers
all non-conditional transformations outlined in the Unicode standard. :ascii
uppercases only the letters a to z. :greek
includes the context sensitive
mappings found in Greek.
Examples
iex> String.upcase("abcd")
"ABCD"
iex> String.upcase("ab 123 xpto")
"AB 123 XPTO"
iex> String.upcase("olá")
"OLÁ"
The :ascii
mode ignores Unicode characters and provides a more
performant implementation when you know the string contains only
ASCII characters:
iex> String.upcase("olá", :ascii)
"OLá"