View Source String (Elixir v1.12.3)

Strings in Elixir are UTF-8 encoded binaries.

Strings in Elixir are a sequence of Unicode characters, typically written between double quoted strings, such as "hello" and "héllò".

In case a string must have a double-quote in itself, the double quotes must be escaped with a backslash, for example: "this is a string with \"double quotes\"".

You can concatenate two strings with the <>/2 operator:

iex> "hello" <> " " <> "world"
"hello world"

interpolation
Interpolation

Strings in Elixir also support interpolation. This allows you to place some value in the middle of a string by using the #{} syntax:

iex> name = "joe"
iex> "hello #{name}"
"hello joe"

Any Elixir expression is valid inside the interpolation. If a string is given, the string is interpolated as is. If any other value is given, Elixir will attempt to convert it to a string using the String.Chars protocol. This allows, for example, to output an integer from the interpolation:

iex> "2 + 2 = #{2 + 2}"
"2 + 2 = 4"

In case the value you want to interpolate cannot be converted to a string, because it doesn't have an human textual representation, a protocol error will be raised.

escape-characters
Escape characters

Besides allowing double-quotes to be escaped with a backslash, strings also support the following escape characters:

\a - Bell
\b - Backspace
\t - Horizontal tab
\n - Line feed (New lines)
\v - Vertical tab
\f - Form feed
\r - Carriage return
\e - Command Escape
\# - Returns the # character itself, skipping interpolation
\xNN - A byte represented by the hexadecimal NN
\uNNNN - A Unicode code point represented by NNNN

Note it is generally not advised to use \xNN in Elixir strings, as introducing an invalid byte sequence would make the string invalid. If you have to introduce a character by its hexadecimal representation, it is best to work with Unicode code points, such as \uNNNN. In fact, understanding Unicode code points can be essential when doing low-level manipulations of string, so let's explore them in detail next.

code-points-and-grapheme-cluster
Code points and grapheme cluster

The functions in this module act according to the Unicode Standard, version 13.0.0.

As per the standard, a code point is a single Unicode Character, which may be represented by one or more bytes.

For example, although the code point "é" is a single character, its underlying representation uses two bytes:

iex> String.length("é")
1
iex> byte_size("é")
2

Furthermore, this module also presents the concept of grapheme cluster (from now on referenced as graphemes). Graphemes can consist of multiple code points that may be perceived as a single character by readers. For example, "é" can be represented either as a single "e with acute" code point or as the letter "e" followed by a "combining acute accent" (two code points):

iex> string = "\u0065\u0301"
iex> byte_size(string)
3
iex> String.length(string)
1
iex> String.codepoints(string)
["e", "́"]
iex> String.graphemes(string)
["é"]

Although the example above is made of two characters, it is perceived by users as one.

Graphemes can also be two characters that are interpreted as one by some languages. For example, some languages may consider "ch" as a single character. However, since this information depends on the locale, it is not taken into account by this module.

In general, the functions in this module rely on the Unicode Standard, but do not contain any of the locale specific behaviour. More information about graphemes can be found in the Unicode Standard Annex #29.

For converting a binary to a different encoding and for Unicode normalization mechanisms, see Erlang's :unicode module.

string-and-binary-operations
String and binary operations

To act according to the Unicode Standard, many functions in this module run in linear time, as they need to traverse the whole string considering the proper Unicode code points.

For example, String.length/1 will take longer as the input grows. On the other hand, Kernel.byte_size/1 always runs in constant time (i.e. regardless of the input size).

This means often there are performance costs in using the functions in this module, compared to the more low-level operations that work directly with binaries:

Kernel.binary_part/3 - retrieves part of the binary
Kernel.bit_size/1 and Kernel.byte_size/1 - size related functions
Kernel.is_bitstring/1 and Kernel.is_binary/1 - type-check function
Plus a number of functions for working with binaries (bytes) in the :binary module

There are many situations where using the String module can be avoided in favor of binary functions or pattern matching. For example, imagine you have a string prefix and you want to remove this prefix from another string named full.

One may be tempted to write:

iex> take_prefix = fn full, prefix ->
...>   base = String.length(prefix)
...>   String.slice(full, base, String.length(full) - base)
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"

Although the function above works, it performs poorly. To calculate the length of the string, we need to traverse it fully, so we traverse both prefix and full strings, then slice the full one, traversing it again.

A first attempt at improving it could be with ranges:

iex> take_prefix = fn full, prefix ->
...>   base = String.length(prefix)
...>   String.slice(full, base..-1)
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"

While this is much better (we don't traverse full twice), it could still be improved. In this case, since we want to extract a substring from a string, we can use Kernel.byte_size/1 and Kernel.binary_part/3 as there is no chance we will slice in the middle of a code point made of more than one byte:

iex> take_prefix = fn full, prefix ->
...>   base = byte_size(prefix)
...>   binary_part(full, base, byte_size(full) - base)
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"

Or simply use pattern matching:

iex> take_prefix = fn full, prefix ->
...>   base = byte_size(prefix)
...>   <<_::binary-size(base), rest::binary>> = full
...>   rest
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"

On the other hand, if you want to dynamically slice a string based on an integer value, then using String.slice/3 is the best option as it guarantees we won't incorrectly split a valid code point into multiple bytes.

integer-code-points
Integer code points

Although code points are represented as integers, this module represents code points in their encoded format as strings. For example:

iex> String.codepoints("olá")
["o", "l", "á"]

There are a couple of ways to retrieve the character code point. One may use the ? construct:

iex> ?o
111

iex> ?á
225

Or also via pattern matching:

iex> <<aacute::utf8>> = "á"
iex> aacute
225

As we have seen above, code points can be inserted into a string by their hexadecimal code:

iex> "ol\u00E1"
"olá"

Finally, to convert a String into a list of integer code points, known as "charlists" in Elixir, you can call String.to_charlist:

iex> String.to_charlist("olá")
[111, 108, 225]

self-synchronization
Self-synchronization

The UTF-8 encoding is self-synchronizing. This means that if malformed data (i.e., data that is not possible according to the definition of the encoding) is encountered, only one code point needs to be rejected.

This module relies on this behaviour to ignore such invalid characters. For example, length/1 will return a correct result even if an invalid code point is fed into it.

In other words, this module expects invalid data to be detected elsewhere, usually when retrieving data from the external source. For example, a driver that reads strings from a database will be responsible to check the validity of the encoding. String.chunk/2 can be used for breaking a string into valid and invalid parts.

compile-binary-patterns
Compile binary patterns

Many functions in this module work with patterns. For example, String.split/3 can split a string into multiple strings given a pattern. This pattern can be a string, a list of strings or a compiled pattern:

iex> String.split("foo bar", " ")
["foo", "bar"]

iex> String.split("foo bar!", [" ", "!"])
["foo", "bar", ""]

iex> pattern = :binary.compile_pattern([" ", "!"])
iex> String.split("foo bar!", pattern)
["foo", "bar", ""]

The compiled pattern is useful when the same match will be done over and over again. Note though that the compiled pattern cannot be stored in a module attribute as the pattern is generated at runtime and does not survive compile time.

Link to this section Summary

Types

codepoint()

A single Unicode code point encoded in UTF-8. It may be one or more bytes.

grapheme()

Multiple code points that may be perceived as a single character by readers

pattern()

Pattern used in functions like replace/4 and split/3

t()

A UTF-8 encoded binary.

Functions

at(string, position)

Returns the grapheme at the position of the given UTF-8 string. If position is greater than string length, then it returns nil.

bag_distance(string1, string2)

Computes the bag distance between two strings.

capitalize(string, mode \\ :default)

Converts the first character in the given string to uppercase and the remainder to lowercase according to mode.

chunk(string, trait)

Splits the string into chunks of characters that share a common trait.

codepoints(string)

Returns a list of code points encoded as strings.

contains?(string, contents)

Checks if string contains any of the given contents.

downcase(string, mode \\ :default)

Converts all characters in the given string to lowercase according to mode.

duplicate(subject, n)

Returns a string subject repeated n times.

ends_with?(string, suffix)

Returns true if string ends with any of the suffixes given.

equivalent?(string1, string2)

Returns true if string1 is canonically equivalent to string2.

first(string)

Returns the first grapheme from a UTF-8 string, nil if the string is empty.

graphemes(string)

Returns Unicode graphemes in the string as per Extended Grapheme Cluster algorithm.

jaro_distance(string1, string2)

Computes the Jaro distance (similarity) between two strings.

last(string)

Returns the last grapheme from a UTF-8 string, nil if the string is empty.

length(string)

Returns the number of Unicode graphemes in a UTF-8 string.

match?(string, regex)

Checks if string matches the given regular expression.

myers_difference(string1, string2)

Returns a keyword list that represents an edit script.

next_codepoint(string)

Returns the next code point in a string.

next_grapheme(binary)

Returns the next grapheme in a string.

next_grapheme_size(string)

Returns the size (in bytes) of the next grapheme.

normalize(string, form)

Converts all characters in string to Unicode normalization form identified by form.

pad_leading(string, count, padding \\ [" "])

Returns a new string padded with a leading filler which is made of elements from the padding.

pad_trailing(string, count, padding \\ [" "])

Returns a new string padded with a trailing filler which is made of elements from the padding.

printable?(string, character_limit \\ :infinity)

Checks if a string contains only printable characters up to character_limit.

replace(subject, pattern, replacement, options \\ [])

Returns a new string created by replacing occurrences of pattern in subject with replacement.

replace_leading(string, match, replacement)

Replaces all leading occurrences of match by replacement of match in string.

replace_prefix(string, match, replacement)

Replaces prefix in string by replacement if it matches match.

replace_suffix(string, match, replacement)

Replaces suffix in string by replacement if it matches match.

replace_trailing(string, match, replacement)

Replaces all trailing occurrences of match by replacement in string.

reverse(string)

Reverses the graphemes in given string.

slice(string, range)

Returns a substring from the offset given by the start of the range to the offset given by the end of the range.

slice(string, start, length)

Returns a substring starting at the offset start, and of the given length.

split(binary)

Divides a string into substrings at each Unicode whitespace occurrence with leading and trailing whitespace ignored. Groups of whitespace are treated as a single occurrence. Divisions do not occur on non-breaking whitespace.

split(string, pattern, options \\ [])

Divides a string into parts based on a pattern.

split_at(string, position)

Splits a string into two at the specified offset. When the offset given is negative, location is counted from the end of the string.

splitter(string, pattern, options \\ [])

Returns an enumerable that splits a string on demand.

starts_with?(string, prefix)

Returns true if string starts with any of the prefixes given.

to_atom(string)

Converts a string to an atom.

to_charlist(string)

Converts a string into a charlist.

to_existing_atom(string)

Converts a string to an existing atom.

to_float(string)

Returns a float whose text representation is string.

to_integer(string)

Returns an integer whose text representation is string.

to_integer(string, base)

Returns an integer whose text representation is string in base base.

trim(string)

Returns a string where all leading and trailing Unicode whitespaces have been removed.

trim(string, to_trim)

Returns a string where all leading and trailing to_trim characters have been removed.

trim_leading(string)

Returns a string where all leading Unicode whitespaces have been removed.

trim_leading(string, to_trim)

Returns a string where all leading to_trim characters have been removed.

trim_trailing(string)

Returns a string where all trailing Unicode whitespaces has been removed.

trim_trailing(string, to_trim)

Returns a string where all trailing to_trim characters have been removed.

upcase(string, mode \\ :default)

Converts all characters in the given string to uppercase according to mode.

valid?(arg1)

Checks whether string contains only valid characters.

Link to this section Types

codepoint()

@type codepoint() :: t()

A single Unicode code point encoded in UTF-8. It may be one or more bytes.

grapheme()

@type grapheme() :: t()

Multiple code points that may be perceived as a single character by readers

pattern()

@type pattern() :: t() | [t()] | :binary.cp()

Pattern used in functions like replace/4 and split/3

t()

@type t() :: binary()

A UTF-8 encoded binary.

The types String.t() and binary() are equivalent to analysis tools. Although, for those reading the documentation, String.t() implies it is a UTF-8 encoded binary.

Link to this section Functions

at(string, position)

@spec at(t(), integer()) :: grapheme() | nil

Returns the grapheme at the position of the given UTF-8 string. If position is greater than string length, then it returns nil.

examples
Examples

iex> String.at("elixir", 0)
"e"

iex> String.at("elixir", 1)
"l"

iex> String.at("elixir", 10)
nil

iex> String.at("elixir", -1)
"r"

iex> String.at("elixir", -10)
nil

bag_distance(string1, string2)

(since 1.8.0)

@spec bag_distance(t(), t()) :: float()

Computes the bag distance between two strings.

Returns a float value between 0 and 1 representing the bag distance between string1 and string2.

The bag distance is meant to be an efficient approximation of the distance between two strings to quickly rule out strings that are largely different.

The algorithm is outlined in the "String Matching with Metric Trees Using an Approximate Distance" paper by Ilaria Bartolini, Paolo Ciaccia, and Marco Patella.

examples
Examples

iex> String.bag_distance("abc", "")
0.0
iex> String.bag_distance("abcd", "a")
0.25
iex> String.bag_distance("abcd", "ab")
0.5
iex> String.bag_distance("abcd", "abc")
0.75
iex> String.bag_distance("abcd", "abcd")
1.0

capitalize(string, mode \\ :default)

@spec capitalize(t(), :default | :ascii | :greek | :turkic) :: t()

Converts the first character in the given string to uppercase and the remainder to lowercase according to mode.

mode may be :default, :ascii, :greek or :turkic. The :default mode considers all non-conditional transformations outlined in the Unicode standard. :ascii capitalizes only the letters A to Z. :greek includes the context sensitive mappings found in Greek. :turkic properly handles the letter i with the dotless variant.

examples
Examples

iex> String.capitalize("abcd")
"Abcd"

iex> String.capitalize("ﬁn")
"Fin"

iex> String.capitalize("olá")
"Olá"

chunk(string, trait)

@spec chunk(t(), :valid | :printable) :: [t()]

Splits the string into chunks of characters that share a common trait.

The trait can be one of two options:

:valid - the string is split into chunks of valid and invalid character sequences
:printable - the string is split into chunks of printable and non-printable character sequences

Returns a list of binaries each of which contains only one kind of characters.

If the given string is empty, an empty list is returned.

examples
Examples

iex> String.chunk(<<?a, ?b, ?c, 0>>, :valid)
["abc\0"]

iex> String.chunk(<<?a, ?b, ?c, 0, 0xFFFF::utf16>>, :valid)
["abc\0", <<0xFFFF::utf16>>]

iex> String.chunk(<<?a, ?b, ?c, 0, 0x0FFFF::utf8>>, :printable)
["abc", <<0, 0x0FFFF::utf8>>]

codepoints(string)

@spec codepoints(t()) :: [codepoint()]

Returns a list of code points encoded as strings.

To retrieve code points in their natural integer representation, see to_charlist/1. For details about code points and graphemes, see the String module documentation.

examples
Examples

iex> String.codepoints("olá")
["o", "l", "á"]

iex> String.codepoints("оптими зации")
["о", "п", "т", "и", "м", "и", " ", "з", "а", "ц", "и", "и"]

iex> String.codepoints("ἅἪῼ")
["ἅ", "Ἢ", "ῼ"]

iex> String.codepoints("\u00e9")
["é"]

iex> String.codepoints("\u0065\u0301")
["e", "́"]

contains?(string, contents)

@spec contains?(t(), pattern()) :: boolean()

Checks if string contains any of the given contents.

contents can be either a string, a list of strings, or a compiled pattern.

examples
Examples

iex> String.contains?("elixir of life", "of")
true
iex> String.contains?("elixir of life", ["life", "death"])
true
iex> String.contains?("elixir of life", ["death", "mercury"])
false

The argument can also be a compiled pattern:

iex> pattern = :binary.compile_pattern(["life", "death"])
iex> String.contains?("elixir of life", pattern)
true

An empty string will always match:

iex> String.contains?("elixir of life", "")
true
iex> String.contains?("elixir of life", ["", "other"])
true

Be aware that this function can match within or across grapheme boundaries. For example, take the grapheme "é" which is made of the characters "e" and the acute accent. The following returns true:

iex> String.contains?(String.normalize("é", :nfd), "e")
true

However, if "é" is represented by the single character "e with acute" accent, then it will return false:

iex> String.contains?(String.normalize("é", :nfc), "e")
false

downcase(string, mode \\ :default)

@spec downcase(t(), :default | :ascii | :greek | :turkic) :: t()

Converts all characters in the given string to lowercase according to mode.

mode may be :default, :ascii, :greek or :turkic. The :default mode considers all non-conditional transformations outlined in the Unicode standard. :ascii lowercases only the letters A to Z. :greek includes the context sensitive mappings found in Greek. :turkic properly handles the letter i with the dotless variant.

examples
Examples

iex> String.downcase("ABCD")
"abcd"

iex> String.downcase("AB 123 XPTO")
"ab 123 xpto"

iex> String.downcase("OLÁ")
"olá"

The :ascii mode ignores Unicode characters and provides a more performant implementation when you know the string contains only ASCII characters:

iex> String.downcase("OLÁ", :ascii)
"olÁ"

The :greek mode properly handles the context sensitive sigma in Greek:

iex> String.downcase("ΣΣ")
"σσ"

iex> String.downcase("ΣΣ", :greek)
"σς"

And :turkic properly handles the letter i with the dotless variant:

iex> String.downcase("Iİ")
"ii̇"

iex> String.downcase("Iİ", :turkic)
"ıi"

duplicate(subject, n)

@spec duplicate(t(), non_neg_integer()) :: t()

Returns a string subject repeated n times.

Inlined by the compiler.

examples
Examples

iex> String.duplicate("abc", 0)
""

iex> String.duplicate("abc", 1)
"abc"

iex> String.duplicate("abc", 2)
"abcabc"

ends_with?(string, suffix)

@spec ends_with?(t(), t() | [t()]) :: boolean()

Returns true if string ends with any of the suffixes given.

suffixes can be either a single suffix or a list of suffixes.

examples
Examples

iex> String.ends_with?("language", "age")
true
iex> String.ends_with?("language", ["youth", "age"])
true
iex> String.ends_with?("language", ["youth", "elixir"])
false

An empty suffix will always match:

iex> String.ends_with?("language", "")
true
iex> String.ends_with?("language", ["", "other"])
true

equivalent?(string1, string2)

@spec equivalent?(t(), t()) :: boolean()

Returns true if string1 is canonically equivalent to string2.

It performs Normalization Form Canonical Decomposition (NFD) on the strings before comparing them. This function is equivalent to:

String.normalize(string1, :nfd) == String.normalize(string2, :nfd)

If you plan to compare multiple strings, multiple times in a row, you may normalize them upfront and compare them directly to avoid multiple normalization passes.

examples
Examples

iex> String.equivalent?("abc", "abc")
true

iex> String.equivalent?("man\u0303ana", "mañana")
true

iex> String.equivalent?("abc", "ABC")
false

iex> String.equivalent?("nø", "nó")
false

first(string)

@spec first(t()) :: grapheme() | nil

Returns the first grapheme from a UTF-8 string, nil if the string is empty.

examples
Examples

iex> String.first("elixir")
"e"

iex> String.first("եոգլի")
"ե"

iex> String.first("")
nil

graphemes(string)

@spec graphemes(t()) :: [grapheme()]

Returns Unicode graphemes in the string as per Extended Grapheme Cluster algorithm.

The algorithm is outlined in the Unicode Standard Annex #29, Unicode Text Segmentation.

For details about code points and graphemes, see the String module documentation.

examples
Examples

iex> String.graphemes("Ńaïve")
["Ń", "a", "ï", "v", "e"]

iex> String.graphemes("\u00e9")
["é"]

iex> String.graphemes("\u0065\u0301")
["é"]

jaro_distance(string1, string2)

@spec jaro_distance(t(), t()) :: float()

Computes the Jaro distance (similarity) between two strings.

Returns a float value between 0.0 (equates to no similarity) and 1.0 (is an exact match) representing Jaro distance between string1 and string2.

The Jaro distance metric is designed and best suited for short strings such as person names. Elixir itself uses this function to provide the "did you mean?" functionality. For instance, when you are calling a function in a module and you have a typo in the function name, we attempt to suggest the most similar function name available, if any, based on the jaro_distance/2 score.

examples
Examples

iex> String.jaro_distance("Dwayne", "Duane")
0.8222222222222223
iex> String.jaro_distance("even", "odd")
0.0
iex> String.jaro_distance("same", "same")
1.0

last(string)

@spec last(t()) :: grapheme() | nil

Returns the last grapheme from a UTF-8 string, nil if the string is empty.

examples
Examples

iex> String.last("elixir")
"r"

iex> String.last("եոգլի")
"ի"

length(string)

@spec length(t()) :: non_neg_integer()

Returns the number of Unicode graphemes in a UTF-8 string.

examples
Examples

iex> String.length("elixir")
6

iex> String.length("եոգլի")
5

match?(string, regex)

@spec match?(t(), Regex.t()) :: boolean()

Checks if string matches the given regular expression.

examples
Examples

iex> String.match?("foo", ~r/foo/)
true

iex> String.match?("bar", ~r/foo/)
false

myers_difference(string1, string2)

(since 1.3.0)

@spec myers_difference(t(), t()) :: [{:eq | :ins | :del, t()}]

Returns a keyword list that represents an edit script.

Check List.myers_difference/2 for more information.

examples
Examples

iex> string1 = "fox hops over the dog"
iex> string2 = "fox jumps over the lazy cat"
iex> String.myers_difference(string1, string2)
[eq: "fox ", del: "ho", ins: "jum", eq: "ps over the ", del: "dog", ins: "lazy cat"]

next_codepoint(string)

@spec next_codepoint(t()) :: {codepoint(), t()} | nil

Returns the next code point in a string.

The result is a tuple with the code point and the remainder of the string or nil in case the string reached its end.

As with other functions in the String module, next_codepoint/1 works with binaries that are invalid UTF-8. If the string starts with a sequence of bytes that is not valid in UTF-8 encoding, the first element of the returned tuple is a binary with the first byte.

examples
Examples

iex> String.next_codepoint("olá")
{"o", "lá"}

iex> invalid = "\x80\x80OK" # first two bytes are invalid in UTF-8
iex> {_, rest} = String.next_codepoint(invalid)
{<<128>>, <<128, 79, 75>>}
iex> String.next_codepoint(rest)
{<<128>>, "OK"}

comparison-with-binary-pattern-matching
Comparison with binary pattern matching

Binary pattern matching provides a similar way to decompose a string:

iex> <<codepoint::utf8, rest::binary>> = "Elixir"
"Elixir"
iex> codepoint
69
iex> rest
"lixir"

though not entirely equivalent because codepoint comes as an integer, and the pattern won't match invalid UTF-8.

Binary pattern matching, however, is simpler and more efficient, so pick the option that better suits your use case.

next_grapheme(binary)

@spec next_grapheme(t()) :: {grapheme(), t()} | nil

Returns the next grapheme in a string.

The result is a tuple with the grapheme and the remainder of the string or nil in case the String reached its end.

examples
Examples

iex> String.next_grapheme("olá")
{"o", "lá"}

iex> String.next_grapheme("")
nil

next_grapheme_size(string)

@spec next_grapheme_size(t()) :: {pos_integer(), t()} | nil

Returns the size (in bytes) of the next grapheme.

The result is a tuple with the next grapheme size in bytes and the remainder of the string or nil in case the string reached its end.

examples
Examples

iex> String.next_grapheme_size("olá")
{1, "lá"}

iex> String.next_grapheme_size("")
nil

normalize(string, form)

Converts all characters in string to Unicode normalization form identified by form.

Invalid Unicode codepoints are skipped and the remaining of the string is converted. If you want the algorithm to stop and return on invalid codepoint, use :unicode.characters_to_nfd_binary/1, :unicode.characters_to_nfc_binary/1, :unicode.characters_to_nfkd_binary/1, and :unicode.characters_to_nfkc_binary/1 instead.

Normalization forms :nfkc and :nfkd should not be blindly applied to arbitrary text. Because they erase many formatting distinctions, they will prevent round-trip conversion to and from many legacy character sets.

forms
Forms

The supported forms are:

:nfd - Normalization Form Canonical Decomposition. Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.
:nfc - Normalization Form Canonical Composition. Characters are decomposed and then recomposed by canonical equivalence.
:nfkd - Normalization Form Compatibility Decomposition. Characters are decomposed by compatibility equivalence, and multiple combining characters are arranged in a specific order.
:nfkc - Normalization Form Compatibility Composition. Characters are decomposed and then recomposed by compatibility equivalence.

examples
Examples

iex> String.normalize("yêṩ", :nfd)
"yêṩ"

iex> String.normalize("leña", :nfc)
"leña"

iex> String.normalize("ﬁ", :nfkd)
"fi"

iex> String.normalize("fi", :nfkc)
"fi"

pad_leading(string, count, padding \\ [" "])

@spec pad_leading(t(), non_neg_integer(), t() | [t()]) :: t()

Returns a new string padded with a leading filler which is made of elements from the padding.

Passing a list of strings as padding will take one element of the list for every missing entry. If the list is shorter than the number of inserts, the filling will start again from the beginning of the list. Passing a string padding is equivalent to passing the list of graphemes in it. If no padding is given, it defaults to whitespace.

When count is less than or equal to the length of string, given string is returned.

Raises ArgumentError if the given padding contains a non-string element.

examples
Examples

iex> String.pad_leading("abc", 5)
"  abc"

iex> String.pad_leading("abc", 4, "12")
"1abc"

iex> String.pad_leading("abc", 6, "12")
"121abc"

iex> String.pad_leading("abc", 5, ["1", "23"])
"123abc"

pad_trailing(string, count, padding \\ [" "])

@spec pad_trailing(t(), non_neg_integer(), t() | [t()]) :: t()

Returns a new string padded with a trailing filler which is made of elements from the padding.

When count is less than or equal to the length of string, given string is returned.

Raises ArgumentError if the given padding contains a non-string element.

examples
Examples

iex> String.pad_trailing("abc", 5)
"abc  "

iex> String.pad_trailing("abc", 4, "12")
"abc1"

iex> String.pad_trailing("abc", 6, "12")
"abc121"

iex> String.pad_trailing("abc", 5, ["1", "23"])
"abc123"

printable?(string, character_limit \\ :infinity)

@spec printable?(t(), 0) :: true

@spec printable?(t(), pos_integer() | :infinity) :: boolean()

Checks if a string contains only printable characters up to character_limit.

Takes an optional character_limit as a second argument. If character_limit is 0, this function will return true.

examples
Examples

iex> String.printable?("abc")
true

iex> String.printable?("abc" <> <<0>>)
false

iex> String.printable?("abc" <> <<0>>, 2)
true

iex> String.printable?("abc" <> <<0>>, 0)
true

replace(subject, pattern, replacement, options \\ [])

@spec replace(t(), pattern() | Regex.t(), t() | (t() -> t() | iodata()), keyword()) ::
  t()

Returns a new string created by replacing occurrences of pattern in subject with replacement.

The subject is always a string.

The pattern may be a string, a list of strings, a regular expression, or a compiled pattern.

The replacement may be a string or a function that receives the matched pattern and must return the replacement as a string or iodata.

By default it replaces all occurrences but this behaviour can be controlled through the :global option; see the "Options" section below.

options
Options

:global - (boolean) if true, all occurrences of pattern are replaced with replacement, otherwise only the first occurrence is replaced. Defaults to true

examples
Examples

iex> String.replace("a,b,c", ",", "-")
"a-b-c"

iex> String.replace("a,b,c", ",", "-", global: false)
"a-b,c"

The pattern may also be a list of strings and the replacement may also be a function that receives the matches:

iex> String.replace("a,b,c", ["a", "c"], fn <<char>> -> <<char + 1>> end)
"b,b,d"

When the pattern is a regular expression, one can give \N or \g{N} in the replacement string to access a specific capture in the regular expression:

iex> String.replace("a,b,c", ~r/,(.)/, ",\\1\\g{1}")
"a,bb,cc"

Note that we had to escape the backslash escape character (i.e., we used \\N instead of just \N to escape the backslash; same thing for \\g{N}). By giving \0, one can inject the whole match in the replacement string.

A compiled pattern can also be given:

iex> pattern = :binary.compile_pattern(",")
iex> String.replace("a,b,c", pattern, "[]")
"a[]b[]c"

When an empty string is provided as a pattern, the function will treat it as an implicit empty string between each grapheme and the string will be interspersed. If an empty string is provided as replacement the subject will be returned:

iex> String.replace("ELIXIR", "", ".")
".E.L.I.X.I.R."

iex> String.replace("ELIXIR", "", "")
"ELIXIR"

replace_leading(string, match, replacement)

@spec replace_leading(t(), t(), t()) :: t()

Replaces all leading occurrences of match by replacement of match in string.

Returns the string untouched if there are no occurrences.

If match is "", this function raises an ArgumentError exception: this happens because this function replaces all the occurrences of match at the beginning of string, and it's impossible to replace "multiple" occurrences of "".

examples
Examples

iex> String.replace_leading("hello world", "hello ", "")
"world"
iex> String.replace_leading("hello hello world", "hello ", "")
"world"

iex> String.replace_leading("hello world", "hello ", "ola ")
"ola world"
iex> String.replace_leading("hello hello world", "hello ", "ola ")
"ola ola world"

replace_prefix(string, match, replacement)

@spec replace_prefix(t(), t(), t()) :: t()

Replaces prefix in string by replacement if it matches match.

Returns the string untouched if there is no match. If match is an empty string (""), replacement is just prepended to string.

examples
Examples

iex> String.replace_prefix("world", "hello ", "")
"world"
iex> String.replace_prefix("hello world", "hello ", "")
"world"
iex> String.replace_prefix("hello hello world", "hello ", "")
"hello world"

iex> String.replace_prefix("world", "hello ", "ola ")
"world"
iex> String.replace_prefix("hello world", "hello ", "ola ")
"ola world"
iex> String.replace_prefix("hello hello world", "hello ", "ola ")
"ola hello world"

iex> String.replace_prefix("world", "", "hello ")
"hello world"

replace_suffix(string, match, replacement)

@spec replace_suffix(t(), t(), t()) :: t()

Replaces suffix in string by replacement if it matches match.

Returns the string untouched if there is no match. If match is an empty string (""), replacement is just appended to string.

examples
Examples

iex> String.replace_suffix("hello", " world", "")
"hello"
iex> String.replace_suffix("hello world", " world", "")
"hello"
iex> String.replace_suffix("hello world world", " world", "")
"hello world"

iex> String.replace_suffix("hello", " world", " mundo")
"hello"
iex> String.replace_suffix("hello world", " world", " mundo")
"hello mundo"
iex> String.replace_suffix("hello world world", " world", " mundo")
"hello world mundo"

iex> String.replace_suffix("hello", "", " world")
"hello world"

replace_trailing(string, match, replacement)

@spec replace_trailing(t(), t(), t()) :: t()

Replaces all trailing occurrences of match by replacement in string.

Returns the string untouched if there are no occurrences.

If match is "", this function raises an ArgumentError exception: this happens because this function replaces all the occurrences of match at the end of string, and it's impossible to replace "multiple" occurrences of "".

examples
Examples

iex> String.replace_trailing("hello world", " world", "")
"hello"
iex> String.replace_trailing("hello world world", " world", "")
"hello"

iex> String.replace_trailing("hello world", " world", " mundo")
"hello mundo"
iex> String.replace_trailing("hello world world", " world", " mundo")
"hello mundo mundo"

reverse(string)

@spec reverse(t()) :: t()

Reverses the graphemes in given string.

examples
Examples

iex> String.reverse("abcd")
"dcba"

iex> String.reverse("hello world")
"dlrow olleh"

iex> String.reverse("hello ∂og")
"go∂ olleh"

Keep in mind reversing the same string twice does not necessarily yield the original string:

iex> "̀e"
"̀e"
iex> String.reverse("̀e")
"è"
iex> String.reverse(String.reverse("̀e"))
"è"

In the first example the accent is before the vowel, so it is considered two graphemes. However, when you reverse it once, you have the vowel followed by the accent, which becomes one grapheme. Reversing it again will keep it as one single grapheme.

slice(string, range)

@spec slice(t(), Range.t()) :: t()

Returns a substring from the offset given by the start of the range to the offset given by the end of the range.

If the start of the range is not a valid offset for the given string or if the range is in reverse order, returns "".

If the start or end of the range is negative, the whole string is traversed first in order to convert the negative indices into positive ones.

Remember this function works with Unicode graphemes and considers the slices to represent grapheme offsets. If you want to split on raw bytes, check Kernel.binary_part/3 instead.

examples
Examples

iex> String.slice("elixir", 1..3)
"lix"

iex> String.slice("elixir", 1..10)
"lixir"

iex> String.slice("elixir", -4..-1)
"ixir"

iex> String.slice("elixir", -4..6)
"ixir"

For ranges where start > stop, you need to explicit mark them as increasing:

iex> String.slice("elixir", 2..-1//1)
"ixir"

iex> String.slice("elixir", 1..-2//1)
"lixi"

If values are out of bounds, it returns an empty string:

iex> String.slice("elixir", 10..3)
""

iex> String.slice("elixir", -10..-7)
""

iex> String.slice("a", 0..1500)
"a"

iex> String.slice("a", 1..1500)
""

slice(string, start, length)

@spec slice(t(), integer(), non_neg_integer()) :: grapheme()

Returns a substring starting at the offset start, and of the given length.

If the offset is greater than string length, then it returns "".

Remember this function works with Unicode graphemes and considers the slices to represent grapheme offsets. If you want to split on raw bytes, check Kernel.binary_part/3 instead.

examples
Examples

iex> String.slice("elixir", 1, 3)
"lix"

iex> String.slice("elixir", 1, 10)
"lixir"

iex> String.slice("elixir", 10, 3)
""

iex> String.slice("elixir", -4, 4)
"ixir"

iex> String.slice("elixir", -10, 3)
""

iex> String.slice("a", 0, 1500)
"a"

iex> String.slice("a", 1, 1500)
""

iex> String.slice("a", 2, 1500)
""

split(binary)

@spec split(t()) :: [t()]

examples
Examples

iex> String.split("foo bar")
["foo", "bar"]

iex> String.split("foo" <> <<194, 133>> <> "bar")
["foo", "bar"]

iex> String.split(" foo   bar ")
["foo", "bar"]

iex> String.split("no\u00a0break")
["no\u00a0break"]

split(string, pattern, options \\ [])

@spec split(t(), pattern() | Regex.t(), keyword()) :: [t()]

Divides a string into parts based on a pattern.

Returns a list of these parts.

The pattern may be a string, a list of strings, a regular expression, or a compiled pattern.

The string is split into as many parts as possible by default, but can be controlled via the :parts option.

Empty strings are only removed from the result if the :trim option is set to true.

When the pattern used is a regular expression, the string is split using Regex.split/3.

options
Options

:parts (positive integer or :infinity) - the string is split into at most as many parts as this option specifies. If :infinity, the string will be split into all possible parts. Defaults to :infinity.
:trim (boolean) - if true, empty strings are removed from the resulting list.

This function also accepts all options accepted by Regex.split/3 if pattern is a regular expression.

examples
Examples

Splitting with a string pattern:

iex> String.split("a,b,c", ",")
["a", "b", "c"]

iex> String.split("a,b,c", ",", parts: 2)
["a", "b,c"]

iex> String.split(" a b c ", " ", trim: true)
["a", "b", "c"]

A list of patterns:

iex> String.split("1,2 3,4", [" ", ","])
["1", "2", "3", "4"]

A regular expression:

iex> String.split("a,b,c", ~r{,})
["a", "b", "c"]

iex> String.split("a,b,c", ~r{,}, parts: 2)
["a", "b,c"]

iex> String.split(" a b c ", ~r{\s}, trim: true)
["a", "b", "c"]

iex> String.split("abc", ~r{b}, include_captures: true)
["a", "b", "c"]

A compiled pattern:

iex> pattern = :binary.compile_pattern([" ", ","])
iex> String.split("1,2 3,4", pattern)
["1", "2", "3", "4"]

Splitting on empty string returns graphemes:

iex> String.split("abc", "")
["", "a", "b", "c", ""]

iex> String.split("abc", "", trim: true)
["a", "b", "c"]

iex> String.split("abc", "", parts: 1)
["abc"]

iex> String.split("abc", "", parts: 3)
["", "a", "bc"]

Be aware that this function can split within or across grapheme boundaries. For example, take the grapheme "é" which is made of the characters "e" and the acute accent. The following will split the string into two parts:

iex> String.split(String.normalize("é", :nfd), "e")
["", "́"]

However, if "é" is represented by the single character "e with acute" accent, then it will split the string into just one part:

iex> String.split(String.normalize("é", :nfc), "e")
["é"]

split_at(string, position)

@spec split_at(t(), integer()) :: {t(), t()}

Splits a string into two at the specified offset. When the offset given is negative, location is counted from the end of the string.

The offset is capped to the length of the string. Returns a tuple with two elements.

Note: keep in mind this function splits on graphemes and for such it has to linearly traverse the string. If you want to split a string or a binary based on the number of bytes, use Kernel.binary_part/3 instead.

examples
Examples

iex> String.split_at("sweetelixir", 5)
{"sweet", "elixir"}

iex> String.split_at("sweetelixir", -6)
{"sweet", "elixir"}

iex> String.split_at("abc", 0)
{"", "abc"}

iex> String.split_at("abc", 1000)
{"abc", ""}

iex> String.split_at("abc", -1000)
{"", "abc"}

splitter(string, pattern, options \\ [])

@spec splitter(t(), pattern(), keyword()) :: Enumerable.t()

Returns an enumerable that splits a string on demand.

This is in contrast to split/3 which splits the entire string upfront.

This function does not support regular expressions by design. When using regular expressions, it is often more efficient to have the regular expressions traverse the string at once than in parts, like this function does.

options
Options

:trim - when true, does not emit empty patterns

examples
Examples

iex> String.splitter("1,2 3,4 5,6 7,8,...,99999", [" ", ","]) |> Enum.take(4)
["1", "2", "3", "4"]

iex> String.splitter("abcd", "") |> Enum.take(10)
["", "a", "b", "c", "d", ""]

iex> String.splitter("abcd", "", trim: true) |> Enum.take(10)
["a", "b", "c", "d"]

A compiled pattern can also be given:

iex> pattern = :binary.compile_pattern([" ", ","])
iex> String.splitter("1,2 3,4 5,6 7,8,...,99999", pattern) |> Enum.take(4)
["1", "2", "3", "4"]

starts_with?(string, prefix)

@spec starts_with?(t(), pattern()) :: boolean()

Returns true if string starts with any of the prefixes given.

prefix can be either a string, a list of strings, or a compiled pattern.

examples
Examples

iex> String.starts_with?("elixir", "eli")
true
iex> String.starts_with?("elixir", ["erlang", "elixir"])
true
iex> String.starts_with?("elixir", ["erlang", "ruby"])
false

A compiled pattern can also be given:

iex> pattern = :binary.compile_pattern(["erlang", "elixir"])
iex> String.starts_with?("elixir", pattern)
true

An empty string will always match:

iex> String.starts_with?("elixir", "")
true
iex> String.starts_with?("elixir", ["", "other"])
true

to_atom(string)

@spec to_atom(t()) :: atom()

Converts a string to an atom.

Warning: this function creates atoms dynamically and atoms are not garbage-collected. Therefore, string should not be an untrusted value, such as input received from a socket or during a web request. Consider using to_existing_atom/1 instead.

By default, the maximum number of atoms is 1_048_576. This limit can be raised or lowered using the VM option +t.

The maximum atom size is of 255 Unicode code points.

Inlined by the compiler.

examples
Examples

iex> String.to_atom("my_atom")
:my_atom

to_charlist(string)

@spec to_charlist(t()) :: charlist()

Converts a string into a charlist.

Specifically, this function takes a UTF-8 encoded binary and returns a list of its integer code points. It is similar to codepoints/1 except that the latter returns a list of code points as strings.

In case you need to work with bytes, take a look at the :binary module.

examples
Examples

iex> String.to_charlist("æß")
'æß'

to_existing_atom(string)

@spec to_existing_atom(t()) :: atom()

Converts a string to an existing atom.

The maximum atom size is of 255 Unicode code points.

Inlined by the compiler.

examples
Examples

iex> _ = :my_atom
iex> String.to_existing_atom("my_atom")
:my_atom

to_float(string)

@spec to_float(t()) :: float()

Returns a float whose text representation is string.

string must be the string representation of a float including a decimal point. In order to parse a string without decimal point as a float then Float.parse/1 should be used. Otherwise, an ArgumentError will be raised.

Inlined by the compiler.

examples
Examples

iex> String.to_float("2.2017764e+0")
2.2017764

iex> String.to_float("3.0")
3.0

String.to_float("3")
** (ArgumentError) argument error

to_integer(string)

@spec to_integer(t()) :: integer()

Returns an integer whose text representation is string.

string must be the string representation of an integer. Otherwise, an ArgumentError will be raised. If you want to parse a string that may contain an ill-formatted integer, use Integer.parse/1.

Inlined by the compiler.

examples
Examples

iex> String.to_integer("123")
123

Passing a string that does not represent an integer leads to an error:

String.to_integer("invalid data")
** (ArgumentError) argument error

to_integer(string, base)

@spec to_integer(t(), 2..36) :: integer()

Returns an integer whose text representation is string in base base.

Inlined by the compiler.

examples
Examples

iex> String.to_integer("3FF", 16)
1023

trim(string)

@spec trim(t()) :: t()

Returns a string where all leading and trailing Unicode whitespaces have been removed.

examples
Examples

iex> String.trim("\n  abc\n  ")
"abc"

trim(string, to_trim)

@spec trim(t(), t()) :: t()

Returns a string where all leading and trailing to_trim characters have been removed.

examples
Examples

iex> String.trim("a  abc  a", "a")
"  abc  "

trim_leading(string)

@spec trim_leading(t()) :: t()

Returns a string where all leading Unicode whitespaces have been removed.

examples
Examples

iex> String.trim_leading("\n  abc   ")
"abc   "

trim_leading(string, to_trim)

@spec trim_leading(t(), t()) :: t()

Returns a string where all leading to_trim characters have been removed.

examples
Examples

iex> String.trim_leading("__ abc _", "_")
" abc _"

iex> String.trim_leading("1 abc", "11")
"1 abc"

trim_trailing(string)

@spec trim_trailing(t()) :: t()

Returns a string where all trailing Unicode whitespaces has been removed.

examples
Examples

iex> String.trim_trailing("   abc\n  ")
"   abc"

trim_trailing(string, to_trim)

@spec trim_trailing(t(), t()) :: t()

Returns a string where all trailing to_trim characters have been removed.

examples
Examples

iex> String.trim_trailing("_ abc __", "_")
"_ abc "

iex> String.trim_trailing("abc 1", "11")
"abc 1"

upcase(string, mode \\ :default)

@spec upcase(t(), :default | :ascii | :greek | :turkic) :: t()

Converts all characters in the given string to uppercase according to mode.

mode may be :default, :ascii, :greek or :turkic. The :default mode considers all non-conditional transformations outlined in the Unicode standard. :ascii uppercases only the letters a to z. :greek includes the context sensitive mappings found in Greek. :turkic properly handles the letter i with the dotless variant.

examples
Examples

iex> String.upcase("abcd")
"ABCD"

iex> String.upcase("ab 123 xpto")
"AB 123 XPTO"

iex> String.upcase("olá")
"OLÁ"

The :ascii mode ignores Unicode characters and provides a more performant implementation when you know the string contains only ASCII characters:

iex> String.upcase("olá", :ascii)
"OLá"

And :turkic properly handles the letter i with the dotless variant:

iex> String.upcase("ıi")
"II"

iex> String.upcase("ıi", :turkic)
"Iİ"

valid?(arg1)

@spec valid?(t()) :: boolean()

Checks whether string contains only valid characters.

examples
Examples

iex> String.valid?("a")
true

iex> String.valid?("ø")
true

iex> String.valid?(<<0xFFFF::16>>)
false

iex> String.valid?(<<0xEF, 0xB7, 0x90>>)
true

iex> String.valid?("asd" <> <<0xFFFF::16>>)
false

Settings View Source String (Elixir v1.12.3)

interpolation Interpolation

escape-characters Escape characters

code-points-and-grapheme-cluster Code points and grapheme cluster

string-and-binary-operations String and binary operations

integer-code-points Integer code points

self-synchronization Self-synchronization

compile-binary-patterns Compile binary patterns

Link to this section Summary

Types

Functions

Link to this section Types

codepoint()

grapheme()

pattern()

t()

Link to this section Functions

at(string, position)

examples Examples

bag_distance(string1, string2)

examples Examples

capitalize(string, mode \\ :default)

examples Examples

chunk(string, trait)

examples Examples

codepoints(string)

examples Examples

contains?(string, contents)

examples Examples

downcase(string, mode \\ :default)

examples Examples

duplicate(subject, n)

examples Examples

ends_with?(string, suffix)

examples Examples

equivalent?(string1, string2)

examples Examples

first(string)

examples Examples

graphemes(string)

examples Examples

jaro_distance(string1, string2)

examples Examples

last(string)

examples Examples

length(string)

examples Examples

match?(string, regex)

examples Examples

myers_difference(string1, string2)

examples Examples

next_codepoint(string)

examples Examples

comparison-with-binary-pattern-matching Comparison with binary pattern matching

next_grapheme(binary)

examples Examples

next_grapheme_size(string)

examples Examples

normalize(string, form)

forms Forms

examples Examples

pad_leading(string, count, padding \\ [" "])

examples Examples

pad_trailing(string, count, padding \\ [" "])

examples Examples

printable?(string, character_limit \\ :infinity)

examples Examples

replace(subject, pattern, replacement, options \\ [])

options Options

examples Examples

replace_leading(string, match, replacement)

examples Examples

replace_prefix(string, match, replacement)

examples Examples

replace_suffix(string, match, replacement)

examples Examples

replace_trailing(string, match, replacement)

examples Examples

reverse(string)

examples Examples

View Source String (Elixir v1.12.3)

interpolation
Interpolation

escape-characters
Escape characters

code-points-and-grapheme-cluster
Code points and grapheme cluster

string-and-binary-operations
String and binary operations

integer-code-points
Integer code points

self-synchronization
Self-synchronization

compile-binary-patterns
Compile binary patterns

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

comparison-with-binary-pattern-matching
Comparison with binary pattern matching

examples
Examples

examples
Examples

forms
Forms

examples
Examples

examples
Examples

examples
Examples

examples
Examples

options
Options

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

options
Options

examples
Examples

examples
Examples

options
Options

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples