Elixir v1.4.2 String View Source

A String in Elixir is a UTF-8 encoded binary.

Codepoints and grapheme cluster

The functions in this module act according to the Unicode Standard, version 9.0.0.

As per the standard, a codepoint is a single Unicode Character, which may be represented by one or more bytes.

For example, the codepoint “é” is two bytes:

iex> byte_size("é")
2

However, this module returns the proper length:

iex> String.length("é")
1

Furthermore, this module also presents the concept of grapheme cluster (from now on referenced as graphemes). Graphemes can consist of multiple codepoints that may be perceived as a single character by readers. For example, “é” can be represented either as a single “e with acute” codepoint or as the letter “e” followed by a “combining acute accent” (two codepoints):

iex> string = "\u0065\u0301"
iex> byte_size(string)
3
iex> String.length(string)
1
iex> String.codepoints(string)
["e", "́"]
iex> String.graphemes(string)
["é"]

Although the example above is made of two characters, it is perceived by users as one.

Graphemes can also be two characters that are interpreted as one by some languages. For example, some languages may consider “ch” as a single character. However, since this information depends on the locale, it is not taken into account by this module.

In general, the functions in this module rely on the Unicode Standard, but do not contain any of the locale specific behaviour.

More information about graphemes can be found in the Unicode Standard Annex #29. The current Elixir version implements Extended Grapheme Cluster algorithm.

String and binary operations

To act according to the Unicode Standard, many functions in this module run in linear time, as they need to traverse the whole string considering the proper Unicode codepoints.

For example, String.length/1 will take longer as the input grows. On the other hand, Kernel.byte_size/1 always runs in constant time (i.e. regardless of the input size).

This means often there are performance costs in using the functions in this module, compared to the more low-level operations that work directly with binaries:

There are many situations where using the String module can be avoided in favor of binary functions or pattern matching. For example, imagine you have a string prefix and you want to remove this prefix from another string named full.

One may be tempted to write:

iex> take_prefix = fn full, prefix ->
...>   base = String.length(prefix)
...>   String.slice(full, base, String.length(full) - base)
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"

Although the function above works, it performs poorly. To calculate the length of the string, we need to traverse it fully, so we traverse both prefix and full strings, then slice the full one, traversing it again.

A first attempt at improving it could be with ranges:

iex> take_prefix = fn full, prefix ->
...>   base = String.length(prefix)
...>   String.slice(full, base..-1)
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"

While this is much better (we don’t traverse full twice), it could still be improved. In this case, since we want to extract a substring from a string, we can use Kernel.byte_size/1 and Kernel.binary_part/3 as there is no chance we will slice in the middle of a codepoint made of more than one byte:

iex> take_prefix = fn full, prefix ->
...>   base = byte_size(prefix)
...>   binary_part(full, base, byte_size(full) - base)
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"

Or simply use pattern matching:

iex> take_prefix = fn full, prefix ->
...>   base = byte_size(prefix)
...>   <<_::binary-size(base), rest::binary>> = full
...>   rest
...> end
iex> take_prefix.("Mr. John", "Mr. ")
"John"

On the other hand, if you want to dynamically slice a string based on an integer value, then using String.slice/3 is the best option as it guarantees we won’t incorrectly split a valid codepoint into multiple bytes.

Integer codepoints

Although codepoints could be represented as integers, this module represents all codepoints as strings. For example:

iex> String.codepoints("olá")
["o", "l", "á"]

There are a couple of ways to retrieve a character integer codepoint. One may use the ? construct:

iex> ?o
111

iex> ?á
225

Or also via pattern matching:

iex> <<aacute::utf8>> = "á"
iex> aacute
225

As we have seen above, codepoints can be inserted into a string by their hexadecimal code:

"ol\u0061\u0301" #=>
"olá"

Self-synchronization

The UTF-8 encoding is self-synchronizing. This means that if malformed data (i.e., data that is not possible according to the definition of the encoding) is encountered, only one codepoint needs to be rejected.

This module relies on this behaviour to ignore such invalid characters. For example, length/1 will return a correct result even if an invalid codepoint is fed into it.

In other words, this module expects invalid data to be detected elsewhere, usually when retrieving data from the external source. For example, a driver that reads strings from a database will be responsible to check the validity of the encoding. String.chunk/2 can be used for breaking a string into valid and invalid parts.

Patterns

Many functions in this module work with patterns. For example, String.split/2 can split a string into multiple patterns given a pattern. This pattern can be a string, a list of strings or a compiled pattern:

iex> String.split("foo bar", " ")
["foo", "bar"]

iex> String.split("foo bar!", [" ", "!"])
["foo", "bar", ""]

iex> pattern = :binary.compile_pattern([" ", "!"])
iex> String.split("foo bar!", pattern)
["foo", "bar", ""]

The compiled pattern is useful when the same match will be done over and over again. Note though the compiled pattern cannot be stored in a module attribute as the pattern is generated at runtime and does not survive compile term.

Link to this section Summary

Functions

Returns the grapheme at the position of the given UTF-8 string. If position is greater than string length, then it returns nil

Converts the first character in the given string to uppercase and the remainder to lowercase

Splits the string into chunks of characters that share a common trait

Returns all codepoints in the string

Checks if string contains any of the given contents

Converts all characters in the given string to lowercase

Returns a string subject duplicated n times

Returns true if string ends with any of the suffixes given

Returns true if string1 is canonically equivalent to ‘string2’

Returns the first grapheme from a UTF-8 string, nil if the string is empty

Returns Unicode graphemes in the string as per Extended Grapheme Cluster algorithm

Returns a float value between 0 (equates to no similarity) and 1 (is an exact match) representing Jaro distance between string1 and string2

Returns the last grapheme from a UTF-8 string, nil if the string is empty

Returns the number of Unicode graphemes in a UTF-8 string

Checks if string matches the given regular expression

Returns a keyword list that represents an edit script

Returns the next codepoint in a string

Returns the next grapheme in a string

Returns the size of the next grapheme

Converts all characters in string to Unicode normalization form identified by form

Returns a new string padded with a leading filler which is made of elements from the padding

Returns a new string padded with a trailing filler which is made of elements from the padding

Checks if a string contains only printable characters

Returns a new string created by replacing occurrences of pattern in subject with replacement

Replaces all leading occurrences of match by replacement of match in string

Replaces prefix in string by replacement if it matches match

Replaces suffix in string by replacement if it matches match

Replaces all trailing occurrences of match by replacement in string

Reverses the graphemes in given string

Returns a substring from the offset given by the start of the range to the offset given by the end of the range

Returns a substring starting at the offset start, and of length len

Divides a string into substrings at each Unicode whitespace occurrence with leading and trailing whitespace ignored. Groups of whitespace are treated as a single occurrence. Divisions do not occur on non-breaking whitespace

Divides a string into substrings based on a pattern

Splits a string into two at the specified offset. When the offset given is negative, location is counted from the end of the string

Returns an enumerable that splits a string on demand

Returns true if string starts with any of the prefixes given

Converts a string to an atom

Converts a string into a charlist

Converts a string to an existing atom

Returns a float whose text representation is string

Returns an integer whose text representation is string

Returns an integer whose text representation is string in base base

Returns a string where all leading and trailing Unicode whitespaces have been removed

Returns a string where all leading and trailing to_trims have been removed

Returns a string where all leading Unicode whitespaces have been removed

Returns a string where all leading to_trims have been removed

Returns a string where all trailing Unicode whitespaces has been removed

Returns a string where all trailing to_trims have been removed

Converts all characters in the given string to uppercase

Checks whether string contains only valid characters

Link to this section Types

Link to this type codepoint() View Source
codepoint() :: t
Link to this type grapheme() View Source
grapheme() :: t
Link to this type pattern() View Source
pattern() :: t | [t] | :binary.cp

Link to this section Functions

Link to this function at(string, position) View Source
at(t, integer) :: grapheme | nil

Returns the grapheme at the position of the given UTF-8 string. If position is greater than string length, then it returns nil.

Examples

iex> String.at("elixir", 0)
"e"

iex> String.at("elixir", 1)
"l"

iex> String.at("elixir", 10)
nil

iex> String.at("elixir", -1)
"r"

iex> String.at("elixir", -10)
nil
Link to this function capitalize(string) View Source
capitalize(t) :: t

Converts the first character in the given string to uppercase and the remainder to lowercase.

This relies on the titlecase information provided by the Unicode Standard. Note this function makes no attempt to capitalize all words in the string (usually known as titlecase).

Examples

iex> String.capitalize("abcd")
"Abcd"

iex> String.capitalize("fin")
"Fin"

iex> String.capitalize("olá")
"Olá"
Link to this function chunk(string, trait) View Source
chunk(t, :valid | :printable) :: [t]

Splits the string into chunks of characters that share a common trait.

The trait can be one of two options:

  • :valid - the string is split into chunks of valid and invalid character sequences

  • :printable - the string is split into chunks of printable and non-printable character sequences

Returns a list of binaries each of which contains only one kind of characters.

If the given string is empty, an empty list is returned.

Examples

iex> String.chunk(<<?a, ?b, ?c, 0>>, :valid)
["abc\0"]

iex> String.chunk(<<?a, ?b, ?c, 0, 0x0FFFF::utf8>>, :valid)
["abc\0", <<0x0FFFF::utf8>>]

iex> String.chunk(<<?a, ?b, ?c, 0, 0x0FFFF::utf8>>, :printable)
["abc", <<0, 0x0FFFF::utf8>>]
Link to this function codepoints(string) View Source
codepoints(t) :: [codepoint]

Returns all codepoints in the string.

For details about codepoints and graphemes, see the String module documentation.

Examples

iex> String.codepoints("olá")
["o", "l", "á"]

iex> String.codepoints("оптими зации")
["о", "п", "т", "и", "м", "и", " ", "з", "а", "ц", "и", "и"]

iex> String.codepoints("ἅἪῼ")
["ἅ", "Ἢ", "ῼ"]

iex> String.codepoints("é")
["é"]

iex> String.codepoints("é")
["e", "́"]
Link to this function contains?(string, contents) View Source
contains?(t, pattern) :: boolean

Checks if string contains any of the given contents.

contents can be either a single string or a list of strings.

Examples

iex> String.contains? "elixir of life", "of"
true
iex> String.contains? "elixir of life", ["life", "death"]
true
iex> String.contains? "elixir of life", ["death", "mercury"]
false

An empty string will always match:

iex> String.contains? "elixir of life", ""
true
iex> String.contains? "elixir of life", ["", "other"]
true

The argument can also be a precompiled pattern:

iex> pattern = :binary.compile_pattern(["life", "death"])
iex> String.contains? "elixir of life", pattern
true
Link to this function downcase(binary) View Source
downcase(t) :: t

Converts all characters in the given string to lowercase.

Examples

iex> String.downcase("ABCD")
"abcd"

iex> String.downcase("AB 123 XPTO")
"ab 123 xpto"

iex> String.downcase("OLÁ")
"olá"
Link to this function duplicate(subject, n) View Source
duplicate(t, non_neg_integer) :: t

Returns a string subject duplicated n times.

Examples

iex> String.duplicate("abc", 0)
""

iex> String.duplicate("abc", 1)
"abc"

iex> String.duplicate("abc", 2)
"abcabc"
Link to this function ends_with?(string, suffixes) View Source
ends_with?(t, t | [t]) :: boolean

Returns true if string ends with any of the suffixes given.

suffixes can be either a single suffix or a list of suffixes.

Examples

iex> String.ends_with? "language", "age"
true
iex> String.ends_with? "language", ["youth", "age"]
true
iex> String.ends_with? "language", ["youth", "elixir"]
false

An empty suffix will always match:

iex> String.ends_with? "language", ""
true
iex> String.ends_with? "language", ["", "other"]
true
Link to this function equivalent?(string1, string2) View Source
equivalent?(t, t) :: boolean

Returns true if string1 is canonically equivalent to ‘string2’.

It performs Normalization Form Canonical Decomposition (NFD) on the strings before comparing them. This function is equivalent to:

String.normalize(string1, :nfd) == String.normalize(string2, :nfd)

Therefore, if you plan to compare multiple strings, multiple times in a row, you may normalize them upfront and compare them directly to avoid multiple normalization passes.

Examples

iex> String.equivalent?("abc", "abc")
true

iex> String.equivalent?("man\u0303ana", "mañana")
true

iex> String.equivalent?("abc", "ABC")
false

iex> String.equivalent?("nø", "nó")
false
Link to this function first(string) View Source
first(t) :: grapheme | nil

Returns the first grapheme from a UTF-8 string, nil if the string is empty.

Examples

iex> String.first("elixir")
"e"

iex> String.first("եոգլի")
"ե"
Link to this function graphemes(string) View Source
graphemes(t) :: [grapheme]

Returns Unicode graphemes in the string as per Extended Grapheme Cluster algorithm.

The algorithm is outlined in the Unicode Standard Annex #29, Unicode Text Segmentation.

For details about codepoints and graphemes, see the String module documentation.

Examples

iex> String.graphemes("Ńaïve")
["Ń", "a", "ï", "v", "e"]

iex> String.graphemes("é")
["é"]

iex> String.graphemes("é")
["é"]
Link to this function jaro_distance(string1, string2) View Source
jaro_distance(t, t) :: float

Returns a float value between 0 (equates to no similarity) and 1 (is an exact match) representing Jaro distance between string1 and string2.

The Jaro distance metric is designed and best suited for short strings such as person names.

Examples

iex> String.jaro_distance("dwayne", "duane")
0.8222222222222223
iex> String.jaro_distance("even", "odd")
0.0

Returns the last grapheme from a UTF-8 string, nil if the string is empty.

Examples

iex> String.last("elixir")
"r"

iex> String.last("եոգլի")
"ի"
Link to this function length(string) View Source
length(t) :: non_neg_integer

Returns the number of Unicode graphemes in a UTF-8 string.

Examples

iex> String.length("elixir")
6

iex> String.length("եոգլի")
5
Link to this function match?(string, regex) View Source
match?(t, Regex.t) :: boolean

Checks if string matches the given regular expression.

Examples

iex> String.match?("foo", ~r/foo/)
true

iex> String.match?("bar", ~r/foo/)
false
Link to this function myers_difference(string1, string2) View Source
myers_difference(t, t) :: [{:eq | :ins | :del, t}] | nil

Returns a keyword list that represents an edit script.

Check List.myers_difference/2 for more information.

Examples

iex> string1 = "fox hops over the dog"
iex> string2 = "fox jumps over the lazy cat"
iex> String.myers_difference(string1, string2)
[eq: "fox ", del: "ho", ins: "jum", eq: "ps over the ", del: "dog", ins: "lazy cat"]
Link to this function next_codepoint(string) View Source
next_codepoint(t) :: {codepoint, t} | nil

Returns the next codepoint in a string.

The result is a tuple with the codepoint and the remainder of the string or nil in case the string reached its end.

As with other functions in the String module, this function does not check for the validity of the codepoint. That said, if an invalid codepoint is found, it will be returned by this function.

Examples

iex> String.next_codepoint("olá")
{"o", "lá"}
Link to this function next_grapheme(binary) View Source
next_grapheme(t) :: {grapheme, t} | nil

Returns the next grapheme in a string.

The result is a tuple with the grapheme and the remainder of the string or nil in case the String reached its end.

Examples

iex> String.next_grapheme("olá")
{"o", "lá"}
Link to this function next_grapheme_size(string) View Source
next_grapheme_size(t) :: {pos_integer, t} | nil

Returns the size of the next grapheme.

The result is a tuple with the next grapheme size and the remainder of the string or nil in case the string reached its end.

Examples

iex> String.next_grapheme_size("olá")
{1, "lá"}
Link to this function normalize(string, form) View Source
normalize(t, atom) :: t

Converts all characters in string to Unicode normalization form identified by form.

Forms

The supported forms are:

  • :nfd - Normalization Form Canonical Decomposition. Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.

  • :nfc - Normalization Form Canonical Composition. Characters are decomposed and then recomposed by canonical equivalence.

Examples

iex> String.normalize("yêṩ", :nfd)
"yêṩ"

iex> String.normalize("leña", :nfc)
"leña"
Link to this function pad_leading(string, count, padding \\ [" "]) View Source
pad_leading(t, non_neg_integer, t | [t]) :: t

Returns a new string padded with a leading filler which is made of elements from the padding.

Passing a list of strings as padding will take one element of the list for every missing entry. If the list is shorter than the number of inserts, the filling will start again from the beginning of the list. Passing a string padding is equivalent to passing the list of graphemes in it. If no padding is given, it defaults to whitespace.

When count is less than or equal to the length of string, given string is returned.

Raises ArgumentError if the given padding contains non-string element.

Examples

iex> String.pad_leading("abc", 5)
"  abc"

iex> String.pad_leading("abc", 4, "12")
"1abc"

iex> String.pad_leading("abc", 6, "12")
"121abc"

iex> String.pad_leading("abc", 5, ["1", "23"])
"123abc"
Link to this function pad_trailing(string, count, padding \\ [" "]) View Source
pad_trailing(t, non_neg_integer, t | [t]) :: t

Returns a new string padded with a trailing filler which is made of elements from the padding.

Passing a list of strings as padding will take one element of the list for every missing entry. If the list is shorter than the number of inserts, the filling will start again from the beginning of the list. Passing a string padding is equivalent to passing the list of graphemes in it. If no padding is given, it defaults to whitespace.

When count is less than or equal to the length of string, given string is returned.

Raises ArgumentError if the given padding contains non-string element.

Examples

iex> String.pad_trailing("abc", 5)
"abc  "

iex> String.pad_trailing("abc", 4, "12")
"abc1"

iex> String.pad_trailing("abc", 6, "12")
"abc121"

iex> String.pad_trailing("abc", 5, ["1", "23"])
"abc123"
Link to this function printable?(string) View Source
printable?(t) :: boolean

Checks if a string contains only printable characters.

Examples

iex> String.printable?("abc")
true
Link to this function replace(subject, pattern, replacement, options \\ []) View Source
replace(t, pattern | Regex.t, t, Keyword.t) :: t

Returns a new string created by replacing occurrences of pattern in subject with replacement.

By default, it replaces all occurrences, unless the global option is set to false, where it will only replace the first one

The pattern may be a string or a regular expression.

Examples

iex> String.replace("a,b,c", ",", "-")
"a-b-c"

iex> String.replace("a,b,c", ",", "-", global: false)
"a-b,c"

When the pattern is a regular expression, one can give \N or \g{N} in the replacement string to access a specific capture in the regular expression:

iex> String.replace("a,b,c", ~r/,(.)/, ",\\1\\g{1}")
"a,bb,cc"

Notice we had to escape the backslash escape character (i.e., we used \\N instead of just \N to escape the backslash; same thing for \\g{N}). By giving \0, one can inject the whole matched pattern in the replacement string.

When the pattern is a string, a developer can use the replaced part inside the replacement by using the :insert_replaced option and specifying the position(s) inside the replacement where the string pattern will be inserted:

iex> String.replace("a,b,c", "b", "[]", insert_replaced: 1)
"a,[b],c"

iex> String.replace("a,b,c", ",", "[]", insert_replaced: 2)
"a[],b[],c"

iex> String.replace("a,b,c", ",", "[]", insert_replaced: [1, 1])
"a[,,]b[,,]c"

If any position given in the :insert_replaced option is larger than the replacement string, or is negative, an ArgumentError is raised.

Link to this function replace_leading(string, match, replacement) View Source
replace_leading(t, t, t) :: t | no_return

Replaces all leading occurrences of match by replacement of match in string.

Returns the string untouched if there are no occurrences.

If match is "", this function raises an ArgumentError exception: this happens because this function replaces all the occurrences of match at the beginning of string, and it’s impossible to replace “multiple” occurrences of "".

Examples

iex> String.replace_leading("hello world", "hello ", "")
"world"
iex> String.replace_leading("hello hello world", "hello ", "")
"world"

iex> String.replace_leading("hello world", "hello ", "ola ")
"ola world"
iex> String.replace_leading("hello hello world", "hello ", "ola ")
"ola ola world"
Link to this function replace_prefix(string, match, replacement) View Source
replace_prefix(t, t, t) :: t

Replaces prefix in string by replacement if it matches match.

Returns the string untouched if there is no match. If match is an empty string (""), replacement is just prepended to string.

Examples

iex> String.replace_prefix("world", "hello ", "")
"world"
iex> String.replace_prefix("hello world", "hello ", "")
"world"
iex> String.replace_prefix("hello hello world", "hello ", "")
"hello world"

iex> String.replace_prefix("world", "hello ", "ola ")
"world"
iex> String.replace_prefix("hello world", "hello ", "ola ")
"ola world"
iex> String.replace_prefix("hello hello world", "hello ", "ola ")
"ola hello world"

iex> String.replace_prefix("world", "", "hello ")
"hello world"
Link to this function replace_suffix(string, match, replacement) View Source
replace_suffix(t, t, t) :: t

Replaces suffix in string by replacement if it matches match.

Returns the string untouched if there is no match. If match is an empty string (""), replacement is just appended to string.

Examples

iex> String.replace_suffix("hello", " world", "")
"hello"
iex> String.replace_suffix("hello world", " world", "")
"hello"
iex> String.replace_suffix("hello world world", " world", "")
"hello world"

iex> String.replace_suffix("hello", " world", " mundo")
"hello"
iex> String.replace_suffix("hello world", " world", " mundo")
"hello mundo"
iex> String.replace_suffix("hello world world", " world", " mundo")
"hello world mundo"

iex> String.replace_suffix("hello", "", " world")
"hello world"
Link to this function replace_trailing(string, match, replacement) View Source
replace_trailing(t, t, t) :: t | no_return

Replaces all trailing occurrences of match by replacement in string.

Returns the string untouched if there are no occurrences.

If match is "", this function raises an ArgumentError exception: this happens because this function replaces all the occurrences of match at the end of string, and it’s impossible to replace “multiple” occurrences of "".

Examples

iex> String.replace_trailing("hello world", " world", "")
"hello"
iex> String.replace_trailing("hello world world", " world", "")
"hello"

iex> String.replace_trailing("hello world", " world", " mundo")
"hello mundo"
iex> String.replace_trailing("hello world world", " world", " mundo")
"hello mundo mundo"
Link to this function reverse(string) View Source
reverse(t) :: t

Reverses the graphemes in given string.

Examples

iex> String.reverse("abcd")
"dcba"

iex> String.reverse("hello world")
"dlrow olleh"

iex> String.reverse("hello ∂og")
"go∂ olleh"

Keep in mind reversing the same string twice does not necessarily yield the original string:

iex> "̀e"
"̀e"
iex> String.reverse("̀e")
"è"
iex> String.reverse String.reverse("̀e")
"è"

In the first example the accent is before the vowel, so it is considered two graphemes. However, when you reverse it once, you have the vowel followed by the accent, which becomes one grapheme. Reversing it again will keep it as one single grapheme.

Link to this function slice(string, range) View Source
slice(t, Range.t) :: t

Returns a substring from the offset given by the start of the range to the offset given by the end of the range.

If the start of the range is not a valid offset for the given string or if the range is in reverse order, returns "".

If the start or end of the range is negative, the whole string is traversed first in order to convert the negative indices into positive ones.

Remember this function works with Unicode graphemes and considers the slices to represent grapheme offsets. If you want to split on raw bytes, check Kernel.binary_part/3 instead.

Examples

iex> String.slice("elixir", 1..3)
"lix"

iex> String.slice("elixir", 1..10)
"lixir"

iex> String.slice("elixir", 10..3)
""

iex> String.slice("elixir", -4..-1)
"ixir"

iex> String.slice("elixir", 2..-1)
"ixir"

iex> String.slice("elixir", -4..6)
"ixir"

iex> String.slice("elixir", -1..-4)
""

iex> String.slice("elixir", -10..-7)
""

iex> String.slice("a", 0..1500)
"a"

iex> String.slice("a", 1..1500)
""
Link to this function slice(string, start, len) View Source
slice(t, integer, integer) :: grapheme

Returns a substring starting at the offset start, and of length len.

If the offset is greater than string length, then it returns "".

Remember this function works with Unicode graphemes and considers the slices to represent grapheme offsets. If you want to split on raw bytes, check Kernel.binary_part/3 instead.

Examples

iex> String.slice("elixir", 1, 3)
"lix"

iex> String.slice("elixir", 1, 10)
"lixir"

iex> String.slice("elixir", 10, 3)
""

iex> String.slice("elixir", -4, 4)
"ixir"

iex> String.slice("elixir", -10, 3)
""

iex> String.slice("a", 0, 1500)
"a"

iex> String.slice("a", 1, 1500)
""

iex> String.slice("a", 2, 1500)
""
Link to this function split(binary) View Source
split(t) :: [t]

Divides a string into substrings at each Unicode whitespace occurrence with leading and trailing whitespace ignored. Groups of whitespace are treated as a single occurrence. Divisions do not occur on non-breaking whitespace.

Examples

iex> String.split("foo bar")
["foo", "bar"]

iex> String.split("foo" <> <<194, 133>> <> "bar")
["foo", "bar"]

iex> String.split(" foo   bar ")
["foo", "bar"]

iex> String.split("no\u00a0break")
["no\u00a0break"]
Link to this function split(string, pattern, options \\ []) View Source
split(t, pattern | Regex.t, Keyword.t) :: [t]

Divides a string into substrings based on a pattern.

Returns a list of these substrings. The pattern can be a string, a list of strings or a regular expression.

The string is split into as many parts as possible by default, but can be controlled via the parts: pos_integer option. If you pass parts: :infinity, it will return all possible parts (:infinity is the default).

Empty strings are only removed from the result if the trim option is set to true (default is false).

When the pattern used is a regular expression, the string is split using Regex.split/3. In that case this function accepts additional options which are documented in Regex.split/3.

Examples

Splitting with a string pattern:

iex> String.split("a,b,c", ",")
["a", "b", "c"]

iex> String.split("a,b,c", ",", parts: 2)
["a", "b,c"]

iex> String.split(" a b c ", " ", trim: true)
["a", "b", "c"]

A list of patterns:

iex> String.split("1,2 3,4", [" ", ","])
["1", "2", "3", "4"]

A regular expression:

iex> String.split("a,b,c", ~r{,})
["a", "b", "c"]

iex> String.split("a,b,c", ~r{,}, parts: 2)
["a", "b,c"]

iex> String.split(" a b c ", ~r{\s}, trim: true)
["a", "b", "c"]

iex> String.split("abc", ~r{b}, include_captures: true)
["a", "b", "c"]

Splitting on empty patterns returns graphemes:

iex> String.split("abc", ~r{})
["a", "b", "c", ""]

iex> String.split("abc", "")
["a", "b", "c", ""]

iex> String.split("abc", "", trim: true)
["a", "b", "c"]

iex> String.split("abc", "", parts: 2)
["a", "bc"]

A precompiled pattern can also be given:

iex> pattern = :binary.compile_pattern([" ", ","])
iex> String.split("1,2 3,4", pattern)
["1", "2", "3", "4"]
Link to this function split_at(string, position) View Source
split_at(t, integer) :: {t, t}

Splits a string into two at the specified offset. When the offset given is negative, location is counted from the end of the string.

The offset is capped to the length of the string. Returns a tuple with two elements.

Note: keep in mind this function splits on graphemes and for such it has to linearly traverse the string. If you want to split a string or a binary based on the number of bytes, use Kernel.binary_part/3 instead.

Examples

iex> String.split_at "sweetelixir", 5
{"sweet", "elixir"}

iex> String.split_at "sweetelixir", -6
{"sweet", "elixir"}

iex> String.split_at "abc", 0
{"", "abc"}

iex> String.split_at "abc", 1000
{"abc", ""}

iex> String.split_at "abc", -1000
{"", "abc"}
Link to this function splitter(string, pattern, options \\ []) View Source
splitter(t, pattern, Keyword.t) :: Enumerable.t

Returns an enumerable that splits a string on demand.

This is in contrast to split/3 which splits all the string upfront.

Note splitter does not support regular expressions (as it is often more efficient to have the regular expressions traverse the string at once than in multiple passes).

Options

  • :trim - when true, does not emit empty patterns

Examples

iex> String.splitter("1,2 3,4 5,6 7,8,...,99999", [" ", ","]) |> Enum.take(4)
["1", "2", "3", "4"]

iex> String.splitter("abcd", "") |> Enum.take(10)
["a", "b", "c", "d", ""]

iex> String.splitter("abcd", "", trim: true) |> Enum.take(10)
["a", "b", "c", "d"]
Link to this function starts_with?(string, prefix) View Source
starts_with?(t, t | [t]) :: boolean

Returns true if string starts with any of the prefixes given.

prefix can be either a single prefix or a list of prefixes.

Examples

iex> String.starts_with? "elixir", "eli"
true
iex> String.starts_with? "elixir", ["erlang", "elixir"]
true
iex> String.starts_with? "elixir", ["erlang", "ruby"]
false

An empty string will always match:

iex> String.starts_with? "elixir", ""
true
iex> String.starts_with? "elixir", ["", "other"]
true
Link to this function to_atom(string) View Source
to_atom(String.t) :: atom

Converts a string to an atom.

Currently Elixir does not support the conversion of strings that contain Unicode codepoints greater than 0xFF.

Inlined by the compiler.

Examples

iex> String.to_atom("my_atom")
:my_atom
Link to this function to_charlist(string) View Source
to_charlist(t) :: charlist

Converts a string into a charlist.

Specifically, this functions takes a UTF-8 encoded binary and returns a list of its integer codepoints. It is similar to codepoints/1 except that the latter returns a list of codepoints as strings.

In case you need to work with bytes, take a look at the :binary module.

Examples

iex> String.to_charlist("æß")
'æß'
Link to this function to_existing_atom(string) View Source
to_existing_atom(String.t) :: atom

Converts a string to an existing atom.

Currently Elixir does not support the conversion of strings that contain Unicode codepoints greater than 0xFF.

Inlined by the compiler.

Examples

iex> _ = :my_atom
iex> String.to_existing_atom("my_atom")
:my_atom

iex> String.to_existing_atom("this_atom_will_never_exist")
** (ArgumentError) argument error
Link to this function to_float(string) View Source
to_float(String.t) :: float

Returns a float whose text representation is string.

string must be the string representation of a float. If a string representation of an integer wants to be used, then Float.parse/1 should be used instead, otherwise an argument error will be raised.

Inlined by the compiler.

Examples

iex> String.to_float("2.2017764e+0")
2.2017764

iex> String.to_float("3.0")
3.0
Link to this function to_integer(string) View Source
to_integer(String.t) :: integer

Returns an integer whose text representation is string.

Inlined by the compiler.

Examples

iex> String.to_integer("123")
123
Link to this function to_integer(string, base) View Source
to_integer(String.t, 2..36) :: integer

Returns an integer whose text representation is string in base base.

Inlined by the compiler.

Examples

iex> String.to_integer("3FF", 16)
1023

Returns a string where all leading and trailing Unicode whitespaces have been removed.

Examples

iex> String.trim("\n  abc\n  ")
"abc"
Link to this function trim(string, to_trim) View Source
trim(t, t) :: t

Returns a string where all leading and trailing to_trims have been removed.

Examples

iex> String.trim("a  abc  a", "a")
"  abc  "
Link to this function trim_leading(string) View Source
trim_leading(t) :: t

Returns a string where all leading Unicode whitespaces have been removed.

Examples

iex> String.trim_leading("\n  abc   ")
"abc   "
Link to this function trim_leading(string, to_trim) View Source
trim_leading(t, t) :: t

Returns a string where all leading to_trims have been removed.

Examples

iex> String.trim_leading("__ abc _", "_")
" abc _"

iex> String.trim_leading("1 abc", "11")
"1 abc"
Link to this function trim_trailing(string) View Source
trim_trailing(t) :: t

Returns a string where all trailing Unicode whitespaces has been removed.

Examples

iex> String.trim_trailing("   abc\n  ")
"   abc"
Link to this function trim_trailing(string, to_trim) View Source
trim_trailing(t, t) :: t

Returns a string where all trailing to_trims have been removed.

Examples

iex> String.trim_trailing("_ abc __", "_")
"_ abc "

iex> String.trim_trailing("abc 1", "11")
"abc 1"
Link to this function upcase(binary) View Source
upcase(t) :: t

Converts all characters in the given string to uppercase.

Examples

iex> String.upcase("abcd")
"ABCD"

iex> String.upcase("ab 123 xpto")
"AB 123 XPTO"

iex> String.upcase("olá")
"OLÁ"
Link to this function valid?(string) View Source
valid?(t) :: boolean

Checks whether string contains only valid characters.

Examples

iex> String.valid?("a")
true

iex> String.valid?("ø")
true

iex> String.valid?(<<0xFFFF :: 16>>)
false

iex> String.valid?("asd" <> <<0xFFFF :: 16>>)
false