View Source Unicode.String (Unicode String v1.1.0)

This module provides functions that implement somee of the Unicode stanards:

  • The Unicode Case Folding algorithm to provide case-independent equality checking irrespective of language or script.

  • The Unicode Segmentation algorithm to detect, break or splut strings into grapheme clusters, works and sentences.

Link to this section Summary

Functions

Returns match data indicating if the requested break is applicable at the point between the two string segments represented by {string_before, string_after}.

Returns a boolean indicating if the requested break is applicable at the point between the two string segments represented by {string_before, string_after}.

Compares two strings in a case insensitive manner.

Returns next segment in a string.

Splits a string according to the specified break type.

Returns an enumerable that splits a string on demand.

Link to this section Types

@type break_match() ::
  {break_or_no_break(), {String.t(), {String.t(), String.t()}}}
  | {break_or_no_break(), {String.t(), String.t()}}
@type break_or_no_break() :: :break | :no_break
@type break_type() :: :grapheme | :word | :line | :sentence
@type error_return() :: {:error, String.t()}
@type options() :: [locale: String.t(), break: break_type(), suppressions: boolean()]
@type split_options() :: [
  locale: String.t(),
  break: break_type(),
  suppressions: boolean(),
  trim: boolean()
]
@type string_interval() :: {String.t(), String.t()}

Link to this section Functions

Link to this function

break(arg, options \\ [])

View Source
@spec break(string_interval(), options()) :: break_match() | error_return()

Returns match data indicating if the requested break is applicable at the point between the two string segments represented by {string_before, string_after}.

arguments

Arguments

  • string is any String.t.

  • options is a keyword list of options.

returns

Returns

A tuple indicating if a break would be applicable at this point between string_before and string_after.

  • {:break, {string_before, {matched_string, remaining_string}}} or

  • {:no_break, {string_before, {matched_string, remaining_string}}} or

  • {:error, reason}

options

Options

  • :locale is any locale returned by Unicode.String.Segment.known_locales/0. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.

  • :break is the type of break. It is one of :grapheme, :word, :line or :sentence. The default is :word.

  • :suppressions is a boolean which, if true, will suppress breaks for common abbreviations defined for the locale. The default is true.

examples

Examples

iex> Unicode.String.break {"This is ", "some words"}
{:break, {"This is ", {"s", "ome words"}}}

iex> Unicode.String.break {"This is ", "some words"}, break: :sentence
{:no_break, {"This is ", {"s", "ome words"}}}

iex> Unicode.String.break {"This is one. ", "This is some words."}, break: :sentence
{:break, {"This is one. ", {"T", "his is some words."}}}
Link to this function

break?(arg, options \\ [])

View Source
@spec break?(string_interval(), options()) :: boolean()

Returns a boolean indicating if the requested break is applicable at the point between the two string segments represented by {string_before, string_after}.

arguments

Arguments

  • string is any String.t.

  • options is a keyword list of options.

returns

Returns

  • true or false or

  • raises an exception if there is an error

options

Options

  • :locale is any locale returned by Unicode.String.Segment.known_locales/0. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.

  • :break is the type of break. It is one of :grapheme, :word, :line or :sentence. The default is :word.

  • :suppressions is a boolean which, if true, will suppress breaks for common abbreviations defined for the locale. The default is true.

examples

Examples

iex> Unicode.String.break? {"This is ", "some words"}
true

iex> Unicode.String.break? {"This is ", "some words"}, break: :sentence
false

iex> Unicode.String.break? {"This is one. ", "This is some words."}, break: :sentence
true
Link to this function

equals_ignoring_case?(string_a, string_b, type \\ :full)

View Source
@spec equals_ignoring_case?(String.t(), String.t(), atom()) :: boolean()

Compares two strings in a case insensitive manner.

Case folding is applied to the two string arguments which are then compared with the == operator.

arguments

Arguments

  • string_a and string_b are two strings to be compared

  • type is the case folding type to be applied. The alternatives are :full, :simple and :turkic. The default is :full.

returns

Returns

  • true or false

notes

Notes

  • This function applies the Unicode Case Folding algorithm

  • The algorithm does not apply any treatment to diacritical marks hence "compare strings without accents" is not part of this function.

examples

Examples

iex> Unicode.String.equals_ignoring_case? "ABC", "abc"
true

iex> Unicode.String.equals_ignoring_case? "beißen", "beissen"
true

iex> Unicode.String.equals_ignoring_case? "grüßen", "grussen"
false

See Unicode.String.Case.Folding.fold/1.

See Unicode.String.Case.Folding.fold/2.

Link to this function

next(string, options \\ [])

View Source
@spec next(String.t(), split_options()) :: String.t() | nil | error_return()

Returns next segment in a string.

arguments

Arguments

  • string is any String.t.

  • options is a keyword list of options.

returns

Returns

A tuple with the segment and the remainder of the string or "" in case the String reached its end.

  • {next_string, rest_of_the_string} or

  • {:error, reason}

options

Options

  • :locale is any locale returned by Unicode.String.Segment.known_locales/0. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.

  • :break is the type of break. It is one of :grapheme, :word, :line or :sentence. The default is :word.

  • :suppressions is a boolean which, if true, will suppress breaks for common abbreviations defined for the locale. The default is true.

examples

Examples

iex> Unicode.String.next "This is a sentence. And another.", break: :word
{"This", " is a sentence. And another."}

iex> Unicode.String.next "This is a sentence. And another.", break: :sentence
{"This is a sentence. ", "And another."}
Link to this function

split(string, options \\ [])

View Source
@spec split(String.t(), split_options()) :: [String.t(), ...] | error_return()

Splits a string according to the specified break type.

arguments

Arguments

  • string is any String.t.

  • options is a keyword list of options.

returns

Returns

  • A list of strings after applying the specified break rules or

  • {:error, reason}

options

Options

  • :locale is any locale returned by Unicode.String.Segment.known_locales/0. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.

  • :break is the type of break. It is one of :grapheme, :word, :line or :sentence. The default is :word.

  • :suppressions is a boolean which, if true, will suppress breaks for common abbreviations defined for the locale. The default is true.

  • :trim is a boolean indicating if segments the are comprised of only white space are to be excluded from the returned list. The default is false.

examples

Examples

iex> Unicode.String.split "This is a sentence. And another.", break: :word
["This", " ", "is", " ", "a", " ", "sentence", ".", " ", "And", " ", "another", "."]

iex> Unicode.String.split "This is a sentence. And another.", break: :word, trim: true
["This", "is", "a", "sentence", ".", "And", "another", "."]

iex> Unicode.String.split "This is a sentence. And another.", break: :sentence
["This is a sentence. ", "And another."]
Link to this function

splitter(string, options)

View Source
@spec splitter(String.t(), split_options()) :: function() | error_return()

Returns an enumerable that splits a string on demand.

arguments

Arguments

  • string is any String.t.

  • options is a keyword list of options.

returns

Returns

  • A function that implements the enumerable protocol or

  • {:error, reason}

options

Options

  • :locale is any locale returned by Unicode.String.Segment.known_locales/0. The default is "root" which corresponds to the break rules defined by the Unicode Segmentation rules.

  • :break is the type of break. It is one of :grapheme, :word, :line or :sentence. The default is :word.

  • :suppressions is a boolean which, if true, will suppress breaks for common abbreviations defined for the locale. The default is true.

  • :trim is a boolean indicating if segments the are comprised of only white space are to be excluded from the returned list. The default is false.

examples

Examples

iex> enum = Unicode.String.splitter "This is a sentence. And another.", break: :word, trim: true
iex> Enum.take enum, 3
["This", "is", "a"]