Essence.Tokenizer (essence v0.3.0)
The Essence.Tokenizer
module exposes useful functions for transforming
text into tokens
and for dealing with tokens
.
Link to this section Summary
Functions
Splits a given String into tokens on punctuation, and include the punctuation as a token. This method supports Unicode text.
Splits a given text
into tokens on punctuation, but omits the punctuation tokens.
This method supports Unicode text.
Splits a given String into tokens. A token is a sequence of characters to
be treated as a group. The tokenize
method will split on whitespace and
punctuation, treating words and punctutation as tokens, and removing whitespace.
Tokenizes a given stream
, and returns a list of tokens
. Commonly the given stream
is a :line
stream.
Link to this section Functions
split_with_punctuation(text)
Specs
split_with_punctuation(String.t()) :: List.t()
Splits a given String into tokens on punctuation, and include the punctuation as a token. This method supports Unicode text.
split_without_punctuation(text)
Specs
split_without_punctuation(String.t()) :: List.t()
Splits a given text
into tokens on punctuation, but omits the punctuation tokens.
This method supports Unicode text.
tokenize(text)
Specs
tokenize(String.t()) :: List.t()
Splits a given String into tokens. A token is a sequence of characters to
be treated as a group. The tokenize
method will split on whitespace and
punctuation, treating words and punctutation as tokens, and removing whitespace.
tokenize_s(stream)
Specs
tokenize_s(File.Stream.t()) :: List.t()
Tokenizes a given stream
, and returns a list of tokens
. Commonly the given stream
is a :line
stream.