Essence.Tokenizer (essence v0.3.0)

The Essence.Tokenizer module exposes useful functions for transforming text into tokens and for dealing with tokens.

Link to this section Summary

Functions

split_with_punctuation(text)

Splits a given String into tokens on punctuation, and include the punctuation as a token. This method supports Unicode text.

split_without_punctuation(text)

Splits a given text into tokens on punctuation, but omits the punctuation tokens. This method supports Unicode text.

tokenize(text)

Splits a given String into tokens. A token is a sequence of characters to be treated as a group. The tokenize method will split on whitespace and punctuation, treating words and punctutation as tokens, and removing whitespace.

tokenize_s(stream)

Tokenizes a given stream, and returns a list of tokens. Commonly the given stream is a :line stream.

Link to this section Functions

split_with_punctuation(text)

Specs

split_with_punctuation(String.t()) :: List.t()

Splits a given String into tokens on punctuation, and include the punctuation as a token. This method supports Unicode text.

split_without_punctuation(text)

Specs

split_without_punctuation(String.t()) :: List.t()

Splits a given text into tokens on punctuation, but omits the punctuation tokens. This method supports Unicode text.

tokenize(text)

Specs

tokenize(String.t()) :: List.t()

tokenize_s(stream)

Specs

tokenize_s(File.Stream.t()) :: List.t()

Tokenizes a given stream, and returns a list of tokens. Commonly the given stream is a :line stream.