essence v0.1.0 Essence.Tokenizer
The Essence.Tokenizer module exposes useful functions for transforming
text into tokens and for dealing with tokens.
Summary
Functions
Splits a given String into tokens on punctuation, and include the punctuation as a token. This method supports Unicode text
Splits a given text into tokens on punctuation, but omits the punctuation tokens.
This method supports Unicode text
Splits a given String into tokens. A token is a sequence of characters to
be treated as a group. The tokenize method will split on whitespace and
punctuation, treating words and punctutation as tokens, and removing whitespace
Tokenizes a given stream, and returns a list of tokens. Commonly the given stream is a :line stream
Functions
Splits a given String into tokens on punctuation, and include the punctuation as a token. This method supports Unicode text.
Splits a given text into tokens on punctuation, but omits the punctuation tokens.
This method supports Unicode text.
Splits a given String into tokens. A token is a sequence of characters to
be treated as a group. The tokenize method will split on whitespace and
punctuation, treating words and punctutation as tokens, and removing whitespace.
Specs
tokenize_s(File.Stream.t) :: List.t
Tokenizes a given stream, and returns a list of tokens. Commonly the given stream is a :line stream.