View Source Haystack.Tokenizer (Haystack v0.1.0)
A module for tokenizing values.
This module provides utilities for tokenizing values. The default tokenizer removes anything but alphanumeric characters and extracts the positions of words using a start offset and length.
There's also a :full
tokenizer that can be used when the full value should
be treated as a single token. For example, a serial code.
Link to this section Summary
Link to this section Functions
Return the seperator
examples
Examples
iex> Tokenizer.separator(:default)
~r/([[:alnum:]]+)/
iex> Tokenizer.separator(:full)
~r/(.+)/
@spec tokenize(term()) :: [Haystack.Tokenizer.Token.t()]
Tokenize a value.
examples
Examples
iex> tokens = Tokenizer.tokenize("Needle in a Haystack")
iex> Enum.map(tokens, & &1.v)
~w{needle in a haystack}
@spec tokenize(term(), Regex.t()) :: [Haystack.Tokenizer.Token.t()]
Tokenize a value with a given separator.
examples
Examples
iex> tokens = Tokenizer.tokenize("Needle in a Haystack", Tokenizer.separator(:default))
iex> Enum.map(tokens, & &1.v)
~w{needle in a haystack}