View Source Haystack.Tokenizer (Haystack v0.1.0)

A module for tokenizing values.

This module provides utilities for tokenizing values. The default tokenizer removes anything but alphanumeric characters and extracts the positions of words using a start offset and length.

There's also a :full tokenizer that can be used when the full value should be treated as a single token. For example, a serial code.

Link to this section Summary

Functions

Return the seperator

Tokenize a value.

Tokenize a value with a given separator.

Link to this section Functions

@spec separator(atom()) :: Regex.t()

Return the seperator

examples

Examples

iex> Tokenizer.separator(:default)
~r/([[:alnum:]]+)/

iex> Tokenizer.separator(:full)
~r/(.+)/
@spec tokenize(term()) :: [Haystack.Tokenizer.Token.t()]

Tokenize a value.

examples

Examples

iex> tokens = Tokenizer.tokenize("Needle in a Haystack")
iex> Enum.map(tokens, & &1.v)
~w{needle in a haystack}
@spec tokenize(term(), Regex.t()) :: [Haystack.Tokenizer.Token.t()]

Tokenize a value with a given separator.

examples

Examples

iex> tokens = Tokenizer.tokenize("Needle in a Haystack", Tokenizer.separator(:default))
iex> Enum.map(tokens, & &1.v)
~w{needle in a haystack}