ExNlp.Tokenizer.Keyword (ex_nlp v0.1.0)
View SourceKeyword tokenizer - treats entire input as a single token.
Useful for exact match searches. Similar to keyword tokenizers in NLTK.
Examples
iex> ExNlp.Tokenizer.Keyword.tokenize("Hello world")
[%ExNlp.Token{text: "Hello world", position: 0, start_offset: 0, end_offset: 11}]
iex> ExNlp.Tokenizer.Keyword.span_tokenize("Hello world")
[{0, 11}]
Summary
Functions
Returns spans (start_offset, end_offset) for tokens.
Tokenizes text by treating the entire input as a single token.
Tokenizes text and returns just the text strings (no Token structs).
Types
@type span() :: ExNlp.Tokenizer.Base.span()
@type token() :: ExNlp.Tokenizer.Base.token()
Functions
Returns spans (start_offset, end_offset) for tokens.
Similar to NLTK's span_tokenize method.
Tokenizes text by treating the entire input as a single token.
Tokenizes text and returns just the text strings (no Token structs).
Returns a single-element list with the entire input text.
Examples
iex> ExNlp.Tokenizer.Keyword.tokenize_text("Hello world")
["Hello world"]