Text.Vocabulary behaviour (Text v0.2.0) View Source

A vocabulary is the encoded form of a training text that is used to support language matching.

A vocabulary is mapping of an n-gram to its rank and probability.

Link to this section Summary

Functions

Calculate the n-grams for a given text

Returns the ngrams for a given text and range representing a range of n-grams

Get the vocabulary entry for a given language and vocabulary

Loads the given vocabulary.

Returns the top n by rank for a list of entries for a given languages vocabulary

Rerturns a list of the top n vocabulary entries by rank for a given language and vocabulary.

Link to this section Types

Link to this section Callbacks

Specs

calculate_ngrams(String.t()) :: map()

Specs

filename() :: String.t()

Specs

get_vocabulary(String.t()) :: map()

Specs

load_vocabulary!() :: map()

Specs

ngram_range() :: Range.t()

Link to this section Functions

Link to this function

calculate_corpus_ngrams(corpus, language, range)

View Source
Link to this function

calculate_ngrams(content, range, top_n \\ 300)

View Source

Calculate the n-grams for a given text

A range of n-grams is calculated from range and the top n ranked n-grams from the text are returned

Link to this function

get_ngrams(content, arg)

View Source

Returns the ngrams for a given text and range representing a range of n-grams

Link to this function

get_vocabulary(vocabulary, language)

View Source

Get the vocabulary entry for a given language and vocabulary

Link to this function

known_vocabularies(corpus)

View Source
Link to this function

load_vocabulary!(vocabulary)

View Source

Loads the given vocabulary.

Vocabularies are placed in :persistent_store since this reduces memory copies and has efficient multi-process access.

Link to this function

top_n(language_vocabulary, n \\ 300)

View Source

Returns the top n by rank for a list of entries for a given languages vocabulary

Link to this function

top_n(vocabulary, language, n)

View Source

Rerturns a list of the top n vocabulary entries by rank for a given language and vocabulary.

This function is primarily intended for debugging support.