Text.Language.Classifier.NaiveBayesian (Text v0.2.0) View Source

A language detection model that uses n-gram frequencies.

It multiplies the frequencies of detected n-grams. Since the frequencies are stored as log(frequency) the addition of the log(frequency) entries is the same as frequency * frequency.

Link to this section Summary

Functions

Return the {language score} tuples in the correct order for this classifier.

Sums the frequencies of each n-gram

Link to this section Functions

Return the {language score} tuples in the correct order for this classifier.

Link to this function

score_one_language(language, text_ngrams, vocabulary)

View Source

Sums the frequencies of each n-gram

A strong negative weighting is applied if the n-gram is not contained in the given vocabulary.