API Reference Text v0.2.0

Modules

Functions for basic text processing and analysis.

Defines the behaviour for a language corpus with convenience functions to simplifying the creation of corpus vocabularies.

Pluralisation for the English language based on the paper An Algorithmic Approach to English Pluralization.

A module to support natural language detection.

A behaviour definition module for language classifiers.

A language detection model that uses cummulative frequencies

A language detection model that uses n-gram frequencies.

A language detection model that uses a rank order coefficient to determine language similarity.

Compute ngrams and their counts from a given UTF8 string.

A vocabulary is the encoded form of a training text that is used to support language matching.

Implements word counting for lists, streams and flows.