Restore case to text that has been lowercased.
Useful as a preprocessing step when downstream tooling — POS tagging, NER, sentiment classifiers — was trained on properly cased text but the input has been lowercased earlier in the pipeline (a common artefact of search and chat systems).
The module uses a small curated lexicon of words with non-default
casing (proper nouns, brand names, common acronyms) plus a
sentence-boundary-aware capitalization pass on the first word of
each sentence. The lexicon can be extended at runtime via
add_terms/1 for project-specific vocabulary.
This is intentionally minimal: a frequency-based truecaser trained on a cased corpus would be more accurate, but harder to bundle and unnecessary for the common cleanup case. The exception list below covers the words that show up most often in mixed-domain text.
Summary
Functions
Adds project-specific casings to the runtime lexicon.
Useful for product names, internal acronyms, and proper nouns that aren't covered by the bundled list.
Arguments
termsis a map or keyword list oflowercase => Canonicalpairs.
Returns
:okon success.
Restores casing in lowercased text.
Arguments
textis the (possibly lowercased) input string.
Options
:capitalize_sentences— iftrue(the default), capitalizes the first letter of each sentence after applying the lexicon.
Returns
- The text with proper nouns, acronyms, and brand names re-capitalized, sentence starts capitalized when requested, and other words left as-is.
Examples
iex> Text.Truecase.truecase("i visited london in july.")
"I visited London in July."
iex> Text.Truecase.truecase("the api returned json.")
"The API returned JSON."
iex> Text.Truecase.truecase("apple released the iphone.")
"Apple released the iPhone."