Text.Truecase (Text v0.5.0)

Copy Markdown View Source

Restore case to text that has been lowercased.

Useful as a preprocessing step when downstream tooling — POS tagging, NER, sentiment classifiers — was trained on properly cased text but the input has been lowercased earlier in the pipeline (a common artefact of search and chat systems).

The module uses a small curated lexicon of words with non-default casing (proper nouns, brand names, common acronyms) plus a sentence-boundary-aware capitalization pass on the first word of each sentence. The lexicon can be extended at runtime via add_terms/1 for project-specific vocabulary.

This is intentionally minimal: a frequency-based truecaser trained on a cased corpus would be more accurate, but harder to bundle and unnecessary for the common cleanup case. The exception list below covers the words that show up most often in mixed-domain text.

Summary

Functions

Adds project-specific casings to the runtime lexicon.

Restores casing in lowercased text.

Functions

add_terms(terms)

@spec add_terms(map() | keyword()) :: :ok

Adds project-specific casings to the runtime lexicon.

Useful for product names, internal acronyms, and proper nouns that aren't covered by the bundled list.

Arguments

  • terms is a map or keyword list of lowercase => Canonical pairs.

Returns

  • :ok on success.

truecase(text, options \\ [])

@spec truecase(
  String.t(),
  keyword()
) :: String.t()

Restores casing in lowercased text.

Arguments

  • text is the (possibly lowercased) input string.

Options

  • :capitalize_sentences — if true (the default), capitalizes the first letter of each sentence after applying the lexicon.

Returns

  • The text with proper nouns, acronyms, and brand names re-capitalized, sentence starts capitalized when requested, and other words left as-is.

Examples

iex> Text.Truecase.truecase("i visited london in july.")
"I visited London in July."

iex> Text.Truecase.truecase("the api returned json.")
"The API returned JSON."

iex> Text.Truecase.truecase("apple released the iphone.")
"Apple released the iPhone."