Cldr Messages

Build Status Hex.pm Hex.pm Hex.pm

Introduction and Getting Started

Implements the ICU Message Format for Elixir.

In any application that addresses audiences from different cultures, the need arises to support the presentation of user interfaces, messages, alerts and other content in the appropriate language for a user.

For nearly 30 years the go-to solution for this requirement in many computer langauges is gettext. There is a full-featured implementation for Elixir that is installed by default with Phoenix with over 10,000,000 downloads.

Given the maturity and widespread adoption of Gettext, why implement another format? Leveraging the content from the Unicode CLDR project we can address some of the shortcomings of Gettext. A good description of motivations and differences can be found in this presentation by Mark Davis from Google in 2012.

Two specific shortcomings that the ICU message format addresses:

Grammatical Gender

Many languages inflect in gender specific way. One example in French might be:

 # You are the only participant for a male and female
 Vous êtes the seul participant
 Vous êtes la seule participante

 # Married for a male and a female
 Marié
 Mariée

In Gettext this requires individual messages and conditional code in the application in order to present the correct message to an audience. This is compounded by the fact that some languages have more than two grammatical genders (most have been two and four but but some are attested with up to 20.

The ICU message format provides a mechanism (the choice format) that helps translator and UX designers implement a single message to easily encapsulate messages conditional on grammatical gender (or any other selector)

Standardised plural rules

Although Gettext supports pluralisation for messages through the Gettext.Plural module in Elixir and the Gettext functions like Gettext.ngettext/4, the plural rules for a language have to be implemented for each message. Give the wide differences in how plural forms are structured in different languages this can be a material challenge. For example:

  • English has two plural forms: singular and plural
  • French applies the singular rule to two values and a plural form to larger groupings
  • Japanese does not differentiate
  • Russian has 4 categories
  • Arabic has 6 categories

Since CLDR has a strong set of pluralization rules defined for ~500 locales, each of which is supported by ex_cldr for Elixir, the ICU message format can reuse these pluralization rules in a simple and consisten fashion using the [plural format]{#Plural_Format}

Message format overview

ICU message formats are Elixir strings with embedded formatting directives inserted between {}. Some examples:

 # Insert the binding `name` into the string
 "My name is {name}"

 # Insert a date, formatting in a localized `short` format plus a localized plural form
 # for the binding `num_photos`
 "On {taken_date, date, short} {name} took {num_photos, plural,
   =0 {no photos.}
   =1 {one photo.}
   other {# photos.}}"

 # Insert localized messages based upon the gender of the audience with
 # appropriate localized plural forms
 "{gender_of_host, select,
   female {
     {num_guests, plural, offset: 1
       =0 {{host} does not give a party.}
       =1 {{host} invites {guest} to her party.}
       =2 {{host} invites {guest} and one other person to her party.}
       other {{host} invites {guest} and # other people to her party.}}}
   male {
     {num_guests, plural, offset: 1
       =0 {{host} does not give a party.}
       =1 {{host} invites {guest} to his party.}
       =2 {{host} invites {guest} and one other person to his party.}
       other {{host} invites {guest} and # other people to his party.}}}
   other {
     {num_guests, plural, offset: 1
       =0 {{host} does not give a party.}
       =1 {{host} invites {guest} to their party.}
       =2 {{host} invites {guest} and one other person to their party.}
       other {{host} invites {guest} and # other people to their party.}}}
 }"

Message formatting

Using the above messages as examples:

iex> Cldr.Message.format! "My name is {name}", name: "Kip"
"My name is Kip"

iex> Cldr.Message.format!  "On {taken_date, date, short} {name} took {num_photos, plural,
       =0 {no photos.}
       =1 {one photo.}
       other {# photos.}}", taken_date: Date.utc_today, name: "Kip", num_photos: 10
"On 8/26/19 Kip took 10 photos."

As of ex_cldr_messages version 0.3.0 a macro form is introduced which parses the message at compile time in order to optimize performance at run time. To use the macro, a backend module must be imported (or required) into a module that uses formatting. For example:

defmodule SomeModule do
  # Import a <backend>.Cldr.Message module
  import MyApp.Cldr.Message

  def my_function do
    format("this is a string with a param {param}", param: 3)
  end
end

Installation

def deps do
  [
    {:ex_cldr_messages, "~> 0.5.0"}
  ]
end

Documentation is at https://hexdocs.pm/cldr_messages.

To Do

For the initial release. This is a simple function interface to message formatting. Before 1.0 it needs to also have a means like gettext of managing messages in multiple different locales for the same message content.

  • [ ] Documentation

  • [X] Ignore whitespace between nested complex arguments at the top level. Example: {:select, {:named_arg, "gender_of_host"}, %{

    "female" => [
      {:literal, "\n    "},  <---- Ignore this when its whitespace only
      {:plural, {:named_arg, "num_guests"},
  • [X] Support decimal numbers for selectors

  • [X] Won't do. Support spellout format for Money.t types ? (Maybe can't because of floating point RBNF rule limitations)

  • [X] Check for all occurences in README's for Cldr.get_current_locale/0 and change it to Cldr.get_locale/0

  • [X] Implement explicit =0 argument selection for plurals

  • [X] Add remaining formatters for dates, times, datetimes

  • [X] In ex_money, if no configured default_cldr_backend, delegate to Cldr.default_backend/0

  • [X] Implement a {arg, list, format} formatter that uses ex_cldr_lists

  • [X] Implement a {arg, unit, format} formatter that uses ex_cldr_units

  • [X] Implement offset

  • [X] Implement custom formats in backend config provider; probably requires updating the struct in ex_cldr

  • [X] Implement selectordinal

  • [X] Assert that plural, select and selectordinal all have an other clause

  • [X] Tests

  • [X] @specs

  • [X] Dialyzer. Ask José to push nimble_parsec 0.5.2 to remove combinator errors

  • [X] Implement to_string for parse trees. This will define a canonical form which we can use to compare messages and create keys.