Text.Readability (Text v0.5.0)

Copy Markdown View Source

Readability metrics for English text.

Implements the classic readability indices used to estimate the reading difficulty of a passage:

  • flesch/2 — Flesch Reading Ease (higher = easier).
  • flesch_kincaid/2 — Flesch-Kincaid Grade Level (US school grade).
  • gunning_fog/2 — Gunning-Fog Index (years of education).
  • smog/2 — SMOG Index (years of education; needs ~30 sentences).
  • ari/2 — Automated Readability Index (US school grade).
  • coleman_liau/2 — Coleman-Liau Index (US school grade).
  • lix/2 — LIX (Läsbarhetsindex; language-agnostic-ish).
  • dale_chall/2 — Dale-Chall (US school grade); uses the bundled ~3,000-word easy-words list.
  • spache/2 — Spache (US school grade, K-3 readers); uses the bundled ~1,000-word easy-words list.

Use metrics/2 to compute every index in one pass over the text, and statistics/2 to inspect the raw counts (words, sentences, syllables, characters, polysyllables, long words) the metrics are built from.

Sentence and word segmentation use Text.Segment (UAX #29 with CLDR abbreviation suppressions). Syllable counting uses Text.Syllable, which currently supports English only — so all metrics that depend on syllable counts (Flesch, Flesch-Kincaid, Gunning-Fog, SMOG) are English-only. ARI, Coleman-Liau, and LIX are character/length-based and work for any whitespace-segmented language. Dale-Chall and Spache are English-only and use the bundled easy-word lists in priv/readability/.

Summary

Types

Map of every readability metric computed for a text.

Raw text statistics used to compute the readability metrics.

Functions

Returns the Automated Readability Index.

Returns the Coleman-Liau Index.

Returns the Dale-Chall readability score.

Returns the Flesch Reading Ease score.

Returns the Flesch-Kincaid Grade Level.

Returns the Gunning-Fog Index.

Returns the LIX (Läsbarhetsindex) readability score.

Computes every readability metric in one pass.

Returns the SMOG Index (Simple Measure Of Gobbledygook).

Returns the Spache readability score.

Returns the raw text statistics used by the readability metrics.

Types

metrics()

@type metrics() :: %{
  flesch: float(),
  flesch_kincaid: float(),
  gunning_fog: float(),
  smog: float(),
  ari: float(),
  coleman_liau: float(),
  lix: float(),
  dale_chall: float(),
  spache: float()
}

Map of every readability metric computed for a text.

statistics()

@type statistics() :: %{
  characters: non_neg_integer(),
  letters: non_neg_integer(),
  words: non_neg_integer(),
  sentences: non_neg_integer(),
  syllables: non_neg_integer(),
  polysyllables: non_neg_integer(),
  long_words: non_neg_integer(),
  difficult_words: non_neg_integer(),
  unfamiliar_words: non_neg_integer(),
  average_sentence_length: float(),
  average_syllables_per_word: float()
}

Raw text statistics used to compute the readability metrics.

Functions

ari(text_or_stats, options \\ [])

@spec ari(
  String.t() | statistics(),
  keyword()
) :: float()

Returns the Automated Readability Index.

Output is a US school grade level. Character-based rather than syllable-based, so it works for any whitespace-segmented language.

Formula: 4.71 × (letters/words) + 0.5 × (words/sentences) − 21.43.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the segmentation locale, default :en.

Returns

  • A float grade level. Returns 0.0 for empty input.

Examples

iex> Text.Readability.ari("The cat sat on the mat.") |> Float.round(1)
-5.1

coleman_liau(text_or_stats, options \\ [])

@spec coleman_liau(
  String.t() | statistics(),
  keyword()
) :: float()

Returns the Coleman-Liau Index.

Output is a US school grade level. Character-based; no syllable counting required.

Formula: 0.0588 × L − 0.296 × S − 15.8, where L is letters per 100 words and S is sentences per 100 words.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the segmentation locale, default :en.

Returns

  • A float grade level. Returns 0.0 for empty input.

Examples

iex> Text.Readability.coleman_liau("The cat sat on the mat.") |> Float.round(1)
-4.1

dale_chall(text_or_stats, options \\ [])

@spec dale_chall(
  String.t() | statistics(),
  keyword()
) :: float()

Returns the Dale-Chall readability score.

Output is a US school grade level. Computed from the percentage of difficult words — words not in the bundled ~3,000-word easy-words list — and the average sentence length. Uses the Chall-Dale 1995 adjustment: the raw score is shifted by +3.6365 when more than 5% of words are difficult.

Formula: 0.1579 × (PDW × 100) + 0.0496 × ASL [+ 3.6365 if PDW > 0.05], where PDW is the proportion of difficult words and ASL is average sentence length.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the segmentation locale, default :en. The bundled easy-words list is English-only.

Returns

  • A float grade level. Returns 0.0 for empty input.

Examples

iex> Text.Readability.dale_chall("The cat sat on the mat.") |> is_float()
true

flesch(text_or_stats, options \\ [])

@spec flesch(
  String.t() | statistics(),
  keyword()
) :: float()

Returns the Flesch Reading Ease score.

Higher scores mean easier text. 60-70 is "plain English"; 30-50 is "difficult"; 0-30 is "very confusing".

Formula: 206.835 − 1.015 × (words/sentences) − 84.6 × (syllables/words).

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the syllable-counting language, default :en.

Returns

  • A float. Returns 0.0 if the text has no sentences or words.

Examples

iex> Text.Readability.flesch("The cat sat on the mat.") |> Float.round(1)
116.1

flesch_kincaid(text_or_stats, options \\ [])

@spec flesch_kincaid(
  String.t() | statistics(),
  keyword()
) :: float()

Returns the Flesch-Kincaid Grade Level.

Output is a US school grade level (e.g. 8.5 ≈ mid eighth grade).

Formula: 0.39 × (words/sentences) + 11.8 × (syllables/words) − 15.59.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the syllable-counting language, default :en.

Returns

  • A float grade level. Returns 0.0 for empty input.

Examples

iex> Text.Readability.flesch_kincaid("The cat sat on the mat.") |> Float.round(1)
-1.4

gunning_fog(text_or_stats, options \\ [])

@spec gunning_fog(
  String.t() | statistics(),
  keyword()
) :: float()

Returns the Gunning-Fog Index.

Output is roughly the years of formal education a reader needs. 12 is a US high-school senior; 17+ is graduate-level.

Formula: 0.4 × ((words/sentences) + 100 × (complex_words/words)), where a complex word is one with 3 or more syllables.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the syllable-counting language, default :en.

Returns

  • A float. Returns 0.0 for empty input.

Examples

iex> Text.Readability.gunning_fog("The complicated explanation confused everyone immediately.") |> Float.round(1)
35.7

lix(text_or_stats, options \\ [])

@spec lix(
  String.t() | statistics(),
  keyword()
) :: float()

Returns the LIX (Läsbarhetsindex) readability score.

Designed for Swedish but used as a language-agnostic indicator. A long word is one with 7 or more characters.

Formula: (words/sentences) + 100 × (long_words/words).

Interpretation: <30 very easy, 30-40 easy, 40-50 medium, 50-60 difficult, >60 very difficult.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the segmentation locale, default :en.

Returns

  • A float. Returns 0.0 for empty input.

Examples

iex> Text.Readability.lix("The cat sat on the mat.") |> Float.round(1)
6.0

metrics(text, options \\ [])

@spec metrics(
  String.t(),
  keyword()
) :: metrics()

Computes every readability metric in one pass.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the syllable-counting and segmentation language, default :en.

Returns

  • A map with the keys :flesch, :flesch_kincaid, :gunning_fog, :smog, :ari, :coleman_liau, :lix, :dale_chall, and :spache. Empty input yields 0.0 for every metric.

Examples

iex> Text.Readability.metrics("The cat sat on the mat.") |> Map.keys() |> Enum.sort()
[:ari, :coleman_liau, :dale_chall, :flesch, :flesch_kincaid, :gunning_fog, :lix, :smog, :spache]

smog(text_or_stats, options \\ [])

@spec smog(
  String.t() | statistics(),
  keyword()
) :: float()

Returns the SMOG Index (Simple Measure Of Gobbledygook).

Output is roughly the years of formal education a reader needs. Designed to be applied to passages of about 30 sentences; results on shorter passages are less reliable.

Formula: 1.0430 × √(polysyllables × 30 / sentences) + 3.1291.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the syllable-counting language, default :en.

Returns

  • A float. Returns 0.0 for empty input.

Examples

iex> Text.Readability.smog("The cat sat on the mat. The dog ran fast.") |> is_float()
true

spache(text_or_stats, options \\ [])

@spec spache(
  String.t() | statistics(),
  keyword()
) :: float()

Returns the Spache readability score.

Output is a US school grade level, intended for K-3 reading material. Computed from the percentage of unfamiliar words — words not in the bundled ~1,000-word Spache easy-words list — and average sentence length.

Formula: 0.121 × ASL + 0.082 × (UPW × 100) + 0.659, where ASL is average sentence length and UPW is the proportion of unfamiliar words.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the segmentation locale, default :en. The bundled easy-words list is English-only.

Returns

  • A float grade level. Returns 0.0 for empty input.

Examples

iex> Text.Readability.spache("The cat sat on the mat.") |> is_float()
true

statistics(text, options \\ [])

@spec statistics(
  String.t(),
  keyword()
) :: statistics()

Returns the raw text statistics used by the readability metrics.

Arguments

  • text is a string of one or more sentences.

Options

  • :language is the language used for syllable counting and sentence segmentation. The default is :en. Only :en is supported by the syllable counter today.

Returns

  • A map with the keys :characters, :letters, :words, :sentences, :syllables, :polysyllables, :long_words, :average_sentence_length, and :average_syllables_per_word. Empty input returns zeroed counts and 0.0 averages.

Examples

iex> stats = Text.Readability.statistics("The cat sat on the mat. The dog ran.")
iex> stats.words
9
iex> stats.sentences
2