Nasty.Language.Catalan.Summarizer (Nasty v0.3.0)

View Source

Generates extractive summaries of Catalan documents.

Uses sentence scoring based on multiple features:

  • Term frequency (TF-IDF)
  • Position in document
  • Named entity density
  • Sentence length
  • Catalan discourse markers
  • Coreference participation

Catalan-Specific Features

  • Stop words from priv/languages/ca/stopwords.txt
  • Catalan discourse markers (en conclusió, per tant, a més, tanmateix)
  • Catalan sentence boundaries (., !, ?)

Examples

iex> {:ok, document} = Catalan.parse(tokens)
iex> Summarizer.summarize(document, ratio: 0.3)
{:ok, %Document{...}}

iex> Summarizer.summarize(document, max_sentences: 5, method: :mmr)
{:ok, %Document{...}}

Summary

Functions

Generates an extractive summary of a Catalan document.

Functions

summarize(doc, opts \\ [])

@spec summarize(
  Nasty.AST.Document.t(),
  keyword()
) :: {:ok, Nasty.AST.Document.t()} | {:error, term()}

Generates an extractive summary of a Catalan document.

Selects the most important sentences based on scoring algorithms. Supports two selection methods:

  • :greedy - Top-N sentences by score (default)
  • :mmr - Maximal Marginal Relevance (reduces redundancy)

Options

  • :ratio - Fraction of sentences to include (default: 0.3)
  • :max_sentences - Maximum number of sentences (overrides ratio)
  • :min_sentences - Minimum number of sentences (default: 1)
  • :method - Selection method: :greedy or :mmr (default: :greedy)
  • :mmr_lambda - MMR lambda parameter (0.0-1.0), default 0.7
  • :min_sentence_length - Minimum sentence length in words, default 5

Examples

iex> {:ok, summary} = Summarizer.summarize(doc, ratio: 0.3)
iex> {:ok, summary} = Summarizer.summarize(doc, max_sentences: 3, method: :mmr)