Nasty.Language.Catalan.Summarizer (Nasty v0.3.0)
View SourceGenerates extractive summaries of Catalan documents.
Uses sentence scoring based on multiple features:
- Term frequency (TF-IDF)
- Position in document
- Named entity density
- Sentence length
- Catalan discourse markers
- Coreference participation
Catalan-Specific Features
- Stop words from priv/languages/ca/stopwords.txt
- Catalan discourse markers (en conclusió, per tant, a més, tanmateix)
- Catalan sentence boundaries (., !, ?)
Examples
iex> {:ok, document} = Catalan.parse(tokens)
iex> Summarizer.summarize(document, ratio: 0.3)
{:ok, %Document{...}}
iex> Summarizer.summarize(document, max_sentences: 5, method: :mmr)
{:ok, %Document{...}}
Summary
Functions
Generates an extractive summary of a Catalan document.
Functions
@spec summarize( Nasty.AST.Document.t(), keyword() ) :: {:ok, Nasty.AST.Document.t()} | {:error, term()}
Generates an extractive summary of a Catalan document.
Selects the most important sentences based on scoring algorithms. Supports two selection methods:
:greedy- Top-N sentences by score (default):mmr- Maximal Marginal Relevance (reduces redundancy)
Options
:ratio- Fraction of sentences to include (default: 0.3):max_sentences- Maximum number of sentences (overrides ratio):min_sentences- Minimum number of sentences (default: 1):method- Selection method::greedyor:mmr(default::greedy):mmr_lambda- MMR lambda parameter (0.0-1.0), default 0.7:min_sentence_length- Minimum sentence length in words, default 5
Examples
iex> {:ok, summary} = Summarizer.summarize(doc, ratio: 0.3)
iex> {:ok, summary} = Summarizer.summarize(doc, max_sentences: 3, method: :mmr)