Nasty.Operations.Summarization.Abstractive behaviour (Nasty v0.3.0)

View Source

Template-based abstractive summarization.

Unlike extractive summarization which selects existing sentences, abstractive summarization generates new sentences by:

  1. Extracting key semantic facts (subject-verb-object triples)
  2. Identifying important entities and actions
  3. Generating new sentences using templates

This is a rule-based approach suitable for pure Elixir implementation. For neural abstractive summarization (seq2seq, transformers), external models would be required.

Approach

  • Extract semantic facts from sentences
  • Rank facts by importance
  • Generate new sentences from top-ranked facts using templates
  • Combine related facts into coherent summaries

Example

iex> doc = Nasty.parse("John works at Google. Google is a tech company.", language: :en)
iex> summary = Abstractive.summarize(impl, doc, max_facts: 2)
["John works at Google, a tech company."]

Summary

Callbacks

Callback for extracting semantic facts from a sentence. Returns list of {subject, verb, object} triples.

Callback for generating sentence from facts (optional). Receives facts and generates a natural language sentence.

Callback for ranking facts by importance (optional). Receives facts and document context, returns scored facts.

Functions

Extracts all sentences from a document.

Extracts subject-verb-object triples from a sentence using basic parsing.

Generates summary by combining related facts into sentences.

Default sentence generation using simple templates.

Generates simple summary with one fact per sentence.

Scores facts based on entity presence and verb importance.

Generates an abstractive summary by extracting and reformulating key facts.

Types

fact()

@type fact() :: {subject :: String.t(), verb :: String.t(), object :: String.t()}

Callbacks

extract_facts(t)

@callback extract_facts(Nasty.AST.Sentence.t()) :: [{String.t(), String.t(), String.t()}]

Callback for extracting semantic facts from a sentence. Returns list of {subject, verb, object} triples.

generate_sentence(list)

(optional)
@callback generate_sentence([fact()]) :: String.t()

Callback for generating sentence from facts (optional). Receives facts and generates a natural language sentence.

rank_facts(list, t)

(optional)
@callback rank_facts([fact()], Nasty.AST.Document.t()) :: [{fact(), float()}]

Callback for ranking facts by importance (optional). Receives facts and document context, returns scored facts.

Functions

extract_all_sentences(document)

@spec extract_all_sentences(Nasty.AST.Document.t()) :: [Nasty.AST.Sentence.t()]

Extracts all sentences from a document.

extract_basic_facts(sentence)

@spec extract_basic_facts(Nasty.AST.Sentence.t()) :: [fact()]

Extracts subject-verb-object triples from a sentence using basic parsing.

This is a simple heuristic-based extraction. For better results, language implementations should override with more sophisticated parsing.

generate_combined_summary(impl, facts, max_sentences)

@spec generate_combined_summary(module(), [fact()], integer()) :: [String.t()]

Generates summary by combining related facts into sentences.

generate_default_sentence(list)

@spec generate_default_sentence([fact()]) :: String.t()

Default sentence generation using simple templates.

generate_simple_summary(impl, facts, max_sentences)

@spec generate_simple_summary(module(), [fact()], integer()) :: [String.t()]

Generates simple summary with one fact per sentence.

score_fact(arg, document)

@spec score_fact(fact(), Nasty.AST.Document.t()) :: float()

Scores facts based on entity presence and verb importance.

summarize(impl, document, opts \\ [])

@spec summarize(module(), Nasty.AST.Document.t(), keyword()) :: [String.t()]

Generates an abstractive summary by extracting and reformulating key facts.

Options

  • :max_facts - Maximum number of facts to include (default: 3)
  • :max_sentences - Maximum number of generated sentences (default: 2)
  • :combine_related - Combine related facts into single sentences (default: true)

Returns a list of generated summary strings.