Extractive text summarization.
Picks the most representative sentences from a document and returns them in their original order. Two algorithms are implemented:
:textrank(default) — builds a sentence similarity graph and runs PageRank over it. Sentences that are similar to many other sentences score higher. From Mihalcea & Tarau (2004).:lexrank— the same idea but with a hard similarity threshold: edges below the threshold are dropped before scoring. From Erkan & Radev (2004).
Both algorithms use Jaccard similarity over content words (with stopwords removed) as the inter-sentence weight. The graph construction and ranking are pure Elixir — no embeddings, no external models — so summarization works on any text the segmenter can handle.
This is suitable for documents from a few sentences to a few hundred. For very long documents, consider chunking first. Abstractive summarization (rewriting rather than selecting) is a Bumblebee-backed feature on the deferred list.
Summary
Functions
Returns the per-sentence importance scores from the chosen algorithm.
Returns the most representative sentences from the input text.
Returns the selected sentences as a list (rather than joined).
Functions
Returns the per-sentence importance scores from the chosen algorithm.
Useful for callers that want to render a heatmap of sentence importance, or implement their own selection policy on top of the raw scores.
Returns
- A list of floats, one per sentence in the input, in document order.
Returns the most representative sentences from the input text.
Arguments
textis the document as a string.
Options
:sentencesis the number of sentences to return. Default3. If the document has fewer sentences than this, every sentence is returned.:algorithmis:textrank(default) or:lexrank.:languageis the language atom used for sentence segmentation and stopword removal. Default:en.:dampingis the PageRank damping factor. Default0.85.:iterationsis the number of PageRank iterations. Default30.:thresholdis the LexRank similarity cutoff. Edges with similarity below this value are dropped. Only used by:lexrank. Default0.1.
Returns
- A string of selected sentences joined with single spaces, in original document order.
Examples
iex> text = "Cats are lovely pets. Dogs are loyal animals. Goldfish swim quietly. Birds sing in the morning."
iex> Text.Summarize.summarize(text, sentences: 2) |> String.contains?(".")
true
Returns the selected sentences as a list (rather than joined).
Same options as summarize/2. Useful when the caller wants to
format the output (bullets, numbered lists) rather than receive
pre-joined prose.
Examples
iex> text = "First sentence. Second sentence. Third sentence. Fourth sentence."
iex> result = Text.Summarize.summarize_sentences(text, sentences: 2)
iex> length(result)
2