Redlines (redlines v0.9.2)

View Source

Extract and normalize tracked changes ("redlines") from documents.

This library provides a single normalized shape (Redlines.Change) across:

  • Legacy DOC track changes (via doc_redlines)
  • DOCX track changes (<w:ins>, <w:del>)
  • PDFs with embedded tracked-changes markup (via pdf_redlines)

Summary

Functions

Accept tracked changes in a DOCX and return the cleaned DOCX bytes.

Like clean_docx/2, but accepts raw DOCX bytes.

Like clean_docx_binary/2, but also returns informational warnings about revision markup that was present while cleaning.

Like clean_docx/2, but also returns informational warnings about revision markup that was present while cleaning.

Extract tracked changes from a file path, inferring type from the extension.

Format tracked changes for LLM prompts.

Types

doc_type()

@type doc_type() :: :pdf | :docx | :doc

Functions

clean_docx(docx_path, opts \\ [])

@spec clean_docx(
  Path.t(),
  keyword()
) :: {:ok, binary()} | {:error, term()}

Accept tracked changes in a DOCX and return the cleaned DOCX bytes.

This removes deletions (<w:del>…</w:del>) and unwraps insertions (<w:ins>…</w:ins>) in word/document.xml by default.

This cleaner also drops other WordprocessingML revision markup where possible (e.g. moved text and *PrChange property change history).

Options

  • :parts - Zip entry names to clean (default ["word/document.xml"])

clean_docx_binary(docx_binary, opts \\ [])

@spec clean_docx_binary(
  binary(),
  keyword()
) :: {:ok, binary()} | {:error, term()}

Like clean_docx/2, but accepts raw DOCX bytes.

clean_docx_binary_with_warnings(docx_binary, opts \\ [])

@spec clean_docx_binary_with_warnings(
  binary(),
  keyword()
) :: {:ok, binary(), [Redlines.DOCX.clean_warning()]} | {:error, term()}

Like clean_docx_binary/2, but also returns informational warnings about revision markup that was present while cleaning.

See Redlines.DOCX.clean_binary_with_warnings/2.

clean_docx_with_warnings(docx_path, opts \\ [])

@spec clean_docx_with_warnings(
  Path.t(),
  keyword()
) :: {:ok, binary(), [Redlines.DOCX.clean_warning()]} | {:error, term()}

Like clean_docx/2, but also returns informational warnings about revision markup that was present while cleaning.

See Redlines.DOCX.clean_with_warnings/2.

extract(path, opts \\ [])

@spec extract(
  Path.t(),
  keyword()
) :: {:ok, Redlines.Result.t()} | {:error, term()}

Extract tracked changes from a file path, inferring type from the extension.

Options

  • :type - Override the inferred type (:pdf, :docx, or :doc)
  • :pdf_opts - Options forwarded to PDFRedlines (only when extracting PDFs)

format_for_llm(input, opts \\ [])

@spec format_for_llm(
  Redlines.Result.t() | [Redlines.Change.t()] | map() | list(),
  keyword()
) :: String.t()

Format tracked changes for LLM prompts.

Accepts:

  • Redlines.Result
  • a list of Redlines.Change
  • a DOCX track_changes map (%{insertions: [...], deletions: [...]})
  • a list of PDF redline structs/maps (anything with :type, :deletion, :insertion, :location)