Docxir.XmlExtractor (docxir v0.1.0)

View Source

Extracts XML content from DOCX files.

DOCX files are ZIP archives containing XML documents. This module handles the extraction of the main document XML from the archive.

Summary

Functions

Extracts the document.xml content from a DOCX file.

Extracts the document.xml content from a DOCX file, raising on error.

Functions

extract(docx_path)

@spec extract(Path.t()) :: {:ok, binary()} | {:error, atom()}

Extracts the document.xml content from a DOCX file.

Parameters

  • docx_path - Path to the DOCX file as a string or charlist

Returns

  • {:ok, xml_content} - The XML content as a binary
  • {:error, reason} - Error tuple with reason

Examples

iex> Docxir.XmlExtractor.extract("contract.docx")
{:ok, "<?xml version=\"1.0\"..."}

iex> Docxir.XmlExtractor.extract("nonexistent.docx")
{:error, :enoent}

extract!(docx_path)

@spec extract!(Path.t()) :: binary()

Extracts the document.xml content from a DOCX file, raising on error.

Parameters

  • docx_path - Path to the DOCX file

Returns

  • The XML content as a binary

Raises

  • File.Error if the file cannot be read or is not a valid DOCX

Examples

iex> Docxir.XmlExtractor.extract!("contract.docx")
"<?xml version=\"1.0\"..."