Docxir.Parser (docxir v0.1.0)

View Source

Parses Word document XML into HTML with Tailwind CSS classes.

This module handles the conversion of Word XML elements (paragraphs, runs, tables) into corresponding HTML elements styled with Tailwind CSS.

HTML Structure

  • Paragraphs: Converted to <div> elements with Tailwind CSS classes
  • Text runs: Plain text or <div class="inline-block"> elements with styling classes
  • Tables: Standard HTML <table> structure

All styled text uses inline-block divs instead of spans to allow nesting of block elements and provide better control over padding, margin, and dimensions.

Supported Features

  • Paragraphs: Alignment, indentation, spacing
  • Text Styling: Bold, italic, underline, font sizes
  • Tables: Basic structure with colspan support
  • Page Breaks: Both manual (<w:br w:type="page"/>) and paragraph-level (<w:pageBreakBefore/>)

Page Break Handling

Page breaks are converted to Tailwind CSS print utilities:

  • <w:pageBreakBefore/> in paragraph properties → break-before-page class on div
  • <w:br w:type="page"/> in run → <div class="break-after-page"></div> element

These classes work with Tailwind's print utilities to create page breaks when printing or generating PDFs.

Summary

Functions

Parses Word document XML content into HTML.

Functions

parse(xml_content)

@spec parse(binary()) :: binary()

Parses Word document XML content into HTML.

Processes paragraphs and tables in document order, preserving the sequence they appear in the original Word document.

Parameters

  • xml_content - The document.xml content as binary

Returns

  • HTML body content as a string

Examples

iex> xml = "<w:document xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"><w:body><w:p><w:r><w:t>Hello</w:t></w:r></w:p></w:body></w:document>"
iex> html = Docxir.Parser.parse(xml)
iex> html =~ "Hello"
true