Docxir.Parser (docxir v0.1.0)
View SourceParses Word document XML into HTML with Tailwind CSS classes.
This module handles the conversion of Word XML elements (paragraphs, runs, tables) into corresponding HTML elements styled with Tailwind CSS.
HTML Structure
- Paragraphs: Converted to
<div>elements with Tailwind CSS classes - Text runs: Plain text or
<div class="inline-block">elements with styling classes - Tables: Standard HTML
<table>structure
All styled text uses inline-block divs instead of spans to allow nesting of block elements and provide better control over padding, margin, and dimensions.
Supported Features
- Paragraphs: Alignment, indentation, spacing
- Text Styling: Bold, italic, underline, font sizes
- Tables: Basic structure with colspan support
- Page Breaks: Both manual (
<w:br w:type="page"/>) and paragraph-level (<w:pageBreakBefore/>)
Page Break Handling
Page breaks are converted to Tailwind CSS print utilities:
<w:pageBreakBefore/>in paragraph properties →break-before-pageclass on div<w:br w:type="page"/>in run →<div class="break-after-page"></div>element
These classes work with Tailwind's print utilities to create page breaks when printing or generating PDFs.
Summary
Functions
Parses Word document XML content into HTML.
Functions
Parses Word document XML content into HTML.
Processes paragraphs and tables in document order, preserving the sequence they appear in the original Word document.
Parameters
xml_content- The document.xml content as binary
Returns
- HTML body content as a string
Examples
iex> xml = "<w:document xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"><w:body><w:p><w:r><w:t>Hello</w:t></w:r></w:p></w:body></w:document>"
iex> html = Docxir.Parser.parse(xml)
iex> html =~ "Hello"
true