html-to-markdown
View SourceElixir bindings for the Rust html-to-markdown engine. The package exposes a fast HTML to Markdown converter implemented with Rustler. Ship identical Markdown across every runtime while enjoying native performance with Rustler NIF bindings.
Installation
Add {:html_to_markdown, "~> 2.19.0"} to mix.exs deps
Requires Elixir 1.19+ and OTP 28. Add to your mix.exs:
def deps do
[
{:html_to_markdown, "~> 2.19.0"}
]
endPerformance Snapshot
Apple M4 • Real Wikipedia documents • convert() (Elixir)
| Document | Size | Ops/sec | Throughput |
|---|---|---|---|
| Lists (Timeline) | 129KB | 2,547 | 321.7 MB/s |
| Tables (Countries) | 360KB | 835 | 293.8 MB/s |
| Medium (Python) | 656KB | 439 | 281.5 MB/s |
| Large (Rust) | 567KB | 485 | 268.7 MB/s |
| Small (Intro) | 463KB | 581 | 262.9 MB/s |
| HOCR German PDF | 44KB | 7,106 | 303.1 MB/s |
| HOCR Embedded Tables | 37KB | 6,231 | 226.1 MB/s |
| HOCR Invoice | 4KB | 62,657 | 256.4 MB/s |
See Performance Guide for detailed benchmarks.
Quick Start
Basic conversion:
iex> {:ok, markdown} = HtmlToMarkdown.convert("<h1>Hello</h1>")
iex> markdown
"# Hello\n"With conversion options:
# Pre-build reusable options
iex> handle = HtmlToMarkdown.options(%Options{wrap: true, wrap_width: 40})
iex> HtmlToMarkdown.convert_with_options("<p>Reusable</p>", handle)
{:ok, "Reusable\n"}API Reference
Core Functions
HtmlToMarkdown.convert(html, options \\ nil) :: String.t()
Basic HTML-to-Markdown conversion. Fast and simple.
HtmlToMarkdown.convert_with_metadata(html, options \\ nil, config \\ nil) :: {String.t(), map()}
Extract Markdown plus metadata in a single pass. See Metadata Extraction Guide.
HtmlToMarkdown.convert_with_inline_images(html, config \\ nil) :: {String.t(), list(map()), list(String.t())}
Extract base64-encoded inline images with metadata.
Options
ConversionOptions – Key configuration fields:
heading_style: Heading format ("underlined"|"atx"|"atx_closed") — default:"underlined"list_indent_width: Spaces per indent level — default:2bullets: Bullet characters cycle — default:"*+-"wrap: Enable text wrapping — default:falsewrap_width: Wrap at column — default:80code_language: Default fenced code block language — default: noneextract_metadata: Embed metadata as YAML frontmatter — default:false
MetadataConfig – Selective metadata extraction:
extract_headers: h1-h6 elements — default:trueextract_links: Hyperlinks — default:trueextract_images: Image elements — default:trueextract_structured_data: JSON-LD, Microdata, RDFa — default:truemax_structured_data_size: Size limit in bytes — default:100KB
Metadata Extraction
The metadata extraction feature enables comprehensive document analysis during conversion. Extract document properties, headers, links, images, and structured data in a single pass.
Use Cases:
- SEO analysis – Extract title, description, Open Graph tags, Twitter cards
- Table of contents generation – Build structured outlines from heading hierarchy
- Content migration – Document all external links and resources
- Accessibility audits – Check for images without alt text, empty links, invalid heading hierarchy
- Link validation – Classify and validate anchor, internal, external, email, and phone links
Zero Overhead When Disabled: Metadata extraction adds negligible overhead and happens during the HTML parsing pass. Disable unused metadata types in MetadataConfig to optimize further.
Example: Quick Start
alias HtmlToMarkdown
html = "<h1>Article</h1><img src=\"test.jpg\" alt=\"test\">"
{markdown, metadata} = HtmlToMarkdown.convert_with_metadata(html)
IO.inspect(metadata.document.title) # Document title
IO.inspect(metadata.headers) # All h1-h6 elements
IO.inspect(metadata.links) # All hyperlinks
IO.inspect(metadata.images) # All images with alt text
IO.inspect(metadata.structured_data) # JSON-LD, Microdata, RDFaFor detailed examples including SEO extraction, table-of-contents generation, link validation, and accessibility audits, see the Metadata Extraction Guide.
Visitor Pattern
The visitor pattern enables custom HTML→Markdown conversion logic by providing callbacks for specific HTML elements during traversal. Use visitors to transform content, filter elements, validate structure, or collect analytics.
Use Cases:
- Custom Markdown dialects – Convert to Obsidian, Notion, or other flavors
- Content filtering – Remove tracking pixels, ads, or unwanted elements
- URL rewriting – Rewrite CDN URLs, add query parameters, validate links
- Accessibility validation – Check alt text, heading hierarchy, link text
- Analytics – Track element usage, link destinations, image sources
Supported Visitor Methods: 40+ callbacks for text, inline elements, links, images, headings, lists, blocks, and tables.
Example: Quick Start
defmodule MyVisitor do
def visit_link(ctx, href, text, title) do
# Rewrite CDN URLs
href = if String.starts_with?(href, "https://old-cdn.com") do
String.replace(href, "https://old-cdn.com", "https://new-cdn.com")
else
href
end
{:custom, "[#{text}](#{href})"}
end
def visit_image(ctx, src, alt, title) do
# Skip tracking pixels
if String.contains?(src, "tracking") do
:skip
else
:continue
end
end
end
html = "<a href=\"https://old-cdn.com/file.pdf\">Download</a>"
markdown = HtmlToMarkdown.convert_with_visitor(html, visitor: MyVisitor)For comprehensive examples including content filtering, link footnotes, accessibility validation, and asynchronous URL validation, see the Visitor Pattern Guide.
Examples
Links
Hex.pm: hex.pm/packages/html_to_markdown
Kreuzberg Ecosystem: kreuzberg.dev
Discord: discord.gg/pXxagNK2zN
Contributing
We welcome contributions! Please see our Contributing Guide for details on:
- Setting up the development environment
- Running tests locally
- Submitting pull requests
- Reporting issues
All contributions must follow our code quality standards (enforced via pre-commit hooks):
- Proper test coverage (Rust 95%+, language bindings 80%+)
- Formatting and linting checks
- Documentation for public APIs
License
MIT License – see LICENSE.
Support
If you find this library useful, consider sponsoring the project.
Have questions or run into issues? We're here to help:
- GitHub Issues: github.com/kreuzberg-dev/html-to-markdown/issues
- Discussions: github.com/kreuzberg-dev/html-to-markdown/discussions
- Discord Community: discord.gg/pXxagNK2zN