html-to-markdown

View Source
Rust Python Node.js WASM Java Go C# PHP Ruby Elixir License
html-to-markdown
Discord

Elixir bindings for the Rust html-to-markdown engine. The package exposes a fast HTML to Markdown converter implemented with Rustler. Ship identical Markdown across every runtime while enjoying native performance with Rustler NIF bindings.

Installation

Add {:html_to_markdown, "~> 2.19.0"} to mix.exs deps

Requires Elixir 1.19+ and OTP 28. Add to your mix.exs:

def deps do
  [
    {:html_to_markdown, "~> 2.19.0"}
  ]
end

Performance Snapshot

Apple M4 • Real Wikipedia documents • convert() (Elixir)

DocumentSizeOps/secThroughput
Lists (Timeline)129KB2,547321.7 MB/s
Tables (Countries)360KB835293.8 MB/s
Medium (Python)656KB439281.5 MB/s
Large (Rust)567KB485268.7 MB/s
Small (Intro)463KB581262.9 MB/s
HOCR German PDF44KB7,106303.1 MB/s
HOCR Embedded Tables37KB6,231226.1 MB/s
HOCR Invoice4KB62,657256.4 MB/s

See Performance Guide for detailed benchmarks.

Quick Start

Basic conversion:

iex> {:ok, markdown} = HtmlToMarkdown.convert("<h1>Hello</h1>")
iex> markdown
"# Hello\n"

With conversion options:

# Pre-build reusable options
iex> handle = HtmlToMarkdown.options(%Options{wrap: true, wrap_width: 40})
iex> HtmlToMarkdown.convert_with_options("<p>Reusable</p>", handle)
{:ok, "Reusable\n"}

API Reference

Core Functions

HtmlToMarkdown.convert(html, options \\ nil) :: String.t()

Basic HTML-to-Markdown conversion. Fast and simple.

HtmlToMarkdown.convert_with_metadata(html, options \\ nil, config \\ nil) :: {String.t(), map()}

Extract Markdown plus metadata in a single pass. See Metadata Extraction Guide.

HtmlToMarkdown.convert_with_inline_images(html, config \\ nil) :: {String.t(), list(map()), list(String.t())}

Extract base64-encoded inline images with metadata.

Options

ConversionOptions – Key configuration fields:

  • heading_style: Heading format ("underlined" | "atx" | "atx_closed") — default: "underlined"

  • list_indent_width: Spaces per indent level — default: 2
  • bullets: Bullet characters cycle — default: "*+-"
  • wrap: Enable text wrapping — default: false
  • wrap_width: Wrap at column — default: 80
  • code_language: Default fenced code block language — default: none
  • extract_metadata: Embed metadata as YAML frontmatter — default: false

MetadataConfig – Selective metadata extraction:

  • extract_headers: h1-h6 elements — default: true
  • extract_links: Hyperlinks — default: true
  • extract_images: Image elements — default: true
  • extract_structured_data: JSON-LD, Microdata, RDFa — default: true
  • max_structured_data_size: Size limit in bytes — default: 100KB

Metadata Extraction

The metadata extraction feature enables comprehensive document analysis during conversion. Extract document properties, headers, links, images, and structured data in a single pass.

Use Cases:

  • SEO analysis – Extract title, description, Open Graph tags, Twitter cards
  • Table of contents generation – Build structured outlines from heading hierarchy
  • Content migration – Document all external links and resources
  • Accessibility audits – Check for images without alt text, empty links, invalid heading hierarchy
  • Link validation – Classify and validate anchor, internal, external, email, and phone links

Zero Overhead When Disabled: Metadata extraction adds negligible overhead and happens during the HTML parsing pass. Disable unused metadata types in MetadataConfig to optimize further.

Example: Quick Start

alias HtmlToMarkdown

html = "<h1>Article</h1><img src=\"test.jpg\" alt=\"test\">"
{markdown, metadata} = HtmlToMarkdown.convert_with_metadata(html)

IO.inspect(metadata.document.title)        # Document title
IO.inspect(metadata.headers)               # All h1-h6 elements
IO.inspect(metadata.links)                 # All hyperlinks
IO.inspect(metadata.images)                # All images with alt text
IO.inspect(metadata.structured_data)       # JSON-LD, Microdata, RDFa

For detailed examples including SEO extraction, table-of-contents generation, link validation, and accessibility audits, see the Metadata Extraction Guide.

Visitor Pattern

The visitor pattern enables custom HTML→Markdown conversion logic by providing callbacks for specific HTML elements during traversal. Use visitors to transform content, filter elements, validate structure, or collect analytics.

Use Cases:

  • Custom Markdown dialects – Convert to Obsidian, Notion, or other flavors
  • Content filtering – Remove tracking pixels, ads, or unwanted elements
  • URL rewriting – Rewrite CDN URLs, add query parameters, validate links
  • Accessibility validation – Check alt text, heading hierarchy, link text
  • Analytics – Track element usage, link destinations, image sources

Supported Visitor Methods: 40+ callbacks for text, inline elements, links, images, headings, lists, blocks, and tables.

Example: Quick Start

defmodule MyVisitor do
  def visit_link(ctx, href, text, title) do
    # Rewrite CDN URLs
    href = if String.starts_with?(href, "https://old-cdn.com") do
      String.replace(href, "https://old-cdn.com", "https://new-cdn.com")
    else
      href
    end
    {:custom, "[#{text}](#{href})"}
  end

  def visit_image(ctx, src, alt, title) do
    # Skip tracking pixels
    if String.contains?(src, "tracking") do
      :skip
    else
      :continue
    end
  end
end

html = "<a href=\"https://old-cdn.com/file.pdf\">Download</a>"
markdown = HtmlToMarkdown.convert_with_visitor(html, visitor: MyVisitor)

For comprehensive examples including content filtering, link footnotes, accessibility validation, and asynchronous URL validation, see the Visitor Pattern Guide.

Examples

Contributing

We welcome contributions! Please see our Contributing Guide for details on:

  • Setting up the development environment
  • Running tests locally
  • Submitting pull requests
  • Reporting issues

All contributions must follow our code quality standards (enforced via pre-commit hooks):

  • Proper test coverage (Rust 95%+, language bindings 80%+)
  • Formatting and linting checks
  • Documentation for public APIs

License

MIT License – see LICENSE.

Support

If you find this library useful, consider sponsoring the project.

Have questions or run into issues? We're here to help: