# `HTML2Text`
[🔗](https://github.com/fuelen/html2text/blob/v0.3.0/lib/html2_text.ex#L6)

A high-performance HTML to text converter using Rust NIF.

Two conversion modes are available:

- `convert/2` — plain text with markdown-like decorations (`**bold**`, `*italic*`, link footnotes)
- `convert_rich/2` — structured `{text, annotations}` tuples for building custom renderers (Slack, Discord, etc.)

Additionally, `HTML2Text.HTML` is a container struct whose `Inspect` protocol renders HTML as
formatted text with ANSI styles directly in IEx.

## HTML container

Wrap HTML in `HTML2Text.HTML.new/1` to get readable output when inspecting data structures:

    email = %{subject: "Welcome", body: HTML2Text.HTML.new("<p>Hello <strong>world</strong></p>")}

In IEx this prints as:

    %{subject: "Welcome", body: #HTML2Text.HTML<Hello **world**>}

Bold, italic, links (clickable in supported terminals), code, strikeout, and CSS
colours are rendered with ANSI escape sequences. `to_string/1` returns the original HTML.

See: https://github.com/jugglerchris/rust-html2text

# `annotation`

```elixir
@type annotation() ::
  :default
  | :emphasis
  | :strong
  | :strikeout
  | :code
  | {:link, url :: String.t()}
  | {:image, src :: String.t()}
  | {:preformat, continuation :: boolean()}
  | {:colour,
     {r :: non_neg_integer(), g :: non_neg_integer(), b :: non_neg_integer()}}
  | {:bg_colour,
     {r :: non_neg_integer(), g :: non_neg_integer(), b :: non_neg_integer()}}
```

# `line`

```elixir
@type line() :: [segment()]
```

# `opts`

```elixir
@type opts() :: [
  width: pos_integer() | :infinity,
  decorate: boolean(),
  link_footnotes: boolean(),
  table_borders: boolean(),
  pad_block_width: boolean(),
  allow_width_overflow: boolean(),
  min_wrap_width: pos_integer(),
  raw: boolean(),
  wrap_links: boolean(),
  unicode_strikeout: boolean(),
  empty_img_mode: :ignore | {:replace, String.t()} | :filename
]
```

# `rich_opts`

```elixir
@type rich_opts() :: [
  width: pos_integer() | :infinity,
  table_borders: boolean(),
  pad_block_width: boolean(),
  allow_width_overflow: boolean(),
  min_wrap_width: pos_integer(),
  raw: boolean(),
  wrap_links: boolean(),
  empty_img_mode: :ignore | {:replace, String.t()} | :filename,
  use_doc_css: boolean(),
  css: String.t()
]
```

# `segment`

```elixir
@type segment() :: {text :: String.t(), annotations :: [annotation()]}
```

# `convert`

```elixir
@spec convert(html :: String.t(), opts()) ::
  {:ok, text :: String.t()} | {:error, reason :: String.t()}
```

Converts HTML content to plain text.

## Options
- `:width` — Maximum line width (positive integer or `:infinity`). Defaults to `80`. Setting to `:infinity` disables line wrapping and outputs the entire text on a single line.
- `:decorate` — Enables text decorations like bold or italic. Boolean, defaults to `true`. When `false`, output is plain text without styling.
- `:link_footnotes` — Adds numbered link footnotes at the end of the text. Boolean, defaults to `true`. When `false`, links are omitted.
- `:table_borders` — Shows ASCII borders around table cells. Boolean, defaults to `true`. When `false`, tables render without borders.
- `:pad_block_width` — Pads blocks with spaces to align text to full width. Boolean, defaults to `false`. Useful for fixed-width layouts.
- `:allow_width_overflow` — Allows lines to exceed the specified width if wrapping is impossible. Boolean, defaults to `false`. Prevents errors when content can't fit.
- `:min_wrap_width` — Minimum length of text chunks when wrapping lines. Integer ≥ 1, defaults to `3`. Helps avoid awkwardly narrow wraps.
- `:raw` — Enables raw mode with minimal processing and formatting. Boolean, defaults to `false`. Produces plain, raw text output.
- `:wrap_links` — Wraps long URLs or links onto multiple lines. Boolean, defaults to `true`. When `false`, links stay on a single line and may overflow.
- `:unicode_strikeout` — Uses Unicode characters for strikeout text. Boolean, defaults to `true`. When `false`, strikeout renders in simpler styles.
- `:empty_img_mode` — Controls how images without alt text are rendered. Accepts `:ignore` (skip images without alt text, default), `{:replace, text}` (replace with static text like `"[image]"`), or `:filename` (use the image filename from URL).

## Examples

    iex> html = "<h1>Title</h1><p>Some paragraph text.</p>"
    ...> HTML2Text.convert(html, width: 15)
    {:ok, "# Title\n\nSome paragraph\ntext.\n"}

    iex> HTML2Text.convert("<b>Important</b>", decorate: false)
    {:ok, "Important\n"}

    iex> HTML2Text.convert("<table><tr><td>A</td><td>B</td></tr></table>", [])
    {:ok, "─┬─\nA│B\n─┴─\n"}

    iex> HTML2Text.convert("<p><a href=\"https://example.com\">link</a></p>", link_footnotes: false)
    {:ok, "[link]\n"}

# `convert!`

```elixir
@spec convert!(html :: String.t(), opts :: opts()) :: String.t()
```

Converts HTML content to plain text, raising on failure.

This function behaves like `convert/2`, but raises an error if conversion fails.

## Examples

    iex> HTML2Text.convert!("<p>hello</p>")
    "hello\n"

    iex> HTML2Text.convert!("<em>italic</em>")
    "*italic*\n"

# `convert_rich`

```elixir
@spec convert_rich(html :: String.t(), rich_opts()) ::
  {:ok, [line()]} | {:error, reason :: String.t()}
```

Converts HTML content to annotated rich text.

Returns a list of lines, where each line is a list of `{text, annotations}` tuples.
Annotations are stacked — a text segment inside `<strong><a href="...">` will have
`[{:link, url}, :strong]`, with the outer annotation first.

## Options
- `:width` — Maximum line width (positive integer or `:infinity`). Defaults to `80`.
- `:table_borders` — Shows ASCII borders around table cells. Boolean, defaults to `true`.
- `:pad_block_width` — Pads blocks with spaces to align text to full width. Boolean, defaults to `false`.
- `:allow_width_overflow` — Allows lines to exceed the specified width. Boolean, defaults to `false`.
- `:min_wrap_width` — Minimum length of text chunks when wrapping. Integer ≥ 1, defaults to `3`.
- `:raw` — Enables raw mode with minimal processing. Boolean, defaults to `false`.
- `:wrap_links` — Wraps long URLs onto multiple lines. Boolean, defaults to `true`.
- `:empty_img_mode` — Controls how images without alt text are rendered. Accepts `:ignore` (default), `{:replace, text}`, or `:filename`.
- `:use_doc_css` — Parse `<style>` tags from the HTML to extract colour annotations. Boolean, defaults to `false`.
- `:css` — Additional CSS rules to apply. String, defaults to `nil`.

## Annotations
- `:default` — Normal text
- `:emphasis` — `<em>` tag
- `:strong` — `<strong>` / `<b>` tag
- `:strikeout` — `<s>` / `<del>` tag
- `:code` — `<code>` tag
- `{:link, url}` — `<a href="...">` tag
- `{:image, src}` — `<img src="...">` tag
- `{:preformat, bool}` — `<pre>` block (`true` if continuation line)
- `{:colour, {r, g, b}}` — CSS text color
- `{:bg_colour, {r, g, b}}` — CSS background color

## Examples

    iex> HTML2Text.convert_rich("<p>Hello <strong>world</strong></p>")
    {:ok, [[{"Hello ", []}, {"world", [:strong]}]]}

    iex> HTML2Text.convert_rich("<em>text</em>")
    {:ok, [[{"text", [:emphasis]}]]}

    iex> HTML2Text.convert_rich(~s(<a href="https://example.com">click</a>))
    {:ok, [[{"click", [link: "https://example.com"]}]]}

    iex> HTML2Text.convert_rich(~s(<a href="https://ex.com"><strong>bold link</strong></a>))
    {:ok, [[{"bold link", [{:link, "https://ex.com"}, :strong]}]]}

# `convert_rich!`

```elixir
@spec convert_rich!(html :: String.t(), rich_opts()) :: [line()]
```

Converts HTML content to annotated rich text, raising on failure.

This function behaves like `convert_rich/2`, but raises an error if conversion fails.

## Examples

    iex> HTML2Text.convert_rich!("<p>hello</p>")
    [[{"hello", []}]]

    iex> HTML2Text.convert_rich!("<code>x = 1</code>")
    [[{"x = 1", [:code]}]]

---

*Consult [api-reference.md](api-reference.md) for complete listing*
