# `Newxp.PreProcessing`

Functions for processing HTML content into plain text for different use cases.

# `get_html2text_handler`

Get configured html2text options.

Returns a keyword list suitable for passing to `HTML2Text.convert/2`:

- `link_footnotes: false` — omits link footnotes
- `empty_img_mode: :ignore` — skips images without alt text
- `width: :infinity` — disables line wrapping

## Examples

    Newxp.PreProcessing.get_html2text_handler()
    # => [link_footnotes: false, empty_img_mode: :ignore, width: :infinity]

# `process_for_general`

Process content for general applications.

This includes:
- Core HTML cleaning (figures, tables, noscript, read-more)
- Convert to plaintext (preserving most HTML structure)

## Examples

    html = "<p>Hello</p><figure><img/></figure>"
    Newxp.PreProcessing.process_for_general(html)
    # => "Hello\n"

# `process_for_summary`

Convert HTML to plain text for summarization.

Strips links, images, and formatting. Output is unwrapped plain text
suitable for feeding into summarization models.

## Examples

    html = "<p>Hello <a href=\"https://example.com\">world</a></p>"
    Newxp.PreProcessing.process_for_summary(html)
    # => "Hello world\n"

---

*Consult [api-reference.md](api-reference.md) for complete listing*