View Source MDEx (MDEx v0.2.0)

MDEx logo

A CommonMark-compliant fast and extensible Markdown parser and formatter for Elixir.

Hex Version Hex Docs MIT

Features

Compliant with CommonMark and GitHub Flavored Markdown specifications with extra extensions as Wiki Links, Discord Markdown tags, and emoji. Also supports syntax highlighting out-of-the-box using the Autumn library.

Under the hood it's calling the comrak APIs to process Markdown, a fast Rust crate that ports the cmark fork maintained by GitHub, a widely and well adopted Markdown implementation.

The AST format is based on Floki so the same API to manipulate HTML can be used to manipulate Markdown documents. Check out some examples at mdex/examples/

And some samples are available at https://mdex-c31.pages.dev

Installation

Add :mdex dependency:

def deps do
  [
    {:mdex, "~> 0.2"}
  ]
end

Usage

Mix.install([{:mdex, "~> 0.2"}])
MDEx.to_html!("# Hello")
"<h1>Hello</h1>\n"
MDEx.to_html!("# Hello :smile:", extension: [shortcodes: true])
"<h1>Hello 😄</h1>\n"

Sigils

Convert between Markdown, HTML, and AST.

import MDEx.Sigil
~M|# Hello from `~M` sigil|
"<h1>Hello from <code>~M</code> sigil</h1>\n"
~M|`~M` can return the AST too|AST
[
  {"document", [], [{"paragraph", [], [{"code", [{"num_backticks", 1}, {"literal", "~M"}], []}, " can return the AST too"]}]}
]
"<h1>Hello</h1>\n"
title = "Hello from variable"

~m|[{"document", [], [{"heading", [], ["#{title}"]}]}]|
"<h1>Hello from variable</h1>\n"

See all modifiers and examples at https://hexdocs.pm/mdex/MDEx.Sigil.html

Safety

For security reasons, every piece of raw HTML is omitted from the output by default:

MDEx.to_html!("<h1>Hello</h1>")
"<!-- raw HTML omitted -->\n"

That's not very useful for most cases, so you can render raw HTML but escaping it for safety:

MDEx.to_html!("<h1>Hello</h1>", render: [escape: true])
"&lt;h1&gt;Hello&lt;/h1&gt;\n"

If the input is provided by external sources, it might be a good idea to sanitize it instead for extra security:

MDEx.to_html!("<a href=https://elixir-lang.org/>Elixir</a>", render: [unsafe_: true], features: [sanitize: true])
"<p><a href=\"https://elixir-lang.org/\" rel=\"noopener noreferrer\">Elixir</a></p>\n"

Note that you must pass the unsafe_: true option to first generate the raw HTML in order to sanitize it.

All sanization rules are defined in the ammonia docs. For example, the link in the example below was marked as noopener noreferrer to prevent attacks.

If those rules are too strict and you really trust the input, or you really need to render raw HTML, then you can just render it directly without escaping nor sanitizing:

MDEx.to_html!("<script>alert('hello')</script>", render: [unsafe_: true])
"<script>alert('hello')</script>\n"

Parsing

Converts Markdown to an AST data structure that can be inspected and manipulated to change the content of the document.

The data structure shape is exactly the same as the one used by Floki so we can reuse the same APIs and keep the same mental model when working with these documents, either Markdown or HTML, where each node is represented as:

{name, attributes, children}

Example:

MDEx.parse_document!("# Hello")
[{"document", [], [{"heading", [{"level", 1}, {"setext", false}], ["Hello"]}]}]

Note that text nodes have no attributes nor children, so it's represented as a string inside a list.

You can find the full AST spec on the MDEx module types section.

Formatting

Converts the AST to a human-readable document, most commonly to HTML, example:

MDEx.to_html!([{"document", [], [{"heading", [{"level", 1}, {"setext", false}], ["Hello"]}]}])
"<h1>Hello</h1>\n"

More formats can be added in the future.

Any missing attribute will be filled with the default value, and extra attributes will be ignored. So you could have the same result with:

MDEx.to_html!([{"document", [], [{"heading", [], ["Hello"]}]}])
"<h1>Hello</h1>\n"

Default values are defined on a best-case scenario but as a good practice you should provide all attributes for each node.

Trying to format malformed ASTs will return a {:error, %DecodeError{}} describing what and where the error occurred, for example:

{:error, decode_error} = MDEx.to_html([{"code", [{1, "foo"}], []}], [])
{:error,
 %MDEx.DecodeError{
   reason: :attr_key_not_string,
   found: "1",
   node: "(<<\"code\">>, [{1,<<\"foo\">>}], [])",
   attr: "(1, <<\"foo\">>)",
   kind: "Integer"
 }}

decode_error |> Exception.message() |> IO.puts()
# invalid attribute key
#
# Expected an attribute key encoded as UTF-8 binary
#
# Got:
#
#   1
#
# Type:
#
#   Integer
#
# In this node:
#
#   (<<"code">>, [{1,<<"foo">>}], [])
#
# In this attribute:
#
#   (1, <<"foo">>)

Options

You can enable extensions and change the output of the generated Markdown by passing any of the available Comrak Options as keyword lists or also an additional :features option.

The full documentation and list of all options with description and examples can be found on the links below:

Features Options

  • :sanitize (defaults to false) - sanitize output using ammonia. See the Safety section for more info.
  • :syntax_highlight_theme (defaults to "onedark") - syntax highlight code fences using autumn themes, you should pass the filename without special chars and without extension, for example you should pass syntax_highlight_theme: "adwaita_dark" to use the Adwaita Dark theme
  • :syntax_highlight_inline_style (defaults to true) - embed styles in the output for each generated token. You'll need to serve CSS themes if inline styles are disabled to properly highlight code

See some examples below on how to use the provided options:

GitHub Flavored Markdown with emojis

MDEx.to_html!(
  ~S"""
  # GitHub Flavored Markdown :rocket:

  - [x] Task A
  - [x] Task B
  - [ ] Task C

  | Feature | Status |
  | ------- | ------ |
  | Fast | :white_check_mark: |
  | GFM  | :white_check_mark: |

  Check out the spec at https://github.github.com/gfm/
  """,
  extension: [
    strikethrough: true,
    tagfilter: true,
    table: true,
    autolink: true,
    tasklist: true,
    footnotes: true,
    shortcodes: true,
  ],
  parse: [
    smart: true,
    relaxed_tasklist_matching: true,
    relaxed_autolinks: true
  ],
  render: [
     github_pre_lang: true,
     escape: true
  ]
) |> IO.puts()
# <p>GitHub Flavored Markdown 🚀</p>
# <ul>
#   <li><input type="checkbox" checked="" disabled="" /> Task A</li>
#   <li><input type="checkbox" checked="" disabled="" /> Task B</li>
#   <li><input type="checkbox" disabled="" /> Task C</li>
# </ul>
# <table>
#   <thead>
#     <tr>
#       <th>Feature</th>
#       <th>Status</th>
#     </tr>
#   </thead>
#   <tbody>
#     <tr>
#       <td>Fast</td>
#       <td>✅</td>
#     </tr>
#     <tr>
#       <td>GFM</td>
#       <td>✅</td>
#     </tr>
#   </tbody>
# </table>
# <p>Check out the spec at <a href="https://github.github.com/gfm/">https://github.github.com/gfm/</a></p>

Code Syntax Highlighting

MDEx.to_html!(~S"""
```elixir
String.upcase("elixir")
```
""",
features: [syntax_highlight_theme: "catppuccin_latte"]
) |> IO.puts()
# <pre class=\"autumn highlight\" style=\"background-color: #282C34; color: #ABB2BF;\">
#   <code class=\"language-elixir\" translate=\"no\">
#     <span class=\"namespace\" style=\"color: #61AFEF;\">String</span><span class=\"operator\" style=\"color: #C678DD;\">.</span><span class=\"function\" style=\"color: #61AFEF;\">upcase</span><span class=\"\" style=\"color: #ABB2BF;\">(</span><span class=\"string\" style=\"color: #98C379;\">&quot;elixir&quot;</span><span class=\"\" style=\"color: #ABB2BF;\">)</span>
#   </code>
# </pre>

Demo and Samples

A livebook and a script are available to play with and experiment with this library, or you can check out all available samples at https://mdex-c31.pages.dev

Used By

Are you using MDEx and want to list your project here? Please send a PR!

Benchmark

A simple script is available to compare existing libs:

Name              ips        average  deviation         median         99th %
cmark         22.82 K      0.0438 ms    ±16.24%      0.0429 ms      0.0598 ms
mdex           3.57 K        0.28 ms     ±9.79%        0.28 ms        0.33 ms
md             0.34 K        2.95 ms    ±10.56%        2.90 ms        3.62 ms
earmark        0.25 K        4.04 ms     ±4.50%        4.00 ms        4.44 ms

Comparison:
cmark         22.82 K
mdex           3.57 K - 6.39x slower +0.24 ms
md             0.34 K - 67.25x slower +2.90 ms
earmark        0.25 K - 92.19x slower +4.00 ms

Motivation

MDEx was born out of the necessity of parsing CommonMark files, to parse hundreds of files quickly, and to be easily extensible by consumer of the library.

  • earmark is extensible but can't parse all kinds of documents and is slow to convert hundreds of markdowns.
  • md is very extensible but the doc says "If one needs to perfectly parse the common markdown, Md is probably not the correct choice" and CommonMark was a requirement to parse many existing files.
  • markdown is not precompiled and has not received updates in a while.
  • cmark is a fast CommonMark parser but it requires compiling the C library, is hard to extend, and was archived on Apr 2024

Note that MDEx is the only one that syntax highlights out-of-the-box which contributes to make it slower than cmark.

To finish, a friendly reminder that all libs have their own strengths and trade-offs so use the one that better suit your needs.

Looking for help with your Elixir project?

DockYard logo

At DockYard we are ready to help you build your next Elixir project. We have a unique expertise in Elixir and Phoenix development that is unmatched and we love to write about Elixir.

Have a project in mind? Get in touch!

Acknowledgements

Summary

Types

The AST (Abstract Syntax Tree) representation of a Markdown document.

Attributes of node elements are key-value pairs where key is always a string and value can be of multiple different types.

Elements are composed by a name, a list of attributes and a list of children.

Each node of the AST document.

Text element. It has no attributes or children so it's represented just as a string.

Functions

Returns a list with attribute values for a given attribute_name, otherwise returns an empty list.

Parse markdown and returns the AST.

Same as parse_document/2 but raises if the parsing fails.

Convert an AST to CommonMark using default options.

Convert an AST to CommonMark with custom options.

Same as to_commonmark/1 but raises MDEx.DecodeError if the conversion fails.

Same as to_commonmark/2 but raises MDEx.DecodeError if the conversion fails.

Convert either markdown or an AST to HTML using default options.

Convert markdown to HTML with custom opts.

Same as to_html/1 but raises MDEx.DecodeError if the conversion fails.

Same as to_html/2 but raises MDEx.DecodeError if the conversion fails.

Traverses and updates a Markdown AST.

Types

@type md_ast() :: [md_node()]

The AST (Abstract Syntax Tree) representation of a Markdown document.

It's composed by a list of nodes starting with the document root node.

Example

[
  {"document", [], [
    {"heading", [{"level", 1}], ["Elixir"]}
  ]}
]

See md_node/0 for more info.

@type md_attribute() :: {String.t(), term()}

Attributes of node elements are key-value pairs where key is always a string and value can be of multiple different types.

Examples

{"level", 1}
{"delimiter", "period"}
@type md_element() ::
  {name :: String.t(), attributes :: [md_attribute()], children :: [md_node()]}

Elements are composed by a name, a list of attributes and a list of children.

Example

{"heading", [{"level", 1}, {"setext", false}], children}
@type md_node() :: md_element() | md_text()

Each node of the AST document.

Represented either as a tuple or a string.

@type md_text() :: String.t()

Text element. It has no attributes or children so it's represented just as a string.

Functions

Link to this function

attribute(ast_or_node, attribute_name)

View Source
@spec attribute(md_ast() | md_node(), String.t()) :: list()

Returns a list with attribute values for a given attribute_name, otherwise returns an empty list.

Example

iex> MDEx.attribute({
...>   "code_block",
...>   [{"info", "mermaid"}, {"literal", "graph TD;\n  A-->B;"}],
...>   []
...> }, "literal")
["graph TD;\n  A-->B;"]

iex> MDEx.attribute({
...>   "code_block",
...>   [{"info", "mermaid"}, {"literal", "graph TD;\n  A-->B;"}],
...>   []
...> }, "other")
[]
Link to this function

parse_document(markdown, opts \\ [])

View Source
@spec parse_document(
  String.t(),
  keyword()
) :: {:ok, md_ast()} | {:error, term()}

Parse markdown and returns the AST.

Options

See the Options section for the available options.

Examples

iex> MDEx.parse_document!("# Languages\n Elixir and Rust")
[
  {"document", [], [
    {"heading", [{"level", 1}, {"setext", false}], ["Languages"]},
    {"paragraph", [], ["Elixir and Rust"]}
  ]}
]

iex> MDEx.parse_document!("Darth Vader is ||Luke's father||", extension: [spoiler: true])
[
  {"document", [], [
    {"paragraph", [], [
      "Darth Vader is ",
      {"spoilered_text", [], ["Luke's father"]}
  ]}]}
]
Link to this function

parse_document!(markdown, opts \\ [])

View Source
@spec parse_document!(
  String.t(),
  keyword()
) :: md_ast()

Same as parse_document/2 but raises if the parsing fails.

@spec to_commonmark(ast :: md_ast()) ::
  {:ok, String.t()} | {:error, MDEx.DecodeError.t()}

Convert an AST to CommonMark using default options.

To customize the output, use to_commonmark/2.

Example

iex> MDEx.to_commonmark([{"document", [], [{"heading", [{"level", 3}], ["Hello"]}]}])
{:ok, "### Hello\n"}
Link to this function

to_commonmark(ast, opts)

View Source
@spec to_commonmark(
  ast :: md_ast(),
  keyword()
) :: {:ok, String.t()} | {:error, MDEx.DecodeError.t()}

Convert an AST to CommonMark with custom options.

Options

See the Options section for the available options.

@spec to_commonmark!(ast :: md_ast()) :: String.t()

Same as to_commonmark/1 but raises MDEx.DecodeError if the conversion fails.

Example

iex> MDEx.to_commonmark([{"document", [], [{"heading", [{}], ["Hello"]}]}])
{:error,
 %MDEx.DecodeError{
   reason: :missing_attr_field,
   found: "[{}]",
   node: "(<<\"heading\">>, [{}], [<<\"Hello\">>])",
   attr: nil,
   kind: nil
 }}
Link to this function

to_commonmark!(ast, opts)

View Source
@spec to_commonmark!(
  ast :: md_ast(),
  keyword()
) :: String.t()

Same as to_commonmark/2 but raises MDEx.DecodeError if the conversion fails.

@spec to_html(md_or_ast :: String.t() | md_ast()) ::
  {:ok, String.t()} | {:error, MDEx.DecodeError.t()}

Convert either markdown or an AST to HTML using default options.

To customize the output, use to_html/2.

Examples

iex> MDEx.to_html("# MDEx")
{:ok, "<h1>MDEx</h1>\n"}

iex> MDEx.to_html("Implemented with:\n1. Elixir\n2. Rust")
{:ok, "<p>Implemented with:</p>\n<ol>\n<li>Elixir</li>\n<li>Rust</li>\n</ol>\n"}

iex> MDEx.to_html([{"document", [], [{"heading", [{"level", 3}], ["MDEx"]}]}])
{:ok, "<h3>MDEx</h3>\n"}
Link to this function

to_html(md_or_ast, opts)

View Source
@spec to_html(
  md_or_ast :: String.t() | md_ast(),
  keyword()
) :: String.t()

Convert markdown to HTML with custom opts.

Options

See the Options section for the available options.

Examples

iex> MDEx.to_html("Hello ~world~ there", extension: [strikethrough: true])
{:ok, "<p>Hello <del>world</del> there</p>\n"}

iex> MDEx.to_html("<marquee>visit https://https://beaconcms.org</marquee>", extension: [autolink: true], render: [unsafe_: true])
{:ok, "<p><marquee>visit <a href=\"https://https://beaconcms.org\">https://https://beaconcms.org</a></marquee></p>\n"}
@spec to_html!(md_or_ast :: String.t() | md_ast()) :: String.t()

Same as to_html/1 but raises MDEx.DecodeError if the conversion fails.

Link to this function

to_html!(md_or_ast, opts)

View Source
@spec to_html!(
  md_or_ast :: String.t() | md_ast(),
  keyword()
) :: String.t()

Same as to_html/2 but raises MDEx.DecodeError if the conversion fails.

Link to this function

traverse_and_update(ast, fun)

View Source
@spec traverse_and_update(
  md_node() | md_ast(),
  (md_node() -> md_node() | [md_node()] | nil)
) :: md_node() | md_ast()

Traverses and updates a Markdown AST.

Example

iex> ast = [{"document", [], [{"heading", [{"level", 1}, {"setext", false}], ["Hello"]}]}]
iex> MDEx.traverse_and_update(ast, fn
...>   {"heading", _attrs, children} -> {"heading", [{"level", 2}], children}
...>   other -> other
...> end)
[{"document", [], [{"heading", [{"level", 2}], ["Hello"]}]}]

See more on the examples directory.