View Source MDEx (MDEx v0.3.0)
A CommonMark-compliant fast and extensible Markdown parser and formatter for Elixir.
Features
Compliant with CommonMark and GitHub Flavored Markdown specifications with extra extensions as Wiki Links, Discord Markdown tags, and emoji. Also supports syntax highlighting out-of-the-box using the Autumn library.
Under the hood it's calling the comrak APIs to process Markdown, a fast Rust crate that ports the cmark fork maintained by GitHub, a widely and well adopted Markdown implementation.
The AST structure is based on Floki so a similar API to manipulate HTML can be used to manipulate Markdown documents. Check out some examples at mdex/examples/
And some samples are available at https://mdex-c31.pages.dev
Installation
Add :mdex
dependency:
def deps do
[
{:mdex, "~> 0.2"}
]
end
Usage
Mix.install([{:mdex, "~> 0.2"}])
iex> MDEx.to_html!("# Hello")
"<h1>Hello</h1>"
iex> MDEx.to_html!("# Hello :smile:", extension: [shortcodes: true])
"<h1>Hello 😄</h1>"
Sigils
Convert and generate AST, Markdown (CommonMark), HTML, and XML formats.
First, import the sigils:
iex> import MDEx.Sigil
iex> import MDEx.Sigil
iex> ~M|# Hello from `~M` sigil|
%MDEx.Document{
nodes: [
%MDEx.Heading{
nodes: [
%MDEx.Text{literal: "Hello from "},
%MDEx.Code{num_backticks: 1, literal: "~M"},
%MDEx.Text{literal: " sigil"}
],
level: 1,
setext: false
}
]
}
iex> import MDEx.Sigil
iex> ~M|`~M` also converts to HTML format|HTML
"<p><code>~M</code> also converts to HTML format</p>"
iex> import MDEx.Sigil
iex> ~M|and to XML as well|XML
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE document SYSTEM \"CommonMark.dtd\">\n<document xmlns=\"http://commonmark.org/xml/1.0\">\n <paragraph>\n <text xml:space=\"preserve\">and to XML as well</text>\n </paragraph>\n</document>\n"
Use ~m to interpolate variables:
iex> import MDEx.Sigil
iex> lang = :elixir
iex> ~m|`lang = #{inspect(lang)}`|
%MDEx.Document{nodes: [%MDEx.Paragraph{nodes: [%MDEx.Code{num_backticks: 1, literal: "lang = :elixir"}]}]}
See more info at https://hexdocs.pm/mdex/MDEx.Sigil.html
Safety
For security reasons, every piece of raw HTML is omitted from the output by default:
iex> MDEx.to_html!("<h1>Hello</h1>")
"<!-- raw HTML omitted -->"
That's not very useful for most cases, but you can render raw HTML and escape it instead:
iex> MDEx.to_html!("<h1>Hello</h1>", render: [escape: true])
"<h1>Hello</h1>"
If the input is provided by external sources, it might be a good idea to sanitize it instead for extra security:
iex> MDEx.to_html!("<a href=https://elixir-lang.org/>Elixir</a>", render: [unsafe_: true], features: [sanitize: true])
"<p><a href=\"https://elixir-lang.org/\" rel=\"noopener noreferrer\">Elixir</a></p>"
Note that you must pass the unsafe_: true
option to first generate the raw HTML in order to sanitize it.
All sanitization rules are defined in the ammonia docs.
For example, the link in the example below was marked as noopener noreferrer
to prevent attacks.
If those rules are too strict and you really trust the input, or you really need to render raw HTML, then you can just render it directly without escaping nor sanitizing:
iex> MDEx.to_html!("<script>alert('hello')</script>", render: [unsafe_: true])
"<script>alert('hello')</script>"
Parsing
Converts Markdown to an AST data structure that can be inspected and manipulated to change the content of the document programmatically.
The data structure format is inspired on Floki (with :attributes_as_maps = true
) so we can keep similar APIs and keep the same mental model when
working with these documents, either Markdown or HTML, where each node is represented as a struct holding the node name as the struct name and its attributes and children, for eg:
%MDEx.Heading{
level: 1
nodes: [...],
}
The parent node that represents the root of the document is the MDEx.Document struct, where you can find more more information about the AST and what operations are available.
The complete list of nodes is listed in the documentation, section Document Nodes
.
Formatting
Formatting is the process of converting from one format to another, for example from AST or Markdown to HTML. Formatting to XML and to Markdown is also supported.
You can use MDEx.parse_document/2 to generate an AST or any of the to_*
functions
to convert to Markdown (CommonMark), HTML, or XML.
Options
Use options to change the behavior and the generated output.
All the comrak Options are available as keyword lists,
and an additional :features
option to extend it further.
The full documentation and list of all options with description and examples can be found on the links below:
:extension
- https://docs.rs/comrak/latest/comrak/struct.ExtensionOptions.html:parse
- https://docs.rs/comrak/latest/comrak/struct.ParseOptions.html:render
- https://docs.rs/comrak/latest/comrak/struct.RenderOptions.html:features
- see the available options below
Features Options
:sanitize
(defaults tofalse
) - sanitize output using ammonia. See the Safety section for more info.:syntax_highlight_theme
(defaults to"onedark"
) - syntax highlight code fences using autumn themes, you should pass the filename without special chars and without extension, for example you should passsyntax_highlight_theme: "adwaita_dark"
to use the Adwaita Dark theme:syntax_highlight_inline_style
(defaults totrue
) - embed styles in the output for each generated token. You'll need to serve CSS themes if inline styles are disabled to properly highlight code
See some examples below on how to use the provided options:
GitHub Flavored Markdown with emojis
MDEx.to_html!(~S"""
# GitHub Flavored Markdown :rocket:
- [x] Task A
- [x] Task B
- [ ] Task C
| Feature | Status |
| ------- | ------ |
| Fast | :white_check_mark: |
| GFM | :white_check_mark: |
Check out the spec at https://github.github.com/gfm/
""",
extension: [
strikethrough: true,
tagfilter: true,
table: true,
autolink: true,
tasklist: true,
footnotes: true,
shortcodes: true,
],
parse: [
smart: true,
relaxed_tasklist_matching: true,
relaxed_autolinks: true
],
render: [
github_pre_lang: true,
unsafe_: true,
],
features: [
sanitize: true
]) |> IO.puts()
"""
<p>GitHub Flavored Markdown 🚀</p>
<ul>
<li><input type="checkbox" checked="" disabled="" /> Task A</li>
<li><input type="checkbox" checked="" disabled="" /> Task B</li>
<li><input type="checkbox" disabled="" /> Task C</li>
</ul>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fast</td>
<td>✅</td>
</tr>
<tr>
<td>GFM</td>
<td>✅</td>
</tr>
</tbody>
</table>
<p>Check out the spec at <a href="https://github.github.com/gfm/">https://github.github.com/gfm/</a></p>
"""
Code Syntax Highlighting
MDEx.to_html!(~S"""
```elixir
String.upcase("elixir")
```
""",
features: [syntax_highlight_theme: "catppuccin_latte"]
) |> IO.puts()
"""
<pre class=\"autumn highlight\" style=\"background-color: #282C34; color: #ABB2BF;\">
<code class=\"language-elixir\" translate=\"no\">
<span class=\"namespace\" style=\"color: #61AFEF;\">String</span><span class=\"operator\" style=\"color: #C678DD;\">.</span><span class=\"function\" style=\"color: #61AFEF;\">upcase</span><span class=\"\" style=\"color: #ABB2BF;\">(</span><span class=\"string\" style=\"color: #98C379;\">"elixir"</span><span class=\"\" style=\"color: #ABB2BF;\">)</span>
</code>
</pre>
"""
Demo and Samples
A livebook and a script are available to play with and experiment with this library, or you can check out all available samples at https://mdex-c31.pages.dev
Used By
Are you using MDEx and want to list your project here? Please send a PR!
Benchmark
A simple script is available to compare existing libs:
Name ips average deviation median 99th %
cmark 22.82 K 0.0438 ms ±16.24% 0.0429 ms 0.0598 ms
mdex 3.57 K 0.28 ms ±9.79% 0.28 ms 0.33 ms
md 0.34 K 2.95 ms ±10.56% 2.90 ms 3.62 ms
earmark 0.25 K 4.04 ms ±4.50% 4.00 ms 4.44 ms
Comparison:
cmark 22.82 K
mdex 3.57 K - 6.39x slower +0.24 ms
md 0.34 K - 67.25x slower +2.90 ms
earmark 0.25 K - 92.19x slower +4.00 ms
Motivation
MDEx was born out of the necessity of parsing CommonMark files, to parse hundreds of files quickly, and to be easily extensible by consumers of the library.
- earmark is extensible but can't parse all kinds of documents and is slow to convert hundreds of markdowns.
- md is very extensible but the doc says "If one needs to perfectly parse the common markdown, Md is probably not the correct choice" and CommonMark was a requirement to parse many existing files.
- markdown is not precompiled and has not received updates in a while.
- cmark is a fast CommonMark parser but it requires compiling the C library, is hard to extend, and was archived on Apr 2024
Note that MDEx is the only one that syntax highlights out-of-the-box which contributes to make it slower than cmark.
To finish, a friendly reminder that all libs have their own strengths and trade-offs so use the one that better suit your needs.
Looking for help with your Elixir project?
At DockYard we are ready to help you build your next Elixir project. We have a unique expertise in Elixir and Phoenix development that is unmatched and we love to write about Elixir.
Have a project in mind? Get in touch!
Acknowledgements
- comrak crate for all the heavy work on parsing Markdown and rendering HTML
- Floki for the AST manipulation
- Logo created by Freepik - Flaticon
- Logo font designed by Alan Greene
Summary
Functions
Parse a markdown
string and returns a MDEx.Document
.
Same as parse_document/2
but raises if the parsing fails.
Parse a markdown
string and returns only the node that represents the fragment.
Same as parse_fragment/2
but raises if the parsing fails.
Convert an AST to CommonMark using default options.
Convert an AST to CommonMark with custom options.
Same as to_commonmark/1
but raises MDEx.DecodeError
if the conversion fails.
Same as to_commonmark/2
but raises MDEx.DecodeError
if the conversion fails.
Convert Markdown or MDEx.Document
to HTML using default options.
Convert Markdown or MDEx.Document
to HTML using custom options.
Same as to_html/1
but raises an error if the conversion fails.
Same as to_html/2
but raises error if the conversion fails.
Convert Markdown or MDEx.Document
to XML using default options.
Convert Markdown or MDEx.Document
to XML using custom options.
Same as to_xml/1
but raises an error if the conversion fails.
Same as to_xml/2
but raises error if the conversion fails.
Traverse and update the Markdown document preserving the tree structure format.
Traverse and update the Markdown document preserving the tree structure format and keeping an accumulator.
Functions
@spec parse_document( String.t(), keyword() ) :: {:ok, MDEx.Document.t()} | {:error, term()}
Parse a markdown
string and returns a MDEx.Document
.
Options
See the Options section for the available opts
.
Examples
iex> MDEx.parse_document!("""
...> # Languages
...>
...> - Elixir
...> - Rust
...> """)
%MDEx.Document{
nodes: [
%MDEx.Heading{nodes: [%MDEx.Text{literal: "Languages"}], level: 1, setext: false},
%MDEx.List{
nodes: [
%MDEx.ListItem{
nodes: [%MDEx.Paragraph{nodes: [%MDEx.Text{literal: "Elixir"}]}],
list_type: :bullet,
marker_offset: 0,
padding: 2,
start: 1,
delimiter: :period,
bullet_char: "-",
tight: false
},
%MDEx.ListItem{
nodes: [%MDEx.Paragraph{nodes: [%MDEx.Text{literal: "Rust"}]}],
list_type: :bullet,
marker_offset: 0,
padding: 2,
start: 1,
delimiter: :period,
bullet_char: "-",
tight: false
}
],
list_type: :bullet,
marker_offset: 0,
padding: 2,
start: 1,
delimiter: :period,
bullet_char: "-",
tight: true
}
]
}
iex> MDEx.parse_document!("Darth Vader is ||Luke's father||", extension: [spoiler: true])
%MDEx.Document{
nodes: [
%MDEx.Paragraph{
nodes: [
%MDEx.Text{literal: "Darth Vader is "},
%MDEx.SpoileredText{nodes: [%MDEx.Text{literal: "Luke's father"}]}
]
}
]
}
@spec parse_document!( String.t(), keyword() ) :: MDEx.Document.t()
Same as parse_document/2
but raises if the parsing fails.
@spec parse_fragment( String.t(), keyword() ) :: {:ok, MDEx.Document.md_node()} | nil
Parse a markdown
string and returns only the node that represents the fragment.
Usually that means filtering out the parent document and paragraphs.
That's useful to generate fragment nodes and inject them into the document when you're manipulating it.
Use parse_document/2
to generate a complete document.
Experimental
Consider this function experimental and subject to change.
Examples
iex> MDEx.parse_fragment("# Elixir")
{:ok, %MDEx.Heading{nodes: [%MDEx.Text{literal: "Elixir"}], level: 1, setext: false}}
iex> MDEx.parse_fragment("<h1>Elixir</h1>")
{:ok, %MDEx.HtmlBlock{nodes: [], block_type: 6, literal: "<h1>Elixir</h1>\n"}}
Same as parse_fragment/2
but raises if the parsing fails.
Experimental
Consider this function experimental and subject to change.
@spec to_commonmark(MDEx.Document.t()) :: {:ok, String.t()} | {:error, MDEx.DecodeError.t()}
Convert an AST to CommonMark using default options.
To customize the output, use to_commonmark/2
.
Example
iex> MDEx.to_commonmark(%MDEx.Document{nodes: [%MDEx.Heading{nodes: [%MDEx.Text{literal: "Hello"}], level: 3, setext: false}]})
{:ok, "### Hello"}
@spec to_commonmark( MDEx.Document.t(), keyword() ) :: {:ok, String.t()} | {:error, MDEx.DecodeError.t()}
Convert an AST to CommonMark with custom options.
Options
See the Options section for the available options.
@spec to_commonmark!(MDEx.Document.t()) :: String.t()
Same as to_commonmark/1
but raises MDEx.DecodeError
if the conversion fails.
@spec to_commonmark!( MDEx.Document.t(), keyword() ) :: String.t()
Same as to_commonmark/2
but raises MDEx.DecodeError
if the conversion fails.
@spec to_html(md_or_doc :: String.t() | MDEx.Document.t()) :: {:ok, String.t()} | {:error, MDEx.DecodeError.t()} | {:error, MDEx.InvalidInputError.t()}
Convert Markdown or MDEx.Document
to HTML using default options.
Use to_html/2
to pass options and customize the generated HTML.
Examples
iex> MDEx.to_html("# MDEx")
{:ok, "<h1>MDEx</h1>"}
iex> MDEx.to_html("Implemented with:\n1. Elixir\n2. Rust")
{:ok, "<p>Implemented with:</p>\n<ol>\n<li>Elixir</li>\n<li>Rust</li>\n</ol>"}
iex> MDEx.to_html(%MDEx.Document{nodes: [%MDEx.Heading{nodes: [%MDEx.Text{literal: "MDEx"}], level: 3, setext: false}]})
{:ok, "<h3>MDEx</h3>"}
Fragments of a document are also supported:
iex> MDEx.to_html(%MDEx.Paragraph{nodes: [%MDEx.Text{literal: "MDEx"}]})
{:ok, "<p>MDEx</p>"}
@spec to_html(md_or_doc :: String.t() | MDEx.Document.t(), opts :: keyword()) :: {:ok, String.t()} | {:error, MDEx.DecodeError.t()} | {:error, MDEx.InvalidInputError.t()}
Convert Markdown or MDEx.Document
to HTML using custom options.
Options
See the Options section for the available options.
Examples
iex> MDEx.to_html("Hello ~world~ there", extension: [strikethrough: true])
{:ok, "<p>Hello <del>world</del> there</p>"}
iex> MDEx.to_html("<marquee>visit https://beaconcms.org</marquee>", extension: [autolink: true], render: [unsafe_: true])
{:ok, "<p><marquee>visit <a href=\"https://beaconcms.org\">https://beaconcms.org</a></marquee></p>"}
@spec to_html!(md_or_doc :: String.t() | MDEx.Document.t()) :: String.t()
Same as to_html/1
but raises an error if the conversion fails.
@spec to_html!(md_or_doc :: String.t() | MDEx.Document.t(), opts :: keyword()) :: String.t()
Same as to_html/2
but raises error if the conversion fails.
@spec to_xml(md_or_doc :: String.t() | MDEx.Document.t()) :: {:ok, String.t()} | {:error, MDEx.DecodeError.t()} | {:error, MDEx.InvalidInputError.t()}
Convert Markdown or MDEx.Document
to XML using default options.
Use to_xml/2
to pass options and customize the generated XML.
Examples
iex> {:ok, xml} = MDEx.to_xml("# MDEx")
iex> xml
"""
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<heading level="1">
<text xml:space="preserve">MDEx</text>
</heading>
</document>
"""
iex> {:ok, xml} = MDEx.to_xml("Implemented with:\n1. Elixir\n2. Rust")
iex> xml
"""
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<paragraph>
<text xml:space="preserve">Implemented with:</text>
</paragraph>
<list type="ordered" start="1" delim="period" tight="true">
<item>
<paragraph>
<text xml:space="preserve">Elixir</text>
</paragraph>
</item>
<item>
<paragraph>
<text xml:space="preserve">Rust</text>
</paragraph>
</item>
</list>
</document>
"""
iex> {:ok, xml} = MDEx.to_xml(%MDEx.Document{nodes: [%MDEx.Heading{nodes: [%MDEx.Text{literal: "MDEx"}], level: 3, setext: false}]})
iex> xml
"""
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<heading level="3">
<text xml:space="preserve">MDEx</text>
</heading>
</document>
"""
Fragments of a document are also supported:
iex> {:ok, xml} = MDEx.to_xml(%MDEx.Paragraph{nodes: [%MDEx.Text{literal: "MDEx"}]})
iex> xml
"""
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<paragraph>
<text xml:space="preserve">MDEx</text>
</paragraph>
</document>
"""
@spec to_xml(md_or_doc :: String.t() | MDEx.Document.t(), opts :: keyword()) :: {:ok, String.t()} | {:error, MDEx.DecodeError.t()} | {:error, MDEx.InvalidInputError.t()}
Convert Markdown or MDEx.Document
to XML using custom options.
Options
See the Options section for the available options.
Examples
iex> {:ok, xml} = MDEx.to_xml("Hello ~world~ there", extension: [strikethrough: true])
iex> xml
"""
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<paragraph>
<text xml:space="preserve">Hello </text>
<strikethrough>
<text xml:space="preserve">world</text>
</strikethrough>
<text xml:space="preserve"> there</text>
</paragraph>
</document>
"""
iex> {:ok, xml} = MDEx.to_xml("<marquee>visit https://beaconcms.org</marquee>", extension: [autolink: true], render: [unsafe_: true])
iex> xml
"""
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<paragraph>
<html_inline xml:space="preserve"><marquee></html_inline>
<text xml:space="preserve">visit </text>
<link destination="https://beaconcms.org" title="">
<text xml:space="preserve">https://beaconcms.org</text>
</link>
<html_inline xml:space="preserve"></marquee></html_inline>
</paragraph>
</document>
"""
@spec to_xml!(md_or_doc :: String.t() | MDEx.Document.t()) :: String.t()
Same as to_xml/1
but raises an error if the conversion fails.
@spec to_xml!(md_or_doc :: String.t() | MDEx.Document.t(), opts :: keyword()) :: String.t()
Same as to_xml/2
but raises error if the conversion fails.
@spec traverse_and_update(MDEx.Document.t(), (MDEx.Document.md_node() -> MDEx.Document.md_node())) :: MDEx.Document.t()
Traverse and update the Markdown document preserving the tree structure format.
Examples
Traverse an entire Markdown document:
iex> import MDEx.Sigil
iex> doc = ~M"""
...> # Languages
...>
...> `elixir`
...>
...> `rust`
...> """
iex> MDEx.traverse_and_update(doc, fn
...> %MDEx.Code{literal: "elixir"} = node -> %{node | literal: "ex"}
...> %MDEx.Code{literal: "rust"} = node -> %{node | literal: "rs"}
...> node -> node
...> end)
%MDEx.Document{
nodes: [
%MDEx.Heading{nodes: [%MDEx.Text{literal: "Languages"}], level: 1, setext: false},
%MDEx.Paragraph{nodes: [%MDEx.Code{num_backticks: 1, literal: "ex"}]},
%MDEx.Paragraph{nodes: [%MDEx.Code{num_backticks: 1, literal: "rs"}]}
]
}
Or fragments of a document:
iex> fragment = MDEx.parse_fragment!("Lang: `elixir`")
iex> MDEx.traverse_and_update(fragment, fn
...> %MDEx.Code{literal: "elixir"} = node -> %{node | literal: "ex"}
...> node -> node
...> end)
%MDEx.Paragraph{nodes: [%MDEx.Text{literal: "Lang: "}, %MDEx.Code{num_backticks: 1, literal: "ex"}]}
@spec traverse_and_update(MDEx.Document.t(), term(), (MDEx.Document.md_node() -> MDEx.Document.md_node())) :: MDEx.Document.t()
Traverse and update the Markdown document preserving the tree structure format and keeping an accumulator.
Example
iex> import MDEx.Sigil
iex> doc = ~M"""
...> # Languages
...>
...> `elixir`
...>
...> `rust`
...> """
iex> MDEx.traverse_and_update(doc, 0, fn
...> %MDEx.Code{literal: "elixir"} = node, acc -> {%{node | literal: "ex"}, acc + 1}
...> %MDEx.Code{literal: "rust"} = node, acc -> {%{node | literal: "rs"}, acc + 1}
...> node, acc -> {node, acc}
...> end)
{%MDEx.Document{
nodes: [
%MDEx.Heading{nodes: [%MDEx.Text{literal: "Languages"}], level: 1, setext: false},
%MDEx.Paragraph{nodes: [%MDEx.Code{num_backticks: 1, literal: "ex"}]},
%MDEx.Paragraph{nodes: [%MDEx.Code{num_backticks: 1, literal: "rs"}]}
]
}, 2}
Also works with fragments.