MDEx.Document (MDEx v0.8.5)

View Source

Tree representation of a Markdown document.

%MDEx.Document{
  nodes: [
    %MDEx.Paragraph{
      nodes: [
        %MDEx.Code{num_backticks: 1, literal: "Elixir"}
      ]
    }
  ]
}

Each node may contain attributes and children nodes as in the example above where MDEx.Document contains a MDEx.Paragraph node which contains a MDEx.Code node with the attributes :num_backticks and :literal.

You can check out each node's documentation in the Document Nodes section, for example MDEx.HtmlBlock.

The MDEx.Document module represents the root of a document and implements several behaviours and protocols to enable operations to fetch, update, and manipulate the document tree.

In these examples we will be using the ~MD sigil.

Tree Traversal

Understanding tree traversal is fundamental to working with MDEx documents, as it affects how all Enum functions, Access operations, and other protocols behave.

The document tree is enumerated using depth-first pre-order traversal. This means:

  1. The parent node is visited first
  2. Then each child node is visited recursively
  3. Children are processed in the order they appear in the :nodes list

This traversal order affects all Enum functions, including Enum.at/2, Enum.map/2, Enum.find/2, etc.

iex> doc = ~MD[# Hello]
iex> Enum.at(doc, 0)
%MDEx.Document{nodes: [%MDEx.Heading{nodes: [%MDEx.Text{literal: "Hello"}], level: 1, setext: false}]}
iex> Enum.at(doc, 1)
%MDEx.Heading{nodes: [%MDEx.Text{literal: "Hello"}], level: 1, setext: false}
iex> Enum.at(doc, 2)
%MDEx.Text{literal: "Hello"}

More complex traversal with nested elements:

iex> doc = ~MD[**bold** text]
iex> Enum.at(doc, 0)
%MDEx.Document{nodes: [%MDEx.Paragraph{nodes: [%MDEx.Strong{nodes: [%MDEx.Text{literal: "bold"}]}, %MDEx.Text{literal: " text"}]}]}
iex> Enum.at(doc, 1)
%MDEx.Paragraph{nodes: [%MDEx.Strong{nodes: [%MDEx.Text{literal: "bold"}]}, %MDEx.Text{literal: " text"}]}
iex> Enum.at(doc, 2)
%MDEx.Strong{nodes: [%MDEx.Text{literal: "bold"}]}
iex> Enum.at(doc, 3)
%MDEx.Text{literal: "bold"}
iex> Enum.at(doc, 4)
%MDEx.Text{literal: " text"}

Enumerable

The Enumerable protocol allows us to call Enum functions to iterate over and manipulate the document tree. All enumeration follows the depth-first traversal order described above.

Count the nodes in a document:

iex> doc = ~MD"""
...> # Languages
...>
...> `elixir`
...>
...> `rust`
...> """
iex> Enum.count(doc)
7

Count how many nodes have the :literal attribute:

iex> doc = ~MD"""
...> # Languages
...>
...> `elixir`
...>
...> `rust`
...> """
iex> Enum.reduce(doc, 0, fn
...>   %{literal: _literal}, acc -> acc + 1
...>
...>   _node, acc -> acc
...> end)
3

Check if a node is member of the document:

iex> doc = ~MD"""
...> # Languages
...>
...> `elixir`
...>
...> `rust`
...> """
iex> Enum.member?(doc, %MDEx.Code{literal: "elixir", num_backticks: 1})
true

Map each node to its module name:

iex> doc = ~MD"""
...> # Languages
...>
...> `elixir`
...>
...> `rust`
...> """
iex> Enum.map(doc, fn %node{} -> inspect(node) end)
["MDEx.Document", "MDEx.Heading", "MDEx.Text", "MDEx.Paragraph", "MDEx.Code", "MDEx.Paragraph", "MDEx.Code"]

Collectable

The Collectable protocol allows you to build documents by collecting nodes or merging multiple documents together. This is particularly useful for programmatically constructing documents from various sources.

Merge two documents together using Enum.into/2:

iex> first_doc = ~MD[# First Document]
iex> second_doc = ~MD[# Second Document]
iex> Enum.into(second_doc, first_doc)
%MDEx.Document{
  nodes: [
    %MDEx.Heading{nodes: [%MDEx.Text{literal: "First Document"}], level: 1, setext: false},
    %MDEx.Heading{nodes: [%MDEx.Text{literal: "Second Document"}], level: 1, setext: false}
  ]
}

Collect individual nodes into a document:

iex> chunks = [
...>   %MDEx.Text{literal: "Hello "},
...>   %MDEx.Code{literal: "world", num_backticks: 1}
...> ]
iex> document = Enum.into(chunks, %MDEx.Document{})
%MDEx.Document{
  nodes: [
    %MDEx.Text{literal: "Hello "},
    %MDEx.Code{literal: "world", num_backticks: 1}
  ]
}
iex> MDEx.to_html!(document)
"Hello <code>world</code>"

Build a document incrementally by collecting mixed content:

iex> chunks = [
...>   %MDEx.Heading{nodes: [%MDEx.Text{literal: "Title"}], level: 1, setext: false},
...>   %MDEx.Paragraph{nodes: []},
...>   %MDEx.Text{literal: "Some text"},
...>   %MDEx.ListItem{nodes: [%MDEx.Text{literal: "Item 1"}]},
...>   %MDEx.Text{literal: " - WIP"},
...> ]
iex> document = Enum.into(chunks, %MDEx.Document{})
%MDEx.Document{
  nodes: [
    %MDEx.Heading{
      level: 1,
      nodes: [%MDEx.Text{literal: "Title"}],
      setext: false
    },
    %MDEx.Paragraph{
      nodes: [%MDEx.Text{literal: "Some text"}]
    },
    %MDEx.List{
      bullet_char: "-",
      delimiter: :period,
      is_task_list: false,
      list_type: :bullet,
      marker_offset: 0,
      nodes: [%MDEx.ListItem{nodes: [%MDEx.Text{literal: "Item 1 - WIP"}], list_type: :bullet, marker_offset: 0, padding: 2, start: 1, delimiter: :period, bullet_char: "-", tight: true, is_task_list: false}],
      padding: 2,
      start: 1,
      tight: true
    }
  ]
}
iex> MDEx.to_html!(document)
"<h1>Title</h1>\n<p>Some text</p>\n<ul>\n<li>Item 1 - WIP</li>\n</ul>"

Access

The Access behaviour gives you the ability to fetch and update nodes using different types of keys. Access operations also follow the depth-first traversal order when searching through nodes.

Access by Index

You can access nodes by their position in the depth-first traversal using integer indices:

iex> doc = ~MD[# Hello]
iex> doc[0]  # First node (the document itself)
%MDEx.Document{nodes: [%MDEx.Heading{nodes: [%MDEx.Text{literal: "Hello"}], level: 1, setext: false}]}
iex> doc[1]  # Second node (the heading)
%MDEx.Heading{nodes: [%MDEx.Text{literal: "Hello"}], level: 1, setext: false}
iex> doc[2]  # Third node (the text)
%MDEx.Text{literal: "Hello"}

Negative indices access nodes from the end:

iex> doc = ~MD[# Hello **world**]
iex> doc[-1]  # Last node
%MDEx.Text{literal: "world"}

Access by Node Type

Starting with a simple Markdown document, let's fetch only the text node by matching the MDEx.Text node:

iex> ~MD[# Hello][%MDEx.Text{literal: "Hello"}]
[%MDEx.Text{literal: "Hello"}]

That's essentially the same as:

doc = %MDEx.Document{nodes: [%MDEx.Heading{nodes: [%MDEx.Text{literal: "Hello"}], level: 1, setext: false}]},

Enum.filter(
  doc,
  fn node -> node == %MDEx.Text{literal: "Hello"} end
)

The key can also be modules, atoms, and even functions! For example:

Fetch all Code nodes, either by MDEx.Code module or the :code atom representing the Code node:

iex> doc = ~MD"""
...> # Languages
...>
...> `elixir`
...>
...> `rust`
...> """
iex> doc[MDEx.Code]
[%MDEx.Code{num_backticks: 1, literal: "elixir"}, %MDEx.Code{num_backticks: 1, literal: "rust"}]
iex> doc[:code]
[%MDEx.Code{num_backticks: 1, literal: "elixir"}, %MDEx.Code{num_backticks: 1, literal: "rust"}]

Dynamically fetch Code nodes where the :literal (node content) starts with "eli" using a function to filter the result:

iex> doc = ~MD"""
...> # Languages
...>
...> `elixir`
...>
...> `rust`
...> """
iex> doc[fn node -> String.starts_with?(Map.get(node, :literal, ""), "eli") end]
[%MDEx.Code{num_backticks: 1, literal: "elixir"}]

That's the most flexible option, in case struct, modules, or atoms are not enough to match the node you want.

The Access protocol also allows us to update nodes that match a selector. In the example below we'll capitalize the content of all MDEx.Code nodes:

iex> doc = ~MD"""
...> # Languages
...>
...> `elixir`
...>
...> `rust`
...>
...> Continue...
...> """
iex> update_in(doc, [:document, Access.key!(:nodes), Access.all(), :code, Access.key!(:literal)], fn literal ->
...>   String.upcase(literal)
...> end)
%MDEx.Document{
  nodes: [
    %MDEx.Heading{nodes: [%MDEx.Text{literal: "Languages"}], level: 1, setext: false},
    %MDEx.Paragraph{nodes: [%MDEx.Code{num_backticks: 1, literal: "ELIXIR"}]},
    %MDEx.Paragraph{nodes: [%MDEx.Code{num_backticks: 1, literal: "RUST"}]},
    %MDEx.Paragraph{nodes: [%MDEx.Text{literal: "Continue..."}]}
  ]
}

String.Chars

Calling Kernel.to_string/1 will format it as CommonMark text:

iex> to_string(~MD[# Hello])
"# Hello"

Fragments (nodes without the parent %Document{}) are also formatted:

iex> to_string(%MDEx.Heading{nodes: [%MDEx.Text{literal: "Hello"}], level: 1})
"# Hello"

Traverse and Update

You can also use the low-level MDEx.traverse_and_update/2 and MDEx.traverse_and_update/3 APIs to traverse each node of the AST and either update the nodes or do some calculation with an accumulator.

Practical Examples

Here are some common patterns for working with MDEx documents that combine the protocols described above.

Update all code block nodes filtered by the selector function

Add line "// Modified" in Rust block codes:

iex> doc = ~MD"""
...> # Code Examples
...>
...> ```elixir
...> def hello do
...>   :world
...> end
...> ```
...>
...> ```rust
...> fn main() {
...>   println!("Hello");
...> }
...> ```
...> """
iex> selector = fn
...>   %MDEx.CodeBlock{info: "rust"} -> true
...>   _ -> false
...> end
iex> update_in(doc, [:document, Access.key!(:nodes), Access.all(), selector], fn node ->
...>   %{node | literal: "// Modified\n" <> node.literal}
...> end)
%MDEx.Document{
  nodes: [
    %MDEx.Heading{
      nodes: [%MDEx.Text{literal: "Code Examples"}],
      level: 1,
      setext: false
    },
    %MDEx.CodeBlock{
      info: "elixir",
      literal: "def hello do\n  :world\nend\n"
    },
    %MDEx.CodeBlock{
      info: "rust",
      literal: "// Modified\nfn main() {\n  println!(\"Hello\");\n}\n"
    }
  ]
}

Collect headings by level

iex> doc = ~MD"""
...> # Main Title
...>
...> ## Section 1
...>
...> ### Subsection
...>
...> ## Section 2
...> """
iex> Enum.reduce(doc, %{}, fn
...>   %MDEx.Heading{level: level, nodes: [%MDEx.Text{literal: text}]}, acc ->
...>     Map.update(acc, level, [text], &[text | &1])
...>   _node, acc -> acc
...> end)
%{
  1 => ["Main Title"],
  2 => ["Section 2", "Section 1"],
  3 => ["Subsection"]
}

Extract and transform task list items

iex> doc = ~MD"""
...> # Todo List
...>
...> - [ ] Buy groceries
...> - [x] Call mom
...> - [ ] Read book
...> """
iex> Enum.map(doc, fn
...>   %MDEx.TaskItem{checked: checked, nodes: [%MDEx.Paragraph{nodes: [%MDEx.Text{literal: text}]}]} ->
...>     {checked, text}
...>   _ -> nil
...> end)
...> |> Enum.reject(&is_nil/1)
[
  {false, "Buy groceries"},
  {true, "Call mom"},
  {false, "Read book"}
]

Bump all heading levels, except level 6

iex> doc = ~MD"""
...> # Main Title
...>
...> ## Subtitle
...>
...> ###### Notes
...> """
iex> selector = fn
...>   %MDEx.Heading{level: level} when level < 6 -> true
...>   _ -> false
...> end
iex> update_in(doc, [:document, Access.key!(:nodes), Access.all(), selector], fn node ->
...>   %{node | level: node.level + 1}
...> end)
%MDEx.Document{
  nodes: [
    %MDEx.Heading{nodes: [%MDEx.Text{literal: "Main Title"}], level: 2, setext: false},
    %MDEx.Heading{nodes: [%MDEx.Text{literal: "Subtitle"}], level: 3, setext: false},
    %MDEx.Heading{nodes: [%MDEx.Text{literal: "Notes"}], level: 6, setext: false}
  ]
}

Summary

Types

Fragment of a Markdown document, a single node. May contain children nodes.

Selector used to match nodes in the document.

t()

Tree root of a Markdown document, including all children nodes.

Types

md_node()

Fragment of a Markdown document, a single node. May contain children nodes.

selector()

@type selector() :: md_node() | module() | atom() | (md_node() -> boolean())

Selector used to match nodes in the document.

Valid selectors can be the module or struct, an atom representing the node name, or a function that receives a node and returns a boolean.

See MDEx.Document for more info and examples.

t()

@type t() :: %MDEx.Document{nodes: [md_node()]}

Tree root of a Markdown document, including all children nodes.

Functions

fetch(document, selector)

@spec fetch(t(), selector()) :: {:ok, [md_node()]} | :error

Callback implementation for Access.fetch/2.

See the Access section for examples.

get_and_update(document, selector, fun)

Callback implementation for Access.get_and_update/3.

See the Access section for examples.

pop(document, key, default \\ nil)

Callback implementation for Access.fetch/2.

See the Access section for examples.

wrap(document)