Document Model Architecture

Copy Markdown View Source

Quillon uses an AST (Abstract Syntax Tree) to represent document structures. This guide documents the architecture and extension strategies for supporting rich content elements.


Current Architecture

AST Structure

Documents are represented as nested Elixir tuples following a strict grammar:

{:type, attrs, children}

Example:

{:document, %{id: "doc_123", name: "Welcome"},
  [
    {:heading, %{level: 1}, [
      {:text, %{text: "Hello ", marks: []}, []},
      {:text, %{text: "World", marks: [:bold]}, []}
    ]},
    {:paragraph, %{}, [
      {:text, %{text: "This is a ", marks: []}, []},
      {:text, %{text: "rich text", marks: [:bold, :italic]}, []},
      {:text, %{text: " editor.", marks: []}, []}
    ]}
  ]}

Element Categories

CategoryTypesPurpose
Blockheading, paragraph, blockquote, callout, code_block, divider, image, video, bullet_list, ordered_list, tableVertical stacking elements
InlinetextText content with marks
Containerdocument, list_item, table_row, table_cellStructural containers

Type Definitions

@block_types [
  :heading, :paragraph, :blockquote, :callout, :code_block, :divider,
  :image, :video, :embed,
  :bullet_list, :ordered_list,
  :table
]

@inline_types [:text]

@container_types [:document, :list_item, :table_row, :table_cell]

Key Operations

All operations are immutable - they return a new AST rather than modifying in place.

# Path-based access (indices into children)
Quillon.get(doc, [0, 2])           # Get node at path
Quillon.update(doc, [0, 2], fn)    # Update node at path
Quillon.insert(doc, [0, 3], node)  # Insert at position
Quillon.delete(doc, [0, 2])        # Delete node
Quillon.reorder(doc, [0], ids)     # Reorder children by ID list
Quillon.move(doc, [0, 1], [1, 0])  # Move between positions

# ID-based access
Quillon.find_path(doc, "node_id")        # Find path to node
Quillon.get_by_id(doc, "id")             # Get node by ID
Quillon.update_by_id(doc, "id", fn)      # Update by ID

# Factory functions
Quillon.new(:document, %{name: "My Doc"})
Quillon.new(:heading, %{level: 1}, "Title")
Quillon.new(:paragraph, "Some text content")

JSON Serialization

# AST → JSON
Quillon.to_json({:paragraph, %{}, [{:text, %{text: "Hello", marks: [:bold]}, []}]})
# => %{"type" => "paragraph", "attrs" => %{}, "children" => [
#      %{"type" => "text", "attrs" => %{"text" => "Hello", "marks" => ["bold"]}, "children" => []}
#    ]}

# JSON → AST
Quillon.from_json(%{"type" => "paragraph", "attrs" => %{}, "children" => [...]})
# => {:paragraph, %{}, [...]}

Block Elements

Heading

{:heading, %{level: 2}, [
  {:text, %{text: "Section Title", marks: []}, []}
]}
AttributeTypeDefaultDescription
levelinteger2Heading level (1-6)

Children: inline content (text nodes with marks)

Paragraph

{:paragraph, %{}, [
  {:text, %{text: "Hello ", marks: []}, []},
  {:text, %{text: "world", marks: [:bold]}, []}
]}

Children: inline content (text nodes with marks)

Divider

{:divider, %{style: :solid}, []}
AttributeTypeDefaultDescription
styleatom:solidLine style (:solid, :dashed, :dotted)

Design Goals

  1. Pure Elixir - No JavaScript dependencies in the core library
  2. Immutable operations - All transforms return new AST, never mutate
  3. Structured inline content - Text nodes with marks, not markdown strings
  4. Schema validation - Content expressions define valid structures
  5. Framework agnostic - Core library works without Phoenix/LiveView
  6. CRDT-ready - Structure supports collaborative editing (via separate package)

Block Types Reference

Block elements are content that takes up its own vertical space (stacks vertically). This is distinct from inline elements which flow within a line of text.

Block vs Inline:

 {:heading, ...}            block    

 {:paragraph, %{}, [                  
   {:text, %{text: "Hello "}, []}        inline
   {:text, %{text: "world",           
            marks: [:bold]}, []}         inline
 ]}                         block    

 {:image, ...}              block    

Block Type Categories

@block_types %{
  # Text blocks
  text: [:heading, :paragraph, :blockquote, :callout, :code_block],

  # Media
  media: [:image, :video, :embed],

  # Lists
  list: [:bullet_list, :ordered_list],

  # Tables
  table: [:table],

  # Structural
  structural: [:divider]
}

Blockquote

{:blockquote, %{citation: "Shakespeare"}, [
  {:paragraph, %{}, [
    {:text, %{text: "To be or not to be", marks: []}, []}
  ]}
]}
AttributeTypeDefaultDescription
citationstringnilAttribution

Children: paragraph blocks

Callout

{:callout, %{type: :info, title: "Note"}, [
  {:paragraph, %{}, [
    {:text, %{text: "Important information", marks: []}, []}
  ]}
]}
AttributeTypeDefaultDescription
typeatom:infoCallout type (:info, :warning, :success, :error)
titlestringnilOptional title

Children: paragraph blocks

Code Block

{:code_block, %{code: "def hello, do: :world", language: "elixir"}, []}
AttributeTypeDefaultDescription
codestringrequiredCode content
languagestringnilSyntax highlighting language

Children: none (code stored in attrs)

Image

{:image, %{src: "/uploads/photo.jpg", alt: "Photo", caption: "A nice photo"}, []}
AttributeTypeDefaultDescription
srcstringrequiredImage URL
altstring""Alt text
captionstringnilOptional caption
widthintegernilWidth in pixels

Children: none

Lists

{:bullet_list, %{}, [
  {:list_item, %{}, [
    {:paragraph, %{}, [{:text, %{text: "First item", marks: []}, []}]}
  ]},
  {:list_item, %{}, [
    {:paragraph, %{}, [{:text, %{text: "Second item", marks: []}, []}]}
  ]}
]}

{:ordered_list, %{start: 1}, [
  {:list_item, %{}, [
    {:paragraph, %{}, [{:text, %{text: "Step one", marks: []}, []}]}
  ]},
  {:list_item, %{}, [
    {:paragraph, %{}, [{:text, %{text: "Step two", marks: []}, []}]}
  ]}
]}

Tables

{:table, %{}, [
  {:table_row, %{header: true}, [
    {:table_cell, %{}, [
      {:paragraph, %{}, [{:text, %{text: "Name", marks: []}, []}]}
    ]},
    {:table_cell, %{}, [
      {:paragraph, %{}, [{:text, %{text: "Email", marks: []}, []}]}
    ]}
  ]},
  {:table_row, %{}, [
    {:table_cell, %{}, [
      {:paragraph, %{}, [{:text, %{text: "John", marks: []}, []}]}
    ]},
    {:table_cell, %{}, [
      {:paragraph, %{}, [{:text, %{text: "john@example.com", marks: []}, []}]}
    ]}
  ]}
]}

Inline Content (Rich Text)

Inline content uses text nodes with marks, similar to ProseMirror/Tiptap, Lexical, and Slate.

Text Node Structure

{:text, %{text: "content", marks: [mark1, mark2, ...]}, []}

All inline content uses the :text node type. Formatting (including links) is applied via marks.

Example

# "Hello world! Visit our site for more."
#        ^^^^^ bold
#              ^^^^^^^^^^ link

{:paragraph, %{}, [
  {:text, %{text: "Hello ", marks: []}, []},
  {:text, %{text: "world", marks: [:bold]}, []},
  {:text, %{text: "! Visit ", marks: []}, []},
  {:text, %{text: "our site", marks: [{:link, %{href: "https://example.com"}}]}, []},
  {:text, %{text: " for more.", marks: []}, []}
]}

Mark Types

Marks can be simple atoms or tuples with attributes:

# Simple marks (no attributes needed)
:bold
:italic
:underline
:strike
:code
:subscript
:superscript

# Marks with attributes
{:link, %{href: "https://example.com", title: "Link title", target: "_blank"}}
{:highlight, %{color: "yellow"}}
{:font_color, %{color: "#FF5500"}}
{:mention, %{id: "user_123", type: :user, label: "@john"}}

Mark Reference

MarkTypeAttributesDescription
:boldatom-Bold text
:italicatom-Italic text
:underlineatom-Underlined text
:strikeatom-Strikethrough
:codeatom-Inline code (monospace)
:subscriptatom-Subscript text
:superscriptatom-Superscript text
:linktuplehref, title, targetHyperlink
:highlighttuplecolorBackground highlight
:font_colortuplecolorText color
:mentiontupleid, type, labelUser/item mention

Complex Formatted Text

# "Hello world! Click here for more info."
#        ^^^^^ bold
#              ^^^^^^^^^^ bold + italic + link

{:paragraph, %{}, [
  {:text, %{text: "Hello ", marks: []}, []},
  {:text, %{text: "world", marks: [:bold]}, []},
  {:text, %{text: "! ", marks: []}, []},
  {:text, %{text: "Click here", marks: [:bold, :italic, {:link, %{href: "/info"}}]}, []},
  {:text, %{text: " for more info.", marks: []}, []}
]}

Why Structured Text (Not Markdown)

  1. Programmatic manipulation - add/remove formatting without parsing strings
  2. Collaborative editing - CRDT can track changes to individual text nodes
  3. Validation - enforce allowed marks per context
  4. Rendering flexibility - same structure renders to HTML, plain text, or other formats
  5. Cursor positioning - track cursor position within rich text

Mark Configuration

Each mark type has configuration that controls its behavior:

@mark_config %{
  bold: %{
    inclusive: true,       # New text at mark boundary inherits the mark
    keep_on_split: true,   # Mark persists when pressing Enter
    excludes: []           # No conflicts with other marks
  },
  italic: %{
    inclusive: true,
    keep_on_split: true,
    excludes: []
  },
  code: %{
    inclusive: false,      # New text doesn't inherit code formatting
    keep_on_split: false,
    excludes: [:bold, :italic, :underline]  # Code excludes other formatting
  },
  link: %{
    inclusive: false,      # New text doesn't become part of link
    keep_on_split: false,  # Link doesn't span newlines
    excludes: [],
    attrs: [:href, :title, :target]
  },
  highlight: %{
    inclusive: true,
    keep_on_split: true,
    excludes: [],
    attrs: [:color]
  }
}
PropertyDescription
inclusiveWhether new text typed at mark boundary gets the mark
keep_on_splitWhether mark persists when node is split (e.g., pressing Enter)
excludesList of marks that cannot coexist with this mark
attrsList of attributes for marks with data

Text Transforms

Text Splitting Algorithm

When applying a mark to a selection, text nodes are split at the selection boundaries:

defmodule Quillon.Transforms do
  @doc """
  Apply a mark to text in a paragraph at the given offsets.
  Splits text nodes at boundaries, applies the mark, then normalizes.
  """
  def apply_mark({:paragraph, attrs, children}, start_offset, end_offset, mark) do
    new_children =
      children
      |> split_at_offset(end_offset)    # Split at END first (preserves start offset)
      |> split_at_offset(start_offset)  # Then split at START
      |> add_mark_in_range(start_offset, end_offset, mark)
      |> normalize()

    {:paragraph, attrs, new_children}
  end

  defp split_at_offset(nodes, offset) do
    {before, at_offset, after_nodes} = find_node_at_offset(nodes, offset)

    case at_offset do
      nil ->
        nodes

      {:text, %{text: text, marks: marks}, []} ->
        split_pos = offset - total_length(before)

        if split_pos == 0 or split_pos == String.length(text) do
          nodes  # No split needed at boundary
        else
          left_text = String.slice(text, 0, split_pos)
          right_text = String.slice(text, split_pos..-1//1)

          left_node = {:text, %{text: left_text, marks: marks}, []}
          right_node = {:text, %{text: right_text, marks: marks}, []}

          before ++ [left_node, right_node] ++ after_nodes
        end
    end
  end
end

Normalization Algorithm

After every edit, normalize the paragraph to merge adjacent text nodes with identical marks:

defmodule Quillon.Normalizer do
  @doc """
  Normalize paragraph content by:
  1. Removing empty text nodes
  2. Merging adjacent text nodes with identical marks
  """
  def normalize(children) when is_list(children) do
    children
    |> Enum.reject(&empty_text_node?/1)
    |> merge_adjacent()
  end

  defp empty_text_node?({:text, %{text: ""}, []}), do: true
  defp empty_text_node?(_), do: false

  defp merge_adjacent([]), do: []
  defp merge_adjacent([node]), do: [node]
  defp merge_adjacent([{:text, a1, []}, {:text, a2, []} | rest]) do
    if marks_equal?(a1.marks, a2.marks) do
      # Merge: combine text, keep marks
      merged = {:text, %{text: a1.text <> a2.text, marks: a1.marks}, []}
      merge_adjacent([merged | rest])
    else
      [{:text, a1, []} | merge_adjacent([{:text, a2, []} | rest])]
    end
  end
  defp merge_adjacent([node | rest]), do: [node | merge_adjacent(rest)]

  @doc """
  Compare marks for equality (loose comparison - ignores text content).
  Marks must be sorted for reliable comparison.
  """
  def marks_equal?(marks1, marks2) do
    sort_marks(marks1 || []) == sort_marks(marks2 || [])
  end

  @mark_priority %{bold: 0, italic: 1, underline: 2, strike: 3, code: 4,
                   subscript: 5, superscript: 6, link: 7, highlight: 8}

  defp sort_marks(marks) do
    Enum.sort_by(marks, fn
      m when is_atom(m) -> {@mark_priority[m] || 99, to_string(m)}
      {type, _attrs} -> {@mark_priority[type] || 99, to_string(type)}
    end)
  end
end

Toggle Mark Command

High-level command that checks if mark is active and toggles accordingly:

defmodule Quillon.Commands do
  alias Quillon.{Transforms, Normalizer}

  def toggle_mark(paragraph, {start_offset, end_offset}, mark) do
    if selection_has_mark?(paragraph, start_offset, end_offset, mark) do
      remove_mark(paragraph, start_offset, end_offset, mark)
    else
      Transforms.apply_mark(paragraph, start_offset, end_offset, mark)
    end
  end

  defp selection_has_mark?({:paragraph, _attrs, children}, start_off, end_off, mark) do
    # Check if ALL text in range has the mark
    children
    |> nodes_in_range(start_off, end_off)
    |> Enum.all?(fn {:text, %{marks: marks}, []} ->
      mark in extract_mark_types(marks)
    end)
  end

  defp extract_mark_types(marks) do
    Enum.map(marks, fn
      m when is_atom(m) -> m
      {type, _attrs} -> type
    end)
  end
end

Schema Validation

Schema validation ensures documents conform to valid structures, similar to ProseMirror's schema system.

Quillon.validate(doc)   # Returns {:ok, doc} or {:error, errors}
Quillon.validate!(doc)  # Returns doc or raises ValidationError

Groups

Groups simplify content rules by categorizing node types:

@groups %{
  # Block-level content
  block: [
    :paragraph, :heading, :blockquote, :callout, :code_block,
    :image, :video, :bullet_list, :ordered_list, :table, :divider
  ],

  # Inline content (text with marks)
  inline: [:text],

  # List items
  list_content: [:list_item]
}

Content Expressions

ProseMirror-style content expressions for declarative rules:

ExpressionMeaning
"block+"One or more block nodes
"block*"Zero or more block nodes
"inline*"Zero or more inline nodes (text)
"paragraph"Exactly one paragraph
"(paragraph | heading)+"One or more paragraphs or headings
"paragraph block*"One paragraph followed by zero or more blocks

Node Schema

@node_schema %{
  # Root type
  document: %{
    content: "block*",
    attrs: [:id, :name]
  },

  # Text container blocks
  paragraph: %{
    content: "inline*",
    group: :block,
    marks: :all
  },
  heading: %{
    content: "inline*",
    group: :block,
    marks: [:bold, :italic, :underline, :strike, :link],
    attrs: [:level]
  },
  blockquote: %{
    content: "paragraph+",
    group: :block,
    attrs: [:citation]
  },
  callout: %{
    content: "paragraph+",
    group: :block,
    attrs: [:type, :title]
  },
  code_block: %{
    content: nil,
    group: :block,
    marks: [],
    attrs: [:code, :language]
  },
  divider: %{
    content: nil,
    group: :block,
    attrs: [:style]
  },

  # Lists
  bullet_list: %{
    content: "list_item+",
    group: :block
  },
  ordered_list: %{
    content: "list_item+",
    group: :block,
    attrs: [:start]
  },
  list_item: %{
    content: "paragraph (bullet_list | ordered_list)?",
    marks: :parent
  },

  # Tables
  table: %{
    content: "table_row+",
    group: :block
  },
  table_row: %{
    content: "table_cell+",
    attrs: [:header]
  },
  table_cell: %{
    content: "paragraph+",
    marks: :all,
    attrs: [:colspan, :rowspan]
  },

  # Media
  image: %{
    content: nil,
    group: :block,
    attrs: [:src, :alt, :caption, :width],
    required_attrs: [:src]
  },
  video: %{
    content: nil,
    group: :block,
    attrs: [:src, :poster],
    required_attrs: [:src]
  },

  # Inline text node
  text: %{
    content: nil,
    group: :inline,
    attrs: [:text, :marks]
  }
}

Mark Schema

Defines what marks exist and their behavior:

@mark_schema %{
  # Simple formatting marks
  bold: %{
    inclusive: true,
    keep_on_split: true,
    excludes: [],
    attrs: []
  },
  italic: %{
    inclusive: true,
    keep_on_split: true,
    excludes: [],
    attrs: []
  },
  underline: %{
    inclusive: true,
    keep_on_split: true,
    excludes: [],
    attrs: []
  },
  strike: %{
    inclusive: true,
    keep_on_split: true,
    excludes: [],
    attrs: []
  },
  code: %{
    inclusive: false,
    keep_on_split: false,
    excludes: [:bold, :italic, :underline, :strike, :link],  # code is exclusive
    attrs: []
  },
  subscript: %{
    inclusive: true,
    keep_on_split: true,
    excludes: [:superscript],  # can't be both
    attrs: []
  },
  superscript: %{
    inclusive: true,
    keep_on_split: true,
    excludes: [:subscript],
    attrs: []
  },

  # Marks with attributes
  link: %{
    inclusive: false,  # typing at end doesn't extend link
    keep_on_split: false,
    excludes: [],
    attrs: [:href, :title, :target],
    required_attrs: [:href]
  },
  highlight: %{
    inclusive: true,
    keep_on_split: true,
    excludes: [],
    attrs: [:color],
    default_attrs: %{color: "yellow"}
  },
  font_color: %{
    inclusive: true,
    keep_on_split: true,
    excludes: [],
    attrs: [:color],
    required_attrs: [:color]
  },
  mention: %{
    inclusive: false,
    keep_on_split: false,
    excludes: [],
    attrs: [:id, :type, :label],
    required_attrs: [:id, :type]
  }
}

Mark Allowance per Node

Some nodes restrict which marks are allowed:

def allowed_marks(node_type) do
  case @node_schema[node_type][:marks] do
    :all -> Map.keys(@mark_schema)
    :parent -> :inherit  # look up to parent
    nil -> []  # no marks allowed
    list when is_list(list) -> list
  end
end

# Examples:
# allowed_marks(:paragraph)  => all marks
# allowed_marks(:heading)    => [:bold, :italic, :underline, :strike, :link]
# allowed_marks(:code_block) => []

Validation Rules

defmodule Quillon.Schema do
  @moduledoc """
  Schema-based validation for document AST.
  Uses content expressions and mark schemas like ProseMirror.
  """

  def valid_content?(parent_type, children) do
    expression = @node_schema[parent_type][:content]
    matches_expression?(children, expression)
  end

  def matches_expression?(children, expression) do
    case expression do
      nil -> children == []
      "block+" -> length(children) >= 1 and Enum.all?(children, &in_group?(&1, :block))
      "block*" -> Enum.all?(children, &in_group?(&1, :block))
      "inline*" -> Enum.all?(children, &in_group?(&1, :inline))
      "list_item+" -> length(children) >= 1 and Enum.all?(children, &is_type?(&1, :list_item))
      "paragraph+" -> length(children) >= 1 and Enum.all?(children, &is_type?(&1, :paragraph))
      "table_row+" -> length(children) >= 1 and Enum.all?(children, &is_type?(&1, :table_row))
      "table_cell+" -> length(children) >= 1 and Enum.all?(children, &is_type?(&1, :table_cell))
      _ -> true  # Complex expressions need parser
    end
  end

  defp in_group?({type, _, _}, group), do: type in (@groups[group] || [])
  defp is_type?({type, _, _}, expected), do: type == expected

  def valid_marks?(parent_type, marks) do
    allowed = allowed_marks(parent_type)
    mark_types = Enum.map(marks, fn
      m when is_atom(m) -> m
      {type, _} -> type
    end)
    Enum.all?(mark_types, &(&1 in allowed))
  end

  def mark_allowed?(existing_marks, new_mark) do
    new_type = case new_mark do
      m when is_atom(m) -> m
      {type, _} -> type
    end
    excludes = @mark_schema[new_type][:excludes] || []
    existing_types = Enum.map(existing_marks, fn
      m when is_atom(m) -> m
      {type, _} -> type
    end)
    not Enum.any?(existing_types, &(&1 in excludes))
  end

  def validate({type, attrs, children} = node) do
    with :ok <- validate_type_exists(type),
         :ok <- validate_attrs(type, attrs),
         :ok <- validate_content(type, children),
         :ok <- validate_children(children) do
      {:ok, node}
    end
  end
end

Validation Errors

# Invalid: list_item outside list
{:paragraph, %{}, [
  {:list_item, %{}, [...]}
]}
# => {:error, "Invalid content for paragraph: expected inline*, got list_item"}

# Invalid: code mark in heading
{:heading, %{level: 1}, [
  {:text, %{text: "Hello", marks: [:code]}, []}
]}
# => {:error, "Mark :code not allowed in heading"}

# Invalid: conflicting marks (subscript + superscript)
{:text, %{text: "x", marks: [:subscript, :superscript]}, []}
# => {:error, "Mark :superscript conflicts with :subscript"}

# Invalid: link without href
{:text, %{text: "click", marks: [{:link, %{title: "Link"}}]}, []}
# => {:error, "Mark :link requires attr :href"}

JSON Serialization

{
  "type": "document",
  "attrs": { "id": "doc_123", "name": "My Document" },
  "children": [
    {
      "type": "heading",
      "attrs": { "level": 1 },
      "children": [
        { "type": "text", "attrs": { "text": "Hello", "marks": [] }, "children": [] }
      ]
    },
    {
      "type": "paragraph",
      "attrs": {},
      "children": [
        { "type": "text", "attrs": { "text": "Hello ", "marks": [] }, "children": [] },
        { "type": "text", "attrs": { "text": "world", "marks": ["bold"] }, "children": [] }
      ]
    }
  ]
}

Mark serialization:

  • Simple marks: "bold", "italic", "code"
  • Marks with attrs: { "type": "link", "attrs": { "href": "..." } }

Comparison with JS Editors

FeatureQuillonProseMirror/TiptapLexicalSlate
StructureElixir tuplesJS objectsJS classesJS objects
ImmutableYes (language native)No (mutable DOM)YesYes
Block typesSchema-definedSchema-definedNode classesElement types
Inline formattingStructured nodes + marksMark systemFormat statesLeaf nodes
CollaborationCRDT-ready (separate pkg)Yjs pluginYjs pluginYjs plugin
Server-sideNative ElixirN/AN/AN/A

Quillon advantages:

  • Native Elixir immutability (no runtime overhead)
  • Same structure on client and server
  • JSON serialization built-in
  • Server-side rendering without JS dependency
  • Framework agnostic (works without Phoenix/LiveView)