Kreuzberg.DocumentNode (kreuzberg v4.4.2)

Copy Markdown View Source

A single node in the document tree.

Each node has a deterministic ID, typed content, optional parent/children references, and metadata like page number and content layer classification.

Fields

  • :id - Deterministic identifier (hash of content + position)
  • :node_type - Node type discriminant (paragraph, heading, list, etc.)
  • :content - Node content as a map with type-specific fields
  • :content_layer - Content layer classification (body, header, footer, footnote)
  • :parent - Parent node index (nil if root node)
  • :children - List of child node indices in reading order
  • :page_number - Page number where node starts (1-indexed)
  • :page_number_end - Page number where node ends (for multi-page elements)
  • :bbox - Bounding box coordinates if available
  • :annotations - List of inline text annotations

Examples

iex> node = %Kreuzberg.DocumentNode{
...>   id: "node-1",
...>   node_type: "paragraph",
...>   content: %{"text" => "Hello world"},
...>   page_number: 1
...> }
iex> node.node_type
"paragraph"

Summary

Functions

Creates a DocumentNode struct from a map.

Check if this node has children.

Check if this is a root node (no parent).

Get node type with readable formatting.

Converts a DocumentNode struct to a map.

Types

node_type()

@type node_type() ::
  :title
  | :heading
  | :paragraph
  | :list
  | :list_item
  | :table
  | :image
  | :code
  | :quote
  | :formula
  | :footnote
  | :group
  | :page_break

t()

@type t() :: %Kreuzberg.DocumentNode{
  annotations: [Kreuzberg.DocumentTextAnnotation.t()],
  bbox: Kreuzberg.BoundingBox.t() | nil,
  children: [non_neg_integer()],
  content: map(),
  content_layer: String.t() | nil,
  id: String.t(),
  node_type: String.t(),
  page_number: non_neg_integer() | nil,
  page_number_end: non_neg_integer() | nil,
  parent: non_neg_integer() | nil
}

Functions

from_map(data)

@spec from_map(map()) :: t()

Creates a DocumentNode struct from a map.

Converts a plain map (typically from NIF/Rust) into a proper struct, handling nested content and annotation data.

Parameters

  • data - A map containing node fields

Returns

A DocumentNode struct with properly typed fields.

Examples

iex> node_map = %{
...>   "id" => "node-1",
...>   "node_type" => "paragraph",
...>   "content" => %{"text" => "Hello"},
...>   "page" => 1
...> }
iex> node = Kreuzberg.DocumentNode.from_map(node_map)
iex> node.node_type
"paragraph"

has_children?(document_node)

@spec has_children?(t()) :: boolean()

Check if this node has children.

Parameters

  • node - A DocumentNode struct

Returns

Boolean indicating whether the node has child nodes.

Examples

iex> node = %Kreuzberg.DocumentNode{children: [1, 2]}
iex> Kreuzberg.DocumentNode.has_children?(node)
true

is_root?(document_node)

@spec is_root?(t()) :: boolean()

Check if this is a root node (no parent).

Parameters

  • node - A DocumentNode struct

Returns

Boolean indicating whether the node is a root node.

Examples

iex> node = %Kreuzberg.DocumentNode{parent: nil}
iex> Kreuzberg.DocumentNode.is_root?(node)
true

readable_type(document_node)

@spec readable_type(t()) :: String.t()

Get node type with readable formatting.

Converts snake_case node types to Title Case for display.

Parameters

  • node - A DocumentNode struct

Returns

A human-readable string representation of the node type.

Examples

iex> node = %Kreuzberg.DocumentNode{node_type: "list_item"}
iex> Kreuzberg.DocumentNode.readable_type(node)
"List Item"

to_map(node)

@spec to_map(t()) :: map()

Converts a DocumentNode struct to a map.

Useful for serialization and passing to external systems.

Parameters

  • node - A DocumentNode struct

Returns

A map with string keys representing all fields.

Examples

iex> node = %Kreuzberg.DocumentNode{
...>   id: "node-1",
...>   node_type: "paragraph",
...>   content: %{"text" => "Hello"}
...> }
iex> map = Kreuzberg.DocumentNode.to_map(node)
iex> map["node_type"]
"paragraph"