Low-level NIF bindings for XML parsing.
This module provides direct access to the Rust NIF functions. For normal use,
prefer the higher-level RustyXML module with its ~x sigil support.
Strategies
The module exposes parsing strategies:
parse/1+xpath_query/2- Structural index with XPath (main path)streaming_*- Stateful streaming parser for large filessax_parse/1- SAX event parser
Memory Efficiency
The structural index (parse/1) uses ~4x input size vs SweetXml's ~600x.
Strings are stored as byte offsets into the original input, not copies.
Scheduler Behaviour
NIFs that parse raw XML input run on the dirty CPU scheduler to avoid blocking BEAM schedulers. Query NIFs on pre-parsed documents run on normal schedulers for sub-millisecond lookups.
Summary
Types
Opaque reference to a parsed XML document (structural index)
Opaque reference to a streaming parser
XML event from parser
Functions
Feed a chunk of data to the document accumulator.
Create a new document accumulator for streaming SimpleForm parsing.
Validate, index, and convert accumulated data to SimpleForm.
Get the root element of a parsed document.
Get current Rust heap allocation in bytes.
Get peak Rust heap allocation since last reset.
Parse XML into a structural index document.
Parse XML and execute an XPath query in one call.
Parse and immediately query, returning text values for node sets.
Parse XML in strict mode (returns {:ok, doc} or {:error, reason}).
Parse XML directly into SimpleForm {name, attrs, children} tree.
Reset memory tracking statistics.
Parse XML and return SAX events.
Parse XML and return SAX events in Saxy-compatible format.
Get number of available complete elements.
Feed a chunk of XML data to the streaming parser.
Feed a chunk and return SAX events as a compact binary.
Finalize the streaming parser and get remaining events.
Finalize the streaming SAX parser, processing any remaining bytes.
Create a new streaming XML parser.
Create a streaming parser with a tag filter.
Create a new streaming SAX parser.
Get streaming parser status.
Take up to max complete elements from the streaming parser.
Take up to max events from the streaming parser.
Take events from streaming parser in Saxy-compatible format.
Execute an XPath query on a parsed document.
Execute an XPath query returning XML strings for node sets (fast path).
Execute XPath and return string value of result.
Execute XPath on document reference and return string value.
Execute XPath query returning text values for node sets (optimized fast path).
Execute parent XPath and evaluate subspecs for each result node.
Types
@opaque document_ref()
Opaque reference to a parsed XML document (structural index)
@opaque parser_ref()
Opaque reference to a streaming parser
@type xml_event() :: {:start_element, binary(), [{binary(), binary()}]} | {:end_element, binary()} | {:empty_element, binary(), [{binary(), binary()}]} | {:text, binary()} | {:cdata, binary()} | {:comment, binary()}
XML event from parser
Functions
Feed a chunk of data to the document accumulator.
@spec accumulator_new() :: reference()
Create a new document accumulator for streaming SimpleForm parsing.
Returns an opaque accumulator reference.
Validate, index, and convert accumulated data to SimpleForm.
Returns {:ok, tree} or {:error, reason}.
@spec get_root(document_ref()) :: term() | nil
Get the root element of a parsed document.
Returns the root element as a tuple:
{:element, name, attributes, children}
Examples
doc = RustyXML.Native.parse("<root attr="value"><child/></root>")
RustyXML.Native.get_root(doc)
#=> {:element, "root", [{"attr", "value"}], [...]}
@spec get_rust_memory() :: non_neg_integer()
Get current Rust heap allocation in bytes.
Requires memory_tracking Cargo feature. Returns 0 otherwise.
@spec get_rust_memory_peak() :: non_neg_integer()
Get peak Rust heap allocation since last reset.
@spec parse(binary()) :: document_ref()
Parse XML into a structural index document.
Runs on the dirty CPU scheduler since parse time scales with input size.
Returns an opaque document reference that can be used with xpath_query/2
and get_root/1. The document is cached and can be queried multiple times.
This is the primary parse function - uses ~4x input size memory.
Examples
doc = RustyXML.Native.parse("<root><item id="1"/></root>")
RustyXML.Native.xpath_query(doc, "//item")
Parse XML and execute an XPath query in one call.
Runs on the dirty CPU scheduler since it parses raw XML input.
More efficient than parse/1 + xpath_query/2 for single queries
since it doesn't create a persistent document reference.
Examples
RustyXML.Native.parse_and_xpath("<root><item/></root>", "//item")
Parse and immediately query, returning text values for node sets.
Optimized path for is_value: true — avoids building element tuples.
@spec parse_strict(binary()) :: {:ok, document_ref()} | {:error, binary()}
Parse XML in strict mode (returns {:ok, doc} or {:error, reason}).
Runs on the dirty CPU scheduler since parse time scales with input size.
Returns {:ok, document_ref} on success, or {:error, reason} if the
document is not well-formed per XML 1.0 specification.
Examples
{:ok, doc} = RustyXML.Native.parse_strict("<root>valid</root>")
{:error, reason} = RustyXML.Native.parse_strict("<1invalid/>")
Parse XML directly into SimpleForm {name, attrs, children} tree.
Bypasses the SAX event pipeline — builds the tree in Rust from the structural index, decoding entities as needed.
Returns {:ok, tree} or {:error, reason}.
@spec reset_rust_memory_stats() :: {non_neg_integer(), non_neg_integer()}
Reset memory tracking statistics.
Returns {current_bytes, previous_peak_bytes}.
Parse XML and return SAX events.
Events are returned as tuples similar to Saxy's format.
Parse XML and return SAX events in Saxy-compatible format.
Events are emitted directly in Saxy format:
{:start_element, {name, attrs}}{:end_element, name}{:characters, content}{:cdata, content}
Comments and PIs are skipped. Empty elements emit start+end.
@spec streaming_available_elements(parser_ref()) :: non_neg_integer() | {:error, :mutex_poisoned}
Get number of available complete elements.
@spec streaming_feed(parser_ref(), binary()) :: {non_neg_integer(), non_neg_integer()} | {:error, :mutex_poisoned}
Feed a chunk of XML data to the streaming parser.
Returns {available_events, buffer_size} on success, or
{:error, :mutex_poisoned} if the parser mutex is poisoned.
Feed a chunk and return SAX events as a compact binary.
When the tail buffer is empty (common case), the NIF tokenizes the BEAM binary in-place (zero copy) and writes events directly into an OwnedBinary on the BEAM heap — no intermediate Rust Vec allocation. Only the unprocessed tail (~100 bytes) is saved between calls.
Format: sequence of <<type::8, ...>> where type 1=start, 2=end, 3=chars, 4=cdata.
@spec streaming_finalize(parser_ref()) :: [xml_event()] | {:error, :mutex_poisoned}
Finalize the streaming parser and get remaining events.
Returns {:error, :mutex_poisoned} if the parser mutex is poisoned.
Finalize the streaming SAX parser, processing any remaining bytes.
Returns final events as a compact binary (same format as streaming_feed_sax/3).
@spec streaming_new() :: parser_ref()
Create a new streaming XML parser.
The streaming parser processes XML in chunks with bounded memory usage.
Examples
parser = RustyXML.Native.streaming_new()
RustyXML.Native.streaming_feed(parser, "<root>")
RustyXML.Native.streaming_feed(parser, "<item/></root>")
events = RustyXML.Native.streaming_take_events(parser, 100)
@spec streaming_new_with_filter(binary()) :: parser_ref()
Create a streaming parser with a tag filter.
Only events for the specified tag name will be emitted. Useful for extracting specific elements from large documents.
Examples
parser = RustyXML.Native.streaming_new_with_filter("item")
RustyXML.Native.streaming_feed(parser, "<root><item/><other/></root>")
# Only item events will be returned
@spec streaming_sax_new() :: reference()
Create a new streaming SAX parser.
Returns an opaque parser reference for use with streaming_feed_sax/3.
@spec streaming_status(parser_ref()) :: {non_neg_integer(), non_neg_integer(), boolean()} | {:error, :mutex_poisoned}
Get streaming parser status.
Returns {available_events, buffer_size, has_pending} on success, or
{:error, :mutex_poisoned} if the parser mutex is poisoned.
@spec streaming_take_elements(parser_ref(), non_neg_integer()) :: [binary()] | {:error, :mutex_poisoned}
Take up to max complete elements from the streaming parser.
Returns a list of XML binaries for complete elements. This is faster than using events because the element strings are built in Rust without needing reconstruction in Elixir.
@spec streaming_take_events(parser_ref(), non_neg_integer()) :: [xml_event()] | {:error, :mutex_poisoned}
Take up to max events from the streaming parser.
Returns {:error, :mutex_poisoned} if the parser mutex is poisoned.
@spec streaming_take_saxy_events(reference(), non_neg_integer(), boolean()) :: [tuple()] | {:error, :mutex_poisoned}
Take events from streaming parser in Saxy-compatible format.
@spec xpath_query(document_ref(), binary()) :: term()
Execute an XPath query on a parsed document.
Returns the result based on the XPath expression:
- Node-set queries return a list of element tuples
- String queries return a string
- Number queries return a float
- Boolean queries return true/false
Examples
doc = RustyXML.Native.parse("<root><item>text</item></root>")
RustyXML.Native.xpath_query(doc, "//item")
#=> [{:element, "item", [], ["text"]}]
@spec xpath_query_raw(document_ref(), binary()) :: [binary()] | term()
Execute an XPath query returning XML strings for node sets (fast path).
Instead of building nested Elixir tuples for each element, this returns the serialized XML string for each node. Much faster for queries returning many elements.
Examples
doc = RustyXML.Native.parse("<root><item>text</item></root>")
RustyXML.Native.xpath_query_raw(doc, "//item")
#=> ["<item>text</item>"]
Execute XPath and return string value of result.
Runs on the dirty CPU scheduler since it parses raw XML input. For node-sets, returns text content of first node.
Examples
RustyXML.Native.xpath_string_value("<root>hello</root>", "//root/text()")
#=> "hello"
@spec xpath_string_value_doc(document_ref(), binary()) :: binary()
Execute XPath on document reference and return string value.
@spec xpath_text_list(document_ref(), binary()) :: [binary()] | term()
Execute XPath query returning text values for node sets (optimized fast path).
Instead of building nested Elixir tuples for each element, returns the
concatenated text content of each node as a string. Much faster for the
common case where is_value: true (no e modifier).
For non-NodeSet results (numbers, strings, booleans), returns as-is.
Execute parent XPath and evaluate subspecs for each result node.
Runs on the dirty CPU scheduler since it parses raw XML input.
Returns a list of maps with each subspec evaluated relative to the parent nodes.
Examples
xml = "<items><item><id>1</id><name>A</name></item></items>"
RustyXML.Native.xpath_with_subspecs(xml, "//item", [{"id", "./id/text()"}, {"name", "./name/text()"}])
#=> [%{id: "1", name: "A"}]