RustyXML (RustyXML v0.2.3)

Copy Markdown View Source

Ultra-fast XML parsing for Elixir with full XPath 1.0 support.

RustyXML is a high-performance XML parser built from scratch as a Rust NIF with SIMD acceleration. It achieves 100% W3C/OASIS XML Conformance (1089/1089 test cases) and provides a drop-in replacement for SweetXml with the familiar ~x sigil syntax.

Quick Start

import RustyXML

xml = "<root><item id="1">Hello</item><item id="2">World</item></root>"

# Get a list of items
xpath(xml, ~x"//item"l)
#=> [{:element, "item", ...}, {:element, "item", ...}]

# Get text content as string
xpath(xml, ~x"//item/text()"s)
#=> "Hello"

# Map multiple values
xmap(xml, items: ~x"//item"l, count: ~x"count(//item)"i)
#=> %{items: [...], count: 2}

Sigil Modifiers

The ~x sigil supports modifiers for result transformation:

  • e - Return entity (element) for chaining, not text value
  • s - Return as string (binary)
  • S - Soft string (empty string on error)
  • l - Return as list
  • o - Optional (return nil instead of raising on missing)
  • i - Cast to integer
  • I - Soft integer (0 on error)
  • f - Cast to float
  • F - Soft float (0.0 on error)
  • k - Return as keyword list

XPath 1.0 Functions

RustyXML supports all 27+ XPath 1.0 functions including:

  • Node: position(), last(), count(), local-name(), namespace-uri(), name()
  • String: string(), concat(), starts-with(), contains(), substring(), etc.
  • Boolean: boolean(), not(), true(), false(), lang()
  • Number: number(), sum(), floor(), ceiling(), round()

Streaming

For large files, use the streaming API:

"large.xml"
|> RustyXML.stream_tags(:item)
|> Stream.each(&process_item/1)
|> Stream.run()

Summary

Functions

Add a namespace binding to an XPath expression.

Encode an XML element tree to a string.

Encode an XML element tree to iodata.

Parse an XML document.

Parse an XML document, returning {:ok, doc} or {:error, reason}.

Parse an XML stream with a SAX event handler.

Parse an XML string with a SAX event handler.

Get the root element of a parsed document.

The ~x sigil for XPath expressions.

Stream XML events from a file.

Stream XML events from a file. Raises on error.

Add a transformation function to an XPath expression.

Execute multiple XPath queries and return as a map.

Execute an XPath query on XML.

Execute an XPath query with a mapping spec for nested extraction.

Types

document()

@type document() :: RustyXML.Native.document_ref()

handler()

@type handler() :: module()

parse_option()

@type parse_option() ::
  {:cdata_as_characters, boolean()}
  | {:expand_entity, :keep | :skip | (String.t() -> String.t())}

parse_options()

@type parse_options() :: [parse_option()]

xml_node()

@type xml_node() ::
  {:element, binary(), [{binary(), binary()}], [xml_node() | binary()]}

Functions

add_namespace(spec, prefix, uri)

@spec add_namespace(RustyXML.SweetXpath.t(), binary(), binary()) ::
  RustyXML.SweetXpath.t()

Add a namespace binding to an XPath expression.

Returns a new %SweetXpath{} with the namespace added.

Examples

xpath_with_ns = add_namespace(~x"//ns:item"l, "ns", "http://example.com/ns")
RustyXML.xpath(xml, xpath_with_ns)

encode!(content, opts \\ [])

@spec encode!(
  term(),
  keyword()
) :: String.t()

Encode an XML element tree to a string.

Drop-in replacement for Saxy.encode!/2.

Examples

import RustyXML.XML

element("root", [], ["text"]) |> RustyXML.encode!()
#=> "<root>text</root>"

encode_to_iodata!(content, opts \\ [])

@spec encode_to_iodata!(
  term(),
  keyword()
) :: iodata()

Encode an XML element tree to iodata.

Drop-in replacement for Saxy.encode_to_iodata!/2.

parse(xml, opts \\ [])

@spec parse(
  binary() | charlist(),
  keyword()
) :: document()

Parse an XML document.

By default, RustyXML uses strict mode to match SweetXml/xmerl behavior. Malformed XML raises RustyXML.ParseError.

Returns an opaque document reference that can be used with xpath/2,3 for multiple queries on the same document.

Options

  • :lenient - If true, accept malformed XML without raising. Useful for processing third-party or legacy XML. Default: false.

Examples

# Strict mode (default) - matches SweetXml behavior
doc = RustyXML.parse("<root><item/></root>")
RustyXML.xpath(doc, ~x"//item"l)

# Raises on malformed XML (like SweetXml)
RustyXML.parse("<1invalid/>")
#=> ** (RustyXML.ParseError) Invalid element name...

# Lenient mode - accepts malformed XML
doc = RustyXML.parse("<1invalid/>", lenient: true)

parse_document(xml)

@spec parse_document(binary() | charlist()) :: {:ok, document()} | {:error, binary()}

Parse an XML document, returning {:ok, doc} or {:error, reason}.

Unlike parse/2, this function returns a tuple instead of raising, allowing pattern matching on parse results.

Examples

{:ok, doc} = RustyXML.parse_document("<root/>")
{:error, reason} = RustyXML.parse_document("<1invalid/>")

parse_stream(stream, handler, initial_state, opts \\ [])

@spec parse_stream(Enumerable.t(), handler(), any(), parse_options()) ::
  {:ok, any()} | {:halt, any()} | {:error, any()}

Parse an XML stream with a SAX event handler.

Drop-in replacement for Saxy.parse_stream/4.

Accepts any Enumerable that yields binary chunks (e.g. File.stream!/3). Uses bounded memory via zero-copy tokenization and direct BEAM binary encoding: when the internal buffer is empty (common case), the NIF tokenizes the BEAM binary in-place without copying. Events are written directly into an OwnedBinary on the BEAM heap — no intermediate Rust Vec allocation. Elixir then decodes one event at a time via binary pattern matching, so only one event tuple is ever live on the heap. Combined NIF + BEAM peak is ~128 KB for a 2.93 MB document, comparable to Saxy while running ~1.8x faster.

Examples

File.stream!("large.xml", [], 64 * 1024)
|> RustyXML.parse_stream(MyHandler, initial_state)

parse_string(xml, handler, initial_state, opts \\ [])

@spec parse_string(binary(), handler(), any(), parse_options()) ::
  {:ok, any()} | {:halt, any()} | {:error, any()}

Parse an XML string with a SAX event handler.

Drop-in replacement for Saxy.parse_string/4.

The handler module must implement RustyXML.Handler (same callback as Saxy.Handler). Events are dispatched in document order.

Options

  • :cdata_as_characters - Emit CDATA as :characters events (default: false)
  • :expand_entity - Accepted for Saxy API compatibility (default: :keep)

Examples

defmodule MyHandler do
  @behaviour RustyXML.Handler

  def handle_event(:start_element, {name, _attrs}, acc), do: {:ok, [name | acc]}
  def handle_event(_, _, acc), do: {:ok, acc}
end

{:ok, names} = RustyXML.parse_string("<root><a/><b/></root>", MyHandler, [])
#=> {:ok, ["b", "a", "root"]}

root(doc)

@spec root(document()) :: xml_node() | nil

Get the root element of a parsed document.

Examples

doc = RustyXML.parse("<root><child/></root>")
RustyXML.root(doc)
#=> {:element, "root", [], [...]}

sigil_x(arg, modifiers)

(macro)

The ~x sigil for XPath expressions.

Creates a %SweetXpath{} struct with the specified path and modifiers.

Modifiers

  • e - Return entity (element) for chaining
  • s - Return as string
  • S - Soft string (empty on error)
  • l - Return as list
  • o - Optional (nil on missing)
  • i - Cast to integer
  • I - Soft integer (0 on error)
  • f - Cast to float
  • F - Soft float (0.0 on error)
  • k - Return as keyword list

Examples

import RustyXML

~x"//item"l          # List of items
~x"//name/text()"s   # String value
~x"count(//item)"i   # Integer count
~x"//optional"so     # Optional string

stream_tags(source, tag, opts \\ [])

@spec stream_tags(binary() | Enumerable.t(), atom() | binary(), keyword()) ::
  Enumerable.t()

Stream XML events from a file.

Returns a Stream that yields events as the file is read. Uses bounded memory regardless of file size.

Options

  • :chunk_size - Bytes to read per IO operation (default: 64KB)
  • :batch_size - Accepted for SweetXml API compatibility but has no effect. RustyXML's streaming parser yields complete elements directly from Rust as they are parsed — there is no event batching step to tune.
  • :discard - Accepted for SweetXml API compatibility but has no effect. RustyXML's streaming parser already operates in bounded memory (~128 KB combined NIF + BEAM peak for a 2.93 MB document) by only materializing one element at a time, so tag discarding for memory reduction is unnecessary.

Examples

"large.xml"
|> RustyXML.stream_tags(:item)
|> Stream.each(&process/1)
|> Stream.run()

stream_tags!(source, tag, opts \\ [])

@spec stream_tags!(binary() | Enumerable.t(), atom() | binary(), keyword()) ::
  Enumerable.t()

Stream XML events from a file. Raises on error.

Provided for SweetXml API compatibility. Behaves identically to stream_tags/3, which already raises on read errors.

transform_by(spec, fun)

@spec transform_by(RustyXML.SweetXpath.t(), (term() -> term())) ::
  RustyXML.SweetXpath.t()

Add a transformation function to an XPath expression.

The function will be applied to the result after all other modifiers.

Examples

spec = transform_by(~x"//price/text()"s, &String.to_float/1)
RustyXML.xpath(xml, spec)
#=> 45.99

xmap(xml_or_doc, specs, opts \\ false)

@spec xmap(binary() | document(), keyword(), boolean() | map()) :: map() | keyword()

Execute multiple XPath queries and return as a map.

Options

The third argument is accepted for SweetXml API compatibility but is not required. Use the k sigil modifier instead for keyword output.

Examples

xml = "<root><a>1</a><b>2</b></root>"

RustyXML.xmap(xml, [
  a: ~x"//a/text()"s,
  b: ~x"//b/text()"s
])
#=> %{a: "1", b: "2"}

xpath(xml_or_doc, spec)

@spec xpath(binary() | document(), RustyXML.SweetXpath.t() | binary()) :: term()

Execute an XPath query on XML.

The first argument can be either:

  • A raw XML binary
  • A parsed document reference from parse/1

The second argument can be:

  • A %SweetXpath{} struct (from ~x sigil)
  • A plain XPath string (binary)

Examples

# On raw XML
RustyXML.xpath("<root>text</root>", ~x"//root/text()"s)
#=> "text"

# On parsed document
doc = RustyXML.parse("<root><a/><b/></root>")
RustyXML.xpath(doc, ~x"//a"l)

xpath(xml_or_doc, spec, subspecs)

@spec xpath(binary() | document(), RustyXML.SweetXpath.t() | binary(), keyword()) ::
  term()

Execute an XPath query with a mapping spec for nested extraction.

The third argument is a keyword list of {name, xpath_spec} pairs that will be evaluated for each node in the parent result.

Examples

xml = "<items><item id="1"><name>A</name></item><item id="2"><name>B</name></item></items>"

RustyXML.xpath(xml, ~x"//item"l, [
  id: ~x"./@id"s,
  name: ~x"./name/text()"s
])
#=> [%{id: "1", name: "A"}, %{id: "2", name: "B"}]