Expath (Expath v0.2.0)

View Source

Lightning-fast XML parsing and XPath querying for Elixir, powered by Rust NIFs.

Expath provides blazing-fast XML processing through Rust's battle-tested sxd-document and sxd-xpath libraries, delivering 2-10x performance improvements and up to 195,000x memory efficiency compared to existing Elixir XML libraries.

Key Features

  • 🚀 Blazing Fast: 2-10x faster than SweetXml with Rust-powered NIFs
  • 🔄 Parse-Once, Query-Many: Efficient document reuse for multiple XPath queries
  • 🛡️ Battle-Tested: Built on proven Rust XML libraries
  • 🎯 Simple API: Clean, intuitive interface with comprehensive error handling
  • ⚡ Thread-Safe: Safe concurrent access to parsed documents

Quick Start

For single XPath queries:

iex> xml = "<library><book id='1'><title>1984</title></book></library>"
iex> {:ok, titles} = Expath.select(xml, "//title/text()")
iex> titles
["1984"]

For multiple queries on the same document (more efficient):

iex> xml = "<library><book id='1'><title>1984</title><author>Orwell</author></book></library>"
iex> {:ok, doc} = Expath.new(xml)
iex> {:ok, titles} = Expath.query(doc, "//title/text()")
iex> {:ok, authors} = Expath.query(doc, "//author/text()")
iex> {titles, authors}
{["1984"], ["Orwell"]}

Performance

Benchmark results comparing Expath vs SweetXml:

Document SizeSpeed
Small (644B)2-3x faster
Medium (5.6KB)2.3x faster
Large (904KB)8-10x faster

XPath Support

Expath supports the full XPath 1.0 specification:

# Node selection
Expath.select(xml, "//book")                    # All book elements
Expath.select(xml, "//book[@id='1']")           # Specific book

# Text extraction
Expath.select(xml, "//title/text()")            # All title text
Expath.select(xml, "//book/@id")                # All id attributes

# Functions
Expath.select(xml, "count(//book)")             # Count elements
Expath.select(xml, "//book[position()=1]")     # First element

# Complex expressions
Expath.select(xml, "//book[price > 10]/title/text()") # Conditional

Error Handling

All functions return {:ok, result} or {:error, reason} tuples:

{:error, :invalid_xml}    # XML parsing failed
{:error, :invalid_xpath}  # XPath expression invalid
{:error, :xpath_error}    # XPath evaluation failed

When to Use Parse-Once vs Single Query

Use select/2 for:

  • One-off XML processing
  • Small documents
  • Simple scripts

Use new/1 + query/2 for:

  • Multiple queries on the same document
  • Large documents (>1KB)
  • Performance-critical applications
  • Concurrent processing scenarios

Summary

Functions

Parse XML from a string and return a Document resource.

Parse XML and return a Document resource for efficient reuse (low-level function).

Parse XML from binary data (low-level function).

Query a Document resource with XPath.

Query a Document resource with XPath and namespace support.

Query a parsed Document resource with XPath (low-level function).

Query a parsed Document resource with XPath and namespace support (low-level function).

Select nodes from an XML string using XPath.

Select nodes from an XML string using XPath with namespace support.

Select nodes from XML using XPath expressions (low-level function).

Select nodes from XML using XPath expressions with namespace support (low-level function).

Functions

new(xml_string)

Parse XML from a string and return a Document resource.

This is the recommended function for parse-once, query-many scenarios. The parsed document can be reused for multiple XPath queries without re-parsing the XML, providing significant performance benefits for applications that need to run multiple queries on the same document.

The Document resource is automatically cleaned up by Erlang's garbage collector when it goes out of scope.

Parameters

  • xml_string - XML content as a binary string

Returns

  • {:ok, %Expath.Document{}} - Successfully parsed document resource
  • {:error, :invalid_xml} - XML parsing failed

Examples

iex> xml = "<library><book><title>1984</title><author>Orwell</author></book></library>"
iex> {:ok, doc} = Expath.new(xml)
iex> is_struct(doc, Expath.Document)
true

iex> Expath.new("<invalid><unclosed>")
{:error, :invalid_xml}

Usage Pattern

# Parse once
{:ok, doc} = Expath.new(large_xml_string)

# Query multiple times efficiently
{:ok, titles} = Expath.query(doc, "//title/text()")
{:ok, authors} = Expath.query(doc, "//author/text()")
{:ok, [count]} = Expath.query(doc, "count(//book)")

# Document automatically cleaned up when `doc` goes out of scope

Performance Benefits

  • Memory: Document stored efficiently in Rust
  • Speed: No XML re-parsing for subsequent queries
  • Concurrency: Document can be safely shared across processes
  • Large Documents: Particularly beneficial for documents >1KB

parse_document(xml_binary)

Parse XML and return a Document resource for efficient reuse (low-level function).

This is a low-level function that creates an Expath.Document resource. For most use cases, prefer the higher-level new/1 function.

Parameters

  • xml_binary - XML content as binary data

Returns

  • {:ok, %Expath.Document{}} - Successfully parsed document
  • {:error, :invalid_xml} - XML parsing failed

Examples

iex> {:ok, doc} = Expath.parse_document("<root><item>value</item></root>")
iex> is_struct(doc, Expath.Document)
true

iex> Expath.parse_document("<invalid><unclosed>")
{:error, :invalid_xml}

parse_xml(xml_binary)

Parse XML from binary data (low-level function).

This function validates that the provided XML binary is well-formed. For most use cases, prefer select/2 or new/1 which provide more functionality.

Parameters

  • xml_binary - XML content as binary data

Returns

  • {:ok, "parsed"} - XML is valid
  • {:error, :invalid_xml} - XML parsing failed

Examples

iex> Expath.parse_xml("<root><item>value</item></root>")
{:ok, "parsed"}

iex> Expath.parse_xml("<invalid><unclosed>")
{:error, :invalid_xml}

query(document, xpath)

Query a Document resource with XPath.

This function executes XPath queries on a previously parsed Document resource. It's designed for high-performance scenarios where multiple queries need to be run on the same XML document without re-parsing.

Parameters

  • document - An %Expath.Document{} resource from new/1
  • xpath - XPath expression as a binary string

Returns

  • {:ok, results} - List of matching values as strings
  • {:error, :invalid_xml} - Document resource is invalid
  • {:error, :invalid_xpath} - XPath expression is invalid
  • {:error, :xpath_error} - XPath evaluation failed

Examples

iex> xml = "<library><book><title>1984</title><author>Orwell</author></book></library>"
iex> {:ok, doc} = Expath.new(xml)
iex> {:ok, titles} = Expath.query(doc, "//title/text()")
iex> titles
["1984"]

iex> {:ok, doc} = Expath.new("<root><item>value1</item><item>value2</item></root>")
iex> {:ok, results} = Expath.query(doc, "//item/text()")
iex> Enum.sort(results)
["value1", "value2"]

iex> {:ok, doc} = Expath.new("<books><book id='1'/><book id='2'/></books>")
iex> {:ok, [count]} = Expath.query(doc, "count(//book)")
iex> count
"2"

Typical Usage

# Parse large document once
{:ok, doc} = Expath.new(large_xml_content)

# Run multiple queries efficiently
{:ok, products} = Expath.query(doc, "//product/@id")
{:ok, prices} = Expath.query(doc, "//price/text()")
{:ok, categories} = Expath.query(doc, "//category/text()")
{:ok, [total]} = Expath.query(doc, "count(//product)")

Concurrent Usage

# Document can be safely shared across processes
{:ok, doc} = Expath.new(xml_content)

tasks = for xpath <- xpath_list do
  Task.async(fn -> Expath.query(doc, xpath) end)
end

results = Task.await_many(tasks)

Performance

This approach is particularly beneficial when:

  • Running 2+ queries on the same document
  • Working with large documents (>1KB)
  • Processing documents in tight loops
  • Sharing documents across multiple processes

query(document, xpath, namespaces)

Query a Document resource with XPath and namespace support.

This function executes XPath queries with namespace mappings on a previously parsed Document resource. It's designed for high-performance scenarios where multiple queries need to be run on the same XML document without re-parsing.

Parameters

  • document - An %Expath.Document{} resource from new/1
  • xpath - XPath expression as a binary string
  • namespaces - Map of namespace prefix to URI mappings

Returns

  • {:ok, results} - List of matching values as strings
  • {:error, :invalid_xml} - Document resource is invalid
  • {:error, :invalid_xpath} - XPath expression is invalid
  • {:error, :xpath_error} - XPath evaluation failed

Examples

iex> xml = "<library xmlns:book='http://example.com/book'><book:title>1984</book:title><book:author>Orwell</book:author></library>"
iex> {:ok, doc} = Expath.new(xml)
iex> namespaces = %{"book" => "http://example.com/book"}
iex> {:ok, titles} = Expath.query(doc, "//book:title/text()", namespaces)
iex> titles
["1984"]

iex> xml = "<catalog xmlns:prod='http://product.com' xmlns:meta='http://metadata.com'><prod:item meta:id='123'><prod:name>Widget</prod:name><prod:price>9.99</prod:price></prod:item></catalog>"
iex> {:ok, doc} = Expath.new(xml)
iex> namespaces = %{"prod" => "http://product.com", "meta" => "http://metadata.com"}
iex> {:ok, names} = Expath.query(doc, "//prod:name/text()", namespaces)
iex> {:ok, prices} = Expath.query(doc, "//prod:price/text()", namespaces)
iex> {:ok, ids} = Expath.query(doc, "//prod:item/@meta:id", namespaces)
iex> {names, prices, ids}
{["Widget"], ["9.99"], ["123"]}

Typical Usage with Namespaces

# Parse document once
xml = ~s[
  <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:web="http://webservice.example.com">
    <soap:Body>
      <web:GetUserResponse>
        <web:User>
          <web:Id>123</web:Id>
          <web:Name>John Doe</web:Name>
          <web:Email>john@example.com</web:Email>
        </web:User>
      </web:GetUserResponse>
    </soap:Body>
  </soap:Envelope>
]

{:ok, doc} = Expath.new(xml)

# Define namespace mappings once
namespaces = %{
  "soap" => "http://schemas.xmlsoap.org/soap/envelope/",
  "web" => "http://webservice.example.com"
}

# Run multiple namespace-aware queries efficiently
{:ok, [user_id]} = Expath.query(doc, "//web:Id/text()", namespaces)
{:ok, [user_name]} = Expath.query(doc, "//web:Name/text()", namespaces)
{:ok, [user_email]} = Expath.query(doc, "//web:Email/text()", namespaces)

Performance

This approach is particularly beneficial when:

  • Running 2+ namespace-aware queries on the same document
  • Working with large namespaced documents (greater than 1 KB)
  • Processing documents with complex namespace hierarchies
  • Sharing namespaced documents across multiple processes

query_document(document, xpath_str)

Query a parsed Document resource with XPath (low-level function).

This is a low-level function that queries an Expath.Document resource. For most use cases, prefer the higher-level query/2 function.

Parameters

  • document - An %Expath.Document{} resource
  • xpath_str - XPath expression as string

Returns

  • {:ok, results} - List of matching values as strings
  • {:error, :invalid_xml} - Document is invalid
  • {:error, :invalid_xpath} - XPath expression is invalid
  • {:error, :xpath_error} - XPath evaluation failed

Examples

iex> {:ok, doc} = Expath.parse_document("<root><item>value1</item><item>value2</item></root>")
iex> {:ok, results} = Expath.query_document(doc, "//item/text()")
iex> Enum.sort(results)
["value1", "value2"]

iex> {:ok, doc} = Expath.parse_document("<books><book id='1'/><book id='2'/></books>")
iex> Expath.query_document(doc, "count(//book)")
{:ok, ["2"]}

query_document_with_namespaces(document, xpath_str, namespaces)

Query a parsed Document resource with XPath and namespace support (low-level function).

This is a low-level function that queries an Expath.Document resource with namespace mappings. For most use cases, prefer the higher-level query/3 function.

Parameters

  • document - An %Expath.Document{} resource
  • xpath_str - XPath expression as string
  • namespaces - Map of namespace prefix to URI mappings

Returns

  • {:ok, results} - List of matching values as strings
  • {:error, :invalid_xml} - Document is invalid
  • {:error, :invalid_xpath} - XPath expression is invalid
  • {:error, :xpath_error} - XPath evaluation failed

Examples

iex> xml = "<root xmlns:ns='http://example.com'><ns:item>value</ns:item></root>"
iex> {:ok, doc} = Expath.parse_document(xml)
iex> namespaces = %{"ns" => "http://example.com"}
iex> {:ok, results} = Expath.query_document_with_namespaces(doc, "//ns:item/text()", namespaces)
iex> results
["value"]

select(xml_string, xpath)

Select nodes from an XML string using XPath.

This is the recommended function for single XPath queries. It parses the XML and executes the XPath query in one operation with clean error handling.

For multiple queries on the same XML document, consider using new/1 followed by multiple query/2 calls for better performance.

Parameters

  • xml_string - XML content as a binary string
  • xpath - XPath expression as a binary string

Returns

  • {:ok, results} - List of matching values as strings
  • {:error, :invalid_xml} - XML parsing failed
  • {:error, :invalid_xpath} - XPath expression is invalid
  • {:error, :xpath_error} - XPath evaluation failed

Examples

iex> xml = "<library><book><title>1984</title></book></library>"
iex> {:ok, titles} = Expath.select(xml, "//title/text()")
iex> titles
["1984"]

iex> xml = "<root><item>value1</item><item>value2</item></root>"
iex> {:ok, results} = Expath.select(xml, "//item/text()")
iex> Enum.sort(results)
["value1", "value2"]

iex> xml = "<books><book id='1'/><book id='2'/></books>"
iex> {:ok, [count]} = Expath.select(xml, "count(//book)")
iex> count
"2"

iex> Expath.select("<invalid><xml>", "//title")
{:error, :invalid_xml}

iex> Expath.select("<root/>", "//[invalid")
{:error, :invalid_xpath}

Performance Tips

  • For single queries: use select/2
  • For multiple queries on the same document: use new/1 + query/2
  • Prefer specific XPath expressions over broad searches
  • Use the included benchmark suite to test your specific use case

select(xml_string, xpath, namespaces)

Select nodes from an XML string using XPath with namespace support.

This is the recommended function for single XPath queries that require namespace support. It parses the XML and executes the XPath query with namespace mappings in one operation.

For multiple queries on the same XML document, consider using new/1 followed by multiple query/3 calls for better performance.

Parameters

  • xml_string - XML content as a binary string
  • xpath - XPath expression as a binary string
  • namespaces - Map of namespace prefix to URI mappings

Returns

  • {:ok, results} - List of matching values as strings
  • {:error, :invalid_xml} - XML parsing failed
  • {:error, :invalid_xpath} - XPath expression is invalid
  • {:error, :xpath_error} - XPath evaluation failed

Examples

iex> xml = "<library xmlns:book='http://example.com/book'><book:title>1984</book:title></library>"
iex> namespaces = %{"book" => "http://example.com/book"}
iex> {:ok, titles} = Expath.select(xml, "//book:title/text()", namespaces)
iex> titles
["1984"]

iex> xml = "<root xmlns:ns1='http://ns1.com' xmlns:ns2='http://ns2.com'><ns1:item ns2:attr='value'>text</ns1:item></root>"
iex> namespaces = %{"ns1" => "http://ns1.com", "ns2" => "http://ns2.com"}
iex> {:ok, attrs} = Expath.select(xml, "//ns1:item/@ns2:attr", namespaces)
iex> attrs
["value"]

Namespace Usage

When working with namespaced XML documents, you must:

  1. Register namespace prefixes - Map prefixes to URIs in the namespaces parameter
  2. Use prefixes in XPath - Reference elements and attributes with their prefixes
  3. Match namespace URIs - The URI in your mapping must exactly match the XML

Common Patterns

# Default namespace (elements without prefix in XML)
xml = "<root xmlns='http://example.com'><item>value</item></root>"
namespaces = %{"" => "http://example.com"}  # Empty prefix for default namespace
Expath.select(xml, "//item/text()", namespaces)

# Multiple namespaces
xml = "<soap:Envelope xmlns:soap='http://schemas.xmlsoap.org/soap/envelope/' xmlns:web='http://example.com/webservice'><soap:Body><web:GetUser><web:UserId>123</web:UserId></web:GetUser></soap:Body></soap:Envelope>"
namespaces = %{
  "soap" => "http://schemas.xmlsoap.org/soap/envelope/",
  "web" => "http://example.com/webservice"
}
Expath.select(xml, "//web:UserId/text()", namespaces)

# Namespace-aware attribute selection
xml = "<root xmlns:meta='http://metadata.com'><item meta:id='123'>content</item></root>"
namespaces = %{"meta" => "http://metadata.com"}
Expath.select(xml, "//item/@meta:id", namespaces)

xpath_select(xml_binary, xpath_str)

Select nodes from XML using XPath expressions (low-level function).

This is a low-level function that parses XML and executes an XPath query in one operation. For most use cases, prefer the higher-level select/2 function which provides better error handling and API consistency.

Parameters

  • xml_binary - XML content as binary data
  • xpath_str - XPath expression as string

Returns

  • {:ok, results} - List of matching values as strings
  • {:error, :invalid_xml} - XML parsing failed
  • {:error, :invalid_xpath} - XPath expression is invalid
  • {:error, :xpath_error} - XPath evaluation failed

Examples

iex> xml = "<root><item>value1</item><item>value2</item></root>"
iex> {:ok, results} = Expath.xpath_select(xml, "//item/text()")
iex> Enum.sort(results)
["value1", "value2"]

iex> Expath.xpath_select("<root/>", "count(//*)")
{:ok, ["1"]}

xpath_select_with_namespaces(xml_binary, xpath_str, namespaces)

Select nodes from XML using XPath expressions with namespace support (low-level function).

This is a low-level function that parses XML and executes an XPath query with namespace mappings in one operation. For most use cases, prefer the higher-level select/3 function.

Parameters

  • xml_binary - XML content as binary data
  • xpath_str - XPath expression as string
  • namespaces - Map of namespace prefix to URI mappings

Returns

  • {:ok, results} - List of matching values as strings
  • {:error, :invalid_xml} - XML parsing failed
  • {:error, :invalid_xpath} - XPath expression is invalid
  • {:error, :xpath_error} - XPath evaluation failed

Examples

iex> xml = "<root xmlns:ns='http://example.com'><ns:item>value</ns:item></root>"
iex> namespaces = %{"ns" => "http://example.com"}
iex> {:ok, results} = Expath.xpath_select_with_namespaces(xml, "//ns:item/text()", namespaces)
iex> results
["value"]