RustyXML Compliance & Validation

Copy Markdown View Source

RustyXML takes correctness seriously. With 1296+ tests across multiple test suites, including the complete W3C/OASIS XML Conformance Test Suite, RustyXML achieves 100% compliance with the industry-standard XML validation tests.

This document describes W3C XML 1.0 compliance, XPath 1.0 support, and the validation methodology.


W3C XML 1.0 Compliance

RustyXML is designed to comply with W3C XML 1.0 (Fifth Edition).

Core XML Requirements

SectionRequirementStatus
2.1Well-formed documents
2.2Characters (Unicode)
2.3Common syntactic constructs
2.4Character data
2.5Comments
2.6Processing instructions
2.7CDATA sections
2.8Prolog and document type declaration✅ Parsed
2.9Standalone document declaration
2.10White space handling
2.11End-of-line handling
2.12Language identification

Element and Attribute Support

FeatureStatusNotes
Start tagsFull attribute support
End tagsName matching validation
Empty element tags<element/> syntax
AttributesSingle and double quotes
NamespacesPrefix resolution
Default namespacesxmlns="..."

Character and Entity Support

FeatureStatusNotes
Character references&#N; and &#xN;
Predefined entities&lt;, &gt;, &amp;, &apos;, &quot;
Character encodingUTF-8 (primary), UTF-16 detection
Unicode charactersFull Unicode support
BOM handlingUTF-8/UTF-16 BOM detection

Document Structure

FeatureStatusNotes
XML declarationVersion, encoding, standalone
DOCTYPE declarationParsed but not validated
Single root elementEnforced
Comments<!-- ... -->
Processing instructions<?target data?>
CDATA sections<![CDATA[...]]>

Differences from Strict XML 1.0

RustyXML makes practical concessions shared by most XML implementations:

  1. Non-validating parser - DTD declarations are parsed but entity definitions are not expanded (except predefined entities)
  2. Lenient character handling - Control characters in content generate warnings rather than errors
  3. Flexible encoding - Automatically detects UTF-8, UTF-16 LE/BE

XPath 1.0 Support

RustyXML implements the complete XPath 1.0 specification.

Axes (13 of 13)

All XPath axes are fully supported:

AxisStatusDescription
childDirect children
parentParent node
selfContext node
attributeAttributes of context node
descendantAll descendants
descendant-or-selfContext node and descendants
ancestorAll ancestors
ancestor-or-selfContext node and ancestors
followingAll following nodes
following-siblingFollowing siblings
precedingAll preceding nodes
preceding-siblingPreceding siblings
namespaceNamespace nodes

Node Tests

TestStatusExample
Node namechild::book
Wildcardchild::*
Node typetext(), comment(), processing-instruction(), node()
Processing instruction targetprocessing-instruction('xml-stylesheet')

Predicates

FeatureStatusExample
Position predicates[1], [last()]
Attribute predicates[@id='1']
Element predicates[title]
Comparison operators=, !=, <, >, <=, >=
Boolean operatorsand, or
Arithmetic operators+, -, *, div, mod
Nested predicates[item[@type='book']]

Functions (27+)

Node Set Functions

FunctionStatusDescription
position()Current position in node set
last()Size of node set
count(node-set)Number of nodes
local-name()Local part of name
namespace-uri()Namespace URI
name()Qualified name
id(string)Select by ID

String Functions

FunctionStatusDescription
string()Convert to string
concat(str, str, ...)Concatenate strings
starts-with(str, prefix)Test string prefix
contains(str, substr)Test substring presence
substring(str, start, len?)Extract substring
substring-before(str, delim)String before delimiter
substring-after(str, delim)String after delimiter
string-length(str?)String length
normalize-space(str?)Normalize whitespace
translate(str, from, to)Character translation

Boolean Functions

FunctionStatusDescription
boolean()Convert to boolean
not(bool)Logical negation
true()Boolean true
false()Boolean false
lang(lang)Test language

Number Functions

FunctionStatusDescription
number()Convert to number
sum(node-set)Sum of node values
floor(num)Floor function
ceiling(num)Ceiling function
round(num)Round to nearest

Abbreviated Syntax

SyntaxExpansionStatus
///descendant-or-self::node()/
.self::node()
..parent::node()
@attrattribute::attr
[n][position() = n]

Conformance Test Suite

RustyXML includes a comprehensive conformance test suite based on W3C and OASIS standards.

Test Categories

CategoryTestsDescription
Well-Formedness18Basic XML structure
Characters12Unicode and special characters
Whitespace8Whitespace preservation and normalization
Entities10Entity references and escaping
CDATA8CDATA section handling
Comments6Comment parsing
Processing Instructions6PI parsing and data extraction
Namespaces12Namespace declaration and resolution
Attributes10Attribute parsing and quoting
Elements8Element naming and nesting
XML Declaration6Version, encoding, standalone
DOCTYPE4DOCTYPE declaration parsing
Edge Cases8Complex real-world scenarios
XPath Axes15All 13 axes plus edge cases
Total121Conformance tests

Test File

test/xml_conformance_test.exs

Running Conformance Tests

# Run all conformance tests
mix test test/xml_conformance_test.exs

# Run specific category
mix test test/xml_conformance_test.exs --only wellformedness
mix test test/xml_conformance_test.exs --only xpath

W3C/OASIS XML Conformance Test Suite

RustyXML is tested against the official W3C XML Conformance Test Suite (xmlconf), the industry standard with 2000+ test cases from Sun, IBM, OASIS/NIST, and others.

Test Results

Strict Mode (Default)

CategoryTestsPassedStatus
Valid documents (must accept)218218100%
Not-well-formed (must reject)871871100%
Invalid (DTD validation)--N/A (non-validating)

RustyXML achieves 100% compliance with all 1089 applicable OASIS/W3C XML Conformance tests.

Lenient Mode (lenient: true)

CategoryTestsPassedStatus
Valid documents (must accept)218218100%
Not-well-formed (must reject)8710⚠️ Lenient
Invalid (DTD validation)--N/A (non-validating)

Lenient mode accepts malformed XML for processing third-party or legacy documents.

Parser Behavior

RustyXML supports two modes:

Strict Mode (Default) - Matches SweetXml/xmerl behavior:

  • Validates element and attribute names
  • Checks comment content (no -- sequences)
  • Validates text content (no unescaped ]]>)
  • Raises ParseError for malformed documents

Lenient Mode (lenient: true) - Accepts malformed XML:

  • Best for processing real-world documents that may have minor issues
  • 100% acceptance of valid documents
  • Does not reject malformed documents
# Strict mode (default) - matches SweetXml
doc = RustyXML.parse("<root/>")
RustyXML.parse("<1invalid/>")  # Raises ParseError

# Lenient mode - accepts malformed XML
doc = RustyXML.parse("<1invalid/>", lenient: true)

# Tuple-based error handling (no exceptions)
{:ok, doc} = RustyXML.parse_document("<root/>")
{:error, reason} = RustyXML.parse_document("<1invalid/>")
Malformed InputStrict Mode (Default)Lenient Mode
<!-- comment -- inside -->❌ Error✅ Accepts
<1invalid-name>❌ Error✅ Accepts
<valid>text ]]> more</valid>❌ Error✅ Accepts
<?XML version="1.0"?> (wrong case)❌ Error✅ Accepts
standalone="YES" (wrong case)❌ Error✅ Accepts
&undefined; in attributes❌ Error✅ Accepts
External entity in attribute❌ Error✅ Accepts

Rationale: Strict mode by default ensures SweetXml compatibility and full XML 1.0 compliance. Lenient mode is available for processing third-party or legacy XML that may have minor issues.

Obtaining the Test Suite

The W3C/OASIS XML Conformance Test Suite is not included in the RustyXML package to keep the download size small (~50MB of test data). To run the conformance tests locally:

Option 1: Download directly from W3C

mkdir -p test/xmlconf && cd test/xmlconf
curl -LO https://www.w3.org/XML/Test/xmlts20130923.tar.gz
tar -xzf xmlts20130923.tar.gz && rm xmlts20130923.tar.gz

Option 2: Use the convenience script

./scripts/download-xmlconf.sh

The test suite version xmlts20130923 (September 2013) is the latest official release from the W3C. Since XML 1.0 Fifth Edition (2008) has been stable for over 15 years, no updates to the conformance tests have been necessary.

Running the Test Suite

# Run all conformance tests (requires test suite download)
mix test test/oasis_conformance_test.exs

# Run only valid document tests
mix test test/oasis_conformance_test.exs --only valid

# Run only not-well-formed tests
mix test test/oasis_conformance_test.exs --only not_wf

# Include skipped tests (shows full results)
mix test test/oasis_conformance_test.exs --include skip

References

XPath Conformance

XPath compliance is tested against:

  • W3C XPath 1.0 specification examples
  • XSLT/XPath conformance test suite
  • Real-world query patterns from SweetXml users

SweetXml Compatibility

RustyXML is designed as a drop-in replacement for SweetXml.

API Compatibility

FunctionSweetXmlRustyXMLStatus
xpath/2Compatible
xpath/3Compatible
xmap/2Compatible
xmap/3Compatible
~x sigilCompatible
stream_tags/2Compatible
stream_tags/3Compatible

Sigil Modifiers

ModifierSweetXmlRustyXMLStatus
s (string)Compatible
l (list)Compatible
e (entities)Compatible
o (optional)Compatible
i (integer)Compatible
f (float)Compatible
k (keyword)Compatible

Migration from SweetXml

# Before
import SweetXml
doc |> xpath(~x"//item"l)

# After
import RustyXML
doc |> xpath(~x"//item"l)

Cross-Path Validation

All parsing paths produce consistent output for the same input.

PathDescriptionValidates Against
Structural Index (parse/1)Main parse path (~4x input memory)All test suites
Streaming (stream_tags/3)Bounded-memory chunksAll test suites
SAX (sax_parse/1)Event-based processingAll test suites
# Paths are validated in test/rusty_xml_test.exs
test "parse produces consistent output" do
  xml = "<root><item>test</item></root>"
  doc = RustyXML.parse(xml)
  assert is_reference(doc)
end

Streaming Compliance

The streaming parser (stream_tags/3) is validated for:

FeatureStatusNotes
Complete element reconstructionBuilds valid XML strings
Nested element handlingCaptures full subtrees
Whitespace preservationAll whitespace preserved
Attribute handlingAll attributes captured
CDATA sectionsPreserved in output
Entity preservationEntities maintained
Chunk boundary handlingElements spanning chunks work correctly
Early terminationStream.take works without hanging

SweetXml Issue Compatibility

RustyXML's streaming implementation addresses known SweetXml issues:

IssueSweetXmlRustyXMLStatus
#97 - Stream.take hangs❌ Hangs✅ WorksFixed
#50 - Nested text order❌ Wrong order✅ CorrectFixed
Element boundary chunks⚠️ Can fail✅ Handles correctlyFixed

Validation Methodology

Test Data Sources

  1. Synthetic tests - Generated XML covering edge cases
  2. Real-world XML - RSS feeds, configuration files, SOAP messages
  3. Conformance suites - W3C and OASIS standard tests
  4. Fuzz testing - Random input to find parsing errors

Test Execution

  • All tests run on every CI build
  • Cross-platform testing (Linux, macOS, Windows)
  • Multiple Elixir/OTP version matrix
  • Memory leak detection with Valgrind (Rust side)

Reporting Issues

If you find XML that RustyXML doesn't handle correctly:

  1. Create a minimal reproduction case
  2. Open an issue with:
    • Input XML (or link to conformance test)
    • Expected output
    • Actual output
    • RustyXML version

Test Summary

SuiteTestsPurpose
OASIS/W3C Conformance1089Industry-standard XML validation
RustyXML Unit Tests207+API, XPath, streaming, SAX, sigils
Total1296+

Strict Mode Validation

RustyXML's strict mode (default) implements comprehensive XML 1.0 validation:

Well-Formedness Checks

  • ✅ Element and attribute names (XML 1.0 Edition 4 NameStartChar/NameChar)
  • ✅ Comment content (no -- sequences)
  • ✅ Text content (no unescaped ]]>)
  • ✅ Standalone declaration values (yes or no only)
  • ✅ Document structure ordering (XMLDecl → DOCTYPE → root)
  • ✅ Processing instruction target validation (xml reserved)

Entity Validation

  • ✅ Entity registry tracking (declared entities, types, values)
  • ✅ Undefined entity detection in attribute values
  • ✅ Case-sensitive entity matching
  • ✅ External entity detection (SYSTEM/PUBLIC)
  • ✅ WFC: No External Entity References in attributes
  • ✅ Unparsed entity (NDATA) restrictions
  • ✅ Entity replacement text validation:
    • Split character reference detection (&#38; + #)
    • Balanced markup validation
    • Invalid name character detection (CombiningChar as first char)
    • XML declaration in entity prohibition

Not Planned

  • XML 1.1 support - Minimal adoption, incompatible changes
  • External entity resolution - Security concerns (XXE attacks)
  • Full DTD processing - Complexity vs. benefit
  • XPath 2.0 - Different specification, significant effort
  • XSD validation - Out of scope for a parsing library

References