Module xmlrat_parse

Parse XML documents.

Description

Parse XML documents.

This module contains functions for parsing XML documents. By default, the xmlrat XML parser attempts to parse and expand namespaces, and expands parameters and entities as long as they are small (<4kbytes).

The parser's implementation of XML entities and parameters is a subset of the XML 1.0 specification: parameters may not expand to any part of the definition of another entity or parameter, and parameters and entities must be defined after any other parameter or entity they refer to.

No support for DTDs beyond basic parsing is included, including no support for DTD validation (this has been a source of many vulnerabilities in XML parsers).

No support for following external references of any kind is included, whether via DTDs, parameters or entities.

The parser also refuses to parse large binaries by default (>256kbytes) but both this limit and the entity size limit can be configured in the options given to file/2 or string/2.

Both the file/1 and string/1 functions will attempt basic character encoding sniffing on the input. This is limited to UTF family encodings, ASCII and ISO-8859-1 (Latin-1). The parser always normalises input data to UTF-8 (so binaries within the returned document are UTF-8 encoded).

Data Types

bytes()

bytes() = integer()

filename()

filename() = string() | binary()

options()

options() = #{entities => #{binary() => term()}, allow_unknown_entities => boolean(), namespaces => #{xmlrat:nsname() => xmlrat:uri()}, expand_namespaces => boolean(), elide_empty_attributes => boolean(), entity_size_limit => bytes(), size_limit => bytes()}

Options which can be given as the final argument to file/2 or string/2.

Function Index

clean_whitespace/1Removes whitespace from a document, simplifying it for output.
file/1Parses a file as an XML document.
file/2Parses a file as an XML document, with configurable options.
string/1Parses a string as an XML document.
string/2Parses a string as an XML document, with configurable options.

Function Details

clean_whitespace/1

clean_whitespace(Rest::xmlrat:document()) -> xmlrat:document()

Removes whitespace from a document, simplifying it for output.

Removes whitespace:

file/1

file(Filename::filename()) -> {ok, xmlrat:document()} | {error, term()}

Parses a file as an XML document.

file/2

file(Filename::filename(), Opts::options()) -> {ok, xmlrat:document()} | {error, term()}

Parses a file as an XML document, with configurable options.

string/1

string(StringOrBinary::string() | binary()) -> {ok, xmlrat:document()} | {error, term()}

Parses a string as an XML document.

string/2

string(String::string() | binary(), Opts::options()) -> {ok, xmlrat:document()} | {error, term()}

Parses a string as an XML document, with configurable options.


Generated by EDoc