Parse XML documents.
This module contains functions for parsing XML documents. By default, the xmlrat XML parser attempts to parse and expand namespaces, and expands parameters and entities as long as they are small (<4kbytes).
The parser's implementation of XML entities and parameters is a subset of the XML 1.0 specification: parameters may not expand to any part of the definition of another entity or parameter, and parameters and entities must be defined after any other parameter or entity they refer to.
No support for DTDs beyond basic parsing is included, including no support for DTD validation (this has been a source of many vulnerabilities in XML parsers).
No support for following external references of any kind is included, whether via DTDs, parameters or entities.
The parser also refuses to parse large binaries by default (>256kbytes)
but both this limit and the entity size limit can be configured in the
options given to file/2
or string/2
.
file/1
and string/1
functions will attempt basic
character encoding sniffing on the input. This is limited to UTF family
encodings, ASCII and ISO-8859-1 (Latin-1). The parser always normalises
input data to UTF-8 (so binaries within the returned document are UTF-8
encoded).
bytes() = integer()
filename() = string() | binary()
options() = #{entities => #{binary() => term()}, allow_unknown_entities => boolean(), namespaces => #{xmlrat:nsname() => xmlrat:uri()}, expand_namespaces => boolean(), elide_empty_attributes => boolean(), entity_size_limit => bytes(), size_limit => bytes()}
Options which can be given as the final argument to file/2
or
string/2
.
entities
provides an initial set of pre-defined entities
to expand in the document. The basic set of XML 1.0 entities (
lt
, gt
, amp
etc) can be
overridden by this map, as well.allow_unknown_entities
instructs the parser to leave
unresolved {entity, xmlname()}
tuples in attribute values
or element content as they are if the entity definition cannot be
found.namespaces
provides an initial set of pre-defined
namespaces to expand in the document.expand_namespaces
allows disabling namespace parsing and
expansion. If set to false
, the returned document will
have names and tags always in the single-binary or xmlnsname form (see
xmlrat:xmlname()
).elide_empty_attributes
instructs the parser to elide
(remove) any attributes whose value is emptyentity_size_limit
sets the maximum size of entities which
will be expanded. If an entity in the document exceeds this size,
parsing will be aborted and an error will be returned.size_limit
sets the maximum size of XML document which
file/2
or string/2
will parse. Any input binary larger
than this will return an error and not be parsed.clean_whitespace/1 | Removes whitespace from a document, simplifying it for output. |
file/1 | Parses a file as an XML document. |
file/2 | Parses a file as an XML document, with configurable options. |
string/1 | Parses a string as an XML document. |
string/2 | Parses a string as an XML document, with configurable options. |
clean_whitespace(Rest::xmlrat:document()) -> xmlrat:document()
Removes whitespace from a document, simplifying it for output.
Removes whitespace:file(Filename::filename()) -> {ok, xmlrat:document()} | {error, term()}
Parses a file as an XML document.
file(Filename::filename(), Opts::options()) -> {ok, xmlrat:document()} | {error, term()}
Parses a file as an XML document, with configurable options.
string(StringOrBinary::string() | binary()) -> {ok, xmlrat:document()} | {error, term()}
Parses a string as an XML document.
string(String::string() | binary(), Opts::options()) -> {ok, xmlrat:document()} | {error, term()}
Parses a string as an XML document, with configurable options.
Generated by EDoc