xmlm
For usage examples and strategies for working with this library, check out
the programs located in the test/examples
directory.
Notes
- Don’t forget to work with the
Input
that is returned by any “inputting” function rather than the original. - If something is marked as being “unspecified”, do not depend on it.
It may change at any time without a major version bump. This mainly applies to the various*_to_string
functions.
Types
Type for attributes.
Example
In following XML fragment <fruit color="green">
, the attribute
color="green"
would look like this:
Attribute(name: Name(uri: "", local: "color"), value: "green")
pub type Attribute {
Attribute(name: Name, value: String)
}
Constructors
-
Attribute(name: Name, value: String)
Arguments
-
name
The
name
of theAttribute
-
value
The
value
of theAttribute
-
The type for character encodings
pub type Encoding {
Utf8
Utf16
Utf16Be
Utf16Le
Iso8859x1
Iso8859x15
UsAscii
}
Constructors
-
Utf8
-
Utf16
UTF-16 endianness is determined from the BOM.
-
Utf16Be
UTF-16 big-endian
-
Utf16Le
UTF-16 big-endian
-
Iso8859x1
-
Iso8859x15
-
UsAscii
The type of error returned by any “inputing” functions.
pub opaque type InputError
Type for names of attribute and elements. An empty uri
represents a
name without a namespace, i.e., an unprefixed name that is not under the
scope of a default namespace.
pub type Name {
Name(uri: String, local: String)
}
Constructors
-
Name(uri: String, local: String)
Arguments
-
uri
The URI of the
Name
.Note that this likely* will not be the literal value of the prefix string before the
:
. E.g.,<a xmlns:snazzy="https://www.example.com/snazzy"> <snazzy:b /> </a>
The
b
tag would look something like this:Tag( name: Name(uri: "https://www.example.com/snazzy", local: "b"), attributes: [] )
Note how the
uri
is not"snazzy"
, but"https://www.example.com/snazzy"
.*I say “likely”, because you could define a
namespace_callback
that maps URIs to themselves rather than a URI. -
local
The non-prefixed (i.e.,
local
) part of theName
.
-
The type for signals
A well-formed sequence of signals belongs to the language of the document grammar:
document := Dtd tree ;
tree := ElementStart child ElementEnd ;
child := ( Data trees ) | trees ;
trees := ( tree child ) | epsilon ;
Note the trees
production which expresses the fact there there will never
be two consecutive Data
signals in the children of an element.
The Input
type and functions that work with it deal only with well-formed
signal sequences, else Errors
are returned.
pub type Signal {
Dtd(Option(String))
ElementStart(Tag)
ElementEnd
Data(String)
}
Constructors
-
Dtd(Option(String))
-
ElementStart(Tag)
-
ElementEnd
-
Data(String)
Functions
pub fn attribute_to_string(attribute: Attribute) -> String
Convert attribute
into an unspecified string representation.
pub fn attributes_to_string(
attributes: List(Attribute),
) -> String
Convert attributes
into an unspecified string representation.
pub fn document_tree(
input: Input,
element_callback element_callback: fn(Tag, List(a)) -> a,
data_callback data_callback: fn(String) -> a,
) -> Result(#(Option(String), a, Input), InputError)
xmlm.document_tree(input, element_callback, data_callback)
reads a
complete, well-formed sequence of signals.
See tree for getting a tree produced by a single Signal
.
pub fn eoi(input: Input) -> Result(#(Bool, Input), InputError)
eoi(input)
tells if the end of input is reached.
pub fn fold_signals(
over input: Input,
from acc: a,
with f: fn(a, Signal) -> a,
) -> Result(#(a, Input), InputError)
xmlm.fold_signals(over: input, from: acc, with: f)
reduces the Signals
of the input
to a single value starting with acc
by calling the given
function f
on each Signal
in the input
.
pub fn from_bit_array(source: BitArray) -> Input
xmlm.from_bit_array(source)
returns a new Input
abstraction from the
given source
.
pub fn from_string(source: String) -> Input
xmlm.from_string(source)
returns a new Input
abstraction from the given
source
.
pub fn input_error_to_string(input_error: InputError) -> String
Converts the input_error
into a non-specified human readable format.
pub fn input_to_string(input: Input) -> String
Convert input
into an unspecified string representation.
pub fn name_to_string(name: Name) -> String
Convert name
into an unspecified string representation.
pub fn peek(input: Input) -> Result(#(Signal, Input), InputError)
xmlm.peek(input)
is the same as xmlm.signal(input)
except that
the signal is not removed from the sequence.
pub fn signal(
input: Input,
) -> Result(#(Signal, Input), InputError)
xmlm.signal(input)
inputs a Signal
.
Repeatedly invoking the function with the same input abstraction will either generate a well-formed sequence apple shton of signals or raise an error. Additionally, no two consecutive Data signals can appear in the sequence, and their strings will always be non-empty.
Note: Currently, after a well-formed sequence has been input, another sequence can be input. However, this behavior is deprecated. (It is inherited from the OCaml library on which this one is based, and will chage at some point in the future.)
pub fn signal_to_string(signal: Signal) -> String
Convert signal
into an unspecified string representation.
pub fn signals(
input: Input,
) -> Result(#(List(Signal), Input), InputError)
Return a list of all Signals
in the given input
.
pub fn signals_to_string(signals: List(Signal)) -> String
Convert signals
into an unspecified string representation.
pub fn tag_to_string(tag: Tag) -> String
Convert tag
into an unspecified string representation.
pub fn tree(
input: Input,
element_callback element_callback: fn(Tag, List(a)) -> a,
data_callback data_callback: fn(String) -> a,
) -> Result(#(a, Input), InputError)
xmlm.tree(input, element_callback, data_callback)
inputs signals in
different ways depending on the next signal.
If the next signal is a…
Data
signal,tree
inputs the signal and invokesdata_callback
with the character data of the signal.ElementStart
signal,tree
inputs the sequence of signals until its matchingElementEnd
and envokeselement_callback
anddata_callback
as follows:element_callback
is called on eachElementEnd
signal with the correspondingElementStart
tag and the result of the callback invocation for the element’s children.data_callback
is called on eachData
signal with the character data.
This function won’t be called twice consecutively or with the empty string.
- Other signals, returns an error.
See document_tree for getting the entire document as a tree.
pub fn with_encoding(input: Input, encoding: Encoding) -> Input
xmlm.with_encoding(input)
sets the input
to use the given encoding
.
pub fn with_entity_callback(
input: Input,
entity_callback: fn(String) -> Option(String),
) -> Input
xmlm.with_entity_callback(input, namespace_callback)
sets the input
to
use the given entity_callback
to resolve non-predefined entity references.
Example
Imagine an XML document that looks something like this:
<p> &apple; &pie; </p>
It has non-predifined entity references, and so when parsing, it will give an error. To address this, we could use an entity callback function to resolve these references.
xmlm.from_string(xml_data)
|> xmlm.with_entity_callback(fn(entity_reference) {
case entity_reference {
"apple" -> Some("APPLE!")
"pie" -> Some("PIE!")
_ -> None
}
})
With that entity callback, the parsed Data
signal would look something
like this:
Data("APPLE! PIE!")
pub fn with_namespace_callback(
input: Input,
namespace_callback: fn(String) -> Option(String),
) -> Input
xmlm.with_namespace_callback(input, namespace_callback)
sets the input
to use the given namespace_callback
to bind undeclared namespace prefixes.
Example
Imagine an XML document something like this, that specifies a namespace.
<a xmlns:snazzy="https://www.example.com/snazzy">
<snazzy:b />
</a>
This will parse Ok because the namespace is properly declared.
However, the following XML document would give an error, telling you about
the unknown namespace prefix snazzy
.
<a>
<snazzy:b />
</a>
To address this, you may provide a function to bind undeclared namespace prefixes.
xmlm.from_string(xml_data)
|> xmlm.with_namespace_callback(fn(prefix) {
case prefix {
"snazzy" -> Some("https://www.example.com/snazzy")
_ -> None
}
})
In this way, the snazzy
prefix will be bound and no error will occur.
pub fn with_stripping(input: Input, stripping: Bool) -> Input
xmlm.with_stripping(input, stripping)
sets the input
to use the given
stripping
.