xmlm
For usage examples and strategies for working with this library, check out
the programs located in the test/examples directory.
Notes
- Don’t forget to work with the
Inputthat is returned by any “inputting” function rather than the original. - If something is marked as being “unspecified”, do not depend on it.
It may change at any time without a major version bump. This mainly applies to the various*_to_stringfunctions.
Types
Type for attributes.
Example
In following XML fragment <fruit color="green">, the attribute
color="green" would look like this:
Attribute(name: Name(uri: "", local: "color"), value: "green")
pub type Attribute {
Attribute(name: Name, value: String)
}
Constructors
-
Attribute(name: Name, value: String)Arguments
- name
-
The
nameof theAttribute - value
-
The
valueof theAttribute
The type for character encodings
pub type Encoding {
Utf8
Utf16
Utf16Be
Utf16Le
Iso8859x1
Iso8859x15
UsAscii
}
Constructors
-
Utf8 -
Utf16UTF-16 endianness is determined from the BOM.
-
Utf16BeUTF-16 big-endian
-
Utf16LeUTF-16 big-endian
-
Iso8859x1 -
Iso8859x15 -
UsAscii
The type of error returned by any “inputing” functions.
pub opaque type InputError
Type for names of attribute and elements. An empty uri represents a
name without a namespace, i.e., an unprefixed name that is not under the
scope of a default namespace.
pub type Name {
Name(uri: String, local: String)
}
Constructors
-
Name(uri: String, local: String)Arguments
- uri
-
The URI of the
Name.Note that this likely* will not be the literal value of the prefix string before the
:. E.g.,<a xmlns:snazzy="https://www.example.com/snazzy"> <snazzy:b /> </a>The
btag would look something like this:Tag( name: Name(uri: "https://www.example.com/snazzy", local: "b"), attributes: [] )Note how the
uriis not"snazzy", but"https://www.example.com/snazzy".*I say “likely”, because you could define a
namespace_callbackthat maps URIs to themselves rather than a URI. - local
-
The non-prefixed (i.e.,
local) part of theName.
The type for signals
A well-formed sequence of signals belongs to the language of the document grammar:
document := Dtd tree ;
tree := ElementStart child ElementEnd ;
child := ( Data trees ) | trees ;
trees := ( tree child ) | epsilon ;
Note the trees production which expresses the fact there there will never
be two consecutive Data signals in the children of an element.
The Input type and functions that work with it deal only with well-formed
signal sequences, else Errors are returned.
pub type Signal {
Dtd(option.Option(String))
ElementStart(Tag)
ElementEnd
Data(String)
}
Constructors
-
Dtd(option.Option(String)) -
ElementStart(Tag) -
ElementEnd -
Data(String)
Values
pub fn attribute_to_string(attribute: Attribute) -> String
Convert attribute into an unspecified string representation.
pub fn attributes_to_string(
attributes: List(Attribute),
) -> String
Convert attributes into an unspecified string representation.
pub fn document_tree(
input: Input,
element_callback element_callback: fn(Tag, List(a)) -> a,
data_callback data_callback: fn(String) -> a,
) -> Result(#(option.Option(String), a, Input), InputError)
xmlm.document_tree(input, element_callback, data_callback) reads a
complete, well-formed sequence of signals.
See tree for getting a tree produced by a single Signal.
pub fn eoi(input: Input) -> Result(#(Bool, Input), InputError)
eoi(input) tells if the end of input is reached.
pub fn fold_signals(
over input: Input,
from acc: acc,
with f: fn(acc, Signal) -> acc,
) -> Result(#(acc, Input), InputError)
xmlm.fold_signals(over: input, from: acc, with: f) reduces the Signals
of the input to a single value starting with acc by calling the given
function f on each Signal in the input.
pub fn from_bit_array(source: BitArray) -> Input
xmlm.from_bit_array(source) returns a new Input abstraction from the
given source.
pub fn from_string(source: String) -> Input
xmlm.from_string(source) returns a new Input abstraction from the given
source.
pub fn input_error_to_string(input_error: InputError) -> String
Converts the input_error into a non-specified human readable format.
pub fn input_to_string(input: Input) -> String
Convert input into an unspecified string representation.
pub fn name_to_string(name: Name) -> String
Convert name into an unspecified string representation.
pub fn peek(input: Input) -> Result(#(Signal, Input), InputError)
xmlm.peek(input) is the same as xmlm.signal(input) except that
the signal is not removed from the sequence.
pub fn signal(
input: Input,
) -> Result(#(Signal, Input), InputError)
xmlm.signal(input) inputs a Signal.
Repeatedly invoking the function with the same input abstraction will either generate a well-formed sequence apple shton of signals or raise an error. Additionally, no two consecutive Data signals can appear in the sequence, and their strings will always be non-empty.
Note: Currently, after a well-formed sequence has been input, another sequence can be input. However, this behavior is deprecated. (It is inherited from the OCaml library on which this one is based, and will chage at some point in the future.)
pub fn signal_to_string(signal: Signal) -> String
Convert signal into an unspecified string representation.
pub fn signals(
input: Input,
) -> Result(#(List(Signal), Input), InputError)
Return a list of all Signals in the given input.
pub fn signals_to_string(signals: List(Signal)) -> String
Convert signals into an unspecified string representation.
pub fn tag_to_string(tag: Tag) -> String
Convert tag into an unspecified string representation.
pub fn tree(
input: Input,
element_callback element_callback: fn(Tag, List(a)) -> a,
data_callback data_callback: fn(String) -> a,
) -> Result(#(a, Input), InputError)
xmlm.tree(input, element_callback, data_callback) inputs signals in
different ways depending on the next signal.
If the next signal is a…
Datasignal,treeinputs the signal and invokesdata_callbackwith the character data of the signal.ElementStartsignal,treeinputs the sequence of signals until its matchingElementEndand envokeselement_callbackanddata_callbackas follows:element_callbackis called on eachElementEndsignal with the correspondingElementStarttag and the result of the callback invocation for the element’s children.data_callbackis called on eachDatasignal with the character data.
This function won’t be called twice consecutively or with the empty string.
- Other signals, returns an error.
See document_tree for getting the entire document as a tree.
pub fn with_encoding(input: Input, encoding: Encoding) -> Input
xmlm.with_encoding(input) sets the input to use the given encoding.
pub fn with_entity_callback(
input: Input,
entity_callback: fn(String) -> option.Option(String),
) -> Input
xmlm.with_entity_callback(input, namespace_callback) sets the input to
use the given entity_callback to resolve non-predefined entity references.
Example
Imagine an XML document that looks something like this:
<p> &apple; &pie; </p>
It has non-predifined entity references, and so when parsing, it will give an error. To address this, we could use an entity callback function to resolve these references.
xmlm.from_string(xml_data)
|> xmlm.with_entity_callback(fn(entity_reference) {
case entity_reference {
"apple" -> Some("APPLE!")
"pie" -> Some("PIE!")
_ -> None
}
})
With that entity callback, the parsed Data signal would look something
like this:
Data("APPLE! PIE!")
pub fn with_namespace_callback(
input: Input,
namespace_callback: fn(String) -> option.Option(String),
) -> Input
xmlm.with_namespace_callback(input, namespace_callback) sets the input
to use the given namespace_callback to bind undeclared namespace prefixes.
Example
Imagine an XML document something like this, that specifies a namespace.
<a xmlns:snazzy="https://www.example.com/snazzy">
<snazzy:b />
</a>
This will parse Ok because the namespace is properly declared.
However, the following XML document would give an error, telling you about
the unknown namespace prefix snazzy.
<a>
<snazzy:b />
</a>
To address this, you may provide a function to bind undeclared namespace prefixes.
xmlm.from_string(xml_data)
|> xmlm.with_namespace_callback(fn(prefix) {
case prefix {
"snazzy" -> Some("https://www.example.com/snazzy")
_ -> None
}
})
In this way, the snazzy prefix will be bound and no error will occur.