xmlm

For usage examples and strategies for working with this library, check out the programs located in the test/examples directory.

Notes

Types

Type for attributes.

Example

In following XML fragment <fruit color="green">, the attribute color="green" would look like this:

Attribute(name: Name(uri: "", local: "color"), value: "green")
pub type Attribute {
  Attribute(name: Name, value: String)
}

Constructors

  • Attribute(name: Name, value: String)

    Arguments

    • name

      The name of the Attribute

    • value

      The value of the Attribute

The type for character encodings

pub type Encoding {
  Utf8
  Utf16
  Utf16Be
  Utf16Le
  Iso8859x1
  Iso8859x15
  UsAscii
}

Constructors

  • Utf8
  • Utf16

    UTF-16 endianness is determined from the BOM.

  • Utf16Be

    UTF-16 big-endian

  • Utf16Le

    UTF-16 big-endian

  • Iso8859x1
  • Iso8859x15
  • UsAscii

The type for input abstractions.

pub opaque type Input

The type of error returned by any “inputing” functions.

pub opaque type InputError

Type for names of attribute and elements. An empty uri represents a name without a namespace, i.e., an unprefixed name that is not under the scope of a default namespace.

pub type Name {
  Name(uri: String, local: String)
}

Constructors

  • Name(uri: String, local: String)

    Arguments

    • uri

      The URI of the Name.

      Note that this likely* will not be the literal value of the prefix string before the :. E.g.,

      <a xmlns:snazzy="https://www.example.com/snazzy">
        <snazzy:b />
      </a>
      

      The b tag would look something like this:

      Tag(
        name: Name(uri: "https://www.example.com/snazzy", local: "b"), 
        attributes: []
      )
      

      Note how the uri is not "snazzy", but "https://www.example.com/snazzy".

      *I say “likely”, because you could define a namespace_callback that maps URIs to themselves rather than a URI.

    • local

      The non-prefixed (i.e., local) part of the Name.

The type for signals

A well-formed sequence of signals belongs to the language of the document grammar:

document := Dtd tree ;
tree     := ElementStart child ElementEnd ;
child    := ( Data trees ) | trees ;
trees    := ( tree child ) | epsilon ;

Note the trees production which expresses the fact there there will never be two consecutive Data signals in the children of an element.

The Input type and functions that work with it deal only with well-formed signal sequences, else Errors are returned.

pub type Signal {
  Dtd(Option(String))
  ElementStart(Tag)
  ElementEnd
  Data(String)
}

Constructors

  • Dtd(Option(String))
  • ElementStart(Tag)
  • ElementEnd
  • Data(String)

The type for an element tag.

pub type Tag {
  Tag(name: Name, attributes: List(Attribute))
}

Constructors

  • Tag(name: Name, attributes: List(Attribute))

    Arguments

    • name

      Name of the tag

    • attributes

      Attribute list of the tag

Functions

pub fn attribute_to_string(attribute: Attribute) -> String

Convert attribute into an unspecified string representation.

pub fn attributes_to_string(
  attributes: List(Attribute),
) -> String

Convert attributes into an unspecified string representation.

pub fn document_tree(
  input: Input,
  element_callback element_callback: fn(Tag, List(a)) -> a,
  data_callback data_callback: fn(String) -> a,
) -> Result(#(Option(String), a, Input), InputError)

xmlm.document_tree(input, element_callback, data_callback) reads a complete, well-formed sequence of signals.

See tree for getting a tree produced by a single Signal.

pub fn eoi(input: Input) -> Result(#(Bool, Input), InputError)

eoi(input) tells if the end of input is reached.

pub fn fold_signals(
  over input: Input,
  from acc: a,
  with f: fn(a, Signal) -> a,
) -> Result(#(a, Input), InputError)

xmlm.fold_signals(over: input, from: acc, with: f) reduces the Signals of the input to a single value starting with acc by calling the given function f on each Signal in the input.

pub fn from_bit_array(source: BitArray) -> Input

xmlm.from_bit_array(source) returns a new Input abstraction from the given source.

pub fn from_string(source: String) -> Input

xmlm.from_string(source) returns a new Input abstraction from the given source.

pub fn input_error_to_string(input_error: InputError) -> String

Converts the input_error into a non-specified human readable format.

pub fn input_to_string(input: Input) -> String

Convert input into an unspecified string representation.

pub fn name_to_string(name: Name) -> String

Convert name into an unspecified string representation.

pub fn peek(input: Input) -> Result(#(Signal, Input), InputError)

xmlm.peek(input) is the same as xmlm.signal(input) except that the signal is not removed from the sequence.

pub fn signal(
  input: Input,
) -> Result(#(Signal, Input), InputError)

xmlm.signal(input) inputs a Signal.

Repeatedly invoking the function with the same input abstraction will either generate a well-formed sequence apple shton of signals or raise an error. Additionally, no two consecutive Data signals can appear in the sequence, and their strings will always be non-empty.

Note: Currently, after a well-formed sequence has been input, another sequence can be input. However, this behavior is deprecated. (It is inherited from the OCaml library on which this one is based, and will chage at some point in the future.)

pub fn signal_to_string(signal: Signal) -> String

Convert signal into an unspecified string representation.

pub fn signals(
  input: Input,
) -> Result(#(List(Signal), Input), InputError)

Return a list of all Signals in the given input.

pub fn signals_to_string(signals: List(Signal)) -> String

Convert signals into an unspecified string representation.

pub fn tag_to_string(tag: Tag) -> String

Convert tag into an unspecified string representation.

pub fn tree(
  input: Input,
  element_callback element_callback: fn(Tag, List(a)) -> a,
  data_callback data_callback: fn(String) -> a,
) -> Result(#(a, Input), InputError)

xmlm.tree(input, element_callback, data_callback) inputs signals in different ways depending on the next signal.

If the next signal is a…

  • Data signal, tree inputs the signal and invokes data_callback with the character data of the signal.
  • ElementStart signal, tree inputs the sequence of signals until its matching ElementEnd and envokes element_callback and data_callback as follows:
    • element_callback is called on each ElementEnd signal with the corresponding ElementStart tag and the result of the callback invocation for the element’s children.
    • data_callback is called on each Data signal with the character data.
      This function won’t be called twice consecutively or with the empty string.
  • Other signals, returns an error.

See document_tree for getting the entire document as a tree.

pub fn with_encoding(input: Input, encoding: Encoding) -> Input

xmlm.with_encoding(input) sets the input to use the given encoding.

pub fn with_entity_callback(
  input: Input,
  entity_callback: fn(String) -> Option(String),
) -> Input

xmlm.with_entity_callback(input, namespace_callback) sets the input to use the given entity_callback to resolve non-predefined entity references.

Example

Imagine an XML document that looks something like this:

<p> &apple; &pie; </p>

It has non-predifined entity references, and so when parsing, it will give an error. To address this, we could use an entity callback function to resolve these references.

xmlm.from_string(xml_data)
|> xmlm.with_entity_callback(fn(entity_reference) {
  case entity_reference {
    "apple" -> Some("APPLE!")
    "pie" -> Some("PIE!")
    _ -> None
  }
})

With that entity callback, the parsed Data signal would look something like this:

Data("APPLE! PIE!")
pub fn with_namespace_callback(
  input: Input,
  namespace_callback: fn(String) -> Option(String),
) -> Input

xmlm.with_namespace_callback(input, namespace_callback) sets the input to use the given namespace_callback to bind undeclared namespace prefixes.

Example

Imagine an XML document something like this, that specifies a namespace.

<a xmlns:snazzy="https://www.example.com/snazzy">
  <snazzy:b />
</a>

This will parse Ok because the namespace is properly declared.

However, the following XML document would give an error, telling you about the unknown namespace prefix snazzy.

<a>
  <snazzy:b />
</a>

To address this, you may provide a function to bind undeclared namespace prefixes.

xmlm.from_string(xml_data)
|> xmlm.with_namespace_callback(fn(prefix) {
  case prefix {
    "snazzy" -> Some("https://www.example.com/snazzy")
    _ -> None
  }
})

In this way, the snazzy prefix will be bound and no error will occur.

pub fn with_stripping(input: Input, stripping: Bool) -> Input

xmlm.with_stripping(input, stripping) sets the input to use the given stripping.

Search Document