View Source Antikythera.Xml (antikythera v0.5.1)

Convenient XML parser module wrapping fast_xml.

decode/2 can parse XML into Antikythera.Xml.Element.t, and encode/2 can serialize Antikythera.Xml.Element.t back to XML string.

Antikythera.Xml.Element.t is XML element data structure, and it is JSON-convertible struct. You can safely convert them to JSON using Poison.encode/2 while keeping order of appearance of children, and also convert them back to Antikythera.Xml.Element.t with Poison.decode/2 and Antikythera.Xml.Element.new/1.

Note that order of attributes will not be preserved, since it is not significant. See here

Namespace of tags (e.g. "ns" in <ns:tag>) are kept as is in :name of elements.

Namespace definitions (e.g. xmlns:ns='http://example.com/ns') are treated as plain attributes, and kept as is in :attributes of elements.

Access behaviour

Antikythera.Xml.Element implements Access behaviour for convenient lookups and updates. Following access patterns are available:

  • element[:name], element[:attributes], element[:children]
    • Fetch values of fields in dynamic lookup style.
  • element["@some_attr"]
    • Fetch value of "some_attr" in :attributes map.
  • element[:texts]
    • Fetch text (character data) children. It always returns list.
  • element["some_name"]
    • Fetch child elements with name: "some_name". It always returns list.

You can also use these patterns in Kernel.get_in/2 and its variants.

iex> xml = "<a>foo<b>bar</b>baz</a>"
iex> element = Antikythera.Xml.decode!(xml)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
  "foo",
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
  "baz",
]}
iex> get_in(element, [:texts])
["foo", "baz"]
iex> get_in(element, ["b", Access.at(0), :texts])
["bar"]
iex> get_and_update_in(element, [:children, Access.at(0)], fn _ -> :pop end)
{"foo",
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
  "baz",
]}}
iex> update_in(element, [:children, Access.all()], fn
...>   text when is_binary(text) -> %Antikythera.Xml.Element{name: "b", attributes: %{}, children: [text]}
...>   e -> e
...> end)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["foo"]},
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["baz"]},
]}
iex> update_in(element, ["@id"], fn _ -> "001" end)
%Antikythera.Xml.Element{name: "a", attributes: %{"id" => "001"}, children: [
  "foo",
  %Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
  "baz",
]}

Notes on updating with Kernel.get_and_update_in/3 and its variants:

  • Struct fields are static and cannot be popped.
  • Custom access keys except "@some_attr" cannot be used in updating. Use :children instead, in order to update children while preserving order of appearance.

Summary

Functions

Reads an XML string and parses it into Antikythera.Xml.Element.t.

Serializes Antikythera.Xml.Element.t into XML string.

Types

@type decode_option() :: {:trim, boolean()}
@type encode_option() :: {:pretty | :with_header, boolean()}

Functions

Link to this function

decode(xml_string, opts \\ [])

View Source

Reads an XML string and parses it into Antikythera.Xml.Element.t.

Comments and header will be discarded.

It can read XHTML document as long as they are well-formatted, though it does not understand Document Type Definition (DTD, header line with "<!DOCTYPE html PUBLIC ..."), so you must remove them.

It tries to read a document with UTF-8 encoding, regardless of "encoding" attribute in the header.

Options:

  • :trim - Drop whitespace-only texts. Default false.
    • There are no universal way to distinguish significant and insignificant whitespaces, so this option may alter the meaning of original document. Use with caution.
    • In W3C recommendation, it is stated that whitespace texts (character data) are basically significant and must be preserved.
@spec decode!(String.t(), [decode_option()]) :: Antikythera.Xml.Element.t()
Link to this function

encode(xml_element, opts \\ [])

View Source

Serializes Antikythera.Xml.Element.t into XML string.

Specifications:

  • Trailing newline will not be generated.
  • All single- and double-quotations in attribute values or entity values are escaped to &apos; and &quot; respectively.
  • All attribute values are SINGLE-quoted.
  • Does not insert a whitespace before "/>" in element without children.

Options:

  • :pretty - Pretty print with 2-space indents. Default false.
    • Similar to :trim option in decode/2, inserted whitespaces may be significant, thus it can alter meaning of original document. Use with caution.
    • It does not insert whitespaces to elements with mixed-content and their descendants, in order to reduce probability to alter the meaning of original document.
  • :with_header - Prepend <?xml version='1.0' encoding='UTF-8'?>\n. Default false.