View Source Antikythera.Xml (antikythera v0.5.1)
Convenient XML parser module wrapping fast_xml.
decode/2 can parse XML into Antikythera.Xml.Element.t, and encode/2 can serialize Antikythera.Xml.Element.t back to XML string.
Antikythera.Xml.Element.t is XML element data structure, and it is JSON-convertible struct.
You can safely convert them to JSON using Poison.encode/2 while keeping order of appearance of children,
and also convert them back to Antikythera.Xml.Element.t with Poison.decode/2 and Antikythera.Xml.Element.new/1.
Note that order of attributes will not be preserved, since it is not significant. See here
Namespace of tags (e.g. "ns" in <ns:tag>) are kept as is in :name of elements.
Namespace definitions (e.g. xmlns:ns='http://example.com/ns') are treated as plain attributes,
and kept as is in :attributes of elements.
Access behaviour
Antikythera.Xml.Element implements Access behaviour for convenient lookups and updates.
Following access patterns are available:
element[:name],element[:attributes],element[:children]- Fetch values of fields in dynamic lookup style.
element["@some_attr"]- Fetch value of "some_attr" in
:attributesmap.
- Fetch value of "some_attr" in
element[:texts]- Fetch text (character data) children. It always returns list.
element["some_name"]- Fetch child elements with
name: "some_name". It always returns list.
- Fetch child elements with
You can also use these patterns in Kernel.get_in/2 and its variants.
iex> xml = "<a>foo<b>bar</b>baz</a>"
iex> element = Antikythera.Xml.decode!(xml)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
"foo",
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
"baz",
]}
iex> get_in(element, [:texts])
["foo", "baz"]
iex> get_in(element, ["b", Access.at(0), :texts])
["bar"]
iex> get_and_update_in(element, [:children, Access.at(0)], fn _ -> :pop end)
{"foo",
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
"baz",
]}}
iex> update_in(element, [:children, Access.all()], fn
...> text when is_binary(text) -> %Antikythera.Xml.Element{name: "b", attributes: %{}, children: [text]}
...> e -> e
...> end)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["foo"]},
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["baz"]},
]}
iex> update_in(element, ["@id"], fn _ -> "001" end)
%Antikythera.Xml.Element{name: "a", attributes: %{"id" => "001"}, children: [
"foo",
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
"baz",
]}Notes on updating with Kernel.get_and_update_in/3 and its variants:
- Struct fields are static and cannot be popped.
- Custom access keys except "@some_attr" cannot be used in updating.
Use
:childreninstead, in order to update children while preserving order of appearance.
Summary
Functions
Reads an XML string and parses it into Antikythera.Xml.Element.t.
Serializes Antikythera.Xml.Element.t into XML string.
Types
Functions
@spec decode(String.t(), [decode_option()]) :: Croma.Result.t(Antikythera.Xml.Element.t())
Reads an XML string and parses it into Antikythera.Xml.Element.t.
Comments and header will be discarded.
It can read XHTML document as long as they are well-formatted, though it does not understand Document Type Definition (DTD, header line with "<!DOCTYPE html PUBLIC ..."), so you must remove them.
It tries to read a document with UTF-8 encoding, regardless of "encoding" attribute in the header.
Options:
:trim- Drop whitespace-only texts. Defaultfalse.- There are no universal way to distinguish significant and insignificant whitespaces, so this option may alter the meaning of original document. Use with caution.
- In W3C recommendation, it is stated that whitespace texts (character data) are basically significant and must be preserved.
@spec decode!(String.t(), [decode_option()]) :: Antikythera.Xml.Element.t()
@spec encode(Antikythera.Xml.Element.t(), [encode_option()]) :: String.t()
Serializes Antikythera.Xml.Element.t into XML string.
Specifications:
- Trailing newline will not be generated.
- All single- and double-quotations in attribute values or entity values are escaped to
'and"respectively. - All attribute values are SINGLE-quoted.
- Does not insert a whitespace before "/>" in element without children.
Options:
:pretty- Pretty print with 2-space indents. Defaultfalse.- Similar to
:trimoption indecode/2, inserted whitespaces may be significant, thus it can alter meaning of original document. Use with caution. - It does not insert whitespaces to elements with mixed-content and their descendants, in order to reduce probability to alter the meaning of original document.
- Similar to
:with_header- Prepend<?xml version='1.0' encoding='UTF-8'?>\n. Defaultfalse.