View Source Antikythera.Xml (antikythera v0.5.1)
Convenient XML parser module wrapping fast_xml.
decode/2
can parse XML into Antikythera.Xml.Element.t
, and encode/2
can serialize Antikythera.Xml.Element.t
back to XML string.
Antikythera.Xml.Element.t
is XML element data structure, and it is JSON-convertible struct.
You can safely convert them to JSON using Poison.encode/2
while keeping order of appearance of children,
and also convert them back to Antikythera.Xml.Element.t
with Poison.decode/2
and Antikythera.Xml.Element.new/1
.
Note that order of attributes will not be preserved, since it is not significant. See here
Namespace of tags (e.g. "ns" in <ns:tag>
) are kept as is in :name
of elements.
Namespace definitions (e.g. xmlns:ns='http://example.com/ns'
) are treated as plain attributes,
and kept as is in :attributes
of elements.
Access
behaviour
Antikythera.Xml.Element
implements Access
behaviour for convenient lookups and updates.
Following access patterns are available:
element[:name]
,element[:attributes]
,element[:children]
- Fetch values of fields in dynamic lookup style.
element["@some_attr"]
- Fetch value of "some_attr" in
:attributes
map.
- Fetch value of "some_attr" in
element[:texts]
- Fetch text (character data) children. It always returns list.
element["some_name"]
- Fetch child elements with
name: "some_name"
. It always returns list.
- Fetch child elements with
You can also use these patterns in Kernel.get_in/2
and its variants.
iex> xml = "<a>foo<b>bar</b>baz</a>"
iex> element = Antikythera.Xml.decode!(xml)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
"foo",
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
"baz",
]}
iex> get_in(element, [:texts])
["foo", "baz"]
iex> get_in(element, ["b", Access.at(0), :texts])
["bar"]
iex> get_and_update_in(element, [:children, Access.at(0)], fn _ -> :pop end)
{"foo",
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
"baz",
]}}
iex> update_in(element, [:children, Access.all()], fn
...> text when is_binary(text) -> %Antikythera.Xml.Element{name: "b", attributes: %{}, children: [text]}
...> e -> e
...> end)
%Antikythera.Xml.Element{name: "a", attributes: %{}, children: [
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["foo"]},
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["baz"]},
]}
iex> update_in(element, ["@id"], fn _ -> "001" end)
%Antikythera.Xml.Element{name: "a", attributes: %{"id" => "001"}, children: [
"foo",
%Antikythera.Xml.Element{name: "b", attributes: %{}, children: ["bar"]},
"baz",
]}
Notes on updating with Kernel.get_and_update_in/3
and its variants:
- Struct fields are static and cannot be popped.
- Custom access keys except "@some_attr" cannot be used in updating.
Use
:children
instead, in order to update children while preserving order of appearance.
Summary
Functions
Reads an XML string and parses it into Antikythera.Xml.Element.t
.
Serializes Antikythera.Xml.Element.t
into XML string.
Types
Functions
@spec decode(String.t(), [decode_option()]) :: Croma.Result.t(Antikythera.Xml.Element.t())
Reads an XML string and parses it into Antikythera.Xml.Element.t
.
Comments and header will be discarded.
It can read XHTML document as long as they are well-formatted, though it does not understand Document Type Definition (DTD, header line with "<!DOCTYPE html PUBLIC ..."), so you must remove them.
It tries to read a document with UTF-8 encoding, regardless of "encoding" attribute in the header.
Options:
:trim
- Drop whitespace-only texts. Defaultfalse
.- There are no universal way to distinguish significant and insignificant whitespaces, so this option may alter the meaning of original document. Use with caution.
- In W3C recommendation, it is stated that whitespace texts (character data) are basically significant and must be preserved.
@spec decode!(String.t(), [decode_option()]) :: Antikythera.Xml.Element.t()
@spec encode(Antikythera.Xml.Element.t(), [encode_option()]) :: String.t()
Serializes Antikythera.Xml.Element.t
into XML string.
Specifications:
- Trailing newline will not be generated.
- All single- and double-quotations in attribute values or entity values are escaped to
'
and"
respectively. - All attribute values are SINGLE-quoted.
- Does not insert a whitespace before "/>" in element without children.
Options:
:pretty
- Pretty print with 2-space indents. Defaultfalse
.- Similar to
:trim
option indecode/2
, inserted whitespaces may be significant, thus it can alter meaning of original document. Use with caution. - It does not insert whitespaces to elements with mixed-content and their descendants, in order to reduce probability to alter the meaning of original document.
- Similar to
:with_header
- Prepend<?xml version='1.0' encoding='UTF-8'?>\n
. Defaultfalse
.