Saxy.SimpleForm (Saxy v1.5.0) View Source

Provides functions to parse a XML document to simple-form data structure.

Data structure

Simple form is a basic representation of the parsed XML document. It contains a root element, and all elements are in the following format:

element = {tag_name, attributes, content}
content = (element | binary | cdata)*

See "Types" section for more information.

Link to this section Summary


Parse given string into simple form.

Link to this section Types


attributes() :: [{name :: String.t(), value :: String.t()}]


content() :: [String.t() | {:cdata, String.t()} | t()]


t() :: {tag_name(), attributes(), content()}


tag_name() :: String.t()

Link to this section Functions

Link to this function

parse_string(data, options \\ [])

View Source


parse_string(data :: binary(), options :: Keyword.t()) ::
  {:ok, t()} | {:error, exception :: Saxy.ParseError.t()}

Parse given string into simple form.


  • :expand_entity - specifies how external entity references should be handled. Three supported strategies respectively are:
    • :keep - keep the original binary, for example Orange ® will be expanded to "Orange ®", this is the default strategy.
    • :skip - skip the original binary, for example Orange ® will be expanded to "Orange ".
    • {mod, fun, args} - take the applied result of the specified MFA.
    • :never - keep the original binary, including predefined entity reference, e.g. "Orange &" will remain "Orange &"
  • :cdata_as_characters - true to return CData as characters, false to wrap CData as {:cdata, data}. Defaults to true.

Note that it is recommended to disable :cdata_as_characters if the outcome simple form data is meant to be re-encoded later. Consider the following example, the encoded document has different sematics from the original one.

iex> xml = "<foo><![CDATA[<greeting>Hello, world!</greeting>]]></foo>"
iex> {:ok, simple_form} = Saxy.SimpleForm.parse_string(xml, cdata_as_characters: true)
{:ok, {"foo", [], ["<greeting>Hello, world!</greeting>"]}}
iex> Saxy.encode!(simple_form)
"<foo><greeting>Hello, world!</greeting></foo>"


Given this XML document.

iex> xml = """
...> <?xml version="1.0" encoding="utf-8" ?>
...> <menu>
...>   <movie url="" id="tt0120338">
...>     <name>Titanic</name>
...>     <characters>Jack &amp; Rose</characters>
...>   </movie>
...>   <movie url="" id="tt0109830">
...>     <name>Forest Gump</name>
...>     <characters>Forest &amp; Jenny</characters>
...>   </movie>
...> </menu>
...> """
iex> Saxy.SimpleForm.parse_string(xml)
 {"menu", [],
    "\n  ",
        {"url", ""},
        {"id", "tt0120338"}
        "\n    ",
        {"name", [], ["Titanic"]},
        "\n    ",
        {"characters", [], ["Jack & Rose"]},
        "\n  "]
    "\n  ",
        {"url", ""},
        {"id", "tt0109830"}
        "\n    ",
        {"name", [], ["Forest Gump"]},
        "\n    ",
        {"characters", [], ["Forest & Jenny"]},
        "\n  "