XmlQuery

A concise API for querying XML. There are just 5 main functions: all/2, find/2 and find!/2 for finding things, plus attr/2 and text/1 for extracting information. There are also a handful of other useful functions, referenced below and described in detail in the module docs. XML parsing is handled by Erlang/OTP’s built-in xmerl library.

The input can be:

  • A string of XML.
  • An Xmerl.xml_attribute(), Xmerl.xml_document(), Xmerl.xml_element(), or Xmerl.xml_text().
  • An XmlQuery.Element struct.
  • Anything that implements the String.Chars protocol.

We created a related library called HtmlQuery which has the same API but is used for querying HTML.

This library is MIT licensed and is part of a growing number of Elixir open source libraries published at github.com/synchronal.

Installation

def deps do
  [
    {:xml_query, "~> 0.2.1"}
  ]
end

Usage

Detailed docs are in the XmlQuery module docs; a quick usage overview follows.

We typically alias XmlQuery to Xq:

alias XmlQuery, as: Xq

The rest of these examples use the following XML:

xml = """
<?xml version="1.0"?>
<family>
  <child age="12" name="Alice" />
  <child age="9" name="Billy">
    <toys>
      <toy name="Voltron" part-number="voltr-123">
        <limb>Leg</limb>
        <animal>Lion</animal>
        <color>Blue</color>
      </toy>
    </toys>
  </child>
</family>
"""

Querying

Query functions use XPath selector strings for finding nodes. The Devhints XPath cheatsheet is a helpful XPath reference.

Xq.find!(xml, "//toy/animal")

The result of a query is an XmlQuery.Element struct, a list of XmlQuery.Element structs, an XmlQuery.Attribute struct, or an XmlQuery.Text struct. All of these structs implement String.Chars so you can convert them to strings with to_string/1:

Xq.find!(xml, "//toy/animal") |> to_string() # returns "<animal>Lion</animal>"

Finding

Xq.all(xml, "//child") # returns a list of all the <child> elements
Xq.find(xml, ~s|//child[@name="Alice"]|) # returns the <child> with name "Alice"
Xq.find!(xml, ~s|//child[@name="foo"]|) # raises because no such element exists

See the module docs for more details.

Extracting

text/1 is the simplest extraction function:

xml |> Xq.find!("//toy") |> Xq.text() # returns "Leg Lion Blue"

attr/2 returns the value of an attribute:

xml |> Xq.find!(~s|//toy[@part-number="voltr-123"]|) |> Xq.attr("name") # returns "Voltron"

To extract data from multiple XML nodes, we found that it is clearer to compose multiple functions rather than to have a more complicated API:

xml |> Xq.all("//toy/*") |> Enum.map(&Xq.text/1) # returns ["Leg", "Lion", "Blue"]
xml |> Xq.all("//child") |> Enum.map(&Xq.attr(&1, "age")) # returns ["12", "9"]

See the module docs for more details.

Parsing

parse/1 parses an XML document and returns an XmlQuery.Element struct. It is rarely needed since all the XmlQuery functions will parse XML if needed. See the module docs for more details.

Utilities

pretty/1 formats XML in a human-friendly format. See the module docs for more details.

Development

bin/dev/doctor
bin/dev/update
bin/dev/audit
bin/dev/shipit

References