sweet_xml v0.6.6 SweetXml

SweetXml is a thin wrapper around :xmerl. It allows you to convert a string or xmlElement record as defined in :xmerl to an elixir value such as map, list, char_list, or any combination of these.

For normal sized documents, SweetXml primarily exposes 3 functions

For something larger, SweetXml mainly exposes 1 function

  • SweetXml.stream_tags/3 - stream a given tag or a list of tags, and optionally "discard" some dom elements in order to free memory during streaming for big files which cannot fit entirely in memory

Examples

Simple Xpath

iex> import SweetXml
iex> doc = "<h1><a>Some linked title</a></h1>"
iex> doc |> xpath(~x"//a/text()")
'Some linked title'

Nested Mapping

iex> import SweetXml
iex> doc = "<body><header><p>Message</p><ul><li>One</li><li><a>Two</a></li></ul></header></body>"
iex> doc |> xpath(~x"//header", message: ~x"./p/text()", a_in_li: ~x".//li/a/text()"l)
%{a_in_li: ['Two'], message: 'Message'}

Streaming

iex> import SweetXml
iex> doc = ["<ul><li>l1</li><li>l2", "</li><li>l3</li></ul>"]
iex> SweetXml.stream_tags(doc, :li)
...> |> Stream.map(fn {:li, doc} ->
...>      doc |> SweetXml.xpath(~x"./text()")
...>    end)
...> |> Enum.to_list
['l1', 'l2', 'l3']

For more examples please see help for each individual functions

The ~x Sigil

Notice in the above examples, we used the expression ~x"//a/text()" to define the path. The reason is it allows us to more precisely specify what is being returned.

  • ~x"//some/path"

    without any modifiers, xpath/2 will return the value of the entity if the entity is of type xmlText, xmlAttribute, xmlPI, xmlComment as defined in :xmerl

  • ~x"//some/path"e

    e stands for (e)ntity. This forces xpath/2 to return the entity with which you can further chain your xpath/2 call

  • ~x"//some/path"l

    'l' stands for (l)ist. This forces xpath/2 to return a list. Without l, xpath/2 will only return the first element of the match

  • ~x"//some/path"el - mix of the above

  • ~x"//some/path"k

    'k' stands for (K)eyword. This forces xpath/2 to return a Keyword instead of a Map.

  • ~x"//some/path"s

    's' stands for (s)tring. This forces xpath/2 to return the value as string instead of a char list.

  • x"//some/path"o

    'o' stands for (O)ptional. This allows the path to not exist, and will return nil.

  • ~x"//some/path"sl - string list.

Notice also in the examples section, we always import SweetXml first. This makes x_sigil available in the current scope. Without it, instead of using ~x, you can do the following

iex> doc = "<h1><a>Some linked title</a></h1>"
iex> doc |> SweetXml.xpath(%SweetXpath{path: '//a/text()', is_value: true, cast_to: false, is_list: false, is_keyword: false})
'Some linked title'

Note the use of char_list in the path definition.

Link to this section Summary

Functions

doc can be

sigil_x/2 simply returns a SweetXpath struct, with modifiers converted to boolean fields

Create an element stream from a xml doc

Most common usage of streaming: stream a given tag or a list of tags, and optionally "discard" some dom elements in order to free memory during streaming for big files which cannot fit entirely in memory

Tags %SweetXpath{} with fun to be applied at the end of xpath query

xmap returns a mapping with each value being the result of xpath

xpath allows you to query an xml document with xpath

Link to this section Functions

Link to this function

add_namespace(xpath, prefix, uri)

doc can be

  • a byte list (iodata)
  • a binary
  • any enumerable of binaries (for instance File.stream!/3 result)

options are xmerl options described here http://www.erlang.org/doc/man/xmerl_scan.html, see the erlang tutorial for usage.

When doc is an enumerable, the :cont_fun option cannot be given.

Return an xmlElement record

Link to this function

parse(doc, options)

Link to this function

sigil_x(path, modifiers \\ [])

sigil_x/2 simply returns a SweetXpath struct, with modifiers converted to boolean fields

iex> SweetXml.sigil_x("//some/path", 'e')
%SweetXpath{path: '//some/path', is_value: false, cast_to: false, is_list: false, is_keyword: false}

or you can simply import and use the ~x expression

iex> import SweetXml
iex> ~x"//some/path"e
%SweetXpath{path: '//some/path', is_value: false, cast_to: false, is_list: false, is_keyword: false}

Valid modifiers are e, s, l and k. Below is the full explanation

  • ~x"//some/path"

    without any modifiers, xpath/2 will return the value of the entity if the entity is of type xmlText, xmlAttribute, xmlPI, xmlComment as defined in :xmerl

  • ~x"//some/path"e

    e stands for (e)ntity. This forces xpath/2 to return the entity with which you can further chain your xpath/2 call

  • ~x"//some/path"l

    'l' stands for (l)ist. This forces xpath/2 to return a list. Without l, xpath/2 will only return the first element of the match

  • ~x"//some/path"el - mix of the above

  • ~x"//some/path"k

    'k' stands for (K)eyword. This forces xpath/2 to return a Keyword instead of a Map.

  • ~x"//some/path"s

    's' stands for (s)tring. This forces xpath/2 to return the value as string instead of a char list.

  • x"//some/path"o

    'o' stands for (O)ptional. This allows the path to not exist, and will return nil.

  • ~x"//some/path"sl - string list.

  • ~x"//some/path"i

    'i' stands for (i)nteger. This forces xpath/2 to return the value as integer instead of a char list.

  • ~x"//some/path"f

    'f' stands for (f)loat. This forces xpath/2 to return the value as float instead of a char list.

  • ~x"//some/path"il - integer list

Link to this function

stream(doc, options_callback)

Create an element stream from a xml doc.

This is a lower level API compared to SweetXml.stream_tags. You can use the options_callback argument to get fine control of what data to be streamed.

  • doc is an enumerable, data will be pulled during the result stream enumeration. e.g. File.stream!("some_file.xml")
  • options_callback is an anonymous function fn emit -> xmerl_opts use it to define your :xmerl callbacks and put data into the stream using emit.(elem) in the callbacks.

For example, here you define a stream of all xmlElement :

iex> import Record
iex> doc = ["<h1", "><a>Som", "e linked title</a><a>other</a></h1>"]
iex> SweetXml.stream(doc, fn emit ->
...>   [
...>     hook_fun: fn
...>       entity, xstate when is_record(entity, :xmlElement)->
...>         emit.(entity)
...>         {entity, xstate}
...>       entity, xstate ->
...>         {entity,xstate}
...>     end
...>   ]
...> end) |> Enum.count
3
Link to this function

stream_tags(doc, tags, options \\ [])

Most common usage of streaming: stream a given tag or a list of tags, and optionally "discard" some dom elements in order to free memory during streaming for big files which cannot fit entirely in memory.

Note that each matched tag produces it's own tree. If a given tag appears in the discarded options, it is ignored.

  • doc is an enumerable, data will be pulled during the result stream enumeration. e.g. File.stream!("some_file.xml")
  • tags is an atom or a list of atoms you want to extract. Each stream element will be {:tagname, xmlelem}. e.g. :li, :header
  • options[:discard] is the list of tag which will be discarded: not added to its parent DOM.

Examples:

iex> import SweetXml
iex> doc = ["<ul><li>l1</li><li>l2", "</li><li>l3</li></ul>"]
iex> SweetXml.stream_tags(doc, :li, discard: [:li])
...> |> Stream.map(fn {:li, doc} -> doc |> SweetXml.xpath(~x"./text()") end)
...> |> Enum.to_list
['l1', 'l2', 'l3']
iex> SweetXml.stream_tags(doc, [:ul, :li])
...> |> Stream.map(fn {_, doc} -> doc |> SweetXml.xpath(~x"./text()") end)
...> |> Enum.to_list
['l1', 'l2', 'l3', nil]

Becareful if you set options[:discard]. If any of the discarded tags is nested inside a kept tag, you will not be able to access them.

Examples:

iex> import SweetXml
iex> doc = ["<header>", "<title>XML</title", "><header><title>Nested</title></header></header>"]
iex> SweetXml.stream_tags(doc, :header)
...> |> Stream.map(fn {_, doc} -> SweetXml.xpath(doc, ~x".//title/text()") end)
...> |> Enum.to_list
['Nested', 'XML']
iex> SweetXml.stream_tags(doc, :header, discard: [:title])
...> |> Stream.map(fn {_, doc} -> SweetXml.xpath(doc, ~x"./title/text()") end)
...> |> Enum.to_list
[nil, nil]
Link to this function

transform_by(sweet_xpath, fun)

Tags %SweetXpath{} with fun to be applied at the end of xpath query.

Examples

iex> import SweetXml iex> string_to_range = fn str -> ...> [first, last] = str |> String.split("-", trim: true) |> Enum.map(&String.to_integer/1) ...> first..last ...> end iex> doc = "north5-15" iex> doc ...> |> xpath( ...> ~x"//weather/zone"l, ...> name: ~x"//name/text()"s |> transform_by(&String.capitalize/1), ...> wind_speed: ~x"./wind-speed/text()"s |> transform_by(string_to_range) ...> ) [%{name: "North", wind_speed: 5..15}]

Link to this function

xmap(parent, mapping)

xmap returns a mapping with each value being the result of xpath

Just as xpath, you can nest the mapping structure. Please see xpath for more detail.

Examples

Simple

iex> import SweetXml
iex> doc = "<h1><a>Some linked title</a></h1>"
iex> doc |> xmap(a: ~x"//a/text()")
%{a: 'Some linked title'}

With optional mapping

iex> import SweetXml
iex> doc = "<body><header><p>Message</p><ul><li>One</li><li><a>Two</a></li></ul></header></body>"
iex> doc |> xmap(message: ~x"//p/text()", a_in_li: ~x".//li/a/text()"l)
%{a_in_li: ['Two'], message: 'Message'}

With optional mapping and nesting

iex> import SweetXml
iex> doc = "<body><header><p>Message</p><ul><li>One</li><li><a>Two</a></li></ul></header></body>"
iex> doc
...> |> xmap(
...>      message: ~x"//p/text()",
...>      ul: [
...>        ~x"//ul",
...>        a: ~x"./li/a/text()"
...>      ]
...>    )
%{message: 'Message', ul: %{a: 'Two'}}
iex> doc
...> |> xmap(
...>      message: ~x"//p/text()",
...>      ul: [
...>        ~x"//ul"k,
...>        a: ~x"./li/a/text()"
...>      ]
...>    )
%{message: 'Message', ul: [a: 'Two']}
iex> doc
...> |> xmap([
...>      message: ~x"//p/text()",
...>      ul: [
...>        ~x"//ul",
...>        a: ~x"./li/a/text()"
...>      ]
...>    ], true)
[message: 'Message', ul: %{a: 'Two'}]
Link to this function

xmap(parent, arg2, atom)

Link to this macro

xmlAttribute(args \\ []) (macro)

Link to this macro

xmlAttribute(record, args) (macro)

Link to this macro

xmlComment(args \\ []) (macro)

Link to this macro

xmlComment(record, args) (macro)

Link to this macro

xmlDecl(args \\ []) (macro)

Link to this macro

xmlDecl(record, args) (macro)

Link to this macro

xmlDocument(args \\ []) (macro)

Link to this macro

xmlDocument(record, args) (macro)

Link to this macro

xmlElement(args \\ []) (macro)

Link to this macro

xmlElement(record, args) (macro)

Link to this macro

xmlNamespace(args \\ []) (macro)

Link to this macro

xmlNamespace(record, args) (macro)

Link to this macro

xmlNsNode(args \\ []) (macro)

Link to this macro

xmlNsNode(record, args) (macro)

Link to this macro

xmlObj(args \\ []) (macro)

Link to this macro

xmlObj(record, args) (macro)

Link to this macro

xmlPI(args \\ []) (macro)

Link to this macro

xmlPI(record, args) (macro)

Link to this macro

xmlText(args \\ []) (macro)

Link to this macro

xmlText(record, args) (macro)

Link to this function

xpath(parent, spec)

xpath allows you to query an xml document with xpath.

The second argument to xpath is a SweetXpath struct. The optional third argument is a keyword list, such that the value of each keyword is also either a SweetXpath or a list with head being a SweetXpath and tail being another keyword list exactly like before. Please see examples below for better understanding.

Examples

Simple

iex> import SweetXml
iex> doc = "<h1><a>Some linked title</a></h1>"
iex> doc |> xpath(~x"//a/text()")
'Some linked title'

With optional mapping

iex> import SweetXml
iex> doc = "<body><header><p>Message</p><ul><li>One</li><li><a>Two</a></li></ul></header></body>"
iex> doc |> xpath(~x"//header", message: ~x"./p/text()", a_in_li: ~x".//li/a/text()"l)
%{a_in_li: ['Two'], message: 'Message'}

With optional mapping and nesting

iex> import SweetXml
iex> doc = "<body><header><p>Message</p><ul><li>One</li><li><a>Two</a></li></ul></header></body>"
iex> doc
...> |> xpath(
...>      ~x"//header",
...>      ul: [
...>        ~x"./ul",
...>        a: ~x"./li/a/text()"
...>      ]
...>    )
%{ul: %{a: 'Two'}}
Link to this function

xpath(parent, sweet_xpath, subspec)