SweetXml
SweetXml
is a thin wrapper around :xmerl
. It allows you to convert a
char_list
or xmlElement
record as defined in :xmerl
to an elixir value such
as map
, list
, string
, integer
, float
or any combination of these.
Installation
Add dependency to your project's mix.exs
:
def deps do
[{:sweet_xml, "~> 0.7.1"}]
end
SweetXml
depends on :xmerl
. On some Linux systems, you might need
to install the package erlang-xmerl
.
Examples
Given an XML document such as below:
<?xml version="1.05" encoding="UTF-8"?>
<game>
<matchups>
<matchup winner-id="1">
<name>Match One</name>
<teams>
<team>
<id>1</id>
<name>Team One</name>
</team>
<team>
<id>2</id>
<name>Team Two</name>
</team>
</teams>
</matchup>
<matchup winner-id="2">
<name>Match Two</name>
<teams>
<team>
<id>2</id>
<name>Team Two</name>
</team>
<team>
<id>3</id>
<name>Team Three</name>
</team>
</teams>
</matchup>
<matchup winner-id="1">
<name>Match Three</name>
<teams>
<team>
<id>1</id>
<name>Team One</name>
</team>
<team>
<id>3</id>
<name>Team Three</name>
</team>
</teams>
</matchup>
</matchups>
</game>
We can do the following:
import SweetXml
doc = "..." # as above
Get the name of the first match:
result = doc |> xpath(~x"//matchup/name/text()") # `sigil_x` for (x)path
assert result == 'Match One'
Get the XML record of the name of the first match:
result = doc |> xpath(~x"//matchup/name"e) # `e` is the modifier for (e)ntity
assert result == {:xmlElement, :name, :name, [], {:xmlNamespace, [], []},
[matchup: 2, matchups: 2, game: 1], 2, [],
[{:xmlText, [name: 2, matchup: 2, matchups: 2, game: 1], 1, [],
'Match One', :text}], [],
...}
Get the full list of matchup name:
result = doc |> xpath(~x"//matchup/name/text()"l) # `l` stands for (l)ist
assert result == ['Match One', 'Match Two', 'Match Three']
Get a list of winner-id by attributes:
result = doc |> xpath(~x"//matchup/@winner-id"l)
assert result == ['1', '2', '1']
Get a list of matchups with different map structure:
result = doc |> xpath(
~x"//matchups/matchup"l,
name: ~x"./name/text()",
winner: [
~x".//team/id[.=ancestor::matchup/@winner-id]/..",
name: ~x"./name/text()"
]
)
assert result == [
%{name: 'Match One', winner: %{name: 'Team One'}},
%{name: 'Match Two', winner: %{name: 'Team Two'}},
%{name: 'Match Three', winner: %{name: 'Team One'}}
]
Or directly return a mapping of your liking:
result = doc |> xmap(
matchups: [
~x"//matchups/matchup"l,
name: ~x"./name/text()",
winner: [
~x".//team/id[.=ancestor::matchup/@winner-id]/..",
name: ~x"./name/text()"
]
],
last_matchup: [
~x"//matchups/matchup[last()]",
name: ~x"./name/text()",
winner: [
~x".//team/id[.=ancestor::matchup/@winner-id]/..",
name: ~x"./name/text()"
]
]
)
assert result == %{
matchups: [
%{name: 'Match One', winner: %{name: 'Team One'}},
%{name: 'Match Two', winner: %{name: 'Team Two'}},
%{name: 'Match Three', winner: %{name: 'Team One'}}
],
last_matchup: %{name: 'Match Three', winner: %{name: 'Team One'}}
}
The ~x Sigil
Warning ! Because we use xmerl
internally, only XPath 1.0 paths are handled.
In the above examples, we used the expression ~x"//some/path"
to
define the path. The reason is it allows us to more precisely specify what
is being returned.
~x"//some/path"
without any modifiers,
xpath/2
will return the value of the entity if the entity is of typexmlText
,xmlAttribute
,xmlPI
,xmlComment
as defined in:xmerl
~x"//some/path"e
e
stands for (e)ntity. This forcesxpath/2
to return the entity with which you can further chain yourxpath/2
call~x"//some/path"l
'l' stands for (l)ist. This forces
xpath/2
to return a list. Withoutl
,xpath/2
will only return the first element of the match~x"//some/path"k
'k' stands for (k)eyword. This forces
xpath/2
to return a Keyword instead of a Map.~x"//some/path"el
- mix of the above~x"//some/path"s
's' stands for (s)tring. This forces
xpath/2
to return the value as string instead of a char list.~x"//some/path"S
'S' stands for soft (S)tring. This forces
xpath/2
to return the value as string instead of a char list, but if node content is incompatible with a string, set""
.~x"//some/path"o
'o' stands for (o)ptional. This allows the path to not exist, and will return nil.
~x"//some/path"sl
- string list.~x"//some/path"i
'i' stands for (i)nteger. This forces
xpath/2
to return the value as integer instead of a char list.~x//some/path"I
'I' stands for soft (I)nteger. This forces
xpath/2
to return the value as integer instead of a char list, but if node content is incompatible with an integer, set0
.~x"//some/path"f
'f' stands for (f)loat. This forces
xpath/2
to return the value as float instead of a char list.~x//some/path"F
'F' stands for soft (F)loat. This forces
xpath/2
to return the value as float instead of a char list, but if node content is incompatible with a float, set0.0
.~x"//some/path"il
- integer list.
If you use the optional modifier o
together with a soft cast modifier
(uppercase), then the value is set to nil
when the value is not compatible
for instance ~x//some/path/text()"Fo
return nil
if the text is not a number.
Also in the examples section, we always import SweetXml first. This
makes x_sigil
available in the current scope. Without it, instead of using
~x
, you can use the %SweetXpath
struct
assert ~x"//some/path"e == %SweetXpath{path: '//some/path', is_value: false, is_list: false, cast_to: false}
Note the use of char_list in the path definition.
Namespace support
Given a XML document such as below
<?xml version="1.05" encoding="UTF-8"?>
<game xmlns="http://example.com/fantasy-league" xmlns:ns1="http://example.com/baseball-stats">
<matchups>
<matchup winner-id="1">
<name>Match One</name>
<teams>
<team>
<id>1</id>
<name>Team One</name>
<ns1:runs>5</ns1:runs>
</team>
<team>
<id>2</id>
<name>Team Two</name>
<ns1:runs>2</ns1:runs>
</team>
</teams>
</matchup>
</matchups>
</game>
We can do the following:
import SweetXml
xml_str = "..." # as above
doc = parse(xml_str, namespace_conformant: true)
Note the fact that we explicitly parse the XML with the namespace_conformant: true
option. This is needed to allow nodes to be identified in a prefix
independent way.
We can use namespace prefixes of our preference, regardless of what prefix is used in the document:
result = doc
|> xpath(~x"//ff:matchup/ff:name/text()"
|> add_namespace("ff", "http://example.com/fantasy-league"))
assert result == 'Match One'
We can specify multiple namespace prefixes:
result = doc
|> xpath(~x"//ff:matchup//bb:runs/text()"
|> add_namespace("ff", "http://example.com/fantasy-league")
|> add_namespace("bb", "http://example.com/baseball-stats"))
assert result == '5'
From Chaining to Nesting
Here's a brief explanation to how nesting came about.
Chaining
Both xpath
and xmap
can take an :xmerl
XML record as the first argument.
Therefore you can chain calls to these functions like below:
doc
|> xpath(~x"//li"l)
|> Enum.map fn (li_node) ->
%{
name: li_node |> xpath(~x"./name/text()"),
age: li_node |> xpath(~x"./age/text()")
}
end
Mapping to a structure
Since the previous example is such a common use case, SweetXml allows you just simply do the following
doc
|> xpath(
~x"//li"l,
name: ~x"./name/text()",
age: ~x"./age/text()"
)
Nesting
But what you want is sometimes more complex than just that, SweetXml thus also allows nesting
doc
|> xpath(
~x"//li"l,
name: [
~x"./name",
first: ~x"./first/text()",
last: ~x"./last/text()"
],
age: ~x"./age/text()"
)
Transform By
Sometimes we need to transform the value to what we need, SweetXml supports that
via transform_by/2
doc = "<li><name><first>john</first><last>doe</last></name><age>30</age></li>"
result = doc |> xpath(
~x"//li"l,
name: [
~x"./name",
first: ~x"./first/text()"s |> transform_by(&String.capitalize/1),
last: ~x"./last/text()"s |> transform_by(&String.capitalize/1)
],
age: ~x"./age/text()"i
)
^result = [%{age: 30, name: %{first: "John", last: "Doe"}}]
The same can be used to break parsing code into reusable functions that can be used in nesting:
doc = "<li><name><first>john</first><last>doe</last></name><age>30</age></li>"
parse_name = fn xpath_node ->
xpath_node |> xmap(
first: ~x"./first/text()"s |> transform_by(&String.capitalize/1),
last: ~x"./last/text()"s |> transform_by(&String.capitalize/1)
)
end
result = doc |> xpath(
~x"//li"l,
name: ~x"./name" |> transform_by(parse_name),
age: ~x"./age/text()"i
)
^result = [%{age: 30, name: %{first: "John", last: "Doe"}}]
For more examples, please take a look at the tests and help.
Streaming
SweetXml
now also supports streaming in various forms. Here's a sample XML doc.
Notice the certain lines have XML tags that span multiple lines:
<?xml version="1.05" encoding="UTF-8"?>
<html>
<head>
<title>XML Parsing</title>
<head><title>Nested Head</title></head>
</head>
<body>
<p>Neato €</p><ul>
<li class="first star" data-index="1">
First</li><li class="second">Second
</li><li
class="third">Third</li>
</ul>
<div>
<ul>
<li>Forth</li>
</ul>
</div>
<special_match_key>first star</special_match_key>
</body>
</html>
Working with File.stream!/1
Working with streams is exactly the same as working with binaries:
File.stream!("file_above.xml") |> xpath(...)
SweetXml
element streaming
Once you have a file stream, you may not want to work with the entire document to save memory:
file_stream = File.stream!("file_above.xml")
result = file_stream
|> stream_tags([:li, :special_match_key])
|> Stream.map(fn
{_, doc} ->
xpath(doc, ~x"./text()")
end)
|> Enum.to_list
assert result == ['\n First', 'Second\n ', 'Third', 'Forth', 'first star']
Warning: In case of large document, you may want to use the discard
option to avoid memory leak.
result = file_stream
|> stream_tags([:li, :special_match_key], discard: [:li, :special_match_key])
Security
Whenever you have to deal with some XML that was not generated by your system (untrusted document),
it is highly recommended that you separate the parsing step from the mapping step, in order to be able
to prevent some default behavior through options. You can check the doc for SweetXml.parse/2
for more details.
The current recommendations are:
doc |> parse(dtd: :none) |> xpath(spec, subspec)
enum |> stream_tags(tags, dtd: :none)
Copyright and License
Copyright (c) 2014, Frank Liu
SweetXml source code is licensed under the MIT License.
CONTRIBUTING
Hi, and thank you for wanting to contribute. Please refer to the centralized informations available at: https://github.com/kbrw#contributing