Sassone (Sassone v1.0.0)
View SourceSassone is an XML SAX parser and encoder.
Sassone provides functions to parse XML file in both binary and streaming way in compliant with Extensible Markup Language (XML) 1.0 (Fifth Edition).
Sassone also offers DSL and API to transform structs to and from XML documents. See "Encoder" section below for more information.
Parser
Sassone parser supports two modes of parsing: SAX and simple form.
SAX mode (Simple API for XML)
SAX is an event driven algorithm for parsing XML documents. A SAX parser takes XML document as the input and emits events out to a pre-configured event handler during parsing.
There are several types of SAX events supported by Sassone:
:start_document- after prolog is parsed.:start_element- when open tag is parsed.:characters- when a chunk ofCharDatais parsed.:cdata- when a chunk ofCDatais parsed.:end_element- when end tag is parsed.:end_document- when the root element is closed.
See Sassone.Handler for more information.
Encoding
Sassone only supports UTF-8 encoding. It also respects the encoding set in XML document prolog, which means that if the declared encoding is not UTF-8, the parser stops. Anyway, when there is no encoding declared, Sassone defaults the encoding to UTF-8.
Reference expansion
Sassone supports expanding character references and XML 1.0 predefined entity references, for example A
is expanded to "A", & to "&", and & to "&".
Sassone does not expand external entity references, but provides an option to specify how they should be handled. See more in "Shared options" section.
Creation of atoms
Sassone does not create atoms during the parsing process.
DTD and XSD
Sassone does not support parsing DTD (Doctype Definition) and XSD schemas. When encountering DTD, the parser simply skips that.
Shared options
:expand_entity- specifies how external entity references should be handled. Three supported strategies respectively are::keep- keep the original binary, for exampleOrange ®will be expanded to"Orange ®", this is the default strategy.:skip- skip the original binary, for exampleOrange ®will be expanded to"Orange ".{mod, fun, args}- take the applied result of the specified MFA.:never- keep the original binary, including predefined entity reference, e.g."Orange &"will remain"Orange &"
:cdata_as_characters-trueto emit CData events as:characters. Defaults totrue.
Encoder
Sassone offers two APIs to build simple form and encode XML document.
Use Sassone.XML to build and compose XML simple form, then Sassone.encode!/2
to encode the built element into XML binary.
iex> import Sassone.XML
iex> element = element("person", [attribute("gender", "female")], ["Alice"])
{nil, "person", [{nil, "gender", "female"}], ["Alice"]}
iex> Sassone.encode!(element, [version: "1.0"])
"<?xml version=\"1.0\"?><person gender=\"female\">Alice</person>"See Sassone.XML for more XML building APIs.
Summary
Functions
Encodes a simple form XML element into string.
Encodes a simple form element into IO data.
Parses XML stream data.
Parses XML binary data.
Parses XML stream and returns a stream of elements.
Functions
@spec encode!(Sassone.XML.element(), Sassone.Prolog.t() | Keyword.t() | nil) :: String.t()
Encodes a simple form XML element into string.
This function encodes an element in simple form format and a prolog to an XML document.
Examples
iex> import Sassone.XML
iex> root = element("foo", [attribute("foo", "bar")], ["bar"])
iex> prolog = [version: "1.0"]
iex> Sassone.encode!(root, prolog)
"<?xml version=\"1.0\"?><foo foo=\"bar\">bar</foo>"
iex> prolog = [version: "1.0", encoding: "UTF-8"]
iex> Sassone.encode!(root, prolog)
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo foo=\"bar\">bar</foo>"
@spec encode_to_iodata!(Sassone.XML.element(), Sassone.Prolog.t() | Keyword.t() | nil) :: iodata()
Encodes a simple form element into IO data.
Same as encode!/2 but this encodes the document into IO data.
Examples
iex> import Sassone.XML
iex> root = element("foo", [attribute("foo", "bar")], ["bar"])
iex> prolog = [version: "1.0"]
iex> Sassone.encode_to_iodata!(root, prolog)
[
[~c'<?xml', [32, ~c'version', 61, 34, "1.0", 34], [], [], ~c'?>'],
[60, "foo", 32, "foo", 61, 34, "bar", 34],
62,
["bar"],
[60, 47, "foo", 62]
]
@spec parse_stream( Enumerable.t(), Sassone.Handler.t(), Sassone.Handler.state(), options :: Keyword.t() ) :: {:ok, state :: term()} | {:halt, state :: term(), rest :: String.t()} | {:error, exception :: Sassone.ParseError.t()}
Parses XML stream data.
This function takes a stream, SAX event handler (see more at Sassone.Handler) and an initial state as the input, it returns
{:ok, state} if parsing is successful, otherwise {:error, exception}, where exception is a
Sassone.ParseError struct which can be converted into readable message with Exception.message/1.
Examples
defmodule MyTestHandler do
@behaviour Sassone.Handler
def handle_event(:start_document, prolog, state) do
{:ok, [{:start_document, prolog} | state]}
end
def handle_event(:end_document, _data, state) do
{:ok, [{:end_document} | state]}
end
def handle_event(:start_element, {namespace, name, attributes}, state) do
{:ok, [{:start_element, name, attributes} | state]}
end
def handle_event(:end_element, name, state) do
{:ok, [{:end_element, name} | state]}
end
def handle_event(:characters, chars, state) do
{:ok, [{:chacters, chars} | state]}
end
end
iex> stream = File.stream!("./test/support/fixture/foo.xml")
iex> Sassone.parse_stream(stream, Sassone.TestHandlers.MyTestHandler, [])
{:ok,
[{:end_document},
{:end_element, {nil, "foo"}},
{:start_element, nil, "foo", [{nil, "bar", "value"}]},
{:start_document, [version: "1.0"]}]}Memory usage
Sassone.parse_stream/3 takes a File.Stream or Stream as the input, so the amount of bytes to buffer in each
chunk can be controlled by File.stream!/3 API.
During parsing, the actual memory used by Sassone might be higher than the number configured for each chunk, since Sassone holds in memory some parsed parts of the original binary to leverage Erlang sub-binary extracting. Anyway, Sassone tries to free those up when it makes sense.
Options
See the “Shared options” section at the module documentation.
:character_data_max_length- tells the parser to emit the:charactersevent when its length exceeds the specified number. The option is useful when the tag being parsed containing a very large chunk of data. Defaults to:infinity.
@spec parse_string( String.t(), Sassone.Handler.t(), Sassone.Handler.state(), options :: Keyword.t() ) :: {:ok, state :: term()} | {:halt, Sassone.Handler.state(), rest :: String.t()} | {:error, Sassone.ParseError.t()}
Parses XML binary data.
This function takes XML binary, SAX event handler (see more at Sassone.Handler) and an initial state as the input, it returns
{:ok, state} if parsing is successful, otherwise {:error, exception}, where exception is a
Sassone.ParseError struct which can be converted into readable message with Exception.message/1.
The third argument state can be used to keep track of data and parsing progress when parsing is happening, which will be
returned when parsing finishes.
Options
See the “Shared options” section at the module documentation.
Examples
defmodule MyTestHandler do
@behaviour Sassone.Handler
def handle_event(:start_document, prolog, state) do
{:ok, [{:start_document, prolog} | state]}
end
def handle_event(:end_document, _data, state) do
{:ok, [{:end_document} | state]}
end
def handle_event(:start_element, {namespace, name, attributes}, state) do
{:ok, [{:start_element, name, attributes} | state]}
end
def handle_event(:end_element, name, state) do
{:ok, [{:end_element, name} | state]}
end
def handle_event(:characters, chars, state) do
{:ok, [{:chacters, chars} | state]}
end
end
iex> xml = "<?xml version='1.0' ?><foo bar='value'></foo>"
iex> Sassone.parse_string(xml, Sassone.TestHandlers.MyTestHandler, [])
{:ok,
[{:end_document},
{:end_element, {nil, "foo"}},
{:start_element, nil, "foo", [{nil, "bar", "value"}]},
{:start_document, [version: "1.0"]}]}
@spec stream_events(Enumerable.t(), options :: Keyword.t()) :: Enumerable.t()
Parses XML stream and returns a stream of elements.
This function takes a stream and returns a stream of xml SAX events.
When any parsing error occurs, it raises a Sassone.ParseError exception.
Examples
iex> stream = File.stream!("./test/support/fixture/foo.xml")
iex> Enum.to_list Sassone.stream_events stream
[
start_document: [version: "1.0"],
start_element: {nil, "foo", [{nil, "bar", "value"}]},
end_element: {nil, "foo"}
]
iex> Enum.to_list Sassone.stream_events ["<foo>unclosed value"]
** (Sassone.ParseError) unexpected end of input, expected token: :chardataWarning
Input stream is evaluated lazily, therefore some events may be emitted before exception is raised
Memory usage
Sassone.stream_events/2 takes a File.Stream or Stream as the input, so the amount of bytes to buffer in each
chunk can be controlled by File.stream!/3 API.
During parsing, the actual memory used by Sassone might be higher than the number configured for each chunk, since Sassone holds in memory some parsed parts of the original binary to leverage Erlang sub-binary extracting. Anyway, Sassone tries to free those up when it makes sense.
Options
See the “Shared options” section at the module documentation.
:character_data_max_length- tells the parser to emit the:charactersevent when its length exceeds the specified number. The option is useful when the tag being parsed containing a very large chunk of data. Defaults to:infinity.