View Source Html5ever (html5ever v0.16.0)

This is an HTML parser written in Rust.

The project provides a NIF - Native Implemented Function. It works on top of a parser of the same name from the Servo project.

By default this lib will try to use a precompiled NIF from the GitHub releases page. This way you don't need to have the Rust toolchain installed. In case no precompiled file is found and the Mix env is production then an error is raised.

You can force the compilation to occur by setting the value of the HTML5EVER_BUILD environment variable to "true" or "1". Alternatively you can also set the application env :build_from_source to true in order to force the build:

config :html5ever, Html5ever, build_from_source: true

This project is possible thanks to Rustler.

Summary

Functions

Parses an HTML document from a string and returns a map.

Same as flat_parse/1, but with attributes as maps.

Parses an HTML document from a string.

Same as parse/1, but with attributes as maps.

Functions

Parses an HTML document from a string and returns a map.

The map contains the document structure.

Example

iex> Html5ever.flat_parse("<!doctype html><html><body><h1>Hello world</h1></body></html>")
{:ok,
 %{
   nodes: %{
     0 => %{id: 0, parent: nil, type: :document},
     1 => %{id: 1, parent: 0, type: :doctype},
     2 => %{
       attrs: [],
       children: [3, 4],
       id: 2,
       name: "html",
       parent: 0,
       type: :element
     },
     3 => %{
       attrs: [],
       children: [],
       id: 3,
       name: "head",
       parent: 2,
       type: :element
     },
     4 => %{
       attrs: [],
       children: [5],
       id: 4,
       name: "body",
       parent: 2,
       type: :element
     },
     5 => %{
       attrs: [],
       children: [6],
       id: 5,
       name: "h1",
       parent: 4,
       type: :element
     },
     6 => %{contents: "Hello world", id: 6, parent: 5, type: :text}
   },
   root: 0
 }}
Link to this function

flat_parse_with_attributes_as_maps(html)

View Source

Same as flat_parse/1, but with attributes as maps.

This is going to remove duplicated attributes, keeping the ones that appear first.

Parses an HTML document from a string.

This returns a list of tuples representing the HTML tree.

Example

iex> Html5ever.parse("<!doctype html><html><body><h1>Hello world</h1></body></html>")
{:ok,
 [
   {:doctype, "html", "", ""},
   {"html", [], [{"head", [], []}, {"body", [], [{"h1", [], ["Hello world"]}]}]}
 ]}
Link to this function

parse_with_attributes_as_maps(html)

View Source

Same as parse/1, but with attributes as maps.

This is going to remove duplicated attributes, keeping the ones that appear first.

Example

iex> Html5ever.parse_with_attributes_as_maps(
...>   "<!doctype html><html><body><h1 class=title>Hello world</h1></body></html>"
...> )
{:ok,
 [
   {:doctype, "html", "", ""},
   {"html", %{}, [{"head", %{}, []}, {"body", %{}, [{"h1", %{"class" => "title"}, ["Hello world"]}]}]}
 ]}