scrape v3.1.0 Scrape.Tools.DOM

Utility module for selecting/extracting data from a "DOM" (HTML/XML tree-like structure). Can find text values and attribute values, inspired by jQuery and implemented with Floki.

Link to this section Summary

Types

DOM tree representation, same as Floki's html_tree.

Functions

Similar to text/2 but but returns a chosen attribute value instead of the node's text value (or nil).

Similar to attr/3 but returns a list of all matching results.

Cascading query helper, applies either text/2 or attr/3 until something returns a non-nil result or all queries are tried.

Create a DOM from a given (HTML/XML) string.

Get the text value of a DOM node (including nested nodes).

Similar to text/2 but iterates over all matching nodes.

Builds a (HTML/XML) string from a DOM structure.

Link to this section Types

Link to this type

dom()
dom() :: String.t() | tuple() | [any()]

DOM tree representation, same as Floki's html_tree.

Can be created via from_string/1.

Link to this section Functions

Link to this function

attr(dom, selector, name)
attr(dom(), String.t(), String.t()) :: nil | String.t()

Similar to text/2 but but returns a chosen attribute value instead of the node's text value (or nil).

Examples

iex> "<meta name='a' content='b' />" |> DOM.from_string |> DOM.attr("meta", "unknown")
nil

iex> "<meta name='a' content='b' />" |> DOM.from_string |> DOM.attr("meta", "content")
"b"

iex> "<meta name='a' content='b' />" |> DOM.from_string |> DOM.attr("meta[name=a]", "content")
"b"
Link to this function

attrs(dom, selector, name)
attrs(dom(), String.t(), String.t()) :: [String.t()]

Similar to attr/3 but returns a list of all matching results.

Examples

iex> "<p class='a'>b</p><p class='c' />" |> DOM.from_string() |> DOM.attrs("div", "class")
[]

iex> "<p class='a'>b</p><p class='c' />" |> DOM.from_string() |> DOM.attrs("p", "id")
[]

iex> "<p class='a'>b</p><p class='c' />" |> DOM.from_string() |> DOM.attrs("p", "class")
["a", "c"]
Link to this function

first(dom, list)
first(dom(), [{String.t()} | {String.t(), String.t()}]) :: nil | String.t()

Cascading query helper, applies either text/2 or attr/3 until something returns a non-nil result or all queries are tried.

Examples

iex> DOM.first([], [])
nil

iex> DOM.first([], [{"b"}, {"i"}, {"div", "class"}])
nil

iex> "<div id='1'>abc</div>" |> DOM.from_string() |> DOM.first([{"i"}, {"div", "id"}])
"1"

iex> "<b>abc</b>" |> DOM.from_string() |> DOM.first([{"i"}, {"b"}])
"abc"
Link to this function

from_string(string)
from_string(String.t()) :: dom()

Create a DOM from a given (HTML/XML) string.

Examples

iex> DOM.from_string("")
[]

iex> DOM.from_string("<html></html>")
{"html", [], []}
Link to this function

text(dom, selector)
text(dom(), String.t()) :: nil | String.t()

Get the text value of a DOM node (including nested nodes).

If many nodes match the selector, the first one is used.

Examples

iex> "<div>abc</div>" |> DOM.from_string() |> DOM.text("p")
nil

iex> "<div>abc</div>" |> DOM.from_string() |> DOM.text("div")
"abc"
Link to this function

texts(dom, selector)
texts(dom(), String.t()) :: [String.t()]

Similar to text/2 but iterates over all matching nodes.

Returns always a list result, but with nil values filtered.

Examples

iex> "<div>abc</div>" |> DOM.from_string() |> DOM.texts("p")
[]

iex> "<div>abc</div>" |> DOM.from_string() |> DOM.texts("div")
["abc"]

iex> "<p>a</p><p>b</p>" |> DOM.from_string() |> DOM.texts("p")
["a", "b"]
Link to this function

to_string(dom)
to_string(dom()) :: String.t()

Builds a (HTML/XML) string from a DOM structure.

Examples

iex> DOM.to_string([])
""

iex> DOM.to_string({"html", [], []})
"<html></html>"