scrape v3.1.0 Scrape.Tools.DOM
Utility module for selecting/extracting data from a "DOM" (HTML/XML tree-like structure). Can find text values and attribute values, inspired by jQuery and implemented with Floki.
Link to this section Summary
Functions
Similar to text/2
but but returns a chosen attribute value instead of the
node's text value (or nil).
Similar to attr/3
but returns a list of all matching results.
Create a DOM from a given (HTML/XML) string.
Get the text value of a DOM node (including nested nodes).
Similar to text/2
but iterates over all matching nodes.
Builds a (HTML/XML) string from a DOM structure.
Link to this section Types
dom()
DOM tree representation, same as Floki's html_tree.
Can be created via from_string/1
.
Link to this section Functions
attr(dom, selector, name)
Similar to text/2
but but returns a chosen attribute value instead of the
node's text value (or nil).
Examples
iex> "<meta name='a' content='b' />" |> DOM.from_string |> DOM.attr("meta", "unknown")
nil
iex> "<meta name='a' content='b' />" |> DOM.from_string |> DOM.attr("meta", "content")
"b"
iex> "<meta name='a' content='b' />" |> DOM.from_string |> DOM.attr("meta[name=a]", "content")
"b"
attrs(dom, selector, name)
Similar to attr/3
but returns a list of all matching results.
Examples
iex> "<p class='a'>b</p><p class='c' />" |> DOM.from_string() |> DOM.attrs("div", "class")
[]
iex> "<p class='a'>b</p><p class='c' />" |> DOM.from_string() |> DOM.attrs("p", "id")
[]
iex> "<p class='a'>b</p><p class='c' />" |> DOM.from_string() |> DOM.attrs("p", "class")
["a", "c"]
first(dom, list)
Cascading query helper, applies either text/2
or attr/3
until something
returns a non-nil result or all queries are tried.
Examples
iex> DOM.first([], [])
nil
iex> DOM.first([], [{"b"}, {"i"}, {"div", "class"}])
nil
iex> "<div id='1'>abc</div>" |> DOM.from_string() |> DOM.first([{"i"}, {"div", "id"}])
"1"
iex> "<b>abc</b>" |> DOM.from_string() |> DOM.first([{"i"}, {"b"}])
"abc"
from_string(string)
Create a DOM from a given (HTML/XML) string.
Examples
iex> DOM.from_string("")
[]
iex> DOM.from_string("<html></html>")
{"html", [], []}
text(dom, selector)
Get the text value of a DOM node (including nested nodes).
If many nodes match the selector, the first one is used.
Examples
iex> "<div>abc</div>" |> DOM.from_string() |> DOM.text("p")
nil
iex> "<div>abc</div>" |> DOM.from_string() |> DOM.text("div")
"abc"
texts(dom, selector)
Similar to text/2
but iterates over all matching nodes.
Returns always a list result, but with nil values filtered.
Examples
iex> "<div>abc</div>" |> DOM.from_string() |> DOM.texts("p")
[]
iex> "<div>abc</div>" |> DOM.from_string() |> DOM.texts("div")
["abc"]
iex> "<p>a</p><p>b</p>" |> DOM.from_string() |> DOM.texts("p")
["a", "b"]
to_string(dom)
Builds a (HTML/XML) string from a DOM structure.
Examples
iex> DOM.to_string([])
""
iex> DOM.to_string({"html", [], []})
"<html></html>"