View Source HtmlQuery (HtmlQuery v1.2.2)

Some simple HTML query functions. Delegates the hard work to Floki.

Data types

All functions accept HTML in the form of a string, a Floki HTML tree, or a Floki HTML node. Others expect only a Floki HTML node or a Floki HTML tree. See HtmlQuery.html/0.

Some functions take a CSS selector, which can be a string, a keyword list, or a list. See HtmlQuery.Css.selector/0.

Main query functions

The main query functions take an HTML string or some parsed HTML, and a selector.

all/2return all elements matching the selector
find/2return the first element that matches the selector
find!/2return the only element that matches the selector

Extraction functions

attr/2returns the attribute value as a string
form_fields/1returns the names and values of form fields as a map
meta_tags/1returns the names and values of metadata fields
table/2returns the cells of a table as a list of lists
text/1returns the text contents as a single string

Parsing functions

parse/1parses an HTML fragment into a [Floki HTML tree]
parse_doc/1parses an HTML doc into a [Floki HTML tree]

Utility functions

inspect_html/2prints prettified HTML with a label
normalize/1parses and re-stringifies HTML
pretty/1prettifies HTML
reject/2removes nodes that match the selector

Alias

If you use HtmlQuery a lot, you may want to alias it to the recommended shortcut "Hq":

alias HtmlQuery, as: Hq

Examples

Get the value of a selected option:

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find(html, "select option[selected]") |> HtmlQuery.attr("value")
"a"

Get the text of a selected option, raising if there are more than one:

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find!(html, "select option[selected]") |> HtmlQuery.text()
"apples"

Get the text of all the options:

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.all(html, "select option") |> Enum.map(&HtmlQuery.text/1)
["apples", "bananas"]

Use a keyword list as the selector (see HtmlQuery.Css for details on selectors):

iex> html = ~s|<div> <a href="/logout" test-role="logout-link">logout</a> </div>|
iex> HtmlQuery.find!(html, test_role: "logout-link") |> HtmlQuery.attr("href")
"/logout"

Summary

Types

A string or atom representing an attribute name. If an atom, underscores are converted to dashes.

A string, a struct that implements the String.Chars protocol, a Floki HTML tree, or a Floki HTML node.

Functions

Finds all elements in html that match selector, returning a Floki HTML tree.

Returns the value of attr from the outermost element of html. If attr is an atom, any underscores are converted to dashes.

Finds the first element in html that matches selector, returning a Floki HTML node.

Like find/2 but raises unless exactly one element is found.

Returns a map containing the form fields of form selector in html. Because it returns a map, any information about the order of form fields is lost.

Prints prettified html with a label, and then returns the original html.

Extracts all the meta tags from html, returning a list of maps.

Parses and then re-stringifies html, increasing the liklihood that two equivalent HTML strings can be considered equal.

Parses an HTML fragment using Floki.parse_fragment!/1, returning a Floki HTML tree.

Parses an HTML document using Floki.parse_document!/1, returning a Floki HTML tree.

Returns html as a prettified string (delgates to Floki.raw_html/2 with the pretty: true option).

Returns html after removing all nodes that don't match selector (delegates to Floki.filter_out/2).

Returns the contents of the table as a list of lists.

Returns the text value of html.

Types

@type attr() :: binary() | atom()

A string or atom representing an attribute name. If an atom, underscores are converted to dashes.

@type html() :: binary() | String.Chars.t() | Floki.html_tree() | Floki.html_node()

A string, a struct that implements the String.Chars protocol, a Floki HTML tree, or a Floki HTML node.

Functions

Finds all elements in html that match selector, returning a Floki HTML tree.

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.all(html, "option")
[
  {"option", [{"value", "a"}, {"selected", "selected"}], ["apples"]},
  {"option", [{"value", "b"}], ["bananas"]}
]
@spec attr(html(), attr()) :: binary() | nil

Returns the value of attr from the outermost element of html. If attr is an atom, any underscores are converted to dashes.

iex> html = ~s|<div> <a href="/logout" test-role="logout-link">logout</a> </div>|
iex> HtmlQuery.find!(html, test_role: "logout-link") |> HtmlQuery.attr("href")
"/logout"
@spec find(html(), HtmlQuery.Css.selector()) :: Floki.html_node() | nil

Finds the first element in html that matches selector, returning a Floki HTML node.

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find(html, "select option[selected]")
{"option", [{"value", "a"}, {"selected", "selected"}], ["apples"]}

Like find/2 but raises unless exactly one element is found.

@spec form_fields(html()) :: %{required(atom()) => binary() | map()}

Returns a map containing the form fields of form selector in html. Because it returns a map, any information about the order of form fields is lost.

iex> html = ~s|<form> <input type="text" name="color" value="green"> <textarea name="desc">A tree</textarea> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{color: "green", desc: "A tree"}

Field names are converted to snake case atoms:

iex> html = ~s|<form> <input type="text" name="favorite-color" value="green"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{favorite_color: "green"}

If form field names are in foo[bar] format, then foo becomes a key to a nested map containing bar:

iex> html = ~s|<form> <input type="text" name="profile[name]" value="fido"> <input type="text" name="profile[age]" value="10"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{profile: %{name: "fido", age: "10"}}

If a text field has no value attribute, it will not be returned at all:

iex> html = ~s|<form> <input type="text" name="no-value"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{}

iex> html = ~s|<form> <input type="text" name="empty-value" value=""> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{empty_value: ""}

iex> html = ~s|<form> <input type="text" name="non-empty-value" value="something"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{non_empty_value: "something"}

The checked value of a radio button set is returned, or nil is returned if no value is checked:

iex> html = ~s|<form> <input type="radio" name="x" value="1"> <input type="radio" name="x" value="2" checked> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: "2"}

iex> html = ~s|<form> <input type="radio" name="x" value="1"> <input type="radio" name="x" value="2"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: nil}

All checked values of checkboxes are returned as a list, or [] is returned if no values are checked:

iex> html = ~s|<form> <input type="checkbox" name="x" value="1" checked> <input type="checkbox" name="x" value="2" checked> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: ["1", "2"]}

iex> html = ~s|<form> <input type="checkbox" name="x" value="1"> <input type="checkbox" name="x" value="2"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: []}
Link to this function

inspect_html(html, label \\ "INSPECTED HTML")

View Source
@spec inspect_html(html(), binary()) :: html()

Prints prettified html with a label, and then returns the original html.

@spec meta_tags(html()) :: [%{required(binary()) => binary()}]

Extracts all the meta tags from html, returning a list of maps.

iex> html = ~s|<head> <meta charset="utf-8"/> <meta http-equiv="X-UA-Compatible" content="IE=edge"/> </head>|
iex> HtmlQuery.meta_tags(html)
[%{"charset" => "utf-8"}, %{"content" => "IE=edge", "http-equiv" => "X-UA-Compatible"}]
@spec normalize(html()) :: binary()

Parses and then re-stringifies html, increasing the liklihood that two equivalent HTML strings can be considered equal.

iex> a = ~s|<p id="color">green</p>|
iex> b = ~s|<p  id = "color" >green</p>|
iex> a == b
false
iex> HtmlQuery.normalize(a) == HtmlQuery.normalize(b)
true
@spec parse(html()) :: Floki.html_tree()

Parses an HTML fragment using Floki.parse_fragment!/1, returning a Floki HTML tree.

@spec parse_doc(html()) :: Floki.html_tree()

Parses an HTML document using Floki.parse_document!/1, returning a Floki HTML tree.

@spec pretty(html()) :: binary()

Returns html as a prettified string (delgates to Floki.raw_html/2 with the pretty: true option).

@spec reject(html(), HtmlQuery.Css.selector()) :: html()

Returns html after removing all nodes that don't match selector (delegates to Floki.filter_out/2).

iex> html = ~s|<div> <span id="name">Alice</span> <span id="password">topaz</span> </div>|
iex> HtmlQuery.reject(html, id: "password") |> HtmlQuery.normalize()
~s|<div><span id="name">Alice</span></div>|
@spec table(
  html(),
  keyword()
) :: [[]]

Returns the contents of the table as a list of lists.

Options:

  • as - if :lists (the default), returns the table as a list of lists; if :maps, returns the table as a list of maps.
  • columns - a list of the indices of the columns to return; a list of column headers (as strings) to return, assuming that the first row of the table is the columns names; or :all to return all columns (which is the same as not specifying this option all all).
  • headers - if true (the default), returns the list of headers along with the rows. Ignored if as option is :map.
iex> html = "<table> <tr><th>A</th><th>B</th><th>C</th></tr> <tr><td>1</td><td>2</td><td>3</td></tr> </table>"
iex> HtmlQuery.table(html)
[
  ["A", "B", "C"],
  ["1", "2", "3"]
]
iex> HtmlQuery.table(html, as: :maps)
[
  %{"A" => "1", "B" => "2", "C" => "3"}
]
iex> HtmlQuery.table(html, columns: [0, 2])
[
  ["A", "C"],
  ["1", "3"]
]
iex> HtmlQuery.table(html, columns: [2, 0])
[
  ["C", "A"],
  ["3", "1"]
]
iex> HtmlQuery.table(html, columns: ["C", "A"])
[
  ["C", "A"],
  ["3", "1"]
]
iex> HtmlQuery.table(html, columns: ["C", "A"], headers: false)
[
  ["3", "1"]
]
@spec text(html()) :: binary()

Returns the text value of html.

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find!(html, "select option[selected]") |> HtmlQuery.text()
"apples"