View Source HtmlQuery (HtmlQuery v0.7.1)

Some simple HTML query functions. Delegates the hard work to Floki.

data-types

Data types

All functions accept HTML in the form of a string, a Floki HTML tree, or a Floki HTML node. Others expect only a Floki HTML node or a Floki HTML tree. See HtmlQuery.html/0.

Some functions take a CSS selector, which can be a string, a keyword list, or a list. See HtmlQuery.Css.selector/0.

main-query-functions

Main query functions

The main query functions take an HTML string or some parsed HTML, and a selector.

all/2return all elements matching the selector
find/2return the first element that matches the selector
find!/2return the only element that matches the selector

parsing-functions

Parsing functions

parse/1parses an HTML fragment into a [Floki HTML tree]
parse_doc/1parses an HTML doc into a [Floki HTML tree]

extraction-functions

Extraction functions

attr/2returns the attribute value as a string
form_fields/1returns the names and values of form fields as a map
meta_tags/1returns the names and values of metadata fields
table/2returns the cells of a table as a list of lists
text/1returns the text contents as a single string

utility-functions

Utility functions

inspect_html/2prints prettified HTML with a label
normalize/1parses and re-stringifies HTML
pretty/1prettifies HTML

alias

Alias

If you use HtmlQuery a lot, you may want to alias it to the recommended shortcut "Hq":

alias HtmlQuery, as: Hq

examples

Examples

Get the value of a selected option:

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find(html, "select option[selected]") |> HtmlQuery.attr("value")
"a"

Get the text of a selected option, raising if there are more than one:

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find!(html, "select option[selected]") |> HtmlQuery.text()
"apples"

Get the text of all the options:

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.all(html, "select option") |> Enum.map(&HtmlQuery.text/1)
["apples", "bananas"]

Use a keyword list as the selector (see HtmlQuery.CSS for details on selectors):

iex> html = ~s|<div> <a href="/logout" test-role="logout-link">logout</a> </div>|
iex> HtmlQuery.find!(html, test_role: "logout-link") |> HtmlQuery.attr("href")
"/logout"

Link to this section Summary

Types

A string or atom representing an attribute name. If an atom, underscores are converted to dashes.

A string, a struct that implements the String.Chars protocol, a Floki HTML tree, or a Floki HTML node.

Functions

Finds all elements in html that match selector, returning a Floki HTML tree.

Returns the value of attr from the outermost element of html. If attr is an atom, any underscores are converted to dashes.

Finds the first element in html that matches selector, returning a Floki HTML node.

Like find/2 but raises unless exactly one element is found.

Returns a map containing the form fields of form selector in html. Because it returns a map, any information about the order of form fields is lost.

Prints prettified html with a label, and then returns the original html.

Extracts all the meta tags from html, returning a list of maps.

Parses and then re-stringifies html, increasing the liklihood that two equivalent HTML strings can be considered equal.

Parses an HTML fragment using Floki.parse_fragment!/1, returning a Floki HTML tree.

Parses an HTML document using Floki.parse_document!/1, returning a Floki HTML tree.

Returns html as a prettified string, using Floki.raw_html/2 and its pretty: true option.

Returns the contents of the table as a list of lists.

Returns the text value of html.

Link to this section Types

@type attr() :: binary() | atom()

A string or atom representing an attribute name. If an atom, underscores are converted to dashes.

@type html() :: binary() | String.Chars.t() | Floki.html_tree() | Floki.html_node()

A string, a struct that implements the String.Chars protocol, a Floki HTML tree, or a Floki HTML node.

Link to this section Functions

Finds all elements in html that match selector, returning a Floki HTML tree.

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.all(html, "option")
[
  {"option", [{"value", "a"}, {"selected", "selected"}], ["apples"]},
  {"option", [{"value", "b"}], ["bananas"]}
]
@spec attr(html(), attr()) :: binary() | nil

Returns the value of attr from the outermost element of html. If attr is an atom, any underscores are converted to dashes.

iex> html = ~s|<div> <a href="/logout" test-role="logout-link">logout</a> </div>|
iex> HtmlQuery.find!(html, test_role: "logout-link") |> HtmlQuery.attr("href")
"/logout"
@spec find(html(), HtmlQuery.Css.selector()) :: Floki.html_node() | nil

Finds the first element in html that matches selector, returning a Floki HTML node.

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find(html, "select option[selected]")
{"option", [{"value", "a"}, {"selected", "selected"}], ["apples"]}

Like find/2 but raises unless exactly one element is found.

@spec form_fields(html()) :: %{required(atom()) => binary() | map()}

Returns a map containing the form fields of form selector in html. Because it returns a map, any information about the order of form fields is lost.

iex> html = ~s|<form> <input type="text" name="color" value="green"> <textarea name="desc">A tree</textarea> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{color: "green", desc: "A tree"}

Field names are converted to snake case atoms:

iex> html = ~s|<form> <input type="text" name="favorite-color" value="green"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{favorite_color: "green"}

If form field names are in foo[bar] format, then foo becomes a key to a nested map containing bar:

iex> html = ~s|<form> <input type="text" name="profile[name]" value="fido"> <input type="text" name="profile[age]" value="10"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{profile: %{name: "fido", age: "10"}}

If a text field has no value attribute, it will not be returned at all:

iex> html = ~s|<form> <input type="text" name="no-value"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{}

iex> html = ~s|<form> <input type="text" name="empty-value" value=""> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{empty_value: ""}

iex> html = ~s|<form> <input type="text" name="non-empty-value" value="something"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{non_empty_value: "something"}

The checked value of a radio button set is returned, or nil is returned if no value is checked:

iex> html = ~s|<form> <input type="radio" name="x" value="1"> <input type="radio" name="x" value="2" checked> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: "2"}

iex> html = ~s|<form> <input type="radio" name="x" value="1"> <input type="radio" name="x" value="2"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: nil}

All checked values of checkboxes are returned as a list, or [] is returned if no values are checked:

iex> html = ~s|<form> <input type="checkbox" name="x" value="1" checked> <input type="checkbox" name="x" value="2" checked> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: ["1", "2"]}

iex> html = ~s|<form> <input type="checkbox" name="x" value="1"> <input type="checkbox" name="x" value="2"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: []}
Link to this function

inspect_html(html, label \\ "INSPECTED HTML")

View Source
@spec inspect_html(html(), binary()) :: html()

Prints prettified html with a label, and then returns the original html.

@spec meta_tags(html()) :: [%{required(binary()) => binary()}]

Extracts all the meta tags from html, returning a list of maps.

iex> html = ~s|<head> <meta charset="utf-8"/> <meta http-equiv="X-UA-Compatible" content="IE=edge"/> </head>|
iex> HtmlQuery.meta_tags(html)
[%{"charset" => "utf-8"}, %{"content" => "IE=edge", "http-equiv" => "X-UA-Compatible"}]
@spec normalize(html()) :: binary()

Parses and then re-stringifies html, increasing the liklihood that two equivalent HTML strings can be considered equal.

iex> a = ~s|<p id="color">green</p>|
iex> b = ~s|<p  id = "color" >green</p>|
iex> a == b
false
iex> HtmlQuery.normalize(a) == HtmlQuery.normalize(b)
true
@spec parse(html()) :: Floki.html_tree()

Parses an HTML fragment using Floki.parse_fragment!/1, returning a Floki HTML tree.

@spec parse_doc(html()) :: Floki.html_tree()

Parses an HTML document using Floki.parse_document!/1, returning a Floki HTML tree.

@spec pretty(html()) :: binary()

Returns html as a prettified string, using Floki.raw_html/2 and its pretty: true option.

@spec table(
  html(),
  keyword()
) :: [[]]

Returns the contents of the table as a list of lists.

Options:

  • columns - a list of the indices of the columns to return; a list of column headers (as strings) to return, assuming that the first row of the table is the columns names; or :all to return all columns (which is the same as not specifying this option all all).
iex> html = "<table> <tr><th>A</th><th>B</th><th>C</th></tr> <tr><td>1</td><td>2</td><td>3</td></tr> </table>"
iex> HtmlQuery.table(html)
[
  ["A", "B", "C"],
  ["1", "2", "3"]
]
iex> HtmlQuery.table(html, columns: [0, 2])
[
  ["A", "C"],
  ["1", "3"]
]
iex> HtmlQuery.table(html, columns: [2, 0])
[
  ["C", "A"],
  ["3", "1"]
]
iex> HtmlQuery.table(html, columns: ["C", "A"])
[
  ["C", "A"],
  ["3", "1"]
]
@spec text(html()) :: binary()

Returns the text value of html.

iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find!(html, "select option[selected]") |> HtmlQuery.text()
"apples"