View Source HtmlQuery (HtmlQuery v0.7.1)
Some simple HTML query functions. Delegates the hard work to Floki.
data-types
Data types
All functions accept HTML in the form of a string, a Floki HTML tree, or a Floki HTML node.
Others expect only a Floki HTML node or a Floki HTML tree. See HtmlQuery.html/0
.
Some functions take a CSS selector, which can be a string, a keyword list, or a list.
See HtmlQuery.Css.selector/0
.
main-query-functions
Main query functions
The main query functions take an HTML string or some parsed HTML, and a selector.
all/2 | return all elements matching the selector |
find/2 | return the first element that matches the selector |
find!/2 | return the only element that matches the selector |
parsing-functions
Parsing functions
parse/1 | parses an HTML fragment into a [Floki HTML tree] |
parse_doc/1 | parses an HTML doc into a [Floki HTML tree] |
extraction-functions
Extraction functions
attr/2 | returns the attribute value as a string |
form_fields/1 | returns the names and values of form fields as a map |
meta_tags/1 | returns the names and values of metadata fields |
table/2 | returns the cells of a table as a list of lists |
text/1 | returns the text contents as a single string |
utility-functions
Utility functions
inspect_html/2 | prints prettified HTML with a label |
normalize/1 | parses and re-stringifies HTML |
pretty/1 | prettifies HTML |
alias
Alias
If you use HtmlQuery a lot, you may want to alias it to the recommended shortcut "Hq":
alias HtmlQuery, as: Hq
examples
Examples
Get the value of a selected option:
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find(html, "select option[selected]") |> HtmlQuery.attr("value")
"a"
Get the text of a selected option, raising if there are more than one:
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find!(html, "select option[selected]") |> HtmlQuery.text()
"apples"
Get the text of all the options:
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.all(html, "select option") |> Enum.map(&HtmlQuery.text/1)
["apples", "bananas"]
Use a keyword list as the selector (see HtmlQuery.CSS
for details on selectors):
iex> html = ~s|<div> <a href="/logout" test-role="logout-link">logout</a> </div>|
iex> HtmlQuery.find!(html, test_role: "logout-link") |> HtmlQuery.attr("href")
"/logout"
Link to this section Summary
Types
A string or atom representing an attribute name. If an atom, underscores are converted to dashes.
A string, a struct that implements the String.Chars
protocol,
a Floki HTML tree, or a Floki HTML node.
Functions
Finds all elements in html
that match selector
, returning a Floki HTML tree.
Returns the value of attr
from the outermost element of html
.
If attr
is an atom, any underscores are converted to dashes.
Finds the first element in html
that matches selector
, returning a Floki HTML node.
Like find/2
but raises unless exactly one element is found.
Returns a map containing the form fields of form selector
in html
. Because it returns a map, any information
about the order of form fields is lost.
Prints prettified html
with a label, and then returns the original html.
Extracts all the meta tags from html
, returning a list of maps.
Parses and then re-stringifies html
, increasing the liklihood that two equivalent HTML strings can
be considered equal.
Parses an HTML fragment using Floki.parse_fragment!/1
, returning a Floki HTML tree.
Parses an HTML document using Floki.parse_document!/1
, returning a Floki HTML tree.
Returns html
as a prettified string, using Floki.raw_html/2
and its pretty: true
option.
Returns the contents of the table as a list of lists.
Returns the text value of html
.
Link to this section Types
A string or atom representing an attribute name. If an atom, underscores are converted to dashes.
@type html() :: binary() | String.Chars.t() | Floki.html_tree() | Floki.html_node()
A string, a struct that implements the String.Chars
protocol,
a Floki HTML tree, or a Floki HTML node.
Link to this section Functions
@spec all(html(), HtmlQuery.Css.selector()) :: Floki.html_tree()
Finds all elements in html
that match selector
, returning a Floki HTML tree.
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.all(html, "option")
[
{"option", [{"value", "a"}, {"selected", "selected"}], ["apples"]},
{"option", [{"value", "b"}], ["bananas"]}
]
Returns the value of attr
from the outermost element of html
.
If attr
is an atom, any underscores are converted to dashes.
iex> html = ~s|<div> <a href="/logout" test-role="logout-link">logout</a> </div>|
iex> HtmlQuery.find!(html, test_role: "logout-link") |> HtmlQuery.attr("href")
"/logout"
@spec find(html(), HtmlQuery.Css.selector()) :: Floki.html_node() | nil
Finds the first element in html
that matches selector
, returning a Floki HTML node.
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find(html, "select option[selected]")
{"option", [{"value", "a"}, {"selected", "selected"}], ["apples"]}
@spec find!(html(), HtmlQuery.Css.selector()) :: Floki.html_node()
Like find/2
but raises unless exactly one element is found.
Returns a map containing the form fields of form selector
in html
. Because it returns a map, any information
about the order of form fields is lost.
iex> html = ~s|<form> <input type="text" name="color" value="green"> <textarea name="desc">A tree</textarea> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{color: "green", desc: "A tree"}
Field names are converted to snake case atoms:
iex> html = ~s|<form> <input type="text" name="favorite-color" value="green"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{favorite_color: "green"}
If form field names are in foo[bar]
format, then foo
becomes a key to a nested map containing bar
:
iex> html = ~s|<form> <input type="text" name="profile[name]" value="fido"> <input type="text" name="profile[age]" value="10"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{profile: %{name: "fido", age: "10"}}
If a text field has no value attribute, it will not be returned at all:
iex> html = ~s|<form> <input type="text" name="no-value"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{}
iex> html = ~s|<form> <input type="text" name="empty-value" value=""> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{empty_value: ""}
iex> html = ~s|<form> <input type="text" name="non-empty-value" value="something"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{non_empty_value: "something"}
The checked value of a radio button set is returned, or nil
is returned if no value is checked:
iex> html = ~s|<form> <input type="radio" name="x" value="1"> <input type="radio" name="x" value="2" checked> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: "2"}
iex> html = ~s|<form> <input type="radio" name="x" value="1"> <input type="radio" name="x" value="2"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: nil}
All checked values of checkboxes are returned as a list, or []
is returned if no values are checked:
iex> html = ~s|<form> <input type="checkbox" name="x" value="1" checked> <input type="checkbox" name="x" value="2" checked> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: ["1", "2"]}
iex> html = ~s|<form> <input type="checkbox" name="x" value="1"> <input type="checkbox" name="x" value="2"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: []}
Prints prettified html
with a label, and then returns the original html.
Extracts all the meta tags from html
, returning a list of maps.
iex> html = ~s|<head> <meta charset="utf-8"/> <meta http-equiv="X-UA-Compatible" content="IE=edge"/> </head>|
iex> HtmlQuery.meta_tags(html)
[%{"charset" => "utf-8"}, %{"content" => "IE=edge", "http-equiv" => "X-UA-Compatible"}]
Parses and then re-stringifies html
, increasing the liklihood that two equivalent HTML strings can
be considered equal.
iex> a = ~s|<p id="color">green</p>|
iex> b = ~s|<p id = "color" >green</p>|
iex> a == b
false
iex> HtmlQuery.normalize(a) == HtmlQuery.normalize(b)
true
@spec parse(html()) :: Floki.html_tree()
Parses an HTML fragment using Floki.parse_fragment!/1
, returning a Floki HTML tree.
@spec parse_doc(html()) :: Floki.html_tree()
Parses an HTML document using Floki.parse_document!/1
, returning a Floki HTML tree.
Returns html
as a prettified string, using Floki.raw_html/2
and its pretty: true
option.
Returns the contents of the table as a list of lists.
Options:
columns
- a list of the indices of the columns to return; a list of column headers (as strings) to return, assuming that the first row of the table is the columns names; or:all
to return all columns (which is the same as not specifying this option all all).
iex> html = "<table> <tr><th>A</th><th>B</th><th>C</th></tr> <tr><td>1</td><td>2</td><td>3</td></tr> </table>"
iex> HtmlQuery.table(html)
[
["A", "B", "C"],
["1", "2", "3"]
]
iex> HtmlQuery.table(html, columns: [0, 2])
[
["A", "C"],
["1", "3"]
]
iex> HtmlQuery.table(html, columns: [2, 0])
[
["C", "A"],
["3", "1"]
]
iex> HtmlQuery.table(html, columns: ["C", "A"])
[
["C", "A"],
["3", "1"]
]
Returns the text value of html
.
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find!(html, "select option[selected]") |> HtmlQuery.text()
"apples"