View Source HtmlQuery (HtmlQuery v2.0.0)
A concise HTML query API. HTML parsing is handled by Floki.
We created a related library called XmlQuery which has the same API but is used for querying XML. You can read more about them in Querying HTML and XML in Elixir with HtmlQuery and XmlQuery.
Data types
All functions can accept HTML in the form of a string, a Floki HTML tree, a Floki HTML node, or anything that
implements the String.Chars
protocol. See HtmlQuery.html/0
.
Some functions take a CSS selector, which can be a string, a keyword list, or a list.
See HtmlQuery.Css.selector/0
.
Query functions
all/2 | return all elements matching the selector |
find/2 | return the first element that matches the selector |
find!/2 | return the only element that matches the selector, or raise |
Extraction functions
attr/2 | returns the attribute value as a string |
form_fields/1 | returns the names and values of form fields as a map |
meta_tags/1 | returns the names and values of metadata fields |
table/2 | returns the cells of a table as a list of lists or maps |
text/1 | returns the text contents as a single string |
Parsing functions
parse/1 | parses an HTML fragment into a [Floki HTML tree] |
parse_doc/1 | parses an HTML doc into a [Floki HTML tree] |
Utility functions
inspect_html/2 | prints prettified HTML with a label |
normalize/1 | parses and re-stringifies HTML |
pretty/1 | prettifies HTML |
reject/2 | removes nodes that match the selector |
Alias
If you use HtmlQuery a lot, you may want to alias it to the recommended shortcut "Hq":
alias HtmlQuery, as: Hq
Examples
Get the value of a selected option:
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find(html, "select option[selected]") |> HtmlQuery.attr("value")
"a"
Get the text of a selected option, raising if there are more than one:
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find!(html, "select option[selected]") |> HtmlQuery.text()
"apples"
Get the text of all the options:
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.all(html, "select option") |> Enum.map(&HtmlQuery.text/1)
["apples", "bananas"]
Use a keyword list as the selector (see HtmlQuery.Css
for details on selectors):
iex> html = ~s|<div> <a href="/logout" test-role="logout-link">logout</a> </div>|
iex> HtmlQuery.find!(html, test_role: "logout-link") |> HtmlQuery.attr("href")
"/logout"
Summary
Types
A string or atom representing an attribute name. If an atom, underscores are converted to dashes.
A string, a struct that implements the String.Chars
protocol,
a Floki HTML tree, or a Floki HTML node.
Functions
Finds all elements in html
that match selector
, returning a Floki HTML tree.
Returns the value of attr
from the outermost element of html
.
If attr
is an atom, any underscores are converted to dashes.
Finds the first element in html
that matches selector
, returning a Floki HTML node.
Like find/2
but raises unless exactly one element is found.
Returns a map containing the form fields of form selector
in html
. Because it returns a map, any information
about the order of form fields is lost.
Prints prettified html
with a label, and then returns the original html.
Extracts all the meta tags from html
, returning a list of maps.
Parses and then re-stringifies html
, increasing the liklihood that two equivalent HTML strings can
be considered equal.
Parses an HTML fragment using Floki.parse_fragment!/1
, returning a Floki HTML tree.
Parses an HTML document using Floki.parse_document!/1
, returning a Floki HTML tree.
Returns html
as a prettified string (delgates to Floki.raw_html/2
with the pretty: true
option).
Returns html
after removing all nodes that don't match selector
(delegates to Floki.filter_out/2
).
Returns the contents of the table as a list of lists.
Returns the text value of html
.
Types
A string or atom representing an attribute name. If an atom, underscores are converted to dashes.
@type html() :: binary() | String.Chars.t() | Floki.html_tree() | Floki.html_node()
A string, a struct that implements the String.Chars
protocol,
a Floki HTML tree, or a Floki HTML node.
Functions
@spec all(html(), HtmlQuery.Css.selector()) :: Floki.html_tree()
Finds all elements in html
that match selector
, returning a Floki HTML tree.
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.all(html, "option")
[
{"option", [{"value", "a"}, {"selected", "selected"}], ["apples"]},
{"option", [{"value", "b"}], ["bananas"]}
]
Returns the value of attr
from the outermost element of html
.
If attr
is an atom, any underscores are converted to dashes.
iex> html = ~s|<div> <a href="/logout" test-role="logout-link">logout</a> </div>|
iex> HtmlQuery.find!(html, test_role: "logout-link") |> HtmlQuery.attr("href")
"/logout"
@spec find(html(), HtmlQuery.Css.selector()) :: Floki.html_node() | nil
Finds the first element in html
that matches selector
, returning a Floki HTML node.
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find(html, "select option[selected]")
{"option", [{"value", "a"}, {"selected", "selected"}], ["apples"]}
@spec find!(html(), HtmlQuery.Css.selector()) :: Floki.html_node()
Like find/2
but raises unless exactly one element is found.
Returns a map containing the form fields of form selector
in html
. Because it returns a map, any information
about the order of form fields is lost.
iex> html = ~s|<form> <input type="text" name="color" value="green"> <textarea name="desc">A tree</textarea> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{color: "green", desc: "A tree"}
Field names are converted to snake case atoms:
iex> html = ~s|<form> <input type="text" name="favorite-color" value="green"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{favorite_color: "green"}
If form field names are in foo[bar]
format, then foo
becomes a key to a nested map containing bar
:
iex> html = ~s|<form> <input type="text" name="profile[name]" value="fido"> <input type="text" name="profile[age]" value="10"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{profile: %{name: "fido", age: "10"}}
If a text field has no value attribute, it will not be returned at all:
iex> html = ~s|<form> <input type="text" name="no-value"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{}
iex> html = ~s|<form> <input type="text" name="empty-value" value=""> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{empty_value: ""}
iex> html = ~s|<form> <input type="text" name="non-empty-value" value="something"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{non_empty_value: "something"}
The checked value of a radio button set is returned, or nil
is returned if no value is checked:
iex> html = ~s|<form> <input type="radio" name="x" value="1"> <input type="radio" name="x" value="2" checked> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: "2"}
iex> html = ~s|<form> <input type="radio" name="x" value="1"> <input type="radio" name="x" value="2"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: nil}
When evaluating checkboxes, the name
attribute of the input defines whether or not a term or a
list will be returned. A name that ends in []
allows a browser to send multiple values, in which case
our form fields will return an array of values. A name that does not end in []
will evaluate to a
single value, the last checked value in a list:
iex> html = ~s|<form> <input type="checkbox" name="x" value="1" checked> <input type="checkbox" name="x" value="2" checked> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: "2"}
iex> html = ~s|<form> <input type="checkbox" name="x" value="1"> <input type="checkbox" name="x" value="2"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: nil}
iex> html = ~s|<form> <input type="checkbox" name="x[]" value="1" checked> <input type="checkbox" name="x[]" value="2" checked> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: ["1", "2"]}
iex> html = ~s|<form> <input type="checkbox" name="x[]" value="1"> <input type="checkbox" name="x[]" value="2"> </form>|
iex> html |> HtmlQuery.find("form") |> HtmlQuery.form_fields()
%{x: []}
Prints prettified html
with a label, and then returns the original html.
Extracts all the meta tags from html
, returning a list of maps.
iex> html = ~s|<head> <meta charset="utf-8"/> <meta http-equiv="X-UA-Compatible" content="IE=edge"/> </head>|
iex> HtmlQuery.meta_tags(html)
[%{"charset" => "utf-8"}, %{"content" => "IE=edge", "http-equiv" => "X-UA-Compatible"}]
Parses and then re-stringifies html
, increasing the liklihood that two equivalent HTML strings can
be considered equal.
iex> a = ~s|<p id="color">green</p>|
iex> b = ~s|<p id = "color" >green</p>|
iex> a == b
false
iex> HtmlQuery.normalize(a) == HtmlQuery.normalize(b)
true
@spec parse(html()) :: Floki.html_tree()
Parses an HTML fragment using Floki.parse_fragment!/1
, returning a Floki HTML tree.
@spec parse_doc(html()) :: Floki.html_tree()
Parses an HTML document using Floki.parse_document!/1
, returning a Floki HTML tree.
Returns html
as a prettified string (delgates to Floki.raw_html/2
with the pretty: true
option).
@spec reject(html(), HtmlQuery.Css.selector()) :: html()
Returns html
after removing all nodes that don't match selector
(delegates to Floki.filter_out/2
).
iex> html = ~s|<div> <span id="name">Alice</span> <span id="password">topaz</span> </div>|
iex> HtmlQuery.reject(html, id: "password") |> HtmlQuery.normalize()
~s|<div><span id="name">Alice</span></div>|
Returns the contents of the table as a list of lists.
Options:
as
- if:lists
(the default), returns the table as a list of lists; if:maps
, returns the table as a list of maps.only
- a list of the indices of the columns to return; a list of column headers (as strings) to return, assuming that the first row of the table is the columns names; or:all
to return all columns (which is the same as not specifying this option at all).except
- returns all the columns except the ones whose indices or names are given.only
andexcept
can be combined to further reduce the set of columns.headers
- iftrue
(the default), returns the list of headers along with the rows. Ignored ifas
option is:map
.
Deprecated options:
columns
- useonly
instead.
iex> html = "<table> <tr><th>A</th><th>B</th><th>C</th></tr> <tr><td>1</td><td>2</td><td>3</td></tr> </table>"
iex> HtmlQuery.table(html)
[
["A", "B", "C"],
["1", "2", "3"]
]
iex> HtmlQuery.table(html, as: :maps)
[
%{"A" => "1", "B" => "2", "C" => "3"}
]
iex> HtmlQuery.table(html, only: [0, 2])
[
["A", "C"],
["1", "3"]
]
iex> HtmlQuery.table(html, only: [2, 0])
[
["C", "A"],
["3", "1"]
]
iex> HtmlQuery.table(html, only: ["C", "A"])
[
["C", "A"],
["3", "1"]
]
iex> HtmlQuery.table(html, except: ["C", "A"])
[
["B"],
["2"]
]
iex> HtmlQuery.table(html, only: ["C", "A"], headers: false)
[
["3", "1"]
]
Returns the text value of html
.
iex> html = ~s|<select> <option value="a" selected>apples</option> <option value="b">bananas</option> </select>|
iex> HtmlQuery.find!(html, "select option[selected]") |> HtmlQuery.text()
"apples"