Floki v0.26.0 Floki View Source
Floki is a simple HTML parser that enables search for nodes using CSS selectors.
Example
Assuming that you have the following HTML:
<!doctype html>
<html>
<body>
<section id="content">
<p class="headline">Floki</p>
<a href="http://github.com/philss/floki">Github page</a>
<span data-model="user">philss</span>
</section>
</body>
</html>
To parse this, you can use the function Floki.parse_document/1
:
{:ok, html} = Floki.parse_document(doc)
# =>
# [{"html", [],
# [
# {"body", [],
# [
# {"section", [{"id", "content"}],
# [
# {"p", [{"class", "headline"}], ["Floki"]},
# {"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]},
# {"span", [{"data-model", "user"}], ["philss"]}
# ]}
# ]}
# ]}]
With this document you can perform queries such as:
Floki.find(html, "#content")
Floki.find(html, ".headline")
Floki.find(html, "a")
Floki.find(html, "[data-model=user]")
Floki.find(html, "#content a")
Floki.find(html, ".headline, a")
Each HTML node is represented by a tuple like:
{tag_name, attributes, children_nodes}
Example of node:
{"p", [{"class", "headline"}], ["Floki"]}
So even if the only child node is the element text, it is represented inside a list.
Link to this section Summary
Functions
Changes the attribute values of the elements matched by selector
with the function mutation
and returns the whole element tree
Returns a list with attribute values from elements.
Returns a list with attribute values for a given selector.
Returns the direct child nodes of a HTML tree.
Returns the nodes from a HTML tree that don't match the filter selector.
Find elements inside a HTML tree or string.
It receives a HTML tree structure as tuple and maps through all nodes with a given function that receives a tuple with {name, attributes}.
Parses a HTML Document from a String.
Parses a HTML Document from a string.
Parses a HTML Document from a string.
Parses a HTML fragment from a string.
Parses a HTML fragment from a string.
Converts HTML tree to raw HTML. Note that the resultant HTML may be different from the original one. Spaces after tags and doctypes are ignored.
Returns the text nodes from a HTML tree.
By default, it will perform a deep search through the HTML tree.
You can disable deep search with the option deep
assigned to false.
You can include content of script tags with the option js
assigned to true.
You can specify a separator between nodes content.
Traverses and updates a HTML tree structure.
Traverses and updates a HTML tree structure with an accumulator.
Link to this section Types
html_tag()
View Sourcehtml_tag() :: {String.t(), [html_attribute()], [html_tag() | String.t() | html_comment()]}
html_tree()
View Sourcehtml_tree() :: [html_comment() | html_doctype() | html_tag()]
Link to this section Functions
Changes the attribute values of the elements matched by selector
with the function mutation
and returns the whole element tree
Examples
iex> Floki.attr([{"div", [{"id", "a"}], []}], "#a", "id", fn(id) -> String.replace(id, "a", "b") end)
[{"div", [{"id", "b"}], []}]
iex> Floki.attr([{"div", [{"class", "name"}], []}], "div", "id", fn _ -> "b" end)
[{"div", [{"id", "b"}, {"class", "name"}], []}]
Returns a list with attribute values from elements.
Examples
iex> Floki.attribute([{"a", [{"href", "https://google.com"}], ["Google"]}], "href")
["https://google.com"]
Returns a list with attribute values for a given selector.
Examples
iex> Floki.attribute([{"a", [{"href", "https://google.com"}], ["Google"]}], "a", "href")
["https://google.com"]
iex> Floki.attribute([{"a", [{"class", "foo"}, {"href", "https://google.com"}], ["Google"]}], "a", "class")
["foo"]
Returns the direct child nodes of a HTML tree.
By default, it will also include all texts. You can disable this behaviour
by using the option include_text
to false
Examples
iex> Floki.children({"div", [], ["text", {"span", [], []}]})
["text", {"span", [], []}]
iex> Floki.children({"div", [], ["text", {"span", [], []}]}, include_text: false)
[{"span", [], []}]
Returns the nodes from a HTML tree that don't match the filter selector.
Examples
iex> Floki.filter_out({"div", [], [{"script", [], ["hello"]}, " world"]}, "script")
{"div", [], [" world"]}
iex> Floki.filter_out([{"body", [], [{"script", [], []},{"div", [], []}]}], "script")
[{"body", [], [{"div", [], []}]}]
iex> Floki.filter_out({"div", [], [{:comment, "comment"}, " text"]}, :comment)
{"div", [], [" text"]}
Find elements inside a HTML tree or string.
Examples
iex> {:ok, html} = Floki.parse_fragment("<p><span class=hint>hello</span></p>")
iex> Floki.find(html, ".hint")
[{"span", [{"class", "hint"}], ["hello"]}]
iex> {:ok, html} = Floki.parse_fragment("<div id=important><div>Content</div></div>")
iex> Floki.find(html, "#important")
[{"div", [{"id", "important"}], [{"div", [], ["Content"]}]}]
iex> {:ok, html} = Floki.parse_fragment("<p><a href='https://google.com'>Google</a></p>")
iex> Floki.find(html, "a")
[{"a", [{"href", "https://google.com"}], ["Google"]}]
iex> Floki.find([{ "div", [], [{"a", [{"href", "https://google.com"}], ["Google"]}]}], "div a")
[{"a", [{"href", "https://google.com"}], ["Google"]}]
It receives a HTML tree structure as tuple and maps through all nodes with a given function that receives a tuple with {name, attributes}.
It returns that structure transformed by the function.
Examples
iex> html = {"div", [{"class", "foo"}], ["text"]}
iex> Floki.map(html, fn({name, attrs}) -> {name, [{"data-name", "bar"} | attrs]} end)
{"div", [{"data-name", "bar"}, {"class", "foo"}], ["text"]}
Parses a HTML Document from a String.
The expect string is a valid HTML, but the parser will try to parse even with errors.
Parses a HTML Document from a string.
It will use the available parser. Check https://github.com/philss/floki#alternative-html-parsers for more details.
Example
iex> Floki.parse_document("<html><head></head><body>hello</body></html>")
{:ok, [{"html", [], [{"head", [], []}, {"body", [], ["hello"]}]}]}
Parses a HTML Document from a string.
Similar to Floki.parse_document/1
, but raises Floki.ParseError
if there was an
error parsing the document.
Example
iex> Floki.parse_document!("<html><head></head><body>hello</body></html>")
[{"html", [], [{"head", [], []}, {"body", [], ["hello"]}]}]
Parses a HTML fragment from a string.
It will use the available parser. Check https://github.com/philss/floki#alternative-html-parsers for more details.
Parses a HTML fragment from a string.
Similar to Floki.parse_fragment/1
, but raises Floki.ParseError
if there was an
error parsing the fragment.
Converts HTML tree to raw HTML. Note that the resultant HTML may be different from the original one. Spaces after tags and doctypes are ignored.
Floki.raw_html/2 accepts a keyword list of options. Currently, the
only supported option is :encode
, which can be set to true
or false
.
You can also control the encoding behaviour at the application level via
config :floki, :encode_raw_html, true | false
Examples
iex> Floki.raw_html({:div, [class: "wrapper"], ["my content"]})
~s(<div class="wrapper">my content</div>)
iex> Floki.raw_html({:div, [class: "wrapper"], ["10 > 5"]}, encode: true)
~s(<div class="wrapper">10 > 5</div>)
iex> Floki.raw_html({:div, [class: "wrapper"], ["10 > 5"]}, encode: false)
~s(<div class="wrapper">10 > 5</div>)
Returns the text nodes from a HTML tree.
By default, it will perform a deep search through the HTML tree.
You can disable deep search with the option deep
assigned to false.
You can include content of script tags with the option js
assigned to true.
You can specify a separator between nodes content.
Examples
iex> Floki.text({"div", [], [{"span", [], ["hello"]}, " world"]})
"hello world"
iex> Floki.text({"div", [], [{"span", [], ["hello"]}, " world"]}, deep: false)
" world"
iex> Floki.text({"div", [], [{"script", [], ["hello"]}, " world"]})
" world"
iex> Floki.text({"div", [], [{"script", [], ["hello"]}, " world"]}, js: true)
"hello world"
iex> Floki.text({"ul", [], [{"li", [], ["hello"]}, {"li", [], ["world"]}]}, sep: "-")
"hello-world"
iex> Floki.text([{"div", [], ["hello world"]}])
"hello world"
iex> Floki.text([{"p", [], ["1"]},{"p", [], ["2"]}])
"12"
iex> Floki.text({"div", [], [{"style", [], ["hello"]}, " world"]}, style: false)
" world"
iex> Floki.text({"div", [], [{"style", [], ["hello"]}, " world"]}, style: true)
"hello world"
Traverses and updates a HTML tree structure.
This function returns a new tree structure that is the result of applying the
given fun
on all nodes.
The function fun
receives a tuple with {name, attributes, children}
, and
should either return a similar tuple or nil
to delete the current node.
Examples
iex> html = {"div", [], ["hello"]}
iex> Floki.traverse_and_update(html, fn {"div", attrs, children} ->
...> {"p", attrs, children}
...> end)
{"p", [], ["hello"]}
iex> html = {"div", [], [{"span", [], ["hello"]}]}
iex> Floki.traverse_and_update(html, fn
...> {"span", _attrs, _children} -> nil
...> tag -> tag
...> end)
{"div", [], []}
traverse_and_update(html_tree, acc, fun)
View Sourcetraverse_and_update( html_tree(), traverse_acc(), (html_tag(), traverse_acc() -> {html_tag() | nil, traverse_acc()}) ) :: {html_tree(), traverse_acc()}
Traverses and updates a HTML tree structure with an accumulator.
This function returns a new tree structure and the final value of accumulator
which are the result of applying the given fun
on all nodes.
The function fun
receives a tuple with {name, attributes, children}
and
an accumulator, and should return a 2-tuple like {new_node, new_acc}
, where
new_node
is either a similar tuple or nil
to delete the current node, and
new_acc
is an updated value for the accumulator.
Examples
iex> html = [{"div", [], ["hello"]}, {"div", [], ["world"]}]
iex> Floki.traverse_and_update(html, 0, fn {"div", attrs, children}, acc ->
...> {{"p", [{"data-count", to_string(acc)} | attrs], children}, acc + 1}
...> end)
{[
{"p", [{"data-count", "0"}], ["hello"]},
{"p", [{"data-count", "1"}], ["world"]}
], 2}
iex> html = {"div", [], [{"span", [], ["hello"]}]}
iex> Floki.traverse_and_update(html, [deleted: 0], fn
...> {"span", _attrs, _children}, acc ->
...> {nil, Keyword.put(acc, :deleted, acc[:deleted] + 1)}
...> tag, acc ->
...> {tag, acc}
...> end)
{{"div", [], []}, [deleted: 1]}