Floki v0.25.0 Floki View Source

Floki is a simple HTML parser that enables search for nodes using CSS selectors.

Example

Assuming that you have the following HTML:

<!doctype html>
<html>
<body>
  <section id="content">
    <p class="headline">Floki</p>
    <a href="http://github.com/philss/floki">Github page</a>
    <span data-model="user">philss</span>
  </section>
</body>
</html>

To parse this, you can use the function Floki.parse_document/1:

{:ok, html} = Floki.parse_document(doc)
# =>
# [{"html", [],
#   [
#     {"body", [],
#      [
#        {"section", [{"id", "content"}],
#         [
#           {"p", [{"class", "headline"}], ["Floki"]},
#           {"a", [{"href", "http://github.com/philss/floki"}], ["Github page"]},
#           {"span", [{"data-model", "user"}], ["philss"]}
#         ]}
#      ]}
#   ]}]

With this document you can perform queries such as:

  • Floki.find(html, "#content")
  • Floki.find(html, ".headline")
  • Floki.find(html, "a")
  • Floki.find(html, "[data-model=user]")
  • Floki.find(html, "#content a")
  • Floki.find(html, ".headline, a")

Each HTML node is represented by a tuple like:

{tag_name, attributes, children_nodes}

Example of node:

{"p", [{"class", "headline"}], ["Floki"]}

So even if the only child node is the element text, it is represented inside a list.

Link to this section Summary

Functions

Changes the attribute values of the elements matched by selector with the function mutation and returns the whole element tree

Returns a list with attribute values from elements.

Returns a list with attribute values for a given selector.

Returns the direct child nodes of a HTML tree.

Returns the nodes from a HTML tree that don't match the filter selector.

Find elements inside a HTML tree or string.

It receives a HTML tree structure as tuple and maps through all nodes with a given function that receives a tuple with {name, attributes}.

parse(html) deprecated

Parses a HTML Document from a String.

Parses a HTML Document from a string.

Parses a HTML Document from a string.

Parses a HTML fragment from a string.

Parses a HTML fragment from a string.

Converts HTML tree to raw HTML. Note that the resultant HTML may be different from the original one. Spaces after tags and doctypes are ignored.

Returns the text nodes from a HTML tree. By default, it will perform a deep search through the HTML tree. You can disable deep search with the option deep assigned to false. You can include content of script tags with the option js assigned to true. You can specify a separator between nodes content.

Traverses and updates a HTML tree structure.

Traverses and updates a HTML tree structure with an accumulator.

Link to this section Types

Link to this type

html_attribute()

View Source
html_attribute() :: {String.t(), String.t()}
Link to this type

html_comment()

View Source
html_comment() :: {:comment, String.t()}
Link to this type

html_doctype()

View Source
html_doctype() :: {:doctype, String.t(), String.t(), String.t()}
Link to this type

traverse_acc()

View Source
traverse_acc() :: any()

Link to this section Functions

Link to this function

attr(html_elem_tuple, selector, attribute_name, mutation)

View Source
attr(binary() | html_tree(), binary(), binary(), (binary() -> binary())) ::
  html_tree()

Changes the attribute values of the elements matched by selector with the function mutation and returns the whole element tree

Examples

iex> Floki.attr([{"div", [{"id", "a"}], []}], "#a", "id", fn(id) -> String.replace(id, "a", "b") end)
[{"div", [{"id", "b"}], []}]

iex> Floki.attr([{"div", [{"class", "name"}], []}], "div", "id", fn _ -> "b" end)
[{"div", [{"id", "b"}, {"class", "name"}], []}]
Link to this function

attribute(html, attribute_name)

View Source
attribute(binary() | html_tree(), binary()) :: list()

Returns a list with attribute values from elements.

Examples

iex> Floki.attribute([{"a", [{"href", "https://google.com"}], ["Google"]}], "href")
["https://google.com"]
Link to this function

attribute(html, selector, attribute_name)

View Source
attribute(binary() | html_tree(), binary(), binary()) :: list()

Returns a list with attribute values for a given selector.

Examples

iex> Floki.attribute([{"a", [{"href", "https://google.com"}], ["Google"]}], "a", "href")
["https://google.com"]

iex> Floki.attribute([{"a", [{"class", "foo"}, {"href", "https://google.com"}], ["Google"]}], "a", "class")
["foo"]
Link to this function

children(html, opts \\ [include_text: true])

View Source
children(html_tree(), Keyword.t()) :: html_tree()

Returns the direct child nodes of a HTML tree.

By default, it will also include all texts. You can disable this behaviour by using the option include_text to false

Examples

iex> Floki.children({"div", [], ["text", {"span", [], []}]})
["text", {"span", [], []}]

iex> Floki.children({"div", [], ["text", {"span", [], []}]}, include_text: false)
[{"span", [], []}]
Link to this function

filter_out(html, selector)

View Source
filter_out(binary() | html_tree(), Floki.FilterOut.selector()) :: list()

Returns the nodes from a HTML tree that don't match the filter selector.

Examples

iex> Floki.filter_out({"div", [], [{"script", [], ["hello"]}, " world"]}, "script")
{"div", [], [" world"]}

iex> Floki.filter_out([{"body", [], [{"script", [], []},{"div", [], []}]}], "script")
[{"body", [], [{"div", [], []}]}]

iex> Floki.filter_out({"div", [], [{:comment, "comment"}, " text"]}, :comment)
{"div", [], [" text"]}
Link to this function

find(html, selector)

View Source
find(binary() | html_tree(), binary()) :: html_tree()

Find elements inside a HTML tree or string.

Examples

iex> {:ok, html} = Floki.parse_fragment("<p><span class=hint>hello</span></p>")
iex> Floki.find(html, ".hint")
[{"span", [{"class", "hint"}], ["hello"]}]

iex> {:ok, html} = Floki.parse_fragment("<div id=important><div>Content</div></div>")
iex> Floki.find(html, "#important")
[{"div", [{"id", "important"}], [{"div", [], ["Content"]}]}]

iex> {:ok, html} = Floki.parse_fragment("<p><a href='https://google.com'>Google</a></p>")
iex> Floki.find(html, "a")
[{"a", [{"href", "https://google.com"}], ["Google"]}]

iex> Floki.find([{ "div", [], [{"a", [{"href", "https://google.com"}], ["Google"]}]}], "div a")
[{"a", [{"href", "https://google.com"}], ["Google"]}]
Link to this function

map(html_tree_list, fun)

View Source

It receives a HTML tree structure as tuple and maps through all nodes with a given function that receives a tuple with {name, attributes}.

It returns that structure transformed by the function.

Examples

iex> html = {"div", [{"class", "foo"}], ["text"]}
iex> Floki.map(html, fn({name, attrs}) -> {name, [{"data-name", "bar"} | attrs]} end)
{"div", [{"data-name", "bar"}, {"class", "foo"}], ["text"]}
This function is deprecated. Please use parse_document/1 or parse_fragment/1.

Parses a HTML Document from a String.

The expect string is a valid HTML, but the parser will try to parse even with errors.

Link to this function

parse_document(document)

View Source
parse_document(binary()) :: {:ok, html_tree()} | {:error, String.t()}

Parses a HTML Document from a string.

It will use the available parser. Check https://github.com/philss/floki#alternative-html-parsers for more details.

Example

iex> Floki.parse_document("<html><head></head><body>hello</body></html>")
{:ok, [{"html", [], [{"head", [], []}, {"body", [], ["hello"]}]}]}
Link to this function

parse_document!(document)

View Source
parse_document!(binary()) :: html_tree()

Parses a HTML Document from a string.

Similar to Floki.parse_document/1, but raises Floki.ParseError if there was an error parsing the document.

Example

iex> Floki.parse_document!("<html><head></head><body>hello</body></html>")
[{"html", [], [{"head", [], []}, {"body", [], ["hello"]}]}]
Link to this function

parse_fragment(fragment)

View Source
parse_fragment(binary()) :: {:ok, html_tree()} | {:error, String.t()}

Parses a HTML fragment from a string.

It will use the available parser. Check https://github.com/philss/floki#alternative-html-parsers for more details.

Link to this function

parse_fragment!(fragment)

View Source
parse_fragment!(binary()) :: html_tree()

Parses a HTML fragment from a string.

Similar to Floki.parse_fragment/1, but raises Floki.ParseError if there was an error parsing the fragment.

Link to this function

raw_html(html_tree, options \\ [])

View Source
raw_html(html_tree() | binary(), keyword()) :: binary()

Converts HTML tree to raw HTML. Note that the resultant HTML may be different from the original one. Spaces after tags and doctypes are ignored.

Floki.raw_html/2 accepts a keyword list of options. Currently, the only supported option is :encode, which can be set to true or false.

You can also control the encoding behaviour at the application level via config :floki, :encode_raw_html, true | false

Examples

iex> Floki.raw_html({:div, [class: "wrapper"], ["my content"]})
~s(<div class="wrapper">my content</div>)

iex> Floki.raw_html({:div, [class: "wrapper"], ["10 > 5"]}, encode: true)
~s(<div class="wrapper">10 &gt; 5</div>)

iex> Floki.raw_html({:div, [class: "wrapper"], ["10 > 5"]}, encode: false)
~s(<div class="wrapper">10 > 5</div>)
Link to this function

text(html, opts \\ [deep: true, js: false, style: true, sep: ""])

View Source

Returns the text nodes from a HTML tree. By default, it will perform a deep search through the HTML tree. You can disable deep search with the option deep assigned to false. You can include content of script tags with the option js assigned to true. You can specify a separator between nodes content.

Examples

iex> Floki.text({"div", [], [{"span", [], ["hello"]}, " world"]})
"hello world"

iex> Floki.text({"div", [], [{"span", [], ["hello"]}, " world"]}, deep: false)
" world"

iex> Floki.text({"div", [], [{"script", [], ["hello"]}, " world"]})
" world"

iex> Floki.text({"div", [], [{"script", [], ["hello"]}, " world"]}, js: true)
"hello world"

iex> Floki.text({"ul", [], [{"li", [], ["hello"]}, {"li", [], ["world"]}]}, sep: "-")
"hello-world"

iex> Floki.text([{"div", [], ["hello world"]}])
"hello world"

iex> Floki.text([{"p", [], ["1"]},{"p", [], ["2"]}])
"12"

iex> Floki.text({"div", [], [{"style", [], ["hello"]}, " world"]}, style: false)
" world"

iex> Floki.text({"div", [], [{"style", [], ["hello"]}, " world"]}, style: true)
"hello world"
Link to this function

traverse_and_update(html_tree, fun)

View Source
traverse_and_update(html_tree(), (html_tag() -> html_tag() | nil)) ::
  html_tree()

Traverses and updates a HTML tree structure.

This function returns a new tree structure that is the result of applying the given fun on all nodes.

The function fun receives a tuple with {name, attributes, children}, and should either return a similar tuple or nil to delete the current node.

Examples

iex> html = {"div", [], ["hello"]}
iex> Floki.traverse_and_update(html, fn {"div", attrs, children} ->
...>   {"p", attrs, children}
...> end)
{"p", [], ["hello"]}

iex> html = {"div", [], [{"span", [], ["hello"]}]}
iex> Floki.traverse_and_update(html, fn
...>   {"span", _attrs, _children} -> nil
...>   tag -> tag
...> end)
{"div", [], []}
Link to this function

traverse_and_update(html_tree, acc, fun)

View Source
traverse_and_update(
  html_tree(),
  traverse_acc(),
  (html_tag(), traverse_acc() -> {html_tag() | nil, traverse_acc()})
) :: html_tree()

Traverses and updates a HTML tree structure with an accumulator.

This function returns a new tree structure and the final value of accumulator which are the result of applying the given fun on all nodes.

The function fun receives a tuple with {name, attributes, children} and an accumulator, and should return a 2-tuple like {new_node, new_acc}, where new_node is either a similar tuple or nil to delete the current node, and new_acc is an updated value for the accumulator.

Examples

iex> html = [{"div", [], ["hello"]}, {"div", [], ["world"]}]
iex> Floki.traverse_and_update(html, 0, fn {"div", attrs, children}, acc ->
...>   {{"p", [{"data-count", to_string(acc)} | attrs], children}, acc + 1}
...> end)
{[
   {"p", [{"data-count", "0"}], ["hello"]},
   {"p", [{"data-count", "1"}], ["world"]}
 ], 2}

iex> html = {"div", [], [{"span", [], ["hello"]}]}
iex> Floki.traverse_and_update(html, [deleted: 0], fn
...>   {"span", _attrs, _children}, acc ->
...>     {nil, Keyword.put(acc, :deleted, acc[:deleted] + 1)}
...>   tag, acc ->
...>     {tag, acc}
...> end)
{{"div", [], []}, [deleted: 1]}