Floki v0.17.2 Floki View Source
Floki is a simple HTML parser that enables search for nodes using CSS selectors.
Example
Assuming that you have the following HTML:
<!doctype html>
<html>
<body>
<section id="content">
<p class="headline">Floki</p>
<a href="http://github.com/philss/floki">Github page</a>
<span data-model="user">philss</span>
</section>
</body>
</html>
Examples of queries that you can perform:
- Floki.find(html, “#content”)
- Floki.find(html, “.headline”)
- Floki.find(html, “a”)
- Floki.find(html, “[data-model=user]”)
- Floki.find(html, “#content a”)
- Floki.find(html, “.headline, a”)
Each HTML node is represented by a tuple like:
{tag_name, attributes, children_nodes}
Example of node:
{"p", [{"class", "headline"}], ["Floki"]}
So even if the only child node is the element text, it is represented inside a list.
You can write a simple HTML crawler (with support of HTTPoison) with a few lines of code:
html
|> Floki.find(".pages a")
|> Floki.attribute("href")
|> Enum.map(fn(url) -> HTTPoison.get!(url) end)
It is simple as that!
Link to this section Summary
Functions
Returns a list with attribute values from elements
Returns a list with attribute values for a given selector
Returns the nodes from a HTML tree that don’t match the filter selector
Find elements inside a HTML tree or string
Parses a HTML string
Converts HTML tree to raw HTML. Note that the resultant HTML may be different from the original one. Spaces after tags and doctypes are ignored
Returns the text nodes from a HTML tree.
By default, it will perform a deep search through the HTML tree.
You can disable deep search with the option deep
assigned to false.
You can include content of script tags with the option js
assigned to true.
You can specify a separator between nodes content
Link to this section Types
Link to this section Functions
attribute(binary | html_tree, binary) :: list
Returns a list with attribute values from elements.
Examples
iex> Floki.attribute("<a href=https://google.com>Google</a>", "href")
["https://google.com"]
iex> Floki.attribute([{"a", [{"href", "https://google.com"}], ["Google"]}], "href")
["https://google.com"]
attribute(binary | html_tree, binary, binary) :: list
Returns a list with attribute values for a given selector.
Examples
iex> Floki.attribute("<a href='https://google.com'>Google</a>", "a", "href")
["https://google.com"]
iex> Floki.attribute([{"a", [{"href", "https://google.com"}], ["Google"]}], "a", "href")
["https://google.com"]
filter_out(binary | html_tree, binary) :: list
Returns the nodes from a HTML tree that don’t match the filter selector.
Examples
iex> Floki.filter_out("<div><script>hello</script> world</div>", "script")
{"div", [], [" world"]}
iex> Floki.filter_out([{"body", [], [{"script", [], []},{"div", [], []}]}], "script")
[{"body", [], [{"div", [], []}]}]
iex> Floki.filter_out("<div><!-- comment --> text</div>", :comment)
{"div", [], [" text"]}
Find elements inside a HTML tree or string.
Examples
iex> Floki.find("<p><span class=hint>hello</span></p>", ".hint")
[{"span", [{"class", "hint"}], ["hello"]}]
iex> Floki.find("<body><div id=important><div>Content</div></div></body>", "#important")
[{"div", [{"id", "important"}], [{"div", [], ["Content"]}]}]
iex> Floki.find("<p><a href='https://google.com'>Google</a></p>", "a")
[{"a", [{"href", "https://google.com"}], ["Google"]}]
iex> Floki.find([{ "div", [], [{"a", [{"href", "https://google.com"}], ["Google"]}]}], "div a")
[{"a", [{"href", "https://google.com"}], ["Google"]}]
Parses a HTML string.
Examples
iex> Floki.parse("<div class=js-action>hello world</div>")
{"div", [{"class", "js-action"}], ["hello world"]}
iex> Floki.parse("<div>first</div><div>second</div>")
[{"div", [], ["first"]}, {"div", [], ["second"]}]
Converts HTML tree to raw HTML. Note that the resultant HTML may be different from the original one. Spaces after tags and doctypes are ignored.
Examples
iex> Floki.parse(~s(<div class="wrapper">my content</div>)) |> Floki.raw_html
~s(<div class="wrapper">my content</div>)
Returns the text nodes from a HTML tree.
By default, it will perform a deep search through the HTML tree.
You can disable deep search with the option deep
assigned to false.
You can include content of script tags with the option js
assigned to true.
You can specify a separator between nodes content.
Examples
iex> Floki.text("<div><span>hello</span> world</div>")
"hello world"
iex> Floki.text("<div><span>hello</span> world</div>", deep: false)
" world"
iex> Floki.text("<div><script>hello</script> world</div>")
" world"
iex> Floki.text("<div><script>hello</script> world</div>", js: true)
"hello world"
iex> Floki.text("<ul><li>hello</li><li>world</li></ul>", sep: " ")
"hello world"
iex> Floki.text([{"div", [], ["hello world"]}])
"hello world"
iex> Floki.text([{"p", [], ["1"]},{"p", [], ["2"]}])
"12"