Floki
A HTML parser and seeker.
This is a simple HTML parser that enables searching using CSS like selectors.
You can search elements by class, tag name and id.
Example
Assuming that you have the following HTML:
<!doctype html>
<html>
<body>
<section id="content">
<p class="headline">Floki</p>
<a href="http://github.com/philss/floki">Github page</a>
</section>
</body>
</html>
You can perform the following queries:
- Floki.find(html, “#content”) : returns the section with all children;
- Floki.find(html, “.headline”) : returns a list with the
p
element; - Floki.find(html, “a”) : returns a list with the
a
element; - Floki.find(html, “#content a”) # returns all links inside content section;
- Floki.find(html, “.headline, a”) # returns the .headline elements and links.
Each HTML node is represented by a tuple like:
{tag_name, attributes, chidren_nodes}
Example of node:
{"p", [{"class", "headline"}], ["Floki"]}
So even if the only child node is the element text, it is represented inside a list.
You can write a simple HTML crawler (with support of HTTPoison) with a few lines of code:
html
|> Floki.find(".pages")
|> Floki.find("a")
|> Floki.attribute("href")
|> Enum.map(fn(url) -> HTTPoison.get!(url) end)
It is simple as that!
Summary↑
attribute(elements, attribute_name) | Returns a list with attribute values from elements |
attribute(html, selector, attribute_name) | Returns a list with attribute values for a given selector |
find(html, selector) | Finds elements inside a HTML tree or string. You can search by class, tag name or id |
parse(html) | Parses a HTML string |
text(html) | Returns the text nodes from a html tree |
Types ↑
html_tree :: tuple | list
Functions
Specs:
- attribute(binary | html_tree, binary) :: list
Returns a list with attribute values from elements.
Examples
iex> Floki.attribute("<a href='https://google.com'>Google</a>", "href")
["https://google.com"]
Specs:
- attribute(binary | html_tree, binary, binary) :: list
Returns a list with attribute values for a given selector.
Examples
iex> Floki.attribute("<a href='https://google.com'>Google</a>", "a", "href")
["https://google.com"]
Specs:
Finds elements inside a HTML tree or string. You can search by class, tag name or id.
It is possible to compose searches:
Floki.find(html_string, ".class")
|> Floki.find(".another-class-inside-small-scope")
Examples
iex> Floki.find("<p><span class=hint>hello</span></p>", ".hint")
[{"span", [{"class", "hint"}], ["hello"]}]
iex> "<body><div id=important><div>Content</div></div></body>" |> Floki.find("#important")
{"div", [{"id", "important"}], [{"div", [], ["Content"]}]}
iex> Floki.find("<p><a href='https://google.com'>Google</a></p>", "a")
[{"a", [{"href", "https://google.com"}], ["Google"]}]
Specs:
- parse(binary) :: html_tree
Parses a HTML string.
Examples
iex> Floki.parse("<div class=js-action>hello world</div>")
{"div", [{"class", "js-action"}], ["hello world"]}
Specs:
- text(html_tree | binary) :: binary
Returns the text nodes from a html tree.
Examples
iex> Floki.text("<div><span>something else</span>hello world</div>")
"hello world"