Floki

Build Status

A HTML parser and seeker.

This is a simple HTML parser that enables searching using CSS like selectors.

You can search elements by class, tag name and id.

Check the documentation.

Example

Assuming that you have the following HTML:

<html>
<body>
<section id="content">
  <p class="headline">Floki</p>
  <a href="http://github.com/philss/floki">Github page</a>
</section>
</body>
</html>

You can perform the following queries:

Each HTML node is represented by a tuple like:

{tag_name, attributes, chidren_nodes}

Example of node:

{"p", [{"class", "headline"}], ["Floki"]}

So even if the only child node is the element text, it is represented inside a list.

You can write a simple HTML crawler (with support of HTTPoison) with a few lines of code:

html
  |> Floki.find(".pages")
  |> Floki.find("a")
  |> Floki.attribute("href")
  |> Enum.map(fn(url) -> HTTPoison.get!(url) end)

It is simple as that!

API

To parse a HTML document, try:

html = """
  <html>
  <body>
    <div class="example"></div>
  </body>
  </html>
"""

Floki.parse(html)
# => {"html", [], [{"body", [], [{"div", [{"class", "example"}], []}]}]}

To find elements with the class example, try:

Floki.find(html, ".example")
# => [{"div", [{"class", "example"}], []}]

To fetch some attribute from elements, try:

Floki.attribute(html, ".example", "class") # href or src are good possibilities to fetch links
# => ["example"]

You can also get attributes from elements that you already have:

Floki.find(html, ".example")
|> Floki.attribute("class")
# => ["example"]

License

Floki is under MIT license. Check the LICENSE file for more details.