floki v0.0.4 → README

Floki

A HTML parser and seeker.

This is a simple HTML parser that enables searching using CSS like selectors.

You can search elements by class, tag name and id.

Example

Assuming that you have the following HTML:

<html>
<body>
<section id="content">
  <p class="headline">Floki</p>
  <a href="http://github.com/philss/floki">Github page</a>
</section>
</body>
</html>

You can perform the following queries:

Floki.find(html, “#content”) : returns the section with all children;
Floki.find(html, “.headline”) : returns a list with the p element;
Floki.find(html, “a”) : returns a list with the a element.

Each HTML node is represented by a tuple like:

{tag_name, attributes, chidren_nodes}

Example of node:

{"p", [{"class", "headline"}], ["Floki"]}

So even if the only child node is the element text, it is represented inside a list.

You can write a simple HTML crawler (with support of HTTPoison) with a few lines of code:

html
  |> Floki.find(".pages")
  |> Floki.find("a")
  |> Floki.attribute("href")
  |> Enum.map(fn(url) -> HTTPoison.get!(url) end)

It is simple as that!

API

To parse a HTML document, try:

html = """
  <html>
  <body>
    <div class="example"></div>
  </body>
  </html>
"""

Floki.parse(html)
# => {"html", [], [{"body", [], [{"div", [{"class", "example"}], []}]}]}

To find elements with the class example, try:

Floki.find(html, ".example")
# => [{"div", [{"class", "example"}], []}]

To fetch some attribute from elements, try:

Floki.attribute(html, ".example", "class") # href or src are good possibilities to fetch links
# => ["example"]

You can also get attributes from elements that you already have:

Floki.find(html, ".example")
|> Floki.attribute("class")
# => ["example"]

License

Floki is under MIT license. Check the LICENSE file for more details.