View Source ExCrawlzy.Utils (ExCrawlzy v0.1.1)

Utilities for the whole library

helping to parsing data and extract from css docs

Summary

Functions

Transform binary to readable strings

Returns if some element exist

Extract specific data from html element props

Extract specific data based on html inner element as text, works great for html simple elements like span, p, h1 and even more

Functions

@spec binary_to_string(binary()) :: String.t()

Transform binary to readable strings

iex> ExCrawlzy.Utils.binary_to_string(<<115, 111, 109, 101, 32, 115, 116, 114, 105, 110, 103>>)
"some string"
@spec exist(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()

Returns if some element exist

Examples:

  iex> ExCrawlzy.Utils.exist([{"h1", [{"class", "some_class"}], ["My text inside a h1"]}])
  true

  iex> ExCrawlzy.Utils.exist([])
  false
@spec iframe(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()
@spec img(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()
@spec link(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()
@spec props(String.t(), String.t() | Floki.html_tree() | Floki.html_node()) ::
  String.t()

Extract specific data from html element props

For example on a simple link <a href="http://site.example">My Link</a> you can extract just the data of the href prop

Examples:

iex> ExCrawlzy.Utils.props("href", [{"a", [{"href", "http://site.example"}], []}])
"http://site.example"
iex> ExCrawlzy.Utils.props("target", [{"span", [{"target", "some_value"}], []}])
"some_value"
@spec text(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()

Extract specific data based on html inner element as text, works great for html simple elements like span, p, h1 and even more

For example on a simple link <h1>My text inside a h1</h1> you can extract the text inside the element

Examples:

  iex> ExCrawlzy.Utils.text([{"h1", [{"class", "some_class"}], ["My text inside a h1"]}])
  "My text inside a h1"