View Source ExCrawlzy.Utils (ExCrawlzy v0.1.1)

Utilities for the whole library

helping to parsing data and extract from css docs

Summary

Functions

binary_to_string(data)

Transform binary to readable strings

exist(crawled_data)

Returns if some element exist

iframe(html_element)

img(html_element)

link(html_element)

props(prop_keys, arg2)

Extract specific data from html element props

text(arg1)

Extract specific data based on html inner element as text, works great for html simple elements like span, p, h1 and even more

Functions

binary_to_string(data)

@spec binary_to_string(binary()) :: String.t()

Transform binary to readable strings

iex> ExCrawlzy.Utils.binary_to_string(<<115, 111, 109, 101, 32, 115, 116, 114, 105, 110, 103>>)
"some string"

exist(crawled_data)

@spec exist(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()

Returns if some element exist

Examples:

  iex> ExCrawlzy.Utils.exist([{"h1", [{"class", "some_class"}], ["My text inside a h1"]}])
  true

  iex> ExCrawlzy.Utils.exist([])
  false

iframe(html_element)

@spec iframe(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()

img(html_element)

@spec img(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()

link(html_element)

@spec link(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()

props(prop_keys, arg2)

@spec props(String.t(), String.t() | Floki.html_tree() | Floki.html_node()) ::
  String.t()

Extract specific data from html element props

For example on a simple link <a href="http://site.example">My Link</a> you can extract just the data of the href prop

Examples:

iex> ExCrawlzy.Utils.props("href", [{"a", [{"href", "http://site.example"}], []}])
"http://site.example"
iex> ExCrawlzy.Utils.props("target", [{"span", [{"target", "some_value"}], []}])
"some_value"

text(arg1)

@spec text(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()

Extract specific data based on html inner element as text, works great for html simple elements like span, p, h1 and even more

For example on a simple link <h1>My text inside a h1</h1> you can extract the text inside the element

Examples:

  iex> ExCrawlzy.Utils.text([{"h1", [{"class", "some_class"}], ["My text inside a h1"]}])
  "My text inside a h1"

Settings View Source ExCrawlzy.Utils (ExCrawlzy v0.1.1)

Summary

Functions

Functions

binary_to_string(data)

exist(crawled_data)

iframe(html_element)

img(html_element)

link(html_element)

props(prop_keys, arg2)

text(arg1)

View Source ExCrawlzy.Utils (ExCrawlzy v0.1.1)