View Source ExCrawlzy.Utils (ExCrawlzy v0.1.1)
Utilities for the whole library
helping to parsing data and extract from css docs
Summary
Functions
Transform binary to readable strings
Returns if some element exist
Extract specific data from html element props
Extract specific data based on html inner element as text, works great for html simple elements like span
, p
, h1
and even more
Functions
Transform binary to readable strings
iex> ExCrawlzy.Utils.binary_to_string(<<115, 111, 109, 101, 32, 115, 116, 114, 105, 110, 103>>)
"some string"
@spec exist(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()
Returns if some element exist
Examples:
iex> ExCrawlzy.Utils.exist([{"h1", [{"class", "some_class"}], ["My text inside a h1"]}])
true
iex> ExCrawlzy.Utils.exist([])
false
@spec iframe(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()
@spec img(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()
@spec link(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()
@spec props(String.t(), String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()
Extract specific data from html element props
For example on a simple link <a href="http://site.example">My Link</a>
you can extract just the data of the href
prop
Examples:
iex> ExCrawlzy.Utils.props("href", [{"a", [{"href", "http://site.example"}], []}])
"http://site.example"
iex> ExCrawlzy.Utils.props("target", [{"span", [{"target", "some_value"}], []}])
"some_value"
@spec text(String.t() | Floki.html_tree() | Floki.html_node()) :: String.t()
Extract specific data based on html inner element as text, works great for html simple elements like span
, p
, h1
and even more
For example on a simple link <h1>My text inside a h1</h1>
you can extract the text inside the element
Examples:
iex> ExCrawlzy.Utils.text([{"h1", [{"class", "some_class"}], ["My text inside a h1"]}])
"My text inside a h1"