View Source ExCrawlzy (ExCrawlzy v0.1.1)

Documentation for ExCrawlzy.

Summary

Functions

Request link and returns the raw content.

Request link and returns the raw content.

Types

@type map_key() :: String.t() | atom()
@type post_processing() :: atom() | {module(), atom()} | (any() -> String.t())
@type result() :: :ok | :error
@type selector_tuple() :: {String.t(), post_processing()}

Functions

Link to this function

crawl(link, clients \\ [])

View Source

Request link and returns the raw content.

Examples

iex> ExCrawlzy.crawl("http://some.site")
{:ok, "<!doctype html><html>  <head>    <title>the title</title>  </head>  <body>    <div id=\"the_body\">      the body      <div id=\"inner_field\">        inner field      </div>      <div id=\"inner_second_field\">        inner second field        <div id=\"the_number\">          2023        </div>      </div>      <div id=\"exist\">        this field exist      </div>      <a class=\"link_class\" href=\"http://some_external.link\"></a>      <img class=\"img_class\" src=\"http://some_external.link/image_path.jpg\" alt=\"some image\">    </div>  </body></html>"}
Link to this function

parse(mapping, raw_content)

View Source
@spec parse(
  %{required(map_key()) => selector_tuple()},
  String.t() | Floki.html_tree() | Floki.html_node()
) :: {result(), %{required(map_key()) => String.t()}}

Request link and returns the raw content.

Examples

iex> raw_content = "<html><head><title>the title</title></head><body><div id=\"the_body\">the body</div></body></html>"
iex> ExCrawlzy.parse(%{body: {"#the_body", :text}}, raw_content)
{:ok, %{body: "the body"}}