crawlie v1.0.0 Crawlie.ParserLogic behaviour

Summary

Callbacks

Extracts the final data from the parsed page

Extracts the uri’s to be crawled subsequently

Parses the retrieved page to user-defined data

Types

parse_result()
parse_result ::
  {:ok, parsed} |
  {:error, term} |
  :skip |
  {:skip, reason :: atom}
parsed()
parsed() :: term
result()
result() :: term

Callbacks

extract_data(arg0, parsed, options)
extract_data(Crawlie.Response.t, parsed, options :: Keyword.t) :: [result]

Extracts the final data from the parsed page.

Note, this callback shoud return a list - you can return one, zero or many items that will be put in the Flow.t/0 returned by Crawlie.crawl/3 - similar as in Enum.flat_map/2.

extract_uris(arg0, parsed, options)
extract_uris(Crawlie.Response.t, parsed, options :: Keyword.t) :: [URI.t | String.t]

Extracts the uri’s to be crawled subsequently.

parse(arg0, options)
parse(Crawlie.Response.t, options :: Keyword.t) :: parse_result

Parses the retrieved page to user-defined data.

The parsed/0 response gets passed on to subsequent operations along with the original Crawlie.Response.t/0.

Returning :skip or {:skip, reason} skips the page from further processing without signalling an error. This can be used for omitting pages with unsupported / not interesting content types.

If you don’t need to transform the received Crawlie.Response.t/0, you can use the default implementation or return {:ok, :this_can_be_whatever}.