Summary

Types

parse_result()

parsed()

result()

Callbacks

extract_data(arg0, parsed, options)

Extracts the final data from the parsed page

extract_uris(arg0, parsed, options)

Extracts the uri’s to be crawled subsequently

parse(arg0, options)

Parses the retrieved page to user-defined data

Types

parse_result()

parse_result ::
  {:ok, parsed} |
  {:error, term} |
  :skip |
  {:skip, reason :: atom}

parsed()

parsed() :: term

result()

result() :: term

Callbacks

extract_data(arg0, parsed, options)

extract_data(Crawlie.Response.t, parsed, options :: Keyword.t) :: [result]

Extracts the final data from the parsed page.

Note, this callback shoud return a list - you can return one, zero or many items that will be put in the Flow.t/0 returned by Crawlie.crawl/3 - similar as in Enum.flat_map/2.

extract_uris(arg0, parsed, options)

extract_uris(Crawlie.Response.t, parsed, options :: Keyword.t) :: [URI.t | String.t]

Extracts the uri’s to be crawled subsequently.

parse(arg0, options)

parse(Crawlie.Response.t, options :: Keyword.t) :: parse_result

Parses the retrieved page to user-defined data.

The parsed/0 response gets passed on to subsequent operations along with the original Crawlie.Response.t/0.

Returning :skip or {:skip, reason} skips the page from further processing without signalling an error. This can be used for omitting pages with unsupported / not interesting content types.

If you don’t need to transform the received Crawlie.Response.t/0, you can use the default implementation or return {:ok, :this_can_be_whatever}.

crawlie

v1.0.0

crawlie v1.0.0 Crawlie.ParserLogic behaviour

Summary

Types

Callbacks

Types

Callbacks