crawlie v1.0.0 Crawlie.ParserLogic behaviour
Summary
Callbacks
Extracts the final data from the parsed page
Extracts the uri’s to be crawled subsequently
Parses the retrieved page to user-defined data
Types
Callbacks
extract_data(Crawlie.Response.t, parsed, options :: Keyword.t) :: [result]
Extracts the final data from the parsed page.
Note, this callback shoud return a list - you can return one, zero or many items
that will be put in the Flow.t/0
returned by Crawlie.crawl/3
- similar
as in Enum.flat_map/2
.
extract_uris(Crawlie.Response.t, parsed, options :: Keyword.t) :: [URI.t | String.t]
Extracts the uri’s to be crawled subsequently.
Parses the retrieved page to user-defined data.
The parsed/0
response gets passed on to subsequent operations along with the
original Crawlie.Response.t/0
.
Returning :skip
or {:skip, reason}
skips the page from further processing without
signalling an error. This can be used for omitting pages with unsupported / not
interesting content types.
If you don’t need to transform the received Crawlie.Response.t/0
, you can use the default
implementation or return {:ok, :this_can_be_whatever}
.