Crawly.RequestsStorage.Worker (Crawly v0.17.2) View Source

Requests Storage, is a module responsible for storing requests for a given spider.

Automatically filters out already seen requests (uses fingerprints approach to detect already visited pages).

Pipes all requests through a list of middlewares, which do pre-processing of all requests before storing them

Link to this section Summary

Functions

Returns a specification to start this module under a supervisor.

Callback implementation for GenServer.init/1.

Pop a request out of requests storage

Returns all scheduled requests (used for some sort of preview)

Get statistics from the requests storage

Store individual request or multiple requests

Link to this section Functions

Returns a specification to start this module under a supervisor.

See Supervisor.

Callback implementation for GenServer.init/1.

Specs

pop(pid()) :: Crawly.Request.t() | nil

Pop a request out of requests storage

Specs

requests(pid()) :: {:requests, [Crawly.Request.t()]}

Returns all scheduled requests (used for some sort of preview)

Link to this function

start_link(spider_name, crawl_id)

View Source

Specs

stats(pid()) :: {:stored_requests, non_neg_integer()}

Get statistics from the requests storage

Specs

Store individual request or multiple requests