Crawly.RequestsStorage.Worker (Crawly v0.13.0) View Source

Requests Storage, is a module responsible for storing requests for a given spider.

Automatically filters out already seen requests (uses fingerprints approach to detect already visited pages).

Pipes all requests through a list of middlewares, which do pre-processing of all requests before storing them

Link to this section Summary

Functions

Returns a specification to start this module under a supervisor.

Callback implementation for GenServer.init/1.

Pop a request out of requests storage

Get statistics from the requests storage

Store individual request request

Link to this section Functions

Returns a specification to start this module under a supervisor.

See Supervisor.

Callback implementation for GenServer.init/1.

Specs

pop(pid()) :: Crawly.Request.t() | nil

Pop a request out of requests storage

Link to this function

start_link(spider_name, crawl_id)

View Source

Specs

stats(pid()) :: {:stored_requests, non_neg_integer()}

Get statistics from the requests storage

Specs

store(spider_name, requests) :: :ok
when spider_name: atom(), requests: [Crawly.Request.t()]
store(spider_name, request) :: :ok
when spider_name: atom(), request: Crawly.Request.t()

Store individual request request