Crawly.RequestsStorage (Crawly v0.17.2) View Source
Request storage, a module responsible for storing urls for crawling
┌──────────────────┐
│ │ ┌------------------┐
│ RequestsStorage <─────────────┤ From crawlers1,2 │
│ │ └------------------┘
└─────────┬────────┘
│
│
│
│
┌────────────▼─────────────────┐
│ │
│ │
│ │
┌───────────▼──────────┐ ┌───────────▼──────────┐ │RequestsStorageWorker1│ │RequestsStorageWorker2│ │ (Crawler1) │ │ (Crawler2) │ └──────────────────────┘ └──────────────────────┘
All requests are going through one RequestsStorage process, which quickly finds the actual worker, which finally stores the request afterwords.
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor.
Callback implementation for GenServer.init/1
.
Pop a request out of requests storage
Starts a worker for a given spider
Get statistics from the requests storage
Store individual request or multiple requests in related child worker
Link to this section Functions
Returns a specification to start this module under a supervisor.
See Supervisor
.
Callback implementation for GenServer.init/1
.
Specs
pop(Crawly.spider()) :: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}
Pop a request out of requests storage
Specs
requests(atom()) :: {:requests, [Crawly.Request.t()]} | {:error, :spider_not_running}
Specs
start_worker(Crawly.spider(), crawl_id :: String.t()) :: {:ok, pid()} | {:error, :already_started}
Starts a worker for a given spider
Specs
stats(Crawly.spider()) :: {:stored_requests, non_neg_integer()} | {:error, :storage_worker_not_running}
Get statistics from the requests storage
Specs
store(Crawly.spider(), Crawly.Request.t() | [Crawly.Request.t()]) :: :ok | {:error, :storage_worker_not_running}
Store individual request or multiple requests in related child worker