Crawly.RequestsStorage (Crawly v0.17.0) View Source

Request storage, a module responsible for storing urls for crawling

           
                                          ------------------
            RequestsStorage  < From crawlers1,2 
                                          ------------------
           
                     
                     
                     
                     
        
                                      
                                      
                                      

┌───────────▼──────────┐ ┌───────────▼──────────┐ │RequestsStorageWorker1│ │RequestsStorageWorker2│ │ (Crawler1) │ │ (Crawler2) │ └──────────────────────┘ └──────────────────────┘

All requests are going through one RequestsStorage process, which quickly finds the actual worker, which finally stores the request afterwords.

Link to this section Summary

Functions

Returns a specification to start this module under a supervisor.

Callback implementation for GenServer.init/1.

Pop a request out of requests storage

Starts a worker for a given spider

Get statistics from the requests storage

Store individual request or multiple requests in related child worker

Link to this section Functions

Returns a specification to start this module under a supervisor.

See Supervisor.

Callback implementation for GenServer.init/1.

Specs

pop(Crawly.spider()) ::
  nil | Crawly.Request.t() | {:error, :storage_worker_not_running}

Pop a request out of requests storage

Specs

requests(atom()) ::
  {:requests, [Crawly.Request.t()]} | {:error, :spider_not_running}
Link to this function

start_worker(spider_name, crawl_id)

View Source

Specs

start_worker(Crawly.spider(), crawl_id :: String.t()) ::
  {:ok, pid()} | {:error, :already_started}

Starts a worker for a given spider

Specs

stats(Crawly.spider()) ::
  {:stored_requests, non_neg_integer()} | {:error, :storage_worker_not_running}

Get statistics from the requests storage

Link to this function

store(spider_name, request)

View Source

Specs

store(Crawly.spider(), Crawly.Request.t() | [Crawly.Request.t()]) ::
  :ok | {:error, :storage_worker_not_running}

Store individual request or multiple requests in related child worker