Crawly.RequestsStorage (Crawly v0.13.0) View Source

Request storage, a module responsible for storing urls for crawling

           ┌──────────────────┐
           │                  │             ┌------------------┐
           │ RequestsStorage  <─────────────┤ From crawlers1,2 │
           │                  │             └------------------┘
           └─────────┬────────┘
                     │
                     │
                     │
                     │
        ┌────────────▼─────────────────┐
        │                              │
        │                              │
        │                              │

┌───────────▼──────────┐ ┌───────────▼──────────┐ │RequestsStorageWorker1│ │RequestsStorageWorker2│ │ (Crawler1) │ │ (Crawler2) │ └──────────────────────┘ └──────────────────────┘

All requests are going through one RequestsStorage process, which quickly finds the actual worker, which finally stores the request afterwords.

Link to this section Summary

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

init(args)

Callback implementation for GenServer.init/1.

pop(spider_name)

Pop a request out of requests storage

start_link(list)

start_worker(spider_name, crawl_id)

Starts a worker for a given spider

stats(spider_name)

Get statistics from the requests storage

store(spider_name, requests)

Store request in related child worker

Link to this section Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

init(args)

Callback implementation for GenServer.init/1.

pop(spider_name)

Specs

pop(spider_name) :: result
when spider_name: atom(),
     result: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}

Pop a request out of requests storage

start_link(list)

start_worker(spider_name, crawl_id)

Specs

start_worker(spider_name, crawl_id) :: result
when spider_name: atom(),
     crawl_id: String.t(),
     result: {:ok, pid()} | {:error, :already_started}

Starts a worker for a given spider

stats(spider_name)

Specs

stats(spider_name) :: result
when spider_name: atom(),
     result:
       {:stored_requests, non_neg_integer()}
       | {:error, :storage_worker_not_running}

Get statistics from the requests storage

store(spider_name, requests)

Specs

store(spider_name, requests) :: result
when spider_name: atom(),
     requests: [Crawly.Request.t()],
     result: :ok | {:error, :storage_worker_not_running}

store(spider_name, request) :: :ok
when spider_name: atom(), request: Crawly.Request.t()

Store request in related child worker