Crawly v0.9.0 Crawly.RequestsStorage View Source

Request storage, a module responsible for storing urls for crawling

           ┌──────────────────┐
           │                  │             ┌------------------┐
           │ RequestsStorage  <─────────────┤ From crawlers1,2 │
           │                  │             └------------------┘
           └─────────┬────────┘
                     │
                     │
                     │
                     │
        ┌────────────▼─────────────────┐
        │                              │
        │                              │
        │                              │

┌───────────▼──────────┐ ┌───────────▼──────────┐ │RequestsStorageWorker1│ │RequestsStorageWorker2│ │ (Crawler1) │ │ (Crawler2) │ └──────────────────────┘ └──────────────────────┘

All requests are going through one RequestsStorage process, which quickly finds the actual worker, which finally stores the request afterwords.

Link to this section Summary

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

init(args)

Invoked when the server is started. start_link/3 or start/3 will block until it returns.

pop(spider_name)

Pop a request out of requests storage

start_link(list)

start_worker(spider_name)

Starts a worker for a given spider

stats(spider_name)

Get statistics from the requests storage

store(spider_name, requests)

Store request in related child worker

Link to this section Functions

child_spec(init_arg) View Source

Returns a specification to start this module under a supervisor.

See Supervisor.

init(args) View Source

Invoked when the server is started. start_link/3 or start/3 will block until it returns.

init_arg is the argument term (second argument) passed to start_link/3.

Returning {:ok, state} will cause start_link/3 to return {:ok, pid} and the process to enter its loop.

Returning {:ok, state, timeout} is similar to {:ok, state}, except that it also sets a timeout. See the "Timeouts" section in the module documentation for more information.

Returning {:ok, state, :hibernate} is similar to {:ok, state} except the process is hibernated before entering the loop. See c:handle_call/3 for more information on hibernation.

Returning {:ok, state, {:continue, continue}} is similar to {:ok, state} except that immediately after entering the loop the c:handle_continue/2 callback will be invoked with the value continue as first argument.

Returning :ignore will cause start_link/3 to return :ignore and the process will exit normally without entering the loop or calling c:terminate/2. If used when part of a supervision tree the parent supervisor will not fail to start nor immediately try to restart the GenServer. The remainder of the supervision tree will be started and so the GenServer should not be required by other processes. It can be started later with Supervisor.restart_child/2 as the child specification is saved in the parent supervisor. The main use cases for this are:

The GenServer is disabled by configuration but might be enabled later.
An error occurred and it will be handled by a different mechanism than the Supervisor. Likely this approach involves calling Supervisor.restart_child/2 after a delay to attempt a restart.

Returning {:stop, reason} will cause start_link/3 to return {:error, reason} and the process to exit with reason reason without entering the loop or calling c:terminate/2.

Callback implementation for GenServer.init/1.

pop(spider_name)

pop(spider_name) :: result
when spider_name: atom(),
     result: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}

Pop a request out of requests storage

start_link(list) View Source

start_worker(spider_name)

start_worker(spider_name) :: result
when spider_name: atom(), result: {:ok, pid()} | {:error, :already_started}

Starts a worker for a given spider

stats(spider_name)

stats(spider_name) :: result
when spider_name: atom(),
     result:
       {:stored_requests, non_neg_integer()}
       | {:error, :storage_worker_not_running}

Get statistics from the requests storage

store(spider_name, requests)

store(spider_name, requests) :: result
when spider_name: atom(),
     requests: [Crawly.Request.t()],
     result: :ok | {:error, :storage_worker_not_running}

store(spider_name, request) :: :ok
when spider_name: atom(), request: Crawly.Request.t()

Store request in related child worker

v0.9.0

Crawly v0.9.0 Crawly.RequestsStorage View Source

Link to this section Summary

Functions

Link to this section Functions

child_spec(init_arg) View Source

init(args) View Source

pop(spider_name) View Source

pop(spider_name) :: result when spider_name: atom(), result: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}

start_link(list) View Source

start_worker(spider_name) View Source

start_worker(spider_name) :: result when spider_name: atom(), result: {:ok, pid()} | {:error, :already_started}

stats(spider_name) View Source

stats(spider_name) :: result when spider_name: atom(), result: {:stored_requests, non_neg_integer()} | {:error, :storage_worker_not_running}

store(spider_name, requests) View Source

store(spider_name, requests) :: result when spider_name: atom(), requests: [Crawly.Request.t()], result: :ok | {:error, :storage_worker_not_running}

store(spider_name, request) :: :ok when spider_name: atom(), request: Crawly.Request.t()

v0.9.0

Crawly v0.9.0 Crawly.RequestsStorage View Source

Link to this section Summary

Functions

Link to this section Functions

child_spec(init_arg) View Source

init(args) View Source

pop(spider_name) View Source pop(spider_name) :: result when spider_name: atom(), result: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}

start_link(list) View Source

start_worker(spider_name) View Source start_worker(spider_name) :: result when spider_name: atom(), result: {:ok, pid()} | {:error, :already_started}

stats(spider_name) View Source stats(spider_name) :: result when spider_name: atom(), result: {:stored_requests, non_neg_integer()} | {:error, :storage_worker_not_running}

store(spider_name, requests) View Source store(spider_name, requests) :: result when spider_name: atom(), requests: [Crawly.Request.t()], result: :ok | {:error, :storage_worker_not_running} store(spider_name, request) :: :ok when spider_name: atom(), request: Crawly.Request.t()

pop(spider_name) View Source

pop(spider_name) :: result when spider_name: atom(), result: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}

start_worker(spider_name) View Source

start_worker(spider_name) :: result when spider_name: atom(), result: {:ok, pid()} | {:error, :already_started}

stats(spider_name) View Source

stats(spider_name) :: result when spider_name: atom(), result: {:stored_requests, non_neg_integer()} | {:error, :storage_worker_not_running}

store(spider_name, requests) View Source

store(spider_name, requests) :: result when spider_name: atom(), requests: [Crawly.Request.t()], result: :ok | {:error, :storage_worker_not_running}

store(spider_name, request) :: :ok when spider_name: atom(), request: Crawly.Request.t()