Crawly v0.9.0 Crawly.RequestsStorage View Source
Request storage, a module responsible for storing urls for crawling
┌──────────────────┐
│ │ ┌------------------┐
│ RequestsStorage <─────────────┤ From crawlers1,2 │
│ │ └------------------┘
└─────────┬────────┘
│
│
│
│
┌────────────▼─────────────────┐
│ │
│ │
│ │
┌───────────▼──────────┐ ┌───────────▼──────────┐ │RequestsStorageWorker1│ │RequestsStorageWorker2│ │ (Crawler1) │ │ (Crawler2) │ └──────────────────────┘ └──────────────────────┘
All requests are going through one RequestsStorage process, which quickly finds the actual worker, which finally stores the request afterwords.
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor.
Invoked when the server is started. start_link/3 or start/3 will
block until it returns.
Pop a request out of requests storage
Starts a worker for a given spider
Get statistics from the requests storage
Store request in related child worker
Link to this section Functions
child_spec(init_arg) View Source
Returns a specification to start this module under a supervisor.
See Supervisor.
init(args) View Source
Invoked when the server is started. start_link/3 or start/3 will
block until it returns.
init_arg is the argument term (second argument) passed to start_link/3.
Returning {:ok, state} will cause start_link/3 to return
{:ok, pid} and the process to enter its loop.
Returning {:ok, state, timeout} is similar to {:ok, state},
except that it also sets a timeout. See the "Timeouts" section
in the module documentation for more information.
Returning {:ok, state, :hibernate} is similar to {:ok, state}
except the process is hibernated before entering the loop. See
c:handle_call/3 for more information on hibernation.
Returning {:ok, state, {:continue, continue}} is similar to
{:ok, state} except that immediately after entering the loop
the c:handle_continue/2 callback will be invoked with the value
continue as first argument.
Returning :ignore will cause start_link/3 to return :ignore and
the process will exit normally without entering the loop or calling
c:terminate/2. If used when part of a supervision tree the parent
supervisor will not fail to start nor immediately try to restart the
GenServer. The remainder of the supervision tree will be started
and so the GenServer should not be required by other processes.
It can be started later with Supervisor.restart_child/2 as the child
specification is saved in the parent supervisor. The main use cases for
this are:
- The
GenServeris disabled by configuration but might be enabled later. - An error occurred and it will be handled by a different mechanism than the
Supervisor. Likely this approach involves callingSupervisor.restart_child/2after a delay to attempt a restart.
Returning {:stop, reason} will cause start_link/3 to return
{:error, reason} and the process to exit with reason reason without
entering the loop or calling c:terminate/2.
Callback implementation for GenServer.init/1.
pop(spider_name)
View Source
pop(spider_name) :: result
when spider_name: atom(),
result: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}
pop(spider_name) :: result when spider_name: atom(), result: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}
Pop a request out of requests storage
start_link(list) View Source
start_worker(spider_name) View Source
Starts a worker for a given spider
stats(spider_name)
View Source
stats(spider_name) :: result
when spider_name: atom(),
result:
{:stored_requests, non_neg_integer()}
| {:error, :storage_worker_not_running}
stats(spider_name) :: result when spider_name: atom(), result: {:stored_requests, non_neg_integer()} | {:error, :storage_worker_not_running}
Get statistics from the requests storage
store(spider_name, requests)
View Source
store(spider_name, requests) :: result
when spider_name: atom(),
requests: [Crawly.Request.t()],
result: :ok | {:error, :storage_worker_not_running}
store(spider_name, request) :: :ok
when spider_name: atom(), request: Crawly.Request.t()
store(spider_name, requests) :: result when spider_name: atom(), requests: [Crawly.Request.t()], result: :ok | {:error, :storage_worker_not_running}
store(spider_name, request) :: :ok when spider_name: atom(), request: Crawly.Request.t()
Store request in related child worker