Crawly v0.7.0 Crawly.RequestsStorage.Worker View Source
Requests Storage, is a module responsible for storing requests for a given spider.
Automatically filters out already seen requests (uses fingerprints approach
to detect already visited pages).
Pipes all requests through a list of middlewares, which do pre-processing of all requests before storing them
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor.
Invoked when the server is started. start_link/3 or start/3 will
block until it returns.
Pop a request out of requests storage
Get statistics from the requests storage
Store individual request request
Link to this section Functions
child_spec(init_arg) View Source
Returns a specification to start this module under a supervisor.
See Supervisor.
init(spider_name) View Source
Invoked when the server is started. start_link/3 or start/3 will
block until it returns.
init_arg is the argument term (second argument) passed to start_link/3.
Returning {:ok, state} will cause start_link/3 to return
{:ok, pid} and the process to enter its loop.
Returning {:ok, state, timeout} is similar to {:ok, state}
except handle_info(:timeout, state) will be called after timeout
milliseconds if no messages are received within the timeout.
Returning {:ok, state, :hibernate} is similar to {:ok, state}
except the process is hibernated before entering the loop. See
c:handle_call/3 for more information on hibernation.
Returning {:ok, state, {:continue, continue}} is similar to
{:ok, state} except that immediately after entering the loop
the c:handle_continue/2 callback will be invoked with the value
continue as first argument.
Returning :ignore will cause start_link/3 to return :ignore and
the process will exit normally without entering the loop or calling
c:terminate/2. If used when part of a supervision tree the parent
supervisor will not fail to start nor immediately try to restart the
GenServer. The remainder of the supervision tree will be started
and so the GenServer should not be required by other processes.
It can be started later with Supervisor.restart_child/2 as the child
specification is saved in the parent supervisor. The main use cases for
this are:
- The
GenServeris disabled by configuration but might be enabled later. - An error occurred and it will be handled by a different mechanism than the
Supervisor. Likely this approach involves callingSupervisor.restart_child/2after a delay to attempt a restart.
Returning {:stop, reason} will cause start_link/3 to return
{:error, reason} and the process to exit with reason reason without
entering the loop or calling c:terminate/2.
Callback implementation for GenServer.init/1.
pop(pid)
View Source
pop(pid()) :: Crawly.Request.t() | nil
pop(pid()) :: Crawly.Request.t() | nil
Pop a request out of requests storage
start_link(spider_name) View Source
stats(pid)
View Source
stats(pid()) :: {:stored_requests, non_neg_integer()}
stats(pid()) :: {:stored_requests, non_neg_integer()}
Get statistics from the requests storage
store(pid, requests)
View Source
store(spider_name, requests) :: :ok
when spider_name: atom(), requests: [Crawly.Request.t()]
store(spider_name, request) :: :ok
when spider_name: atom(), request: Crawly.Request.t()
store(spider_name, requests) :: :ok when spider_name: atom(), requests: [Crawly.Request.t()]
store(spider_name, request) :: :ok when spider_name: atom(), request: Crawly.Request.t()
Store individual request request