Crawly.Engine (Crawly v0.17.2) View Source
Crawly Engine - process responsible for starting and stopping spiders.
Stores all currently running spiders.
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor.
Callback implementation for GenServer.handle_continue/2
.
Callback implementation for GenServer.init/1
.
Starts a spider. All options passed in the second argument will be passed along to the spider's init/1
callback.
Link to this section Types
Specs
crawl_id_opt() :: {:crawl_id, binary()} | GenServer.option()
Specs
spider_info() :: %{ name: Crawly.spider(), status: :stopped | :started, pid: identifier() | nil }
Specs
started_spiders() :: %{optional(Crawly.spider()) => identifier()}
Specs
t() :: %Crawly.Engine{ known_spiders: [Crawly.spider()], started_spiders: started_spiders() }
Link to this section Functions
Returns a specification to start this module under a supervisor.
See Supervisor
.
Specs
get_crawl_id(Crawly.spider()) :: {:error, :spider_not_running} | {:ok, binary()}
Specs
get_manager(Crawly.spider()) :: pid() | {:error, :spider_not_found}
Specs
get_spider_info(Crawly.spider()) :: spider_info() | nil
Callback implementation for GenServer.handle_continue/2
.
Specs
Callback implementation for GenServer.init/1
.
Specs
list_known_spiders() :: [spider_info()]
Specs
running_spiders() :: started_spiders()
Specs
start_spider(Crawly.spider(), opts) :: result when opts: [crawl_id_opt()], result: :ok | {:error, :spider_already_started} | {:error, :atom}
Starts a spider. All options passed in the second argument will be passed along to the spider's init/1
callback.
Reserved Options
:crawl_id
(binary). Optional, automatically generated if not set.:closespider_itemcount
(integer | disabled). Optional, overrides the close spider item count on startup.:closespider_timeout
(integer | disabled). Optional, overrides the close spider timeout on startup.:concurrent_requests_per_domain
(integer). Optional, overrides the number of workers for a given spider
Backward compatibility
If the 2nd positional argument is a binary, it will be set as the :crawl_id
. Deprecated, will be removed in the future.
Specs
stop_spider(Crawly.spider(), reason) :: result when reason: :itemcount_limit | :itemcount_timeout | atom(), result: :ok | {:error, :spider_not_running} | {:error, :spider_not_found}