Crawly.Engine (Crawly v0.13.0) View Source
Crawly Engine - process responsible for starting and stopping spiders.
Stores all currently running spiders.
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor.
Callback implementation for GenServer.init/1
.
Starts a spider. All options passed in the second argument will be passed along to the spider's init/1
callback.
Link to this section Types
Specs
crawl_id_opt() :: {:crawl_id, binary()}
Specs
spider_info() :: %{ name: module(), status: :stopped | :started, pid: identifier() | nil }
Specs
started_spiders() :: %{optional(module()) => identifier()}
Specs
t() :: %Crawly.Engine{ known_spiders: [module()], started_spiders: started_spiders() }
Link to this section Functions
Returns a specification to start this module under a supervisor.
See Supervisor
.
Specs
Specs
Specs
get_spider_info(module()) :: spider_info()
Specs
Callback implementation for GenServer.init/1
.
Specs
list_known_spiders() :: [spider_info()]
Specs
running_spiders() :: started_spiders()
Specs
start_spider(spider_name, opts) :: result when spider_name: module(), opts: [crawl_id_opt()], result: :ok | {:error, :spider_already_started} | {:error, :atom}
Starts a spider. All options passed in the second argument will be passed along to the spider's init/1
callback.
Reserved Options
:crawl_id
(binary). Optional, automatically generated if not set.:closespider_itemcount
(integer | disabled). Optional, overrides the close spider item count on startup.:closespider_timeout
(integer | disabled). Optional, overrides the closespider timeout on startup.
:concurrent_requests_per_domain
(integer). Optional, overrides the number of workers for a given spider
Backward compatibility
If the 2nd positional argument is a binary, it will be set as the :crawl_id
. Deprecated, will be removed in the future.