Crawly.Engine (Crawly v0.17.0) View Source

Crawly Engine - process responsible for starting and stopping spiders.

Stores all currently running spiders.

Link to this section Summary

Functions

Returns a specification to start this module under a supervisor.

Callback implementation for GenServer.init/1.

Starts a spider. All options passed in the second argument will be passed along to the spider's init/1 callback.

Link to this section Types

Specs

crawl_id_opt() :: {:crawl_id, binary()} | GenServer.option()

Specs

spider_info() :: %{
  name: Crawly.spider(),
  status: :stopped | :started,
  pid: identifier() | nil
}

Specs

started_spiders() :: %{optional(Crawly.spider()) => identifier()}

Specs

t() :: %Crawly.Engine{
  known_spiders: [Crawly.spider()],
  started_spiders: started_spiders()
}

Link to this section Functions

Returns a specification to start this module under a supervisor.

See Supervisor.

Link to this function

get_crawl_id(spider_name)

View Source

Specs

get_crawl_id(Crawly.spider()) :: {:error, :spider_not_running} | {:ok, binary()}
Link to this function

get_manager(spider_name)

View Source

Specs

get_manager(Crawly.spider()) :: pid() | {:error, :spider_not_found}
Link to this function

get_spider_info(spider_name)

View Source

Specs

get_spider_info(Crawly.spider()) :: spider_info() | nil

Specs

init(any()) :: {:ok, t()}

Callback implementation for GenServer.init/1.

Specs

list_known_spiders() :: [spider_info()]

Specs

running_spiders() :: started_spiders()
Link to this function

start_spider(spider_name, opts \\ [])

View Source

Specs

start_spider(Crawly.spider(), opts) :: result
when opts: [crawl_id_opt()],
     result: :ok | {:error, :spider_already_started} | {:error, :atom}

Starts a spider. All options passed in the second argument will be passed along to the spider's init/1 callback.

Reserved Options

  • :crawl_id (binary). Optional, automatically generated if not set.
  • :closespider_itemcount (integer | disabled). Optional, overrides the close spider item count on startup.
  • :closespider_timeout (integer | disabled). Optional, overrides the close spider timeout on startup.
  • :concurrent_requests_per_domain (integer). Optional, overrides the number of workers for a given spider

Backward compatibility

If the 2nd positional argument is a binary, it will be set as the :crawl_id. Deprecated, will be removed in the future.

Link to this function

stop_spider(spider_name, reason \\ :ignore)

View Source

Specs

stop_spider(Crawly.spider(), reason) :: result
when reason: :itemcount_limit | :itemcount_timeout | atom(),
     result: :ok | {:error, :spider_not_running} | {:error, :spider_not_found}