Crawly.Engine (Crawly v0.13.0) View Source

Crawly Engine - process responsible for starting and stopping spiders.

Stores all currently running spiders.

Link to this section Summary

Types

crawl_id_opt()

spider_info()

started_spiders()

t()

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

get_crawl_id(spider_name)

get_manager(spider_name)

get_spider_info(name)

init(args)

Callback implementation for GenServer.init/1.

list_known_spiders()

refresh_spider_list()

running_spiders()

start_link()

start_spider(spider_name, opts \\ [])

Starts a spider. All options passed in the second argument will be passed along to the spider's init/1 callback.

stop_spider(spider_name, reason \\ :ignore)

Link to this section Types

crawl_id_opt()

Specs

crawl_id_opt() :: {:crawl_id, binary()}

spider_info()

Specs

spider_info() :: %{
  name: module(),
  status: :stopped | :started,
  pid: identifier() | nil
}

started_spiders()

Specs

started_spiders() :: %{optional(module()) => identifier()}

t()

Specs

t() :: %Crawly.Engine{
  known_spiders: [module()],
  started_spiders: started_spiders()
}

Link to this section Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

get_crawl_id(spider_name)

Specs

get_crawl_id(atom()) :: {:error, :spider_not_running} | {:ok, binary()}

get_manager(spider_name)

Specs

get_manager(module()) :: pid() | {:error, :spider_not_found}

get_spider_info(name)

Specs

get_spider_info(module()) :: spider_info()

init(args)

Specs

init(any()) :: {:ok, t()}

Callback implementation for GenServer.init/1.

list_known_spiders()

Specs

list_known_spiders() :: [spider_info()]

refresh_spider_list()

running_spiders()

Specs

running_spiders() :: started_spiders()

start_link()

start_spider(spider_name, opts \\ [])

Specs

start_spider(spider_name, opts) :: result
when spider_name: module(),
     opts: [crawl_id_opt()],
     result: :ok | {:error, :spider_already_started} | {:error, :atom}

Starts a spider. All options passed in the second argument will be passed along to the spider's init/1 callback.

Reserved Options

:crawl_id (binary). Optional, automatically generated if not set.
:closespider_itemcount (integer | disabled). Optional, overrides the close spider item count on startup.
:closespider_timeout (integer | disabled). Optional, overrides the close
```
                    spider timeout on startup.
```
:concurrent_requests_per_domain (integer). Optional, overrides the number of workers for a given spider

Backward compatibility

If the 2nd positional argument is a binary, it will be set as the :crawl_id. Deprecated, will be removed in the future.

stop_spider(spider_name, reason \\ :ignore)

Specs

stop_spider(module(), reason) :: result
when reason: :itemcount_limit | :itemcount_timeout | atom(),
     result: :ok | {:error, :spider_not_running} | {:error, :spider_not_found}