SpiderMan behaviour (spider_man v0.2.0)
Documentation for SpiderMan.
Spider Life Cycle
Spider.settings()Spider.prepare_for_start(:pre, state)Spider.prepare_for_start_component(:downloader, state)Spider.prepare_for_start_component(:spider, state)Spider.prepare_for_start_component(:item_processor, state)Spider.prepare_for_start(:post, state)Spider.init(state)Spider.handle_response(response, context)Spider.prepare_for_stop_component(:downloader, state)Spider.prepare_for_stop_component(:spider, state)Spider.prepare_for_stop_component(:item_processor, state)Spider.prepare_for_stop(state)
Link to this section Summary
Functions
continue a spider
fetch spider's statistics of all ets
fetch spider's state
insert a request to spider
insert multiple requests to spider
list spiders where already started
retry failed events for a spider
start a spider
fetch spider's statistics
fetch component's statistics
fetch spider's status
stop a spider
suspend a spider
Link to this section Types
component()
Specs
component() :: :downloader | :spider | :item_processor
ets_stats()
Specs
ets_stats() :: [size: pos_integer(), memory: pos_integer()] | nil
prepare_for_start_stage()
Specs
prepare_for_start_stage() :: :pre | :post
request()
Specs
request() :: SpiderMan.Request.t()
requests()
Specs
requests() :: [request()]
settings()
Specs
settings() :: keyword()
spider()
Specs
status()
Specs
status() :: :running | :suspended
Link to this section Functions
continue(spider, timeout \\ :infinity)
Specs
continue a spider
ets_stats(spider)
Specs
ets_stats(spider()) :: [ common_pipeline_tid: ets_stats(), downloader_tid: ets_stats(), failed_tid: ets_stats(), spider_tid: ets_stats(), item_processor_tid: ets_stats() ]
fetch spider's statistics of all ets
get_state(spider)
Specs
get_state(spider()) :: SpiderMan.Engine.state()
fetch spider's state
insert_request(spider, request)
Specs
insert a request to spider
insert_requests(spider, requests)
Specs
insert multiple requests to spider
list_spiders()
Specs
list_spiders() :: [spider()]
list spiders where already started
retry_failed(spider, max_retries \\ 3, timeout \\ :infinity)
Specs
retry failed events for a spider
run_until(spider, settings \\ [], fun)
Specs
run_until_zero(spider, settings \\ [], check_interval \\ 1500)
Specs
start(spider, settings \\ [])
Specs
start(spider(), settings()) :: Supervisor.on_start_child()
start a spider
Settings
:log2file- The default value istrue.:status- The default value is:running.:spider_module:ets_file:downloader_options:spider_options:item_processor_options
Downloader options
:requester- The default value is{{SpiderMan.Requester.Finch, []}}.:producer- The default value isSpiderMan.Producer.ETS.:context- The default value is%{}.:processor- The default value is[max_demand: 1].:stages:concurrency- The default value is8.:min_demand:max_demand- The default value is10.:partition_by:spawn_opt:hibernate_after
:rate_limiting- The default value is[allowed_messages: 10, interval: 1000].:allowed_messages- Required.:interval- Required.
:pipelines- The default value is[SpiderMan.Pipeline.DuplicateFilter].:post_pipelines- The default value is[].
Spider options
:producer- The default value isSpiderMan.Producer.ETS.:context- The default value is%{}.:processor- The default value is[max_demand: 1].:stages:concurrency- The default value is8.:min_demand:max_demand- The default value is10.:partition_by:spawn_opt:hibernate_after
:rate_limiting:allowed_messages- Required.:interval- Required.
:pipelines- The default value is[].:post_pipelines- The default value is[].
Batchers options
:concurrency- The default value is1.:batch_size- The default value is100.:batch_timeout- The default value is1000.:partition_by:spawn_opt:hibernate_after
ItemProcessor options
:storage- The default value isSpiderMan.Storage.JsonLines.:batchers- The default value is[default: [concurrency: 1, batch_size: 50, batch_timeout: 1000]].:producer- The default value isSpiderMan.Producer.ETS.:context- The default value is%{}.:processor- The default value is[].:stages:concurrency- The default value is8.:min_demand:max_demand- The default value is10.:partition_by:spawn_opt:hibernate_after
:rate_limiting:allowed_messages- Required.:interval- Required.
:pipelines- The default value is[SpiderMan.Pipeline.DuplicateFilter].:post_pipelines- The default value is[].
stats(spider)
Specs
stats(spider()) :: [ status: status(), common_pipeline_tid: ets_stats(), downloader_tid: ets_stats(), failed_tid: ets_stats(), spider_tid: ets_stats(), item_processor_tid: ets_stats() ]
fetch spider's statistics
stats(spider, component)
Specs
fetch component's statistics
status(spider)
Specs
fetch spider's status
stop(spider)
Specs
stop(spider()) :: :ok | {:error, error} when error: :not_found | :running | :restarting
stop a spider
suspend(spider, timeout \\ :infinity)
Specs
suspend a spider
Link to this section Callbacks
handle_response(arg1, context)
Specs
handle_response(SpiderMan.Response.t(), context :: map()) :: %{ optional(:requests) => [SpiderMan.Request.t()], optional(:items) => [SpiderMan.Item.t()] }
Specs
init(state) :: state when state: SpiderMan.Engine.state()
Specs
prepare_for_start(prepare_for_start_stage(), state) :: state when state: SpiderMan.Engine.state()
Specs
Specs
prepare_for_stop(SpiderMan.Engine.state()) :: :ok
Specs
Specs
settings() :: settings()