Crawly v0.9.0 Crawly.Spider behaviour View Source

A behavior module for implementing a Crawly Spider

A Spider is a module which is responsible for defining:

  1. init/0 function, which must return a keyword list with start_urls list
  2. base_url/0 function responsible for filtering out requests not related to a given website
  3. parse_item/1 function which is responsible for parsing the downloaded request and converting it into items which can be stored and new requests which can be scheduled
  4. custom_settings/0 an optional callback which can be used in order to provide custom spider specific settings. Should define a list with custom settings and their values. These values will take precedence over the global settings defined in the config.

Link to this section Summary

Link to this section Callbacks

Link to this callback

init() View Source
init() :: [{:start_urls, list()}]

Link to this callback

override_settings() View Source
override_settings() :: Crawly.Settings.t()

Link to this callback

parse_item(response) View Source
parse_item(response :: HTTPoison.Response.t()) :: Crawly.ParsedItem.t()