View Source API Reference spider_man v0.6.3

Modules

SpiderMan, a fast high-level web crawling & scraping framework for Elixir.

A Common Spider what setting functions as callbacks instead of module defined

Download request.

Analyze web pages.

Handle settings for spider

Item Struct

Setting user-agent for request

msg counter for component

use for debug msg by component

filter msg while duplicate key

Encode item.value to json for ItemProcessor component

Encode item.value to json and save to files for ItemProcessor component

A post_pipeline what is use to download file directly for downloader component

auto save cookies for spider component & auto set cookie for downloader component

use Splash for javascript rendering service

ETS Producer

Request Struct

A Requester use by downloader component

use Finch as Requester

use Hackney as Requester

Response Struct

Save items to *.csv files by Storage

Save items to *.ets file by Storage

Save items to JsonLines(*.jsonl) file by Storage

Just log each item by Logger

Support setting multiple Storage for ItemProcessor component