View Source API Reference Crawler v1.5.0

Modules

A high performance web crawler in Elixir.

Dispatches requests to a queue for crawling.

Crawler.Dispatcher.Worker

A worker that performs the crawling.

Crawler.Example.GoogleSearch

This example performs a Google search, then scrapes the results to find Github projects and output their name and description.

Crawler.Example.GoogleSearch.Scraper

We only scrape Github pages, specifically looking for a project's name and description.

Crawler.Example.GoogleSearch.UrlFilter

We start with Google, then only crawls Github.

Crawler.Fetcher

Fetches pages and perform tasks on them.

Crawler.Fetcher.HeaderPreparer

Captures and prepares HTTP response headers.

Crawler.Fetcher.Modifier

Modifies request options and headers before dispatch.

Crawler.Fetcher.Modifier.Spec

Crawler.Fetcher.Policer

Checks a series of conditions to determine whether it is okay to continue.

Crawler.Fetcher.Recorder

Records information about each crawl for internal use.

Crawler.Fetcher.Requester

Makes HTTP requests.

Crawler.Fetcher.Retrier

Handles retries for failed crawls.

Crawler.Fetcher.Retrier.Spec

Spec for defining a fetch retrier.

Crawler.Fetcher.UrlFilter

A placeholder module that lets all URLs pass through.

Crawler.Fetcher.UrlFilter.Spec

Spec for defining an url filter.

Crawler.HTTP

Custom HTTPoison base module for potential customisation.

Crawler.Linker

A set of high level functions for making online and offline URLs and links.

Crawler.Linker.PathBuilder

Builds a path for a link (can be a URL itself or a relative link) based on the input string which is a URL with or without its protocol.

Crawler.Linker.PathExpander

Expands the path by expanding any . and .. characters.

Crawler.Linker.PathFinder

Finds different components of a given URL, e.g. its domain name, directory path, or full path.

Crawler.Linker.PathOffliner

Transforms a link to be storable and linkable offline.

Crawler.Linker.PathPrefixer

Returns prefixes (../s) according to the given URL's structure.

Crawler.Options

Options for the crawler.

Crawler.Parser

Parses pages and calls a link handler to handle the detected links.

Crawler.Parser.CssParser

Parses CSS files.

Crawler.Parser.Guarder

Detects whether a page is parsable.

Crawler.Parser.HtmlParser

Parses HTML files.

Crawler.Parser.LinkParser

Parses links and transforms them if necessary.

Crawler.Parser.LinkParser.LinkExpander

Expands a link into a full URL.

Crawler.Parser.Spec

Spec for defining a parser.

Crawler.QueueHandler

Handles the queueing of crawl requests.

Crawler.Scraper

A placeholder module that demonstrates the scraping interface.

Crawler.Scraper.Spec

Spec for defining a scraper.

Crawler.Snapper

Stores crawled pages offline.

Crawler.Snapper.DirMaker

Makes a new (nested) folder according to the options provided.

Crawler.Snapper.LinkReplacer

Replaces links found in a page so they work offline.

Crawler.Store

An internal data store for information related to each crawl.

Crawler.Store.Counter

Crawler.Store.Page

An internal struct for keeping the url and content of a crawled page.

Crawler.Worker

Handles the crawl tasks.

Next Page → Changelog

Settings View Source API Reference Crawler v1.5.0

Modules

View Source API Reference Crawler v1.5.0