API Reference Crawler v1.1.2

Modules

A high performance web crawler in Elixir.

Dispatches requests to a queue for crawling.

A worker that performs the crawling.

Fetches pages and perform tasks on them.

Captures and prepares HTTP response headers.

Modifies request options and headers before dispatch.

Checks a series of conditions to determine whether it is okay to continue.

Records information about each crawl for internal use.

Makes HTTP requests.

Handles retries for failed crawls.

Spec for defining a fetch retrier.

A placeholder module that lets all URLs pass through.

Spec for defining an url filter.

Custom HTTPoison base module for potential customisation.

A set of high level functions for making online and offline URLs and links.

Builds a path for a link (can be a URL itself or a relative link) based on the input string which is a URL with or without its protocol.

Expands the path by expanding any . and .. characters.

Finds different components of a given URL, e.g. its domain name, directory path, or full path.

Transforms a link to be storable and linkable offline.

Returns prefixes (../s) according to the given URL's structure.

Options for the crawler.

Parses pages and calls a link handler to handle the detected links.

Parses CSS files.

Detects whether a page is parsable.

Parses HTML files.

Parses links and transforms them if necessary.

Expands a link into a full URL.

Spec for defining a parser.

Handles the queueing of crawl requests.

A placeholder module that demonstrates the scraping interface.

Spec for defining a scraper.

Stores crawled pages offline.

Makes a new (nested) folder according to the options provided.

Replaces links found in a page so they work offline.

An internal data store for information related to each crawl.

An internal struct for keeping the url and content of a crawled page.

Handles the crawl tasks.