scrape v3.1.0 Scrape

Elixir Toolkit for extracting meaningful structured data out of common web resources.

This process is often called "web-scraping". Actually, the normalization and transformation of data into a well-known structured form is also known as "data engineering", which in turn is the prerequisite for most data-science/machine-learning/... algorithms in the wild.

Currently Scrape supports 3 types of common web data:

  • Feeds: RSS or Atom XML feeds
  • Domains: "root" pages of a web presence
  • Articles: "content" pages of a web presence

Link to this section Summary

Functions

Given a valid url, return structured data of the content.

Same as article/2 but will return the result directly or raise an error if the result is not :ok

Given a valid url, return structured data of the domain.

Same as domain/2 but will return the result directly or raise an error if the result is not :ok.

Given a valid url, return structured data of the feed.

Same as feed/2 but will return the result directly or raise an error if the result is not :ok.

Link to this section Functions

Link to this function

article(url, opts \\ [])
article(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

Given a valid url, return structured data of the content.

This function is intended for "content" pages.

Link to this function

article!(url, opts \\ [])

Same as article/2 but will return the result directly or raise an error if the result is not :ok

Link to this function

domain(url, opts \\ [])
domain(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

Given a valid url, return structured data of the domain.

This function is intended for "root" pages of a web presence. The most important usecase for Scrape is to detect possible feeds for the domain.

Link to this function

domain!(url, opts \\ [])

Same as domain/2 but will return the result directly or raise an error if the result is not :ok.

Link to this function

feed(url, opts \\ [])
feed(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

Given a valid url, return structured data of the feed.

Link to this function

feed!(url, opts \\ [])

Same as feed/2 but will return the result directly or raise an error if the result is not :ok.