scrape v3.1.0 Scrape

Elixir Toolkit for extracting meaningful structured data out of common web resources.

This process is often called "web-scraping". Actually, the normalization and transformation of data into a well-known structured form is also known as "data engineering", which in turn is the prerequisite for most data-science/machine-learning/... algorithms in the wild.

Currently Scrape supports 3 types of common web data:

Feeds: RSS or Atom XML feeds
Domains: "root" pages of a web presence
Articles: "content" pages of a web presence

Link to this section Summary

Functions

article(url, opts \\ [])

Given a valid url, return structured data of the content.

article!(url, opts \\ [])

Same as article/2 but will return the result directly or raise an error if the result is not :ok

domain(url, opts \\ [])

Given a valid url, return structured data of the domain.

domain!(url, opts \\ [])

Same as domain/2 but will return the result directly or raise an error if the result is not :ok.

feed(url, opts \\ [])

Given a valid url, return structured data of the feed.

feed!(url, opts \\ [])

Same as feed/2 but will return the result directly or raise an error if the result is not :ok.

Link to this section Functions

article(url, opts \\ [])

article(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

Given a valid url, return structured data of the content.

This function is intended for "content" pages.

article!(url, opts \\ [])

Same as article/2 but will return the result directly or raise an error if the result is not :ok

domain(url, opts \\ [])

domain(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

Given a valid url, return structured data of the domain.

This function is intended for "root" pages of a web presence. The most important usecase for Scrape is to detect possible feeds for the domain.

domain!(url, opts \\ [])

Same as domain/2 but will return the result directly or raise an error if the result is not :ok.

feed(url, opts \\ [])

feed(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

Given a valid url, return structured data of the feed.

feed!(url, opts \\ [])

Same as feed/2 but will return the result directly or raise an error if the result is not :ok.

scrape v3.1.0 Scrape

Link to this section Summary

Functions

Link to this section Functions

article(url, opts \\ [])

article(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

article!(url, opts \\ [])

domain(url, opts \\ [])

domain(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

domain!(url, opts \\ [])

feed(url, opts \\ [])

feed(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

feed!(url, opts \\ [])

v3.1.0 v3.0.3 v3.0.2 v3.0.1 v3.0.0 v2.0.0

scrape v3.1.0 Scrape

Link to this section Summary

Functions

Link to this section Functions

article(url, opts \\ []) article(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

article!(url, opts \\ [])

domain(url, opts \\ []) domain(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

domain!(url, opts \\ [])

feed(url, opts \\ []) feed(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

feed!(url, opts \\ [])

article(url, opts \\ [])

article(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

domain(url, opts \\ [])

domain(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}

feed(url, opts \\ [])

feed(String.t(), [{atom(), any()}]) :: {:ok, map()} | {:error, any()}