scrape v3.1.0 Scrape
Elixir Toolkit for extracting meaningful structured data out of common web resources.
This process is often called "web-scraping". Actually, the normalization and transformation of data into a well-known structured form is also known as "data engineering", which in turn is the prerequisite for most data-science/machine-learning/... algorithms in the wild.
Currently Scrape supports 3 types of common web data:
- Feeds: RSS or Atom XML feeds
- Domains: "root" pages of a web presence
- Articles: "content" pages of a web presence
Link to this section Summary
Functions
Given a valid url, return structured data of the content.
Same as article/2
but will return the result directly or raise an
error if the result is not :ok
Given a valid url, return structured data of the domain.
Same as domain/2
but will return the result directly or raise an
error if the result is not :ok
.
Given a valid url, return structured data of the feed.
Same as feed/2
but will return the result directly or raise an error
if the result is not :ok
.
Link to this section Functions
article(url, opts \\ [])
Given a valid url, return structured data of the content.
This function is intended for "content" pages.
article!(url, opts \\ [])
Same as article/2
but will return the result directly or raise an
error if the result is not :ok
domain(url, opts \\ [])
Given a valid url, return structured data of the domain.
This function is intended for "root" pages of a web presence. The most important usecase for Scrape is to detect possible feeds for the domain.
domain!(url, opts \\ [])
Same as domain/2
but will return the result directly or raise an
error if the result is not :ok
.
feed(url, opts \\ [])
Given a valid url, return structured data of the feed.
feed!(url, opts \\ [])
Same as feed/2
but will return the result directly or raise an error
if the result is not :ok
.