View Source Readability (readability v0.12.1)
Readability library for extracting & curating articles.
Example
@type html :: binary
# Just pass url
%Readability.Summary{title: title, authors: authors, article_html: article} = Readability.summarize(url)
# Extract title
Readability.title(html)
# Extract authors.
Readability.authors(html)
# Extract only text from article
article = html
|> Readability.article
|> Readability.readable_text
# Extract article with transformed html
article = html
|> Readability.article
|> Readability.raw_html
Summary
Functions
Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read.
Extract authors.
Returns true if Content-Type in provided headers list is a markup type, else false.
Extract MIME Type from headers.
Returns raw HTML binary from html_tree
.
Returns attributes, tags cleaned HTML.
Returns only text binary from html_tree
.
Summarize the primary readable content of a webpage.
Extract title
Types
Functions
Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read.
Example
iex> article_tree = Redability(html_str)
# returns article that is tuple
Extract authors.
Example
iex> authors = Readability.authors(html_str)
["José Valim", "chrismccord"]
Returns true if Content-Type in provided headers list is a markup type, else false.
Example
iex> Readability.is_response_markup?([{"Content-Type", "text/html"}])
true
Extract MIME Type from headers.
Example
iex> mime = Readability.mime(headers_list)
"text/html"
Returns raw HTML binary from html_tree
.
Returns attributes, tags cleaned HTML.
Returns only text binary from html_tree
.
Summarize the primary readable content of a webpage.
Extract title
Example
iex> title = Readability.title(html_str)
"Some title in html"