Readability (readability2 v0.7.0)
Readability library for extracting & curating articles.
example
Example
@type html :: binary
# Just pass url
%Readability.Summary{title: title, authors: authors, article_html: article} = Readability.summarize(url)
# Extract title
Readability.title(html)
# Extract authors.
Readability.authors(html)
# Extract only text from article
article = html
|> Readability.article
|> Readability.readable_text
# Extract article with transformed html
article = html
|> Readability.article
|> Readability.raw_html
Link to this section Summary
Functions
Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read
Extract authors
Return true if Content-Type in provided headers list is a markup type, else false
Extract MIME Type from headers
return raw html binary from html_tree
return attributes, tags cleaned html
return only text binary from html_tree
summarize the primary readable content of a webpage.
Extract title
Link to this section Types
headers()
html_tree()
options()
@type options() :: list()
raw_html()
@type raw_html() :: binary()
url()
@type url() :: binary()
Link to this section Functions
article(raw_html, opts \\ [])
Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read
example
Example
iex> article_tree = Redability(html_str)
# returns article that is tuple
authors(html)
Extract authors
example
Example
iex> authors = Readability.authors(html_str)
["José Valim", "chrismccord"]
default_options()
is_response_markup(headers)
Return true if Content-Type in provided headers list is a markup type, else false
example
Example
iex> Readability.is_response_markup?([{"Content-Type", "text/html"}])
true
mime(headers \\ [])
Extract MIME Type from headers
example
Example
iex> mime = Readability.mime(headers_list)
"text/html"
parse(raw_html)
raw_html(html_tree)
return raw html binary from html_tree
readable_html(html_tree)
return attributes, tags cleaned html
readable_text(html_tree)
return only text binary from html_tree
regexes(key)
summarize(url, opts \\ [])
summarize the primary readable content of a webpage.
title(raw_html)
Extract title
example
Example
iex> title = Readability.title(html_str)
"Some title in html"