readability v0.10.0 Readability

Readability library for extracting & curating articles.

Example

@type html :: binary

# Just pass url
%Readability.Summary{title: title, authors: authors, article_html: article} = Readability.summarize(url)

# Extract title
Readability.title(html)

# Extract authors.
Readability.authors(html)

# Extract only text from article
article = html
          |> Readability.article
          |> Readability.readable_text

# Extract article with transformed html
article = html
          |> Readability.article
          |> Readability.raw_html

Link to this section Summary

Functions

Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read

Extract authors

Return true if Content-Type in provided headers list is a markup type, else false

Extract MIME Type from headers

return raw html binary from html_tree

return attributes, tags cleaned html

return only text binary from html_tree

summarize the primary readable content of a webpage

Extract title

Link to this section Types

Link to this type headers()
headers() :: list[tuple]
Link to this type html_tree()
html_tree() :: tuple | list
Link to this type options()
options() :: list
Link to this type raw_html()
raw_html() :: binary
Link to this type url()
url() :: binary

Link to this section Functions

Link to this function article(raw_html, opts \\ [])
article(binary, options) :: html_tree

Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read

Example

iex> article_tree = Redability(html_str)
# returns article that is tuple
Link to this function authors(html)
authors(binary | html_tree) :: list[binary]

Extract authors

Example

iex> authors = Readability.authors(html_str)
["José Valim", "chrismccord"]
Link to this function default_options()
Link to this function is_response_markup(headers)
is_response_markup(headers) :: boolean

Return true if Content-Type in provided headers list is a markup type, else false

Example

iex> Readability.is_response_markup?([{"Content-Type", "text/html"}])
true
Link to this function mime(headers \\ [])
mime(headers) :: String.t

Extract MIME Type from headers

Example

iex> mime = Readability.mime(headers_list)
"text/html"
Link to this function parse(raw_html)
Link to this function raw_html(html_tree)
raw_html(html_tree) :: binary

return raw html binary from html_tree

Link to this function readable_html(html_tree)
readable_html(html_tree) :: binary

return attributes, tags cleaned html

Link to this function readable_text(html_tree)
readable_text(html_tree) :: binary

return only text binary from html_tree

Link to this function summarize(url, opts \\ [])

summarize the primary readable content of a webpage.

Link to this function title(raw_html)
title(binary | html_tree) :: binary

Extract title

Example

iex> title = Readability.title(html_str)
"Some title in html"