Readability (readability2 v0.7.0)

Readability library for extracting & curating articles.

example
Example

@type html :: binary

# Just pass url
%Readability.Summary{title: title, authors: authors, article_html: article} = Readability.summarize(url)

# Extract title
Readability.title(html)

# Extract authors.
Readability.authors(html)

# Extract only text from article
article = html
          |> Readability.article
          |> Readability.readable_text

# Extract article with transformed html
article = html
          |> Readability.article
          |> Readability.raw_html

Link to this section Summary

Types

headers()

html_tree()

options()

raw_html()

url()

Functions

article(raw_html, opts \\ [])

Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read

authors(html)

Extract authors

default_options()

is_response_markup(headers)

Return true if Content-Type in provided headers list is a markup type, else false

mime(headers \\ [])

Extract MIME Type from headers

parse(raw_html)

raw_html(html_tree)

return raw html binary from html_tree

readable_html(html_tree)

return attributes, tags cleaned html

readable_text(html_tree)

return only text binary from html_tree

regexes(key)

summarize(url, opts \\ [])

summarize the primary readable content of a webpage.

title(raw_html)

Extract title

Link to this section Types

headers()

@type headers() :: list()[tuple()]

html_tree()

@type html_tree() :: tuple() | list()

options()

@type options() :: list()

raw_html()

@type raw_html() :: binary()

url()

@type url() :: binary()

Link to this section Functions

article(raw_html, opts \\ [])

@spec article(binary(), options()) :: html_tree()

Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read

example
Example

iex> article_tree = Redability(html_str)
# returns article that is tuple

authors(html)

@spec authors(binary() | html_tree()) :: list()[binary()]

Extract authors

example
Example

iex> authors = Readability.authors(html_str)
["José Valim", "chrismccord"]

default_options()

is_response_markup(headers)

@spec is_response_markup(headers()) :: boolean()

Return true if Content-Type in provided headers list is a markup type, else false

example
Example

iex> Readability.is_response_markup?([{"Content-Type", "text/html"}])
true

mime(headers \\ [])

@spec mime(headers()) :: String.t()

Extract MIME Type from headers

example
Example

iex> mime = Readability.mime(headers_list)
"text/html"

parse(raw_html)

raw_html(html_tree)

@spec raw_html(html_tree()) :: binary()

return raw html binary from html_tree

readable_html(html_tree)

@spec readable_html(html_tree()) :: binary()

return attributes, tags cleaned html

readable_text(html_tree)

@spec readable_text(html_tree()) :: binary()

return only text binary from html_tree

regexes(key)

summarize(url, opts \\ [])

@spec summarize(url(), options()) :: Readability.Summary.t()

summarize the primary readable content of a webpage.

title(raw_html)

@spec title(binary() | html_tree()) :: binary()

Extract title

example
Example

iex> title = Readability.title(html_str)
"Some title in html"

Settings Readability (readability2 v0.7.0)

example Example

Link to this section Summary

Types

Functions

Link to this section Types

headers()

html_tree()

options()

raw_html()

url()

Link to this section Functions

article(raw_html, opts \\ [])

example Example

authors(html)

example Example

default_options()

is_response_markup(headers)

example Example

mime(headers \\ [])

example Example

parse(raw_html)

raw_html(html_tree)

readable_html(html_tree)

readable_text(html_tree)

regexes(key)

summarize(url, opts \\ [])

title(raw_html)

example Example

Readability (readability2 v0.7.0)

example
Example

example
Example

example
Example

example
Example

example
Example

example
Example