presentable_soup

Types

A representation of a HTML document or fragment.

pub type ElementTree {
  ElementNode(
    tag: String,
    attributes: List(#(String, String)),
    children: List(ElementTree),
  )
  TextNode(String)
}

Constructors

  • ElementNode(
      tag: String,
      attributes: List(#(String, String)),
      children: List(ElementTree),
    )

    A HTML element

  • TextNode(String)

    Some text

pub opaque type Matcher
pub type Namespace {
  Html
  Svg
  MathMl
}

Constructors

  • Html
  • Svg
  • MathMl

Queries are used to scope scrapers to specific elements.

pub opaque type Query(in, out)

Errors that can occur when scraping HTML.

pub type ScrapeError {
  ParsingFailed
  ScrapingFailed
}

Constructors

  • ParsingFailed

    The HTML document was malformed in a way that make it unparsable.

  • ScrapingFailed

    The document did not have the structure the scraper expected, so it was unable to extract the desired data.

A scraper matches elements and extracts data from them.

pub opaque type Scraper(value)

Values

pub fn attributes() -> Scraper(List(#(String, String)))

Get the attributes of the element.

pub fn descendant(
  query: Query(in, out),
  matchers: List(Matcher),
) -> Query(in, out)

Narrow a query to find the first descendant matching the given matchers.

pub fn descendants(
  query: Query(List(in), out),
  matchers: List(Matcher),
) -> Query(in, out)

Narrow a query to find all descendants matching the given matchers.

pub fn element(matchers: List(Matcher)) -> Query(value, value)

Start a query to find the first element matching the given matchers.

Chain with descendant to narrow down the search, then finish with return to specify what data to extract.

pub fn element_tree() -> Scraper(ElementTree)

Get the element add its descendants as an ElementTree. This may be useful for snapshot testing when combined with elements_to_string.

pub fn elements(
  matchers: List(Matcher),
) -> Query(value, List(value))

Start a query to find all elements matching the given matchers.

This is not recursive, so if you search for div elements it won’t match any divs that are children of other matched divs.

pub fn elements_to_string(html: List(ElementTree)) -> String

Convert elements into a pretty-printed HTML string.

Examples

let elements = [
  soup.Element("h1", [], soup.Text("Hello, Joe! <3"))
]
assert soup.elements_to_string(elements)
  == "<h1>Hello, Joe! &lt;3</h1>"
pub fn map(
  scraper: Scraper(a),
  transform: fn(a) -> b,
) -> Scraper(b)

Transform the data returned by a scraper by running a function on it after it has been extracted from the HTML.

pub fn merge2(
  scraper1: Scraper(t1),
  scraper2: Scraper(t2),
  transform: fn(t1, t2) -> out,
) -> Scraper(out)

Take two scrapers and combine them into one. The final result from both is combined using a function to make the new final result.

pub fn merge3(
  scraper0: Scraper(t0),
  scraper1: Scraper(t1),
  scraper2: Scraper(t2),
  transform: fn(t0, t1, t2) -> out,
) -> Scraper(out)

Take three scrapers and combine them into one. The final result from each is combined using a function to make the new final result.

pub fn merge4(
  scraper0: Scraper(t0),
  scraper1: Scraper(t1),
  scraper2: Scraper(t2),
  scraper3: Scraper(t3),
  transform: fn(t0, t1, t2, t3) -> out,
) -> Scraper(out)

Take four scrapers and combine them into one. The final result from each is combined using a function to make the new final result.

pub fn namespace() -> Scraper(Namespace)

Get the namespace of the element.

pub fn return(
  query: Query(in, out),
  scraper: Scraper(in),
) -> Scraper(out)

Finish a query by specifying what data to extract from matched elements.

pub fn scrape(
  scraper: Scraper(out),
  html: String,
) -> Result(out, ScrapeError)

Run a scraper, returning the scraped data, or an error if the scraper failed to find its data.

pub fn tag() -> Scraper(String)

Get the tag of the element.

pub fn text_content() -> Scraper(List(String))

Get all the text contained by the element and its descendants.

pub fn try_map(
  scraper: Scraper(a),
  transform: fn(a) -> Result(b, error),
) -> Scraper(b)

Transform the data returned by a scraper by running a function on it after it has been extracted from the HTML.

If the transformer returns an error then the scraper returns nothing.

pub fn with_aria(name: String, value: String) -> Matcher

Match elements that have the given aria-* attribute.

pub fn with_attribute(name: String, value: String) -> Matcher

Matches elements that have the specified attribute with the given value. If the value is left blank, this matcher will match any element that has the attribute, regardless of its value.

pub fn with_class(name: String) -> Matcher

Matches elements that include the given space-separated class name(s).

If you need to match the class attribute exactly, you can use the attribute matcher instead.

pub fn with_data(name: String, value: String) -> Matcher

Matches elements that have the given data-* attribute.

pub fn with_id(name: String) -> Matcher

Matches an element based on its id attribute. Well-formed HTML means that only one element should have a given id.

pub fn with_math_ml_tag(value: String) -> Matcher

Matches MathML elements based on their tag name.

pub fn with_svg_tag(value: String) -> Matcher

Matches SVG elements based on their tag name.

pub fn with_tag(value: String) -> Matcher

Matches elements based on their tag name, like "div", "span", or "a".

pub fn with_test_id(value: String) -> Matcher

It is a common convention to use the data-test-id attribute to mark elements for easy querying in tests. This function is a shorthand for writing query.data("test-id", value)

Search Document