presentable_soup
Types
A representation of a HTML document or fragment.
pub type ElementTree {
ElementNode(
tag: String,
attributes: List(#(String, String)),
children: List(ElementTree),
)
TextNode(String)
}
Constructors
-
ElementNode( tag: String, attributes: List(#(String, String)), children: List(ElementTree), )A HTML element
-
TextNode(String)Some text
Queries are used to scope scrapers to specific elements.
pub opaque type Query(in, out)
Errors that can occur when scraping HTML.
pub type ScrapeError {
ParsingFailed
ScrapingFailed
}
Constructors
-
ParsingFailedThe HTML document was malformed in a way that make it unparsable.
-
ScrapingFailedThe document did not have the structure the scraper expected, so it was unable to extract the desired data.
Values
pub fn attributes() -> Scraper(List(#(String, String)))
Get the attributes of the element.
pub fn descendant(
query: Query(in, out),
matchers: List(Matcher),
) -> Query(in, out)
Narrow a query to find the first descendant matching the given matchers.
pub fn descendants(
query: Query(List(in), out),
matchers: List(Matcher),
) -> Query(in, out)
Narrow a query to find all descendants matching the given matchers.
pub fn element(matchers: List(Matcher)) -> Query(value, value)
Start a query to find the first element matching the given matchers.
Chain with descendant to narrow down the search, then finish with return
to specify what data to extract.
pub fn element_tree() -> Scraper(ElementTree)
Get the element add its descendants as an ElementTree. This may be useful
for snapshot testing when combined with elements_to_string.
pub fn elements(
matchers: List(Matcher),
) -> Query(value, List(value))
Start a query to find all elements matching the given matchers.
This is not recursive, so if you search for div elements it won’t match
any divs that are children of other matched divs.
pub fn elements_to_string(html: List(ElementTree)) -> String
Convert elements into a pretty-printed HTML string.
Examples
let elements = [
soup.Element("h1", [], soup.Text("Hello, Joe! <3"))
]
assert soup.elements_to_string(elements)
== "<h1>Hello, Joe! <3</h1>"
pub fn map(
scraper: Scraper(a),
transform: fn(a) -> b,
) -> Scraper(b)
Transform the data returned by a scraper by running a function on it after it has been extracted from the HTML.
pub fn merge2(
scraper1: Scraper(t1),
scraper2: Scraper(t2),
transform: fn(t1, t2) -> out,
) -> Scraper(out)
Take two scrapers and combine them into one. The final result from both is combined using a function to make the new final result.
pub fn merge3(
scraper0: Scraper(t0),
scraper1: Scraper(t1),
scraper2: Scraper(t2),
transform: fn(t0, t1, t2) -> out,
) -> Scraper(out)
Take three scrapers and combine them into one. The final result from each is combined using a function to make the new final result.
pub fn merge4(
scraper0: Scraper(t0),
scraper1: Scraper(t1),
scraper2: Scraper(t2),
scraper3: Scraper(t3),
transform: fn(t0, t1, t2, t3) -> out,
) -> Scraper(out)
Take four scrapers and combine them into one. The final result from each is combined using a function to make the new final result.
pub fn return(
query: Query(in, out),
scraper: Scraper(in),
) -> Scraper(out)
Finish a query by specifying what data to extract from matched elements.
pub fn scrape(
scraper: Scraper(out),
html: String,
) -> Result(out, ScrapeError)
Run a scraper, returning the scraped data, or an error if the scraper failed to find its data.
pub fn text_content() -> Scraper(List(String))
Get all the text contained by the element and its descendants.
pub fn try_map(
scraper: Scraper(a),
transform: fn(a) -> Result(b, error),
) -> Scraper(b)
Transform the data returned by a scraper by running a function on it after it has been extracted from the HTML.
If the transformer returns an error then the scraper returns nothing.
pub fn with_aria(name: String, value: String) -> Matcher
Match elements that have the given aria-* attribute.
pub fn with_attribute(name: String, value: String) -> Matcher
Matches elements that have the specified attribute with the given value. If the value is left blank, this matcher will match any element that has the attribute, regardless of its value.
pub fn with_class(name: String) -> Matcher
Matches elements that include the given space-separated class name(s).
If you need to match the class attribute exactly, you can use the attribute
matcher instead.
pub fn with_data(name: String, value: String) -> Matcher
Matches elements that have the given data-* attribute.
pub fn with_id(name: String) -> Matcher
Matches an element based on its id attribute. Well-formed HTML means that
only one element should have a given id.
pub fn with_math_ml_tag(value: String) -> Matcher
Matches MathML elements based on their tag name.
pub fn with_svg_tag(value: String) -> Matcher
Matches SVG elements based on their tag name.
pub fn with_tag(value: String) -> Matcher
Matches elements based on their tag name, like "div", "span", or "a".
pub fn with_test_id(value: String) -> Matcher
It is a common convention to use the data-test-id attribute to mark elements
for easy querying in tests. This function is a shorthand for writing
query.data("test-id", value)