View Source Crawler.Parser (Crawler v1.5.0)
Parses pages and calls a link handler to handle the detected links.
Summary
Functions
Parses the links and returns the page.
There are two hooks:
link_handler
is useful when a custom parser calls this default parser and utilises a different link handler for processing links.scraper
is useful for scraping content immediately as the parser parses the page, alternatively you can simply access the crawled data asynchronously, refer to the README
Examples
iex> {:ok, page} = Parser.parse(%Page{
iex> body: "Body",
iex> opts: %{scraper: Crawler.Scraper, html_tag: "a", content_type: "text/html"}
iex> })
iex> page.body
"Body"
iex> {:ok, page} = Parser.parse(%Page{
iex> body: "<a href='http://parser/1'>Link</a>",
iex> opts: %{scraper: Crawler.Scraper, html_tag: "a", content_type: "text/html"}
iex> })
iex> page.body
"<a href='http://parser/1'>Link</a>"
iex> {:ok, page} = Parser.parse(%Page{
iex> body: "<a name='hello'>Link</a>",
iex> opts: %{scraper: Crawler.Scraper, html_tag: "a", content_type: "text/html"}
iex> })
iex> page.body
"<a name='hello'>Link</a>"
iex> {:ok, page} = Parser.parse(%Page{
iex> body: "<a href='http://parser/2' target='_blank'>Link</a>",
iex> opts: %{scraper: Crawler.Scraper, html_tag: "a", content_type: "text/html"}
iex> })
iex> page.body
"<a href='http://parser/2' target='_blank'>Link</a>"
iex> {:ok, page} = Parser.parse(%Page{
iex> body: "<a href='parser/2'>Link</a>",
iex> opts: %{scraper: Crawler.Scraper, html_tag: "a", content_type: "text/html", referrer_url: "http://hello"}
iex> })
iex> page.body
"<a href='parser/2'>Link</a>"
iex> {:ok, page} = Parser.parse(%Page{
iex> body: "<a href='../parser/2'>Link</a>",
iex> opts: %{scraper: Crawler.Scraper, html_tag: "a", content_type: "text/html", referrer_url: "http://hello"}
iex> })
iex> page.body
"<a href='../parser/2'>Link</a>"
iex> {:ok, page} = Parser.parse(%Page{
iex> body: image_file(),
iex> opts: %{scraper: Crawler.Scraper, html_tag: "img", content_type: "image/png"}
iex> })
iex> page.body
"#{image_file()}"