Crawler v0.2.0 Crawler.Parser

Parses pages and calls a link handler to handle the detected links.

Link to this section Summary

Link to this section Functions

Link to this function mark_processed(arg1)
Link to this function parse(page, link_handler \\ &(Dispatcher.dispatch(&1, &2)))

Examples

iex> Parser.parse(%{page: %Page{body: "Body"}, opts: []})
%Page{body: "Body"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a href='http://parser/1'>Link</a>"
iex> }, opts: []})
%Page{body: "<a href='http://parser/1'>Link</a>"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a name='hello'>Link</a>"
iex> }, opts: []})
%Page{body: "<a name='hello'>Link</a>"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a href='http://parser/2' target='_blank'>Link</a>"
iex> }, opts: []})
%Page{body: "<a href='http://parser/2' target='_blank'>Link</a>"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a href='parser/2'>Link</a>"
iex> }, opts: [referrer_url: "http://hello/"]})
%Page{body: "<a href='parser/2'>Link</a>"}

iex> Parser.parse(%{page: %Page{
iex>   body: "<a href='../parser/2'>Link</a>"
iex> }, opts: [referrer_url: "http://hello/"]})
%Page{body: "<a href='../parser/2'>Link</a>"}
Link to this function parse_links(body, opts, link_handler)