SitemapXml.SitemapUrlTree (sitemap_xml v0.1.2)

A module to fetch and parse sitemap XML concurrently and return a nested data structure.

Summary

Functions

Fetches the raw sitemap XML from the given URL.

Fetches and parses the sitemap from the provided URL and returns a nested structure.

Parses the sitemap XML to extract URLs with their attributes, or processes nested sitemaps.

Functions

fetch_sitemap(url)

Fetches the raw sitemap XML from the given URL.

Examples

iex> SitemapXml.SitemapUrlTree.fetch_sitemap("https://web.site/sitemap.xml")
{:ok, "<?xml version="1.0" encoding="UTF-8"?><?xml-styleshee..."}

iex> SitemapXml.SitemapUrlTree.fetch_sitemap("https://web.site/404.xml")
{:error, "HTTP error with status 404"}

fetch_url_tree(url)

Fetches and parses the sitemap from the provided URL and returns a nested structure.

Examples

iex> SitemapXml.SitemapUrlTree.fetch_url_tree("https://web.site/sitemap.xml")
{:ok, [%{"sitemap.xml" => [%{url: "https://web.site/page1", lastmod: ..., priority: ...}, ...]}]}

parse_sitemap(url, body)

Parses the sitemap XML to extract URLs with their attributes, or processes nested sitemaps.

Examples

iex> SitemapXml.SitemapUrlTree.parse_sitemap("https://web.site/sitemap.xml", "<urlset>...</urlset>")
{:ok, [%{"sitemap.xml" => [%{url: "https://web.site/page1", lastmod: ..., priority: ...}, ...]}]}

iex> SitemapXml.SitemapUrlTree.parse_sitemap("https://web.site/nested_sitemap.xml", "<sitemapindex>...</sitemapindex>")
...