ExWikipedia.PageParser (ex_wikipedia v0.1.0)

Parses Wikipedia's JSON response for a page.

The response returned from the Wikipedia API should be valid JSON but we still need to sanitize it before returning to the user. Any HTML tags will get sanitized during this stage.

Link to this section Summary

Functions

This parses a map and HTML contained in it, sanitizes it, and returns a map suitable to be marshalled into a struct. The response should be JSON decoded prior to this parsing.

Link to this section Functions

Link to this function

parse(json, opts \\ [])

This parses a map and HTML contained in it, sanitizes it, and returns a map suitable to be marshalled into a struct. The response should be JSON decoded prior to this parsing.

Options:

  • :html_parser: Parser used to parse HTML. Default: Elixir.Floki
  • :follow_redirect: indicates whether or not the content from a redirected page constitutes a valid response. Default: true

Examples

iex> ExWikipedia.PageParser.parse(%{
  parse: %{
    categories: [
      %{*: "Webarchive_template_wayback_links", hidden: "", sortkey: ""},
    ],
    headhtml: %{*: "headhtml in here"},
    images: ["Semi-protection-shackle.svg", "End_of_Ezekiel.ogg"],
    links: [
      %{*: "Pulp fiction (disambiguation)", exists: "", ns: 0}
    ],
    pageid: 54173,
    redirects: [],
    revid: 1063115250,
    text: %{
      *: "text in here"
    },
    title: "Pulp Fiction"
  }
})
{:ok,
 %{
   categories: ["Webarchive template wayback links"],
   content: "",
   external_links: nil,
   images: [],
   is_redirect?: false,
   page_id: 54173,
   revision_id: 1063115250,
   summary: "",
   title: "Pulp Fiction",
   url: ""
 }}