mediawiki-client notebook

View Source
Mix.install([
  {:mediawiki_client, path: "#{__DIR__}/.."}
])

Site matrix

Wiki domains can be organized into a farm such as the default Wikimedia cluster, and information about all member sites can be retrieved from the SiteMatrix extension API.

This library provides the Wiki.SiteMatrix module to fetch and filter the site list,

site_matrix = Wiki.SiteMatrix.new()

{:ok, sites} = Wiki.SiteMatrix.get_all(site_matrix)

Once you know the dbname or base_url for a site you can easily query it, or filter by project eg. to list all Wiktionaries.

dewiki = Wiki.SiteMatrix.get!(site_matrix, :dewiki)

Working with a wiki

You can pass a site spec to the initializer for other modules, or you can supply the base API URL if this is already known.

Wiki.Action.new(dewiki) == Wiki.Action.new("https://de.wikipedia.org/w/api.php")

Free-form Action API

MediaWiki's main external interface is called the Action API, and this module allows access to any arbitrary command using keyword syntax:

{:ok, %{result: %{"query" => %{"babel" => babel}}}} =
  Wiki.Action.new(:mediawikiwiki)
  |> Wiki.Action.get(
    action: :query,
    meta: :babel,
    babuser: "Adamw"
  )

babel

Streaming continuation

The Wiki.Action transparently handles continuation and can provide output as a stream. This example snippet returns 50 results, making 10 calls with 5 results per call:

Wiki.Action.new(:dewiki)
|> Wiki.Action.stream(
  action: :query,
  list: :recentchanges,
  rclimit: 5
)
|> Stream.take(10)
|> Enum.flat_map(fn response -> response["query"]["recentchanges"] end)
|> Enum.map(fn rc -> rc["timestamp"] <> " " <> rc["title"] end)
|> Enum.to_list()

Site statistics

The siteinfo API returns summary statistics for a wiki.

Wiki.Action.new(:dewiki)
|> Wiki.Action.get!(
  action: :query,
  meta: :siteinfo,
  siprop: :statistics
)
|> Map.get(:result)

Event Streams

You can monitor many different events in real-time using the EventStreams interface. These snippets sample from the most recent edits,

{:ok, pid} = Wiki.EventStreams.start_link(streams: "recentchange")

pid
|> Wiki.EventStreams.stream()
|> Stream.take(6)
|> Enum.each(fn event -> IO.inspect(event) end)

Process.exit(pid, :normal)

and from the feed of ORES results for recent edits,

{:ok, pid} = Wiki.EventStreams.start_link(streams: "revision-create")

pid
|> Wiki.EventStreams.stream()
|> Stream.take(6)
|> Enum.each(fn event -> IO.inspect(event) end)

Process.exit(pid, :normal)

Machine learning scores

To query ORES directly, use the Wiki.ORES module. List the models available for your target wiki: (FIXME: eliminate the dummy request parameter)

Wiki.Ores.new("enwiki")
|> Wiki.Ores.request!(dummy: [])

Retrieve scores for multiple revisions by ID:

Wiki.Ores.new("enwiki")
|> Wiki.Ores.request!(
  models: [:damaging, :wp10],
  revids: [456_789, 123_456]
)

Wikidata

Wikibase provides an Action API you can use as below.

This snippet runs a search:

Wiki.Action.new(:wikidatawiki)
|> Wiki.Action.get!(
  action: :wbsearchentities,
  search: "alphabet",
  language: :en
)
|> Map.fetch(:result)

This returns the detailed, structured data for a single Wikidata item:

Wiki.Action.new(:wikidatawiki)
|> Wiki.Action.get!(
  action: :wbgetentities,
  ids: "Q42"
)
|> Map.get(:result)