LastfmArchive (lastfm_archive v0.8.0) View Source

lastfm_archive is a tool for creating local Last.fm scrobble file archive, Solr archive and analytics.

The software is currently experimental and in preliminary development. It should eventually provide capability to perform ETL and analytic tasks on Lastfm scrobble data.

Current usage:

Link to this section Summary

Functions

Download all scrobbled tracks and create an archive on local filesystem for the default user.

Download all scrobbled tracks and create an archive on local filesystem for a Lastfm user.

Download scrobbled tracks within a date range and create an archive on local filesystem for a Lastfm user.

Load all TSV data from the archive into Solr for a Lastfm user.

Sync scrobbled tracks for the default user.

Sync scrobbled tracks for a Lastfm user.

Transform downloaded raw JSON data and create a TSV file archive for a Lastfm user.

Link to this section Types

Specs

date_range() ::
  :all | :today | :yesterday | integer() | Date.t() | Date.Range.t()

Specs

solr_url() :: atom() | Hui.URL.t()

Link to this section Functions

Specs

archive() :: :ok | {:error, :file.posix()}

Download all scrobbled tracks and create an archive on local filesystem for the default user.

Example

  LastfmArchive.archive

The archive belongs to a default user specified in configuration, for example user_a (in config/config.exs):

  config :lastfm_archive,
    user: "user_a",
    ... # other archiving options

See archive/2 for further details on archive format, file location and archiving options

Specs

archive(binary(), keyword()) :: :ok | {:error, :file.posix()}

Download all scrobbled tracks and create an archive on local filesystem for a Lastfm user.

Example

  LastfmArchive.archive("a_lastfm_user")

  # with archiving option
  LastfmArchive.archive("a_lastfm_user", interval: 300) # 300ms interval between Lastfm API requests
  LastfmArchive.archive("a_lastfm_user", overwrite: true) # re-fetch / overwrite downloaded data

Older scrobbles are archived on a yearly basis, whereas the latest (current year) scrobbles are extracted on a daily basis to ensure data immutability and updatability.

The data is currently in raw Lastfm recenttracks JSON format, chunked into 200-track (max) gzip compressed pages and stored within directories corresponding to the years and days when tracks were scrobbled.

Options:

  • :interval - default 500(ms), the duration between successive Lastfm API requests. This provides a control for request rate. The default interval ensures a safe rate that is within Lastfm's term of service: no more than 5 requests per second

  • :overwrite - default false, if sets to true the system will (re)fetch and overwrite any previously downloaded data. Use this option to refresh the file archive. Otherwise (false), the system will not be making calls to Lastfm to check and re-fetch data if existing data chunks / pages are found. This speeds up archive updating

  • :per_page - default 200, number of scrobbles per page in archive. The default is the max number of tracks per request permissible by Lastfm

  • :daily - default false, an option for archiving at daily granularity, entailing smaller and immutable archive files suitable for latest scrobbles data update

The data is written to a main directory, e.g. ./lastfm_data/a_lastfm_user/ as configured in config/config.exs:

  config :lastfm_archive,
    ...
    data_dir: "./lastfm_data/"

See archive/3 for archiving data within a date range.

Reruns and refresh archive

Lastfm API calls could timed out occasionally. When this happen the function will continue archiving and move on to the next data chunk (page). It will log the missing page event(s) in an error directory.

Rerun the function to download any missing data chunks. The function skips all existing archived pages by default so that it will not make repeated calls to Lastfm. Use the overwrite: true option to re-fetch existing data.

To create a fresh or refresh part of the archive: delete all or some files in the archive and re-run the function, or use the overwrite: true option.

Link to this function

archive(user, date_range \\ :all, options \\ [])

View Source

Specs

archive(binary(), date_range(), keyword()) :: :ok | {:error, :file.posix()}

Download scrobbled tracks within a date range and create an archive on local filesystem for a Lastfm user.

Example

  LastfmArchive.archive("a_lastfm_user", :past_month)

  # data from year 2016
  LastfmArchive.archive("a_lastfm_user", 2016)

  # with Date struct
  LastfmArchive.archive("a_lastfm_user", ~D[2018-10-31])

  # with Date.Range struct
  d1 = ~D[2018-01-01]
  d2 = d1 |> Date.add(7)
  LastfmArchive.archive("a_lastfm_user", Date.range(d1, d2), daily: true, overwrite: true)

Supported date range:

  • :all: archive all scrobble data between Lastfm registration date and now
  • :today, :yesterday, :past_week, past_month - other convenience date ranges
  • yyyy (integer): data for a single year
  • Date: data for a specific date - single day
  • Date.Range: data for a specific date range

See archive/2 for more details on archiving options.

Specs

load_archive(binary(), solr_url()) :: :ok | {:error, Hui.Error.t()}

Load all TSV data from the archive into Solr for a Lastfm user.

The function finds TSV files from the archive and sends them to Solr for ingestion one at a time. It uses Hui client to interact with Solr and the Hui.URL.t/0 struct for Solr endpoint specification.

Example

  # define a Solr endpoint with %Hui.URL{} struct
  headers = [{"Content-type", "application/json"}]
  url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}

  LastfmArchive.load_archive("a_lastfm_user", url)

TSV files must be pre-created before the loading - see transform_archive/2.

Specs

sync() :: :ok | {:error, :file.posix()}

Sync scrobbled tracks for the default user.

Example

  LastfmArchive.sync

The first sync downloads all scrobbles and creates an archive on local filesystem. Subsequent sync calls download the latest scrobbles starting from the previous date of sync.

See archive/0 for further details on how to configured a default user.

Specs

sync(binary()) :: :ok | {:error, :file.posix()}

Sync scrobbled tracks for a Lastfm user.

Example

  LastfmArchive.sync("a_lastfm_user")

The first sync downloads all scrobbles and creates an archive on local filesystem. Subsequent sync calls download only the latest scrobbles starting from the previous date of sync. The date of sync is logged in a .lastfm_archive file in the user archive data directory.

Link to this function

transform_archive(user, mode \\ :tsv)

View Source

Specs

transform_archive(binary(), :tsv) :: :ok

Transform downloaded raw JSON data and create a TSV file archive for a Lastfm user.

Example

  LastfmArchive.transform_archive("a_lastfm_user")

The function only transforms downloaded archive data on local filesystem. It does not fetch data from Lastfm, which can be done via archive/2, archive/3.

The TSV files are created on a yearly basis and stored in gzip compressed format. They are stored in a tsv directory within either the default ./lastfm_data/ or the directory specified in config/config.exs (:lastfm_archive, :data_dir).