View Source Lastfm Archive Build Status Hex pm Coverage Status

A tool for extracting and archiving Last.fm music listening data - scrobbles.

Note:

Usage

Download and create a file archive of Lastfm scrobble tracks via an Elixir application or interactive Elixir by invoking iex -S mix command line action while in software home directory.

  # archive all data of a default user specified in configuration
  LastfmArchive.sync # subsequent calls download only latest scrobbles

  # archive all data of any Lastfm user
  # the data is stored in directory named after the user
  LastfmArchive.sync("a_lastfm_user")

You can also deploy and use the tool in Livebook, as shown in various Livebook guides.

Scrobbles are downloaded via the Last.fm API and stored in the file archive on demand and on a daily basis. The software has a built-in cache to remember and resume from the previous downloads. It skips already downloaded scrobbles and does not make further requests to the API.

The stored data is in a raw Last.fm recenttracks JSON format, chunked into 200-track (max) gzip compressed pages and stored within directories corresponding to the days when tracks were scrobbled. The file archive in a main directory specified in configuration - see below.

See Creating a file archive guide and sync/2 for various archiving options such overwrite, year, date.

Transform into columnar storage formats

You can transform the file archive into other common storage formats such as CSV and columnar data structure such as Apache Parquet. These formats facilitate data interoperability, as well as OLAP, analytics use cases.

# transform the file archive into columnar Apache Parquet files
LastfmArchive.transform("a_lastfm_user", format: :parquet)

# to columnar Apache Arrow IPC files
LastfmArchive.transform("a_lastfm_user", format: :ipc_stream)

# CSV format also available
LastfmArchive.transform("a_lastfm_user", format: :csv)

Available formats:

See Columnar data transforms guide and transform/2.

Transform into faceted columnar datasets

You can also transform the file archive into faceted (artists, albums, tracks) datasets.

LastfmArchive.transform("a_lastfm_user", format: :ipc_stream, facet: :artists)

See Facets archiving guide and transform/2.

Read archive

The tool provides a read/2 function for retrieving data from the archive. It mainly relies on Elixir Explorer data frame mechanisms to underpin further data i/o, manipulation, analytics and visualisation.

The function returns a lazy Explorer.DataFrame.t/0.

From raw data file archive

Scrobbles stored in the file archive can be read with a day or month option:

# read a single-day scrobbles for the configured default user
LastfmArchive.read(day: ~D[2022-12-31])

# read a single-month scrobbles for a user with an arbitrary day of a month
LastfmArchive.read("a_lastfm_user",  month: ~D[2022-12-01])

From columnar archive for analytics

read/2 can return a single-year or all scrobbles, i.e. the entire dataset from a columnar archive. A columns option is available to retrieve only a column subset.

# load all 2023 data from a Parquet archive
LastfmArchive.read("a_lastfm_user", format: :parquet, year: 2023)

# load all data from an Arrow IPC archive
LastfmArchive.read("a_lastfm_user", format: :ipc_stream)

# load data from specific columns
LastfmArchive.read("a_lastfm_user", format: :parquet, columns: [:id, :artist, :album])

From faceted datasets for analytics

read/2 can also return the faceted datasets, e.g. all artists from a columnar archive.

LastfmArchive.read("a_lastfm_user", format: :ipc_stream, facet: :artists)

Livebook guides

LastfmArchive also provides the following Livebook interactive and step-by-step guides.

Creating a file archive

Run in Livebook

Creating a file archive guide for creating a local file archive consisting data fetched from the Last.fm API. It provides a heatmap and count visualisation for checking ongoing archiving status.

archiving progress visualisation

Columnar data transforms

Run in Livebook

Columnar data transforms guide for transforming the local file archive to columnar data formats (Arrow, Parquet). It demonstrates how read/2 can be used to load single-year single-column data, as well as an entire dataset into data frame for various analytics.

See a sample output of this guide, showing top tracks analytics.

Facets archiving

Run in Livebook

Facets archiving guide shows how the local file archive can generate faceted artists, albums, tracks columnar datasets. It also demos how the datasets may be used. For example finding the new artists discovered on a particular date,

new artists discovered on this day

and visualising all artists, when their were first listened to and overall popularity.

all artists first played and popularity

Other usage

To load all transformed CSV data from the archive into Solr:

  # define a Solr endpoint with %Hui.URL{} struct
  headers = [{"Content-type", "application/json"}]
  url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}

  LastfmArchive.load_archive("a_lastfm_user", url)

The function finds CSV files from the archive and send them to Solr for ingestion one at a time. It uses Hui client to interact with Solr and the Hui.URL.t/0 struct for Solr endpoint specification.

Requirement

This tool requires Elixir and Erlang, see installation details for various operating systems or Livebook.

Installation

lastfm_archive is available in Hex, the package can be installed by adding lastfm_archive to your list of dependencies in mix.exs:

  def deps do
    [
      {:lastfm_archive, "~> 1.2"}
    ]
  end

Documentation can be found at https://hexdocs.pm/lastfm_archive.

Configuration

Add the following entries in your config - config/config.exs. For example, the following specifies an Lastfm user and a main file location for multiple user archives, ./lastfm_data/ relative to the software home directory.

You also need to specify an lastfm_api_key in the config, so that the application can access Lastfm API.

  config :lastfm_archive,
    user: "default_user", # the default user
    data_dir: "./lastfm_data/", # main directory for multiple archives
    lastfm_api_key: "api_key_provided_by_lastfm",
    per_page: 200, # 200 is max no. of tracks per call permitted by Lastfm API 
    interval: 1000 # milliseconds between requests cf. Lastfm's max 5 reqs/s rate limit


  # optional: Solr endpoint for Lastfm data loading
  config :hui, :lastfm_archive,
    url: "http://localhost:8983/solr/lastfm_archive",
    handler: "update",
    headers: [{"Content-type", "application/json"}]

See sync/2 for other configurable archiving options, e.g. interval, per_page.

See Hui for more details on Solr configuration.

An api_key must be configured to enable Lastfm API requests, see https://www.last.fm/api ("Get an API account").