LastfmArchive (lastfm_archive v0.8.0) View Source
lastfm_archive is a tool for creating local Last.fm scrobble file archive, Solr archive and analytics.
The software is currently experimental and in preliminary development. It should eventually provide capability to perform ETL and analytic tasks on Lastfm scrobble data.
Current usage:
archive/0,archive/2: download all raw Lastfm scrobble data to local filesystemarchive/3: download a data subset within a date rangesync/0,sync/1: sync Lastfm scrobble data to local filesystemtransform_archive/2: transform downloaded raw data and create a TSV file archiveload_archive/2: load all (TSV) data from the archive into Solr
Link to this section Summary
Functions
Download all scrobbled tracks and create an archive on local filesystem for the default user.
Download all scrobbled tracks and create an archive on local filesystem for a Lastfm user.
Download scrobbled tracks within a date range and create an archive on local filesystem for a Lastfm user.
Load all TSV data from the archive into Solr for a Lastfm user.
Sync scrobbled tracks for the default user.
Sync scrobbled tracks for a Lastfm user.
Transform downloaded raw JSON data and create a TSV file archive for a Lastfm user.
Link to this section Types
Specs
date_range() :: :all | :today | :yesterday | integer() | Date.t() | Date.Range.t()
Specs
Link to this section Functions
Specs
archive() :: :ok | {:error, :file.posix()}
Download all scrobbled tracks and create an archive on local filesystem for the default user.
Example
LastfmArchive.archiveThe archive belongs to a default user specified in configuration, for example user_a (in
config/config.exs):
config :lastfm_archive,
user: "user_a",
... # other archiving optionsSee archive/2 for further details on archive format, file location and archiving options
Specs
archive(binary(), keyword()) :: :ok | {:error, :file.posix()}
Download all scrobbled tracks and create an archive on local filesystem for a Lastfm user.
Example
LastfmArchive.archive("a_lastfm_user")
# with archiving option
LastfmArchive.archive("a_lastfm_user", interval: 300) # 300ms interval between Lastfm API requests
LastfmArchive.archive("a_lastfm_user", overwrite: true) # re-fetch / overwrite downloaded dataOlder scrobbles are archived on a yearly basis, whereas the latest (current year) scrobbles are extracted on a daily basis to ensure data immutability and updatability.
The data is currently in raw Lastfm recenttracks JSON format, chunked into
200-track (max) gzip compressed pages and stored within directories corresponding
to the years and days when tracks were scrobbled.
Options:
:interval- default500(ms), the duration between successive Lastfm API requests. This provides a control for request rate. The default interval ensures a safe rate that is within Lastfm's term of service: no more than 5 requests per second:overwrite- defaultfalse, if sets to true the system will (re)fetch and overwrite any previously downloaded data. Use this option to refresh the file archive. Otherwise (false), the system will not be making calls to Lastfm to check and re-fetch data if existing data chunks / pages are found. This speeds up archive updating:per_page- default200, number of scrobbles per page in archive. The default is the max number of tracks per request permissible by Lastfm:daily- defaultfalse, an option for archiving at daily granularity, entailing smaller and immutable archive files suitable for latest scrobbles data update
The data is written to a main directory,
e.g. ./lastfm_data/a_lastfm_user/ as configured in
config/config.exs:
config :lastfm_archive,
...
data_dir: "./lastfm_data/"See archive/3 for archiving data within a date range.
Reruns and refresh archive
Lastfm API calls could timed out occasionally. When this happen
the function will continue archiving and move on to the next data chunk (page).
It will log the missing page event(s) in an error directory.
Rerun the function
to download any missing data chunks. The function skips all existing
archived pages by default so that it will not make repeated calls to Lastfm.
Use the overwrite: true option to re-fetch existing data.
To create a fresh or refresh part of the archive: delete all or some
files in the archive and re-run the function, or use the overwrite: true
option.
Specs
archive(binary(), date_range(), keyword()) :: :ok | {:error, :file.posix()}
Download scrobbled tracks within a date range and create an archive on local filesystem for a Lastfm user.
Example
LastfmArchive.archive("a_lastfm_user", :past_month)
# data from year 2016
LastfmArchive.archive("a_lastfm_user", 2016)
# with Date struct
LastfmArchive.archive("a_lastfm_user", ~D[2018-10-31])
# with Date.Range struct
d1 = ~D[2018-01-01]
d2 = d1 |> Date.add(7)
LastfmArchive.archive("a_lastfm_user", Date.range(d1, d2), daily: true, overwrite: true)Supported date range:
:all: archive all scrobble data between Lastfm registration date and now:today,:yesterday,:past_week,past_month- other convenience date rangesyyyy(integer): data for a single yearDate: data for a specific date - single dayDate.Range: data for a specific date range
See archive/2 for more details on archiving options.
Specs
load_archive(binary(), solr_url()) :: :ok | {:error, Hui.Error.t()}
Load all TSV data from the archive into Solr for a Lastfm user.
The function finds TSV files from the archive and sends them to
Solr for ingestion one at a time. It uses Hui client to interact
with Solr and the Hui.URL.t/0 struct
for Solr endpoint specification.
Example
# define a Solr endpoint with %Hui.URL{} struct
headers = [{"Content-type", "application/json"}]
url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}
LastfmArchive.load_archive("a_lastfm_user", url)TSV files must be pre-created before the loading - see
transform_archive/2.
Specs
sync() :: :ok | {:error, :file.posix()}
Sync scrobbled tracks for the default user.
Example
LastfmArchive.syncThe first sync downloads all scrobbles and creates an archive on local filesystem. Subsequent sync calls download the latest scrobbles starting from the previous date of sync.
See archive/0 for further details on how to configured a default user.
Specs
sync(binary()) :: :ok | {:error, :file.posix()}
Sync scrobbled tracks for a Lastfm user.
Example
LastfmArchive.sync("a_lastfm_user")The first sync downloads all scrobbles and creates an archive on local filesystem. Subsequent sync calls
download only the latest scrobbles starting from the previous date of sync. The date of sync is logged in
a .lastfm_archive file in the user archive data directory.
Specs
transform_archive(binary(), :tsv) :: :ok
Transform downloaded raw JSON data and create a TSV file archive for a Lastfm user.
Example
LastfmArchive.transform_archive("a_lastfm_user")The function only transforms downloaded archive data on local filesystem. It does not fetch data from Lastfm,
which can be done via archive/2, archive/3.
The TSV files are created on a yearly basis and stored in gzip compressed format.
They are stored in a tsv directory within either the default ./lastfm_data/
or the directory specified in config/config.exs (:lastfm_archive, :data_dir).