Dux.Datasets (Dux v0.2.0)

Copy Markdown View Source

Embedded datasets for learning and testing.

All datasets are CC0 (public domain) unless noted.

Available datasets

DatasetRowsDescription
penguins/0344Palmer penguins — species, measurements, island
gapminder/01,704Country-level life expectancy, population, GDP
flights/06,099NYC flights (Jan 1-7 2013) — the fact table
airlines/016Carrier code → name lookup
airports/01,458Airport code → name, lat/lon
planes/03,322Tail number → manufacturer, model, seats

Examples

require Dux

Dux.Datasets.penguins()
|> Dux.filter(species == "Gentoo")
|> Dux.group_by(:island)
|> Dux.summarise(avg_mass: avg(body_mass_g))
|> Dux.to_rows()

# Star schema join
Dux.Datasets.flights()
|> Dux.join(Dux.Datasets.airlines(), on: :carrier)
|> Dux.group_by(:name)
|> Dux.summarise_with(n: "COUNT(*)")
|> Dux.sort_by(desc: :n)
|> Dux.to_rows()

Summary

datasets

Airline carrier codes and names (16 rows). CC0.

US airport codes, names, and coordinates (1,458 rows). CC0.

NYC flights, Jan 1-7 2013 (6,099 rows). CC0.

Gapminder excerpt — country, continent, year, lifeExp, pop, gdpPercap (1,704 rows). CC0.

Zachary's Karate Club graph (34 nodes, 78 undirected edges). CC BY 4.0.

Palmer penguins dataset (344 rows). CC0.

Aircraft tail numbers, manufacturers, models (3,322 rows). CC0.

datasets

airlines()

Airline carrier codes and names (16 rows). CC0.

airports()

US airport codes, names, and coordinates (1,458 rows). CC0.

flights()

NYC flights, Jan 1-7 2013 (6,099 rows). CC0.

gapminder()

Gapminder excerpt — country, continent, year, lifeExp, pop, gdpPercap (1,704 rows). CC0.

karate_club()

Zachary's Karate Club graph (34 nodes, 78 undirected edges). CC BY 4.0.

Returns a Dux.Graph with bidirectional edges (156 directed edges). The classic social network dataset from a 1977 study of friendships in a university karate club.

penguins()

Palmer penguins dataset (344 rows). CC0.

planes()

Aircraft tail numbers, manufacturers, models (3,322 rows). CC0.