View Source Tucan.Datasets (tucan v0.3.1)

Common datasets for Tucan demos and docs.

Supported datasets

Currently the following datasets are supported:

barley

Yield data from a Minnesota barley trial. Includes total yield in bushels per acre for 10 varieties at 6 sites in each of two years. [Data].

Columns: yield, variety, year, site

cars

This was the 1983 ASA Data Exposition dataset. The dataset was collected by Ernesto Ramos and David Donoho and dealt with automobiles. I don't remember the instructions for analysis. Data on mpg, cylinders, displacement, etc. (8 variables) for 406 different cars. [Source] [Data].

Columns: Name, Miles_per_Gallon, Cylinders, Displacement, Horsepower, Weight_in_lbs, Acceleration, Year, Origin

corruption

Corruption Perceptions Index (CPI) and Human Development Index (HDI) for 176 countries, from 2012 to 2015. [Data].

Columns: country, region, year, cpi, iso3c, hdi

flights

Monthly airline passengers from 1949 to 1960. [Data].

Columns: year, month, passengers

gapminder

Gapminder health & income by country data. [Data].

Columns: country, income, health, population, region

glue

Data from the GLUE (The General Language Understanding Evaluation) benchmark leaderboard. [Data].

Columns: Model, Year, Encoder, Task, Score

iris

This is one of the earliest datasets used in the literature on classification methods and widely used in statistics and machine learning. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. [Data].

Columns: sepal_length, sepal_width, petal_length, petal_width, species

movies

Movies dataset including IMDB scores. The dataset has well known and intentionally included errors. This dataset is used for instructional purposes, including the need to reckon with dirty data. [Data].

Columns: Creative Type, Director, Distributor, IMDB Rating, IMDB Votes, MPAA Rating, Major Genre, Production Budget, Release Date, Rotten Tomatoes Rating, Running Time min, Source, Title, US DVD Sales, US Gross, Worldwide Gross

penguins

The dataset contains data for 344 penguins. There are 3 different species of penguins in this dataset, collected from 3 islands in the Palmer Archipelago, Antarctica. This is an excellent dataset for data exploration & visualization, as an alternative to :iris.

Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. [Data].

Columns: Beak Depth (mm), Beak Length (mm), Body Mass (g), Flipper Length (mm), Island, Sex, Species

stocks

Daily closing prices of various stocks. [Data].

Columns: symbol, date, price

tips

Tipping data as collected by one waiter. Information about each tip he received over a period of a few months working in one restaurant is included. [Data].

Columns: total_bill, tip, sex, smoker, day, time, size

titanic

Titanic survival data from the legendary Kaggle competition. [Data].

Columns: PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked

unemployment

US unemployment data across various industries from 2000 to 2009. [Data].

Columns: count, date, month, rate, series, year

weather

Daily weather records from Seattle with metric units. Data from NOAA. [Data].

Columns: date, precipitation, temp_max, temp_min, wind, weather

Summary

Functions

Reruns the url of the given dataset.

Types

@type t() ::
  :barley
  | :cars
  | :corruption
  | :flights
  | :gapminder
  | :glue
  | :iris
  | :movies
  | :penguins
  | :stocks
  | :tips
  | :titanic
  | :unemployment
  | :weather

Functions

@spec dataset(atom()) :: String.t()

Reruns the url of the given dataset.

Raises an error if the dataset is invalid.