View Source Tucan.Datasets (tucan v0.4.1)
Common datasets for Tucan
demos and docs.
Supported datasets
Currently the following datasets are supported:
barley
Yield data from a Minnesota barley trial. Includes total yield in bushels per acre for 10 varieties at 6 sites in each of two years. [Data].
Columns: yield
, variety
, year
, site
cars
This was the 1983 ASA Data Exposition dataset. The dataset was collected by Ernesto Ramos and David Donoho and dealt with automobiles. I don't remember the instructions for analysis. Data on mpg, cylinders, displacement, etc. (8 variables) for 406 different cars. [Source] [Data].
Columns: Name
, Miles_per_Gallon
, Cylinders
, Displacement
, Horsepower
, Weight_in_lbs
, Acceleration
, Year
, Origin
corruption
Corruption Perceptions Index (CPI) and Human Development Index (HDI) for 176 countries, from 2012 to 2015. [Data].
Columns: country
, region
, year
, cpi
, iso3c
, hdi
flights
Monthly airline passengers from 1949 to 1960. [Data].
Columns: year
, month
, passengers
gapminder
Gapminder health & income by country data. [Data].
Columns: country
, income
, health
, population
, region
glue
Data from the GLUE (The General Language Understanding Evaluation) benchmark leaderboard. [Data].
Columns: Model
, Year
, Encoder
, Task
, Score
iris
This is one of the earliest datasets used in the literature on classification methods and widely used in statistics and machine learning. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. [Data].
Columns: sepal_length
, sepal_width
, petal_length
, petal_width
, species
movies
Movies dataset including IMDB scores. The dataset has well known and intentionally included errors. This dataset is used for instructional purposes, including the need to reckon with dirty data. [Data].
Columns: Creative Type
, Director
, Distributor
, IMDB Rating
, IMDB Votes
, MPAA Rating
, Major Genre
, Production Budget
, Release Date
, Rotten Tomatoes Rating
, Running Time min
, Source
, Title
, US DVD Sales
, US Gross
, Worldwide Gross
ohlc
The dataset contains the performance of the Chicago Board Options Exchange Volatility Index (VIX) in the summer of 2009. [Data].
Columns: date
, open
, high
, low
, close
, signal
, short
penguins
The dataset contains data for 344 penguins.
There are 3 different species of penguins in this dataset, collected from 3 islands in the
Palmer Archipelago, Antarctica. This is an excellent dataset for data exploration
& visualization, as an alternative to :iris
.
Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. [Data].
Columns: Beak Depth (mm)
, Beak Length (mm)
, Body Mass (g)
, Flipper Length (mm)
, Island
, Sex
, Species
stocks
Daily closing prices of various stocks. [Data].
Columns: symbol
, date
, price
tips
Tipping data as collected by one waiter. Information about each tip he received over a period of a few months working in one restaurant is included. [Data].
Columns: total_bill
, tip
, sex
, smoker
, day
, time
, size
titanic
Titanic survival data from the legendary Kaggle competition. [Data].
Columns: PassengerId
, Survived
, Pclass
, Name
, Sex
, Age
, SibSp
, Parch
, Ticket
, Fare
, Cabin
, Embarked
unemployment
US unemployment data across various industries from 2000 to 2009. [Data].
Columns: count
, date
, month
, rate
, series
, year
weather
Daily weather records from Seattle with metric units. Data from NOAA. [Data].
Columns: date
, precipitation
, temp_max
, temp_min
, wind
, weather
Summary
Functions
Reruns the url of the given dataset.
Types
@type t() ::
:barley
| :cars
| :corruption
| :flights
| :gapminder
| :glue
| :iris
| :movies
| :ohlc
| :penguins
| :stocks
| :tips
| :titanic
| :unemployment
| :weather
Functions
Reruns the url of the given dataset.
Raises an error if the dataset is invalid.