# `Tucan.Datasets`
[🔗](https://github.com/pnezis/tucan/blob/v0.6.0/lib/tucan/datasets.ex#L21)

Common datasets for `Tucan` demos and docs.

## Supported datasets

Currently the following datasets are supported:

#### barley

[Yield data](https://stat.ethz.ch/R-manual/R-devel/library/lattice/html/barley.html) from a
Minnesota barley trial. Includes total yield in bushels per acre for 10 varieties at 6 sites
in each of two years.
 [[Data]](https://vega.github.io/editor/data/barley.json).

**Columns: ** `yield`, `variety`, `year`, `site`

#### cars

This was the 1983 ASA Data Exposition dataset. The dataset was collected by Ernesto Ramos and
David Donoho and dealt with automobiles. I don't remember the instructions for analysis. Data
on mpg, cylinders, displacement, etc. (8 variables) for 406 different cars. [[Source]](http://lib.stat.cmu.edu/datasets/)
 [[Data]](https://vega.github.io/editor/data/cars.json).

**Columns: ** `Name`, `Miles_per_Gallon`, `Cylinders`, `Displacement`, `Horsepower`, `Weight_in_lbs`, `Acceleration`, `Year`, `Origin`

#### corruption

Corruption Perceptions Index (CPI) and Human Development Index (HDI) for 176 countries,
from 2012 to 2015.
 [[Data]](https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/corruption.csv).

**Columns: ** `country`, `region`, `year`, `cpi`, `iso3c`, `hdi`

#### flights

Monthly airline passengers from 1949 to 1960.
 [[Data]](https://raw.githubusercontent.com/mwaskom/seaborn-data/master/flights.csv).

**Columns: ** `year`, `month`, `passengers`

#### gapminder

[Gapminder](https://www.gapminder.org/) health & income by country data.
 [[Data]](https://vega.github.io/vega-datasets/data/gapminder-health-income.csv).

**Columns: ** `country`, `income`, `health`, `population`, `region`

#### glue

Data from the [GLUE (The General Language Understanding Evaluation) benchmark
leaderboard](https://gluebenchmark.com/leaderboard).
 [[Data]](https://raw.githubusercontent.com/mwaskom/seaborn-data/master/glue.csv).

**Columns: ** `Model`, `Year`, `Encoder`, `Task`, `Score`

#### iris

This is one of the earliest datasets used in the literature on classification methods and widely
used in statistics and machine learning.  The data set contains 3 classes of 50 instances each,
where each class refers to a type of iris plant.
 [[Data]](https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/0e7a9b0a5d22642a06d3d5b9bcbad9890c8ee534/iris.csv).

**Columns: ** `sepal_length`, `sepal_width`, `petal_length`, `petal_width`, `species`

#### movies

Movies dataset including IMDB scores. The dataset has well known and intentionally included
errors. This dataset is used for instructional purposes, including the need to reckon with
dirty data.
 [[Data]](https://vega.github.io/editor/data/movies.json).

**Columns: ** `Creative Type`, `Director`, `Distributor`, `IMDB Rating`, `IMDB Votes`, `MPAA Rating`, `Major Genre`, `Production Budget`, `Release Date`, `Rotten Tomatoes Rating`, `Running Time min`, `Source`, `Title`, `US DVD Sales`, `US Gross`, `Worldwide Gross`

#### ohlc

The dataset contains the performance of the Chicago Board Options Exchange Volatility Index (VIX) in
the summer of 2009.
 [[Data]](https://vega.github.io/editor/data/ohlc.json).

**Columns: ** `date`, `open`, `high`, `low`, `close`, `signal`, `short`

#### penguins

The [dataset](https://github.com/allisonhorst/palmerpenguins) contains data for 344 penguins.
There are 3 different species of penguins in this dataset, collected from 3 islands in the
Palmer Archipelago, Antarctica. This is an excellent dataset for data exploration
& visualization, as an alternative to `:iris`.

Data were collected and made available by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php)
and the [Palmer Station, Antarctica LTER](https://pallter.marine.rutgers.edu/), a member of
the [Long Term Ecological Research Network](https://lternet.edu/).
 [[Data]](https://raw.githubusercontent.com/vega/vega-datasets/next/data/penguins.json).

**Columns: ** `Beak Depth (mm)`, `Beak Length (mm)`, `Body Mass (g)`, `Flipper Length (mm)`, `Island`, `Sex`, `Species`

#### stocks

Daily closing prices of various stocks. [[Data]](https://vega.github.io/editor/data/stocks.csv).

**Columns: ** `symbol`, `date`, `price`

#### tips

[Tipping data](https://rdrr.io/cran/reshape2/man/tips.html) as collected by one waiter.
Information about each tip he received over a period of a few months working in one restaurant
is included.
 [[Data]](https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv).

**Columns: ** `total_bill`, `tip`, `sex`, `smoker`, `day`, `time`, `size`

#### titanic

Titanic survival data from the [legendary Kaggle competition](https://www.kaggle.com/competitions/titanic/data).
 [[Data]](https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv).

**Columns: ** `PassengerId`, `Survived`, `Pclass`, `Name`, `Sex`, `Age`, `SibSp`, `Parch`, `Ticket`, `Fare`, `Cabin`, `Embarked`

#### unemployment

US unemployment data across various industries from 2000 to 2009.
 [[Data]](https://vega.github.io/editor/data/unemployment-across-industries.json).

**Columns: ** `count`, `date`, `month`, `rate`, `series`, `year`

#### weather

Daily weather records from Seattle with metric units. [Data from NOAA](https://www.ncdc.noaa.gov/cdo-web/datatools/records).
 [[Data]](https://vega.github.io/editor/data/weather.csv).

**Columns: ** `date`, `precipitation`, `temp_max`, `temp_min`, `wind`, `weather`

# `t`

```elixir
@type t() ::
  :barley
  | :cars
  | :corruption
  | :flights
  | :gapminder
  | :glue
  | :iris
  | :movies
  | :ohlc
  | :penguins
  | :stocks
  | :tips
  | :titanic
  | :unemployment
  | :weather
```

# `dataset`

```elixir
@spec dataset(atom()) :: String.t()
```

Reruns the url of the given dataset.

Raises an error if the dataset is invalid.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
