Dataset v0.5.0 Dataset View Source

Datasets represent labeled tabular data.

Datasets are enumerable:

iex> Dataset.new([{:a, :b, :c},
...>              {:A, :B, :C},
...>              {:i, :ii, :iii},
...>              {:I, :II, :III}],
...>             {"one", "two", "three"})
...> |> Enum.map(&elem(&1, 2))
[:c, :C, :iii, :III]

Datasets are also collectable:

iex> for x <- 0..10, into: Dataset.empty({:n}), do: x
%Dataset{labels: {:n}, rows: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}

Link to this section Summary

Functions

Return a tuple of lists containing columnar data from ds, one list for each passed element of the column_labels list. Lists are returned in the tuple in the same order in which they appear in column_labels. Labels may appear more than once.

Return a dataset with no rows and labels specified by the tuple passed as label. If label is not specified, return an empty dataset with zero columns.

Return the result of performing an inner join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

Return the result of performing a left join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

Construct a new dataset. A dataset is a list of tuples. With no arguments, an empty dataset with zero columns is constructed. Withf one argument a dataset is constructed with the passed object interpreted as rows and labels beginning with 0 are generated, the number of which are determined by size of the first tuple in the data.

Return the result of performing an outer join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

Return the result of performing a right join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

Returns a dataset with each value in row i and column j transposed into row j and column i. The dataset is labelled with integer indicies beginning with zero.

Return a new dataset with columns chosen from the input dataset ds.

Return the contents of _ds as a list of maps.

Link to this section Functions

Link to this function

columns(ds, column_labels) View Source

Return a tuple of lists containing columnar data from ds, one list for each passed element of the column_labels list. Lists are returned in the tuple in the same order in which they appear in column_labels. Labels may appear more than once.

iex> iso_countries = %Dataset{
...>   labels: {:iso_country, :country_name},
...>   rows: [
...>     {"us", "United States"},
...>     {"uk", "United Kingdom"},
...>     {"ca", "Canada"},
...>     {"de", "Germany"},
...>     {"nl", "Netherlands"},
...>     {"sg", "Singapore"}
...>   ]
...> }
...>  Dataset.columns(iso_countries, [:iso_country, :iso_country])
{["us", "uk", "ca", "de", "nl", "sg"],
 ["us", "uk", "ca", "de", "nl", "sg"]}

Return a dataset with no rows and labels specified by the tuple passed as label. If label is not specified, return an empty dataset with zero columns.

Link to this function

inner_join(ds1, ds2, k1, k2 \\ nil, out_labels) View Source

Return the result of performing an inner join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

iex> iso_countries =
...>   Dataset.new(
...>     [
...>       {"us", "United States"},
...>       {"uk", "United Kingdom"},
...>       {"ca", "Canada"},
...>       {"de", "Germany"},
...>       {"nl", "Netherlands"},
...>       {"sg", "Singapore"}
...>     ],
...>     {:iso_country, :country_name}
...>   )
...>
...> country_clicks =
...>   Dataset.new(
...>     [
...>       {"United States", "13"},
...>       {"United Kingdom", "11"},
...>       {"Canada", "4"},
...>       {"Germany", "4"},
...>       {"France", "2"}
...>     ],
...>     {:country_name, :clicks}
...>   )
...>
...> Dataset.inner_join(country_clicks, iso_countries, :country_name,
...>   right: :iso_country,
...>   left: :clicks
...> )
%Dataset{
  labels: {:iso_country, :clicks},
  rows: [{"ca", "4"}, {"de", "4"}, {"uk", "11"}, {"us", "13"}]
}
Link to this function

left_join(ds1, ds2, k1, k2 \\ nil, out_labels) View Source

Return the result of performing a left join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

iex> iso_countries =
...>   Dataset.new(
...>     [
...>       {"us", "United States"},
...>       {"uk", "United Kingdom"},
...>       {"ca", "Canada"},
...>       {"de", "Germany"},
...>       {"nl", "Netherlands"},
...>       {"sg", "Singapore"}
...>     ],
...>     {:iso_country, :country_name}
...>   )
...>
...> country_clicks =
...>   Dataset.new(
...>     [
...>       {"United States", "13"},
...>       {"United Kingdom", "11"},
...>       {"Canada", "4"},
...>       {"Germany", "4"},
...>       {"France", "2"}
...>     ],
...>     {:country_name, :clicks}
...>   )
...>
...>  Dataset.left_join(country_clicks, iso_countries, :country_name,
...>    right: :iso_country,
...>    left: :clicks
...>  )
%Dataset{
  labels: {:iso_country, :clicks},
  rows: [{"ca", "4"}, {nil, "2"}, {"de", "4"}, {"uk", "11"}, {"us", "13"}]
}
Link to this function

new(rows \\ [], labels \\ nil) View Source

Construct a new dataset. A dataset is a list of tuples. With no arguments, an empty dataset with zero columns is constructed. Withf one argument a dataset is constructed with the passed object interpreted as rows and labels beginning with 0 are generated, the number of which are determined by size of the first tuple in the data.

iex> Dataset.new()
%Dataset{rows: [], labels: {}}

iex> Dataset.new([{:foo, :bar}, {:eggs, :ham}])
%Dataset{rows: [foo: :bar, eggs: :ham], labels: {0, 1}}

iex> Dataset.new([{0,0}, {1, 1}, {2, 4}, {3, 9}],
...>             {:x, :x_squared})
%Dataset{labels: {:x, :x_squared}, rows: [{0, 0}, {1, 1}, {2, 4}, {3, 9}]}
Link to this function

outer_join(ds1, ds2, k1, k2 \\ nil, out_labels) View Source

Return the result of performing an outer join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

iex> iso_countries =
...>   Dataset.new(
...>     [
...>       {"us", "United States"},
...>       {"uk", "United Kingdom"},
...>       {"ca", "Canada"},
...>       {"de", "Germany"},
...>       {"nl", "Netherlands"},
...>       {"sg", "Singapore"}
...>     ],
...>     {:iso_country, :country_name}
...>   )
...>
...> country_clicks =
...>   Dataset.new(
...>     [
...>       {"United States", "13"},
...>       {"United Kingdom", "11"},
...>       {"Canada", "4"},
...>       {"Germany", "4"},
...>       {"France", "2"}
...>     ],
...>     {:country_name, :clicks}
...>   )
...>
...>  Dataset.outer_join(country_clicks, iso_countries, :country_name,
...>    right: :iso_country,
...>    left: :clicks
...>  )
%Dataset{
  labels: {:iso_country, :clicks},
  rows: [
    {"ca", "4"},
    {nil, "2"},
    {"de", "4"},
    {"nl", nil},
    {"sg", nil},
    {"uk", "11"},
    {"us", "13"}
  ]
}
Link to this function

right_join(ds1, ds2, k1, k2 \\ nil, out_labels) View Source

Return the result of performing a right join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

iex> iso_countries =
...>   Dataset.new(
...>     [
...>       {"us", "United States"},
...>       {"uk", "United Kingdom"},
...>       {"ca", "Canada"},
...>       {"de", "Germany"},
...>       {"nl", "Netherlands"},
...>       {"sg", "Singapore"}
...>     ],
...>     {:iso_country, :country_name}
...>   )
...>
...> country_clicks =
...>   Dataset.new(
...>     [
...>       {"United States", "13"},
...>       {"United Kingdom", "11"},
...>       {"Canada", "4"},
...>       {"Germany", "4"},
...>       {"France", "2"}
...>     ],
...>     {:country_name, :clicks}
...>   )
...>
...>  Dataset.right_join(country_clicks, iso_countries, :country_name,
...>    right: :iso_country,
...>    left: :clicks
...>  )
%Dataset{
  labels: {:iso_country, :clicks},
  rows: [
    {"ca", "4"},
    {"de", "4"},
    {"nl", nil},
    {"sg", nil},
    {"uk", "11"},
    {"us", "13"}
  ]
}

Returns a dataset with each value in row i and column j transposed into row j and column i. The dataset is labelled with integer indicies beginning with zero.

iex> Dataset.new([{:a,:b,:c},
...>              {:A, :B, :C},
...>              {:i, :ii, :iii},
...>              {:I, :II, :III}])
...> |> Dataset.rotate()
%Dataset{
  labels: {0, 1, 2, 3},
  rows: [{:a, :A, :i, :I},
         {:b, :B, :ii, :II},
         {:c, :C, :iii, :III}]
}

Return a new dataset with columns chosen from the input dataset ds.

iex> Dataset.new([{:a,:b,:c},
...>              {:A, :B, :C},
...>              {:i, :ii, :iii},
...>              {:I, :II, :III}],
...>             {"first", "second", "third"})
...> |> Dataset.select(["second"])
%Dataset{rows: [{:b}, {:B}, {:ii}, {:II}], labels: {"second"}}

Return the contents of _ds as a list of maps.