View Source Ten Minutes to Explorer

Mix.install([
  {:explorer, "~> 0.7.0"},
  {:kino, "~> 0.10.0"}
])
:ok

Introduction

Explorer is a dataframe library for Elixir. A dataframe is a common data structure used in data analysis. It is a two-dimensional table composed of columns and rows similar to a SQL table or a spreadsheet.

Explorer's aim is to provide a simple and powerful API for manipulating dataframes. It takes influences mainly from the tidyverse, but if you've used other dataframe libraries like pandas you shouldn't have too much trouble working with Explorer.

This document is meant to give you a crash course in using Explorer. More in-depth documentation can be found in the relevant sections of the docs.

We strongly recommend you run this livebook locally so you can see the outputs and play with the inputs!

Reading and writing data

Data can be read from delimited files (like CSV), NDJSON, Parquet, and the Arrow IPC (feather) format. You can also load in data from a map or keyword list of columns with Explorer.DataFrame.new/1.

For CSV, your 'usual suspects' of options are available:

  • delimiter - A single character used to separate fields within a record. (default: ",")
  • dtypes - A keyword list of [column_name: dtype]. If a type is not specified for a column, it is imputed from the first 1000 rows. (default: [])
  • header - Does the file have a header of column names as the first row or not? (default: true)
  • max_rows - Maximum number of lines to read. (default: nil)
  • nil_values - A list of strings that should be interpreted as a nil values. (default: [])
  • skip_rows - The number of lines to skip at the beginning of the file. (default: 0)
  • columns - A list of column names to keep. If present, only these columns are read into the dataframe. (default: nil)

Explorer also has multiple example datasets built in, which you can load from the Explorer.Datasets module like so:

df = Explorer.Datasets.fossil_fuels()
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

You'll notice that the output looks slightly different than many dataframe libraries. Explorer takes inspiration on this front from glimpse in R. A benefit to this approach is that you will rarely need to elide columns.

If you'd like to see a table with your data, take a look at Kino Explorer, that provides a rich table with filtering and sorting.

Writing files is very similar to reading them. The options are a little more limited:

  • header - Should the column names be written as the first line of the file? (default: true)
  • delimiter - A single character used to separate fields within a record. (default: ",")

First, let's add some useful aliases:

alias Explorer.DataFrame
alias Explorer.Series
Explorer.Series

And then write to a file of your choosing:

input = Kino.Input.text("Filename")
filename = Kino.Input.read(input)
DataFrame.to_csv(df, filename)
:ok

Working with Series

Explorer, like Polars, works up from the concept of a Series. In many ways, you can think of a dataframe as a row-aligned map of Series. These are like vectors in R or series in Pandas.

Explorer supports the following Series dtypes:

  • :binary - Binaries (sequences of bytes)
  • :boolean - Boolean
  • :category - Strings but represented internally as integers
  • :date - Date type that unwraps to Elixir.Date
  • {:datetime, precision} - DateTime type with millisecond/microsecond/nanosecond precision that unwraps to Elixir.NaiveDateTime
  • {:duration, precision} - Duration type with millisecond/microsecond/nanosecond precision that unwraps to Explorer.Duration
  • {:f, 32} - 32-bit floating point number
  • :float or {:f, 64} - 64-bit floating point number.
  • :integer - 64-bit signed integer
  • :string - UTF-8 encoded binary
  • :time - Time type that unwraps to Elixir.Time

Series can be constructed from Elixir basic types. For example:

s1 = Series.from_list([1, 2, 3])
#Explorer.Series<
  Polars[3]
  integer [1, 2, 3]
>
s2 = Series.from_list(["a", "b", "c"])
#Explorer.Series<
  Polars[3]
  string ["a", "b", "c"]
>
s3 = Series.from_list([~D[2011-01-01], ~D[1965-01-21]])
#Explorer.Series<
  Polars[2]
  date [2011-01-01, 1965-01-21]
>

You'll notice that the dtype and size of the Series are at the top of the printed value. You can get those programmatically as well.

Series.dtype(s3)
:date
Series.size(s3)
2

And the printed values max out at 50:

1..100 |> Enum.to_list() |> Series.from_list()
#Explorer.Series<
  Polars[100]
  integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
   25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
   49, 50, ...]
>

Series are also nullable.

s = Series.from_list([1.0, 2.0, nil, nil, 5.0])
#Explorer.Series<
  Polars[5]
  f64 [1.0, 2.0, nil, nil, 5.0]
>

And you can fill in those missing values using one of the following strategies:

  • :forward - replace nil with the previous value
  • :backward - replace nil with the next value
  • :max - replace nil with the series maximum
  • :min - replace nil with the series minimum
  • :mean - replace nil with the series mean
Series.fill_missing(s, :forward)
#Explorer.Series<
  Polars[5]
  f64 [1.0, 2.0, 2.0, 2.0, 5.0]
>

In the case of mixed numeric types (i.e. integers and floats), Series will downcast to a float:

Series.from_list([1, 2.0])
#Explorer.Series<
  Polars[2]
  f64 [1.0, 2.0]
>

In all other cases, Series must all be of the same dtype or else you'll get an ArgumentError.

Series.from_list([1, 2, 3, "a"])

One of the goals of Explorer is useful error messages. If you look at the error above, you get:

the value "a" does not match the inferred series dtype :integer

Hopefully this makes abundantly clear what's going on.

Series also implements the Access protocol. You can slice and dice in many ways:

s = 1..10 |> Enum.to_list() |> Series.from_list()
#Explorer.Series<
  Polars[10]
  integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>
s[1]
2
s[-1]
10
s[0..4]
#Explorer.Series<
  Polars[5]
  integer [1, 2, 3, 4, 5]
>
s[[0, 4, 4]]
#Explorer.Series<
  Polars[3]
  integer [1, 5, 5]
>

And of course, you can convert back to an Elixir list.

Series.to_list(s)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Explorer comparisons return boolean series. We will talk more about boolean series later.

s = 1..11 |> Enum.to_list() |> Series.from_list()
#Explorer.Series<
  Polars[11]
  integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>
s1 = 11..1 |> Enum.to_list() |> Series.from_list()
#Explorer.Series<
  Polars[11]
  integer [11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
>
Series.equal(s, s1)
#Explorer.Series<
  Polars[11]
  boolean [false, false, false, false, false, true, false, false, false, false, false]
>
Series.equal(s, 5)
#Explorer.Series<
  Polars[11]
  boolean [false, false, false, false, true, false, false, false, false, false, false]
>
Series.not_equal(s, 10)
#Explorer.Series<
  Polars[11]
  boolean [true, true, true, true, true, true, true, true, true, false, true]
>
Series.greater_equal(s, 4)
#Explorer.Series<
  Polars[11]
  boolean [false, false, false, true, true, true, true, true, true, true, true]
>

Explorer supports arithmetic.

Series.add(s, s1)
#Explorer.Series<
  Polars[11]
  integer [12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12]
>
Series.subtract(s, 4)
#Explorer.Series<
  Polars[11]
  integer [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]
>
Series.multiply(s, s1)
#Explorer.Series<
  Polars[11]
  integer [11, 20, 27, 32, 35, 36, 35, 32, 27, 20, 11]
>

Remember those helpful errors? We've tried to add those throughout. So if you try to do arithmetic with mismatching dtypes:

s = Series.from_list([1, 2, 3])
s1 = Series.from_list([1.0, 2.0, 3.0])
Series.add(s, s1)
#Explorer.Series<
  Polars[3]
  f64 [2.0, 4.0, 6.0]
>

Just kidding! Integers and floats will downcast to floats. Let's try again:

s = Series.from_list([1, 2, 3])
s1 = Series.from_list(["a", "b", "c"])
Series.add(s, s1)

You can flip them around.

s = Series.from_list([1, 2, 3, 4])
Series.reverse(s)
#Explorer.Series<
  Polars[4]
  integer [4, 3, 2, 1]
>

And sort.

1..100 |> Enum.to_list() |> Enum.shuffle() |> Series.from_list() |> Series.sort()
#Explorer.Series<
  Polars[100]
  integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
   25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
   49, 50, ...]
>

Or argsort.

s = 1..100 |> Enum.to_list() |> Enum.shuffle() |> Series.from_list()
ids = Series.argsort(s) |> Series.to_list()
[14, 30, 15, 16, 76, 44, 38, 0, 1, 20, 47, 8, 31, 55, 32, 49, 39, 24, 19, 50, 88, 57, 40, 75, 68,
 18, 46, 37, 36, 5, 77, 48, 97, 98, 79, 42, 83, 73, 82, 96, 60, 2, 66, 90, 13, 43, 74, 99, 17, 12,
 ...]

Which you can pass to Explorer.Series.slice/2 if you want the sorted values.

Series.slice(s, ids)
#Explorer.Series<
  Polars[100]
  integer [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
   25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
   49, 50, ...]
>

You can calculate cumulative values.

s = 1..100 |> Enum.to_list() |> Series.from_list()
Series.cumulative_sum(s)
#Explorer.Series<
  Polars[100]
  integer [1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210, 231,
   253, 276, 300, 325, 351, 378, 406, 435, 465, 496, 528, 561, 595, 630, 666, 703, 741, 780, 820,
   861, 903, 946, 990, 1035, 1081, 1128, 1176, 1225, 1275, ...]
>

Or rolling ones.

Series.window_sum(s, 4)
#Explorer.Series<
  Polars[100]
  integer [1, 3, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62, 66, 70, 74, 78, 82, 86,
   90, 94, 98, 102, 106, 110, 114, 118, 122, 126, 130, 134, 138, 142, 146, 150, 154, 158, 162, 166,
   170, 174, 178, 182, 186, 190, 194, ...]
>

You can count and list unique values.

s = Series.from_list(["a", "b", "b", "c", "c", "c"])
Series.distinct(s)
#Explorer.Series<
  Polars[3]
  string ["a", "b", "c"]
>
Series.n_distinct(s)
3

And you can even get a dataframe showing the frequencies for each distinct value.

Series.frequencies(s)
#Explorer.DataFrame<
  Polars[3 x 2]
  values string ["c", "b", "a"]
  counts integer [3, 2, 1]
>

Back to those boolean series returned by comparison functions like equal and not_equal.

These boolean series can be combined with other functions to perform conditional operations.

s1 = Series.from_list(["It", "was", "the", "best", "of", "times"])
s1 |> Series.equal("best") |> Series.select("worst", s1)
#Explorer.Series<
  Polars[6]
  string ["It", "was", "the", "worst", "of", "times"]
>

Working with DataFrames

A DataFrame is really just a collection of Series of the same size. Which is why you can create a DataFrame from a Keyword list.

DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
#Explorer.DataFrame<
  Polars[3 x 2]
  a integer [1, 2, 3]
  b string ["a", "b", "c"]
>

Similarly to Series, the Inspect implementation prints some info at the top and to the left. At the top we see the shape of the dataframe (rows and columns) and then for each column we see the name, dtype, and first five values. We can see a bit more from that built-in dataset we loaded in earlier.

df
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

You will also see grouping information there, but we'll get to that later. You can get the info yourself directly:

DataFrame.names(df)
["year", "country", "total", "solid_fuel", "liquid_fuel", "gas_fuel", "cement", "gas_flaring",
 "per_capita", "bunker_fuels"]
DataFrame.dtypes(df)
%{
  "bunker_fuels" => :integer,
  "cement" => :integer,
  "country" => :string,
  "gas_flaring" => :integer,
  "gas_fuel" => :integer,
  "liquid_fuel" => :integer,
  "per_capita" => {:f, 64},
  "solid_fuel" => :integer,
  "total" => :integer,
  "year" => :integer
}
DataFrame.shape(df)
{1094, 10}
{DataFrame.n_rows(df), DataFrame.n_columns(df)}
{1094, 10}

We can grab the head.

DataFrame.head(df)
#Explorer.DataFrame<
  Polars[5 x 10]
  year integer [2010, 2010, 2010, 2010, 2010]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA"]
  total integer [2308, 1254, 32500, 141, 7924]
  solid_fuel integer [627, 117, 332, 0, 0]
  liquid_fuel integer [1601, 953, 12381, 141, 3649]
  gas_fuel integer [74, 7, 14565, 0, 374]
  cement integer [5, 177, 2598, 0, 204]
  gas_flaring integer [0, 0, 2623, 0, 3697]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37]
  bunker_fuels integer [9, 7, 663, 0, 321]
>

Or the tail. Let's get a few more values from the tail.

DataFrame.tail(df, 10)
#Explorer.DataFrame<
  Polars[10 x 10]
  year integer [2014, 2014, 2014, 2014, 2014, ...]
  country string ["UNITED STATES OF AMERICA", "URUGUAY", "UZBEKISTAN", "VANUATU", "VENEZUELA", ...]
  total integer [1432855, 1840, 28692, 42, 50510, ...]
  solid_fuel integer [450047, 2, 1677, 0, 204, ...]
  liquid_fuel integer [576531, 1700, 2086, 42, 28445, ...]
  gas_fuel integer [390719, 25, 23929, 0, 12731, ...]
  cement integer [11314, 112, 1000, 0, 1088, ...]
  gas_flaring integer [4244, 0, 0, 0, 8042, ...]
  per_capita f64 [4.43, 0.54, 0.97, 0.16, 1.65, ...]
  bunker_fuels integer [30722, 251, 0, 10, 1256, ...]
>

Verbs and macros

In Explorer, like in dplyr, we have five main verbs to work with dataframes:

  • select
  • filter
  • mutate
  • arrange
  • summarise

We are going to explore then in this notebook, but first we need to "require" the Explorer.DataFrame module in order to load the macros needed for these verbs.

I want to take the opportunity to create a shorter alias for the DataFrame module, called DF:

require DataFrame, as: DF
Explorer.DataFrame

From now on we are using the shorter version, DF, to refer to the required Explorer.DataFrame module.

Select

Let's jump right into it. We can select columns pretty simply.

DF.select(df, ["year", "country"])
#Explorer.DataFrame<
  Polars[1094 x 2]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
>

But Elixir gives us some superpowers. In R there's tidy-select. I don't think we need that in Elixir. Anywhere in Explorer where you need to pass a list of column names, you can also execute a filtering callback on the column names. It's just an anonymous function passed to df |> DataFrame.names() |> Enum.filter(callback_here).

DF.select(df, &String.ends_with?(&1, "fuel"))
#Explorer.DataFrame<
  Polars[1094 x 3]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
>

Want all but some columns? discard/2 performs the opposite of select/2.

DF.discard(df, &String.ends_with?(&1, "fuel"))
#Explorer.DataFrame<
  Polars[1094 x 7]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

Filter

The next verb we'll look at is filter.

This is implemented using a macro, so it's possible to use expressions like you would if comparing variables in Elixir:

DF.filter(df, country == "BRAZIL")
#Explorer.DataFrame<
  Polars[5 x 10]
  year integer [2010, 2011, 2012, 2013, 2014]
  country string ["BRAZIL", "BRAZIL", "BRAZIL", "BRAZIL", "BRAZIL"]
  total integer [114468, 119829, 128178, 137354, 144480]
  solid_fuel integer [15965, 17498, 17165, 18773, 20089]
  liquid_fuel integer [74689, 78849, 84409, 88898, 92454]
  gas_fuel integer [14372, 13778, 16328, 19399, 21297]
  cement integer [8040, 8717, 9428, 9517, 9691]
  gas_flaring integer [1402, 987, 848, 767, 949]
  per_capita f64 [0.58, 0.6, 0.63, 0.67, 0.7]
  bunker_fuels integer [5101, 5516, 5168, 4895, 4895]
>

Using complex filters is also possible:

DF.filter(df, country == "ALGERIA" and year > 2012)
#Explorer.DataFrame<
  Polars[2 x 10]
  year integer [2013, 2014]
  country string ["ALGERIA", "ALGERIA"]
  total integer [36669, 39651]
  solid_fuel integer [198, 149]
  liquid_fuel integer [14170, 14422]
  gas_fuel integer [17863, 20151]
  cement integer [2516, 2856]
  gas_flaring integer [1922, 2073]
  per_capita f64 [0.96, 1.02]
  bunker_fuels integer [687, 581]
>

You can also write the same filter without the macro, by using the callback version function which is filter_with/2:

DF.filter_with(df, fn ldf ->
  ldf["country"]
  |> Series.equal("ALGERIA")
  |> Series.and(Series.greater(ldf["year"], 2012))
end)
#Explorer.DataFrame<
  Polars[2 x 10]
  year integer [2013, 2014]
  country string ["ALGERIA", "ALGERIA"]
  total integer [36669, 39651]
  solid_fuel integer [198, 149]
  liquid_fuel integer [14170, 14422]
  gas_fuel integer [17863, 20151]
  cement integer [2516, 2856]
  gas_flaring integer [1922, 2073]
  per_capita f64 [0.96, 1.02]
  bunker_fuels integer [687, 581]
>

By the way, all the Explorer.DataFrame macros have a correspondent function that accepts a callback. In fact, our macros are implemented using those functions.

The filter_with/2 function is going to use a virtual representation of the dataframe that we call a "lazy frame". With lazy frames you can´t access the series contents, but every operation will be optimized and run only once.

Remember those helpful error messages?

DF.filter(df, cuontry == "BRAZIL")

Mutate

A common task in data analysis is to add columns or change existing ones. Mutate is a handy verb.

DF.mutate(df, new_column: solid_fuel + cement)
#Explorer.DataFrame<
  Polars[1094 x 11]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
  new_column integer [632, 294, 2930, 0, 204, ...]
>

Did you catch that? You can pass in new columns as keyword arguments. It also works to transform existing columns.

DF.mutate(df,
  gas_fuel: Series.cast(gas_fuel, :float),
  gas_and_liquid_fuel: gas_fuel + liquid_fuel
)
#Explorer.DataFrame<
  Polars[1094 x 11]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel f64 [74.0, 7.0, 14565.0, 0.0, 374.0, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
  gas_and_liquid_fuel integer [1675, 960, 26946, 141, 4023, ...]
>

DataFrame.mutate/2 is flexible though. You may not always want to use keyword arguments. Given that column names are String.t(), it may make more sense to use a map.

DF.mutate(df, %{"gas_fuel" => gas_fuel - 10})
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [64, -3, 14555, -10, 364, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

DF.transmute/2, which is DF.mutate/2 that only retains the specified columns, is forthcoming.

Arrange

Sorting the dataframe is pretty straightforward.

DF.arrange(df, year)
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

But it comes with some tricks up its sleeve.

DF.arrange(df, asc: total, desc: year)
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2013, 2012, 2011, 2011, ...]
  country string ["NIUE", "NIUE", "NIUE", "NIUE", "TUVALU", ...]
  total integer [1, 2, 2, 2, 2, ...]
  solid_fuel integer [0, 0, 0, 0, 0, ...]
  liquid_fuel integer [1, 2, 2, 2, 2, ...]
  gas_fuel integer [0, 0, 0, 0, 0, ...]
  cement integer [0, 0, 0, 0, 0, ...]
  gas_flaring integer [0, 0, 0, 0, 0, ...]
  per_capita f64 [0.52, 1.04, 1.04, 1.04, 0.0, ...]
  bunker_fuels integer [0, 0, 0, 0, 0, ...]
>

As the examples show, arrange/2 is a macro, and therefore you can use some functions to arrange your dataframe:

DF.arrange(df, asc: Series.window_sum(total, 2))
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2011, 2012, 2010, 2011, ...]
  country string ["FEDERATED STATES OF MICRONESIA", "FEDERATED STATES OF MICRONESIA",
   "FEDERATED STATES OF MICRONESIA", "TUVALU", "TUVALU", ...]
  total integer [31, 33, 37, 2, 2, ...]
  solid_fuel integer [0, 0, 0, 0, 0, ...]
  liquid_fuel integer [31, 33, 37, 2, 2, ...]
  gas_fuel integer [0, 0, 0, 0, 0, ...]
  cement integer [0, 0, 0, 0, 0, ...]
  gas_flaring integer [0, 0, 0, 0, 0, ...]
  per_capita f64 [0.3, 0.32, 0.36, 0.0, 0.0, ...]
  bunker_fuels integer [1, 1, 1, 0, 0, ...]
>

Sort operations happen left to right. And keyword list args permit specifying the direction.

Distinct

Okay, as expected here too. Very straightforward.

DF.distinct(df, ["year", "country"])
#Explorer.DataFrame<
  Polars[1094 x 2]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
>

You can specify whether to keep the other columns as well, so the first row of each distinct value is kept:

DF.distinct(df, ["country"], keep_all: true)
#Explorer.DataFrame<
  Polars[222 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

Rename

Rename can take either a list of new names or a callback that is passed to Enum.map/2 against the names. You can also use a map or keyword args to rename specific columns.

DF.rename(df, year: "year_test")
#Explorer.DataFrame<
  Polars[1094 x 10]
  year_test integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>
DF.rename_with(df, &(&1 <> "_test"))
#Explorer.DataFrame<
  Polars[1094 x 10]
  year_test integer [2010, 2010, 2010, 2010, 2010, ...]
  country_test string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total_test integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel_test integer [627, 117, 332, 0, 0, ...]
  liquid_fuel_test integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel_test integer [74, 7, 14565, 0, 374, ...]
  cement_test integer [5, 177, 2598, 0, 204, ...]
  gas_flaring_test integer [0, 0, 2623, 0, 3697, ...]
  per_capita_test f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels_test integer [9, 7, 663, 0, 321, ...]
>

Dummies

This is fun! We can get dummy variables for unique values.

DF.dummies(df, ["year"])
#Explorer.DataFrame<
  Polars[1094 x 5]
  year_2010 integer [1, 1, 1, 1, 1, ...]
  year_2011 integer [0, 0, 0, 0, 0, ...]
  year_2012 integer [0, 0, 0, 0, 0, ...]
  year_2013 integer [0, 0, 0, 0, 0, ...]
  year_2014 integer [0, 0, 0, 0, 0, ...]
>
DF.dummies(df, ["country"])
#Explorer.DataFrame<
  Polars[1094 x 222]
  country_AFGHANISTAN integer [1, 0, 0, 0, 0, ...]
  country_ALBANIA integer [0, 1, 0, 0, 0, ...]
  country_ALGERIA integer [0, 0, 1, 0, 0, ...]
  country_ANDORRA integer [0, 0, 0, 1, 0, ...]
  country_ANGOLA integer [0, 0, 0, 0, 1, ...]
  country_ANGUILLA integer [0, 0, 0, 0, 0, ...]
  country_ANTIGUA & BARBUDA integer [0, 0, 0, 0, 0, ...]
  country_ARGENTINA integer [0, 0, 0, 0, 0, ...]
  country_ARMENIA integer [0, 0, 0, 0, 0, ...]
  country_ARUBA integer [0, 0, 0, 0, 0, ...]
  country_AUSTRALIA integer [0, 0, 0, 0, 0, ...]
  country_AUSTRIA integer [0, 0, 0, 0, 0, ...]
  country_AZERBAIJAN integer [0, 0, 0, 0, 0, ...]
  country_BAHAMAS integer [0, 0, 0, 0, 0, ...]
  country_BAHRAIN integer [0, 0, 0, 0, 0, ...]
  country_BANGLADESH integer [0, 0, 0, 0, 0, ...]
  country_BARBADOS integer [0, 0, 0, 0, 0, ...]
  country_BELARUS integer [0, 0, 0, 0, 0, ...]
  country_BELGIUM integer [0, 0, 0, 0, 0, ...]
  country_BELIZE integer [0, 0, 0, 0, 0, ...]
  country_BENIN integer [0, 0, 0, 0, 0, ...]
  country_BERMUDA integer [0, 0, 0, 0, 0, ...]
  country_BHUTAN integer [0, 0, 0, 0, 0, ...]
  country_BOSNIA & HERZEGOVINA integer [0, 0, 0, 0, 0, ...]
  country_BOTSWANA integer [0, 0, 0, 0, 0, ...]
  country_BRAZIL integer [0, 0, 0, 0, 0, ...]
  country_BRITISH VIRGIN ISLANDS integer [0, 0, 0, 0, 0, ...]
  country_BRUNEI (DARUSSALAM) integer [0, 0, 0, 0, 0, ...]
  country_BULGARIA integer [0, 0, 0, 0, 0, ...]
  country_BURKINA FASO integer [0, 0, 0, 0, 0, ...]
  country_BURUNDI integer [0, 0, 0, 0, 0, ...]
  country_CAMBODIA integer [0, 0, 0, 0, 0, ...]
  country_CANADA integer [0, 0, 0, 0, 0, ...]
  country_CAPE VERDE integer [0, 0, 0, 0, 0, ...]
  country_CAYMAN ISLANDS integer [0, 0, 0, 0, 0, ...]
  country_CENTRAL AFRICAN REPUBLIC integer [0, 0, 0, 0, 0, ...]
  country_CHAD integer [0, 0, 0, 0, 0, ...]
  country_CHILE integer [0, 0, 0, 0, 0, ...]
  country_CHINA (MAINLAND) integer [0, 0, 0, 0, 0, ...]
  country_COLOMBIA integer [0, 0, 0, 0, 0, ...]
  country_COMOROS integer [0, 0, 0, 0, 0, ...]
  country_CONGO integer [0, 0, 0, 0, 0, ...]
  country_COOK ISLANDS integer [0, 0, 0, 0, 0, ...]
  country_COSTA RICA integer [0, 0, 0, 0, 0, ...]
  country_COTE D IVOIRE integer [0, 0, 0, 0, 0, ...]
  country_CROATIA integer [0, 0, 0, 0, 0, ...]
  country_CUBA integer [0, 0, 0, 0, 0, ...]
  country_CYPRUS integer [0, 0, 0, 0, 0, ...]
  country_CZECH REPUBLIC integer [0, 0, 0, 0, 0, ...]
  country_DEMOCRATIC PEOPLE S REPUBLIC OF KOREA integer [0, 0, 0, 0, 0, ...]
  country_DEMOCRATIC REPUBLIC OF THE CONGO (FORMERLY ZAIRE) integer [0, 0, 0, 0, 0, ...]
  country_DENMARK integer [0, 0, 0, 0, 0, ...]
  country_DJIBOUTI integer [0, 0, 0, 0, 0, ...]
  country_DOMINICA integer [0, 0, 0, 0, 0, ...]
  country_DOMINICAN REPUBLIC integer [0, 0, 0, 0, 0, ...]
  country_ECUADOR integer [0, 0, 0, 0, 0, ...]
  country_EGYPT integer [0, 0, 0, 0, 0, ...]
  country_EL SALVADOR integer [0, 0, 0, 0, 0, ...]
  country_EQUATORIAL GUINEA integer [0, 0, 0, 0, 0, ...]
  country_ERITREA integer [0, 0, 0, 0, 0, ...]
  country_ESTONIA integer [0, 0, 0, 0, 0, ...]
  country_ETHIOPIA integer [0, 0, 0, 0, 0, ...]
  country_FAEROE ISLANDS integer [0, 0, 0, 0, 0, ...]
  country_FALKLAND ISLANDS (MALVINAS) integer [0, 0, 0, 0, 0, ...]
  country_FEDERATED STATES OF MICRONESIA integer [0, 0, 0, 0, 0, ...]
  country_FIJI integer [0, 0, 0, 0, 0, ...]
  country_FINLAND integer [0, 0, 0, 0, 0, ...]
  country_FRANCE (INCLUDING MONACO) integer [0, 0, 0, 0, 0, ...]
  country_FRENCH GUIANA integer [0, 0, 0, 0, 0, ...]
  country_FRENCH POLYNESIA integer [0, 0, 0, 0, 0, ...]
  country_GABON integer [0, 0, 0, 0, 0, ...]
  country_GAMBIA integer [0, 0, 0, 0, 0, ...]
  country_GEORGIA integer [0, 0, 0, 0, 0, ...]
  country_GERMANY integer [0, 0, 0, 0, 0, ...]
  country_GHANA integer [0, 0, 0, 0, 0, ...]
  country_GIBRALTAR integer [0, 0, 0, 0, 0, ...]
  country_GREECE integer [0, 0, 0, 0, 0, ...]
  country_GREENLAND integer [0, 0, 0, 0, 0, ...]
  country_GRENADA integer [0, 0, 0, 0, 0, ...]
  country_GUADELOUPE integer [0, 0, 0, 0, 0, ...]
  country_GUATEMALA integer [0, 0, 0, 0, 0, ...]
  country_GUINEA integer [0, 0, 0, 0, 0, ...]
  country_GUINEA BISSAU integer [0, 0, 0, 0, 0, ...]
  country_GUYANA integer [0, 0, 0, 0, 0, ...]
  country_HAITI integer [0, 0, 0, 0, 0, ...]
  country_HONDURAS integer [0, 0, 0, 0, 0, ...]
  country_HONG KONG SPECIAL ADMINSTRATIVE REGION OF CHINA integer [0, 0, 0, 0, 0, ...]
  country_HUNGARY integer [0, 0, 0, 0, 0, ...]
  country_ICELAND integer [0, 0, 0, 0, 0, ...]
  country_INDIA integer [0, 0, 0, 0, 0, ...]
  country_INDONESIA integer [0, 0, 0, 0, 0, ...]
  country_IRAQ integer [0, 0, 0, 0, 0, ...]
  country_IRELAND integer [0, 0, 0, 0, 0, ...]
  country_ISLAMIC REPUBLIC OF IRAN integer [0, 0, 0, 0, 0, ...]
  country_ISRAEL integer [0, 0, 0, 0, 0, ...]
  country_ITALY (INCLUDING SAN MARINO) integer [0, 0, 0, 0, 0, ...]
  country_JAMAICA integer [0, 0, 0, 0, 0, ...]
  country_JAPAN integer [0, 0, 0, 0, 0, ...]
  country_JORDAN integer [0, 0, 0, 0, 0, ...]
  country_KAZAKHSTAN integer [0, 0, 0, 0, 0, ...]
  country_KENYA integer [0, 0, 0, 0, 0, ...]
  country_KIRIBATI integer [0, 0, 0, 0, 0, ...]
  country_KUWAIT integer [0, 0, 0, 0, 0, ...]
  country_KYRGYZSTAN integer [0, 0, 0, 0, 0, ...]
  country_LAO PEOPLE S DEMOCRATIC REPUBLIC integer [0, 0, 0, 0, 0, ...]
  country_LATVIA integer [0, 0, 0, 0, 0, ...]
  country_LEBANON integer [0, 0, 0, 0, 0, ...]
  country_LESOTHO integer [0, 0, 0, 0, 0, ...]
  country_LIBERIA integer [0, 0, 0, 0, 0, ...]
  country_LIBYAN ARAB JAMAHIRIYAH integer [0, 0, 0, 0, 0, ...]
  country_LIECHTENSTEIN integer [0, 0, 0, 0, 0, ...]
  country_LITHUANIA integer [0, 0, 0, 0, 0, ...]
  country_LUXEMBOURG integer [0, 0, 0, 0, 0, ...]
  country_MACAU SPECIAL ADMINSTRATIVE REGION OF CHINA integer [0, 0, 0, 0, 0, ...]
  country_MACEDONIA integer [0, 0, 0, 0, 0, ...]
  country_MADAGASCAR integer [0, 0, 0, 0, 0, ...]
  country_MALAWI integer [0, 0, 0, 0, 0, ...]
  country_MALAYSIA integer [0, 0, 0, 0, 0, ...]
  country_MALDIVES integer [0, 0, 0, 0, 0, ...]
  country_MALI integer [0, 0, 0, 0, 0, ...]
  country_MALTA integer [0, 0, 0, 0, 0, ...]
  country_MARSHALL ISLANDS integer [0, 0, 0, 0, 0, ...]
  country_MARTINIQUE integer [0, 0, 0, 0, 0, ...]
  country_MAURITANIA integer [0, 0, 0, 0, 0, ...]
  country_MAURITIUS integer [0, 0, 0, 0, 0, ...]
  country_MEXICO integer [0, 0, 0, 0, 0, ...]
  country_MONGOLIA integer [0, 0, 0, 0, 0, ...]
  country_MONTENEGRO integer [0, 0, 0, 0, 0, ...]
  country_MONTSERRAT integer [0, 0, 0, 0, 0, ...]
  country_MOROCCO integer [0, 0, 0, 0, 0, ...]
  country_MOZAMBIQUE integer [0, 0, 0, 0, 0, ...]
  country_MYANMAR (FORMERLY BURMA) integer [0, 0, 0, 0, 0, ...]
  country_NAMIBIA integer [0, 0, 0, 0, 0, ...]
  country_NAURU integer [0, 0, 0, 0, 0, ...]
  country_NEPAL integer [0, 0, 0, 0, 0, ...]
  country_NETHERLAND ANTILLES integer [0, 0, 0, 0, 0, ...]
  country_NETHERLANDS integer [0, 0, 0, 0, 0, ...]
  country_NEW CALEDONIA integer [0, 0, 0, 0, 0, ...]
  country_NEW ZEALAND integer [0, 0, 0, 0, 0, ...]
  country_NICARAGUA integer [0, 0, 0, 0, 0, ...]
  country_NIGER integer [0, 0, 0, 0, 0, ...]
  country_NIGERIA integer [0, 0, 0, 0, 0, ...]
  country_NIUE integer [0, 0, 0, 0, 0, ...]
  country_NORWAY integer [0, 0, 0, 0, 0, ...]
  country_OCCUPIED PALESTINIAN TERRITORY integer [0, 0, 0, 0, 0, ...]
  country_OMAN integer [0, 0, 0, 0, 0, ...]
  country_PAKISTAN integer [0, 0, 0, 0, 0, ...]
  country_PALAU integer [0, 0, 0, 0, 0, ...]
  country_PANAMA integer [0, 0, 0, 0, 0, ...]
  country_PAPUA NEW GUINEA integer [0, 0, 0, 0, 0, ...]
  country_PARAGUAY integer [0, 0, 0, 0, 0, ...]
  country_PERU integer [0, 0, 0, 0, 0, ...]
  country_PHILIPPINES integer [0, 0, 0, 0, 0, ...]
  country_PLURINATIONAL STATE OF BOLIVIA integer [0, 0, 0, 0, 0, ...]
  country_POLAND integer [0, 0, 0, 0, 0, ...]
  country_PORTUGAL integer [0, 0, 0, 0, 0, ...]
  country_QATAR integer [0, 0, 0, 0, 0, ...]
  country_REPUBLIC OF CAMEROON integer [0, 0, 0, 0, 0, ...]
  country_REPUBLIC OF KOREA integer [0, 0, 0, 0, 0, ...]
  country_REPUBLIC OF MOLDOVA integer [0, 0, 0, 0, 0, ...]
  country_REUNION integer [0, 0, 0, 0, 0, ...]
  country_ROMANIA integer [0, 0, 0, 0, 0, ...]
  country_RUSSIAN FEDERATION integer [0, 0, 0, 0, 0, ...]
  country_RWANDA integer [0, 0, 0, 0, 0, ...]
  country_SAINT HELENA integer [0, 0, 0, 0, 0, ...]
  country_SAINT LUCIA integer [0, 0, 0, 0, 0, ...]
  country_SAMOA integer [0, 0, 0, 0, 0, ...]
  country_SAO TOME & PRINCIPE integer [0, 0, 0, 0, 0, ...]
  country_SAUDI ARABIA integer [0, 0, 0, 0, 0, ...]
  country_SENEGAL integer [0, 0, 0, 0, 0, ...]
  country_SERBIA integer [0, 0, 0, 0, 0, ...]
  country_SEYCHELLES integer [0, 0, 0, 0, 0, ...]
  country_SIERRA LEONE integer [0, 0, 0, 0, 0, ...]
  country_SINGAPORE integer [0, 0, 0, 0, 0, ...]
  country_SLOVAKIA integer [0, 0, 0, 0, 0, ...]
  country_SLOVENIA integer [0, 0, 0, 0, 0, ...]
  country_SOLOMON ISLANDS integer [0, 0, 0, 0, 0, ...]
  country_SOMALIA integer [0, 0, 0, 0, 0, ...]
  country_SOUTH AFRICA integer [0, 0, 0, 0, 0, ...]
  country_SPAIN integer [0, 0, 0, 0, 0, ...]
  country_SRI LANKA integer [0, 0, 0, 0, 0, ...]
  country_ST. KITTS-NEVIS integer [0, 0, 0, 0, 0, ...]
  country_ST. PIERRE & MIQUELON integer [0, 0, 0, 0, 0, ...]
  country_ST. VINCENT & THE GRENADINES integer [0, 0, 0, 0, 0, ...]
  country_SUDAN integer [0, 0, 0, 0, 0, ...]
  country_SURINAME integer [0, 0, 0, 0, 0, ...]
  country_SWAZILAND integer [0, 0, 0, 0, 0, ...]
  country_SWEDEN integer [0, 0, 0, 0, 0, ...]
  country_SWITZERLAND integer [0, 0, 0, 0, 0, ...]
  country_SYRIAN ARAB REPUBLIC integer [0, 0, 0, 0, 0, ...]
  country_TAIWAN integer [0, 0, 0, 0, 0, ...]
  country_TAJIKISTAN integer [0, 0, 0, 0, 0, ...]
  country_THAILAND integer [0, 0, 0, 0, 0, ...]
  country_TIMOR-LESTE (FORMERLY EAST TIMOR) integer [0, 0, 0, 0, 0, ...]
  country_TOGO integer [0, 0, 0, 0, 0, ...]
  country_TONGA integer [0, 0, 0, 0, 0, ...]
  country_TRINIDAD AND TOBAGO integer [0, 0, 0, 0, 0, ...]
  country_TUNISIA integer [0, 0, 0, 0, 0, ...]
  country_TURKEY integer [0, 0, 0, 0, 0, ...]
  country_TURKMENISTAN integer [0, 0, 0, 0, 0, ...]
  country_TURKS AND CAICOS ISLANDS integer [0, 0, 0, 0, 0, ...]
  country_TUVALU integer [0, 0, 0, 0, 0, ...]
  country_UGANDA integer [0, 0, 0, 0, 0, ...]
  country_UKRAINE integer [0, 0, 0, 0, 0, ...]
  country_UNITED ARAB EMIRATES integer [0, 0, 0, 0, 0, ...]
  country_UNITED KINGDOM integer [0, 0, 0, 0, 0, ...]
  country_UNITED REPUBLIC OF TANZANIA integer [0, 0, 0, 0, 0, ...]
  country_UNITED STATES OF AMERICA integer [0, 0, 0, 0, 0, ...]
  country_URUGUAY integer [0, 0, 0, 0, 0, ...]
  country_UZBEKISTAN integer [0, 0, 0, 0, 0, ...]
  country_VANUATU integer [0, 0, 0, 0, 0, ...]
  country_VENEZUELA integer [0, 0, 0, 0, 0, ...]
  country_VIET NAM integer [0, 0, 0, 0, 0, ...]
  country_WALLIS AND FUTUNA ISLANDS integer [0, 0, 0, 0, 0, ...]
  country_YEMEN integer [0, 0, 0, 0, 0, ...]
  country_ZAMBIA integer [0, 0, 0, 0, 0, ...]
  country_ZIMBABWE integer [0, 0, 0, 0, 0, ...]
  country_BONAIRE, SAINT EUSTATIUS, AND SABA integer [0, 0, 0, 0, 0, ...]
  country_CURACAO integer [0, 0, 0, 0, 0, ...]
  country_REPUBLIC OF SOUTH SUDAN integer [0, 0, 0, 0, 0, ...]
  country_REPUBLIC OF SUDAN integer [0, 0, 0, 0, 0, ...]
  country_SAINT MARTIN (DUTCH PORTION) integer [0, 0, 0, 0, 0, ...]
>

Sampling

Random samples can give us a percent or a specific number of samples, with or without replacement, and the function is seedable.

DF.sample(df, 10)
#Explorer.DataFrame<
  Polars[10 x 10]
  year integer [2010, 2010, 2014, 2014, 2012, ...]
  country string ["REPUBLIC OF MOLDOVA", "SINGAPORE", "MONTSERRAT", "FRENCH POLYNESIA", "GREECE",
   ...]
  total integer [1345, 15174, 13, 219, 21828, ...]
  solid_fuel integer [89, 7, 0, 0, 8760, ...]
  liquid_fuel integer [593, 10661, 13, 219, 10099, ...]
  gas_fuel integer [541, 4506, 0, 0, 2287, ...]
  cement integer [122, 0, 0, 0, 681, ...]
  gas_flaring integer [0, 0, 0, 0, 0, ...]
  per_capita f64 [0.33, 2.99, 2.63, 0.78, 1.96, ...]
  bunker_fuels integer [11, 39109, 1, 45, 2545, ...]
>
DF.sample(df, 0.4)
#Explorer.DataFrame<
  Polars[437 x 10]
  year integer [2010, 2010, 2014, 2012, 2014, ...]
  country string ["AFGHANISTAN", "ALBANIA", "MEXICO", "MALAYSIA", "DOMINICAN REPUBLIC", ...]
  total integer [2308, 1254, 130971, 59642, 5874, ...]
  solid_fuel integer [627, 117, 13358, 16340, 765, ...]
  liquid_fuel integer [1601, 953, 73374, 20544, 3938, ...]
  gas_fuel integer [74, 7, 37801, 18092, 518, ...]
  cement integer [5, 177, 4760, 2955, 653, ...]
  gas_flaring integer [0, 0, 1678, 1712, 0, ...]
  per_capita f64 [0.08, 0.43, 1.04, 2.06, 0.56, ...]
  bunker_fuels integer [9, 7, 3300, 1509, 414, ...]
>

Trying for those helpful error messages again.

DF.sample(df, 10000)
DF.sample(df, 10000, replace: true)
#Explorer.DataFrame<
  Polars[10000 x 10]
  year integer [2012, 2012, 2010, 2013, 2013, ...]
  country string ["DOMINICA", "TIMOR-LESTE (FORMERLY EAST TIMOR)",
   "LAO PEOPLE S DEMOCRATIC REPUBLIC", "GRENADA", "VANUATU", ...]
  total integer [37, 80, 447, 83, 29, ...]
  solid_fuel integer [0, 0, 181, 0, 0, ...]
  liquid_fuel integer [37, 80, 104, 83, 29, ...]
  gas_fuel integer [0, 0, 0, 0, 0, ...]
  cement integer [0, 0, 163, 0, 0, ...]
  gas_flaring integer [0, 0, 0, 0, 0, ...]
  per_capita f64 [0.51, 0.07, 0.07, 0.78, 0.12, ...]
  bunker_fuels integer [2, 3, 9, 3, 9, ...]
>

Pull and slice

Slicing and dicing can be done with the Access protocol or with explicit pull/slice/take functions.

df["year"]
#Explorer.Series<
  Polars[1094]
  integer [2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
   2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
   2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
   2010, 2010, 2010, ...]
>
DF.pull(df, "year")
#Explorer.Series<
  Polars[1094]
  integer [2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
   2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
   2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
   2010, 2010, 2010, ...]
>
df[["year", "country"]]
#Explorer.DataFrame<
  Polars[1094 x 2]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
>
DF.slice(df, [1, 20, 50])
#Explorer.DataFrame<
  Polars[3 x 10]
  year integer [2010, 2010, 2010]
  country string ["ALBANIA", "BENIN", "DEMOCRATIC REPUBLIC OF THE CONGO (FORMERLY ZAIRE)"]
  total integer [1254, 1388, 551]
  solid_fuel integer [117, 0, 0]
  liquid_fuel integer [953, 1211, 471]
  gas_fuel integer [7, 0, 12]
  cement integer [177, 177, 67]
  gas_flaring integer [0, 0, 0]
  per_capita f64 [0.43, 0.15, 0.01]
  bunker_fuels integer [7, 127, 126]
>

Negative offsets work for slice!

DF.slice(df, -10, 5)
#Explorer.DataFrame<
  Polars[5 x 10]
  year integer [2014, 2014, 2014, 2014, 2014]
  country string ["UNITED STATES OF AMERICA", "URUGUAY", "UZBEKISTAN", "VANUATU", "VENEZUELA"]
  total integer [1432855, 1840, 28692, 42, 50510]
  solid_fuel integer [450047, 2, 1677, 0, 204]
  liquid_fuel integer [576531, 1700, 2086, 42, 28445]
  gas_fuel integer [390719, 25, 23929, 0, 12731]
  cement integer [11314, 112, 1000, 0, 1088]
  gas_flaring integer [4244, 0, 0, 0, 8042]
  per_capita f64 [4.43, 0.54, 0.97, 0.16, 1.65]
  bunker_fuels integer [30722, 251, 0, 10, 1256]
>
DF.slice(df, 10, 5)
#Explorer.DataFrame<
  Polars[5 x 10]
  year integer [2010, 2010, 2010, 2010, 2010]
  country string ["AUSTRALIA", "AUSTRIA", "AZERBAIJAN", "BAHAMAS", "BAHRAIN"]
  total integer [106589, 18408, 8366, 451, 7981]
  solid_fuel integer [56257, 3537, 6, 1, 0]
  liquid_fuel integer [31308, 9218, 2373, 450, 1123]
  gas_fuel integer [17763, 5073, 4904, 0, 6696]
  cement integer [1129, 579, 174, 0, 163]
  gas_flaring integer [132, 0, 909, 0, 0]
  per_capita f64 [4.81, 2.19, 0.92, 1.25, 6.33]
  bunker_fuels integer [3307, 575, 398, 179, 545]
>

Slice also works with ranges:

DF.slice(df, 12..42)
#Explorer.DataFrame<
  Polars[31 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AZERBAIJAN", "BAHAMAS", "BAHRAIN", "BANGLADESH", "BARBADOS", ...]
  total integer [8366, 451, 7981, 16345, 403, ...]
  solid_fuel integer [6, 1, 0, 839, 0, ...]
  liquid_fuel integer [2373, 450, 1123, 2881, 363, ...]
  gas_fuel integer [4904, 0, 6696, 10753, 8, ...]
  cement integer [174, 0, 163, 1873, 31, ...]
  gas_flaring integer [909, 0, 0, 0, 1, ...]
  per_capita f64 [0.92, 1.25, 6.33, 0.11, 1.44, ...]
  bunker_fuels integer [398, 179, 545, 313, 108, ...]
>

Pivot

We can pivot_longer/3 and pivot_wider/4. These are inspired by tidyr.

There are some shortcomings in pivot_wider/4 related to polars. The select option must select only columns of numeric type.

DF.pivot_longer(df, ["year", "country"], select: &String.ends_with?(&1, "fuel"))
DF.pivot_wider(df, "country", "total", id_columns: ["year"])
#Explorer.DataFrame<
  Polars[5 x 223]
  year integer [2010, 2011, 2012, 2013, 2014]
  AFGHANISTAN integer [2308, 3338, 2933, 2731, 2675]
  ALBANIA integer [1254, 1429, 1339, 1381, 1559]
  ALGERIA integer [32500, 33048, 35448, 36669, 39651]
  ANDORRA integer [141, 134, 133, 130, 126]
  ANGOLA integer [7924, 8274, 9108, 8895, 9480]
  ANGUILLA integer [41, 39, 39, 37, 39]
  ANTIGUA & BARBUDA integer [143, 140, 143, 143, 145]
  ARGENTINA integer [51246, 52259, 52456, 51773, 55638]
  ARMENIA integer [1150, 1341, 1553, 1499, 1508]
  ARUBA integer [684, 682, 368, 235, 238]
  AUSTRALIA integer [106589, 106850, 105843, 101518, 98517]
  AUSTRIA integer [18408, 17731, 16982, 17040, 16011]
  AZERBAIJAN integer [8366, 9121, 9696, 9720, 10223]
  BAHAMAS integer [451, 509, 537, 764, 659]
  BAHRAIN integer [7981, 7813, 7274, 8539, 8546]
  BANGLADESH integer [16345, 17293, 18409, 19010, 19959]
  BARBADOS integer [403, 417, 401, 395, 347]
  BELARUS integer [17192, 17470, 17241, 17390, 17316]
  BELGIUM integer [30222, 27255, 25936, 26444, 25457]
  BELIZE integer [147, 164, 130, 140, 135]
  BENIN integer [1388, 1444, 1492, 1585, 1723]
  BERMUDA integer [166, 121, 130, 125, 157]
  BHUTAN integer [133, 200, 223, 251, 273]
  BOSNIA & HERZEGOVINA integer [5802, 6514, 6070, 5978, 6063]
  BOTSWANA integer [1278, 1139, 1154, 1426, 1918]
  BRAZIL integer [114468, 119829, 128178, 137354, 144480]
  BRITISH VIRGIN ISLANDS integer [47, 48, 48, 48, 49]
  BRUNEI (DARUSSALAM) integer [2237, 2644, 2636, 2128, 2484]
  BULGARIA integer [12030, 13457, 12192, 10799, 11567]
  BURKINA FASO integer [535, 603, 717, 834, 777]
  BURUNDI integer [58, 66, 77, 79, 120]
  CAMBODIA integer [1367, 1420, 1488, 1528, 1823]
  CANADA integer [145806, 146472, 141112, 141031, 146494]
  CAPE VERDE integer [152, 168, 138, 136, 134]
  CAYMAN ISLANDS integer [152, 160, 146, 146, 148]
  CENTRAL AFRICAN REPUBLIC integer [72, 76, 80, 81, 82]
  CHAD integer [141, 147, 167, 191, 199]
  CHILE integer [19703, 21610, 22082, 22696, 22515]
  CHINA (MAINLAND) integer [2393248, 2654360, 2734817, 2797384, 2806634]
  COLOMBIA integer [20773, 20870, 21803, 24441, 22932]
  COMOROS integer [44, 37, 39, 48, 42]
  CONGO integer [540, 616, 810, 842, 844]
  COOK ISLANDS integer [19, 19, 19, 19, 19]
  COSTA RICA integer [2064, 2111, 2118, 2072, 2116]
  COTE D IVOIRE integer [1900, 1977, 2535, 2914, 3012]
  CROATIA integer [5501, 5402, 4907, 4786, 4593]
  CUBA integer [10465, 9814, 9860, 9490, 9500]
  CYPRUS integer [2102, 2025, 1887, 1622, 1653]
  CZECH REPUBLIC integer [30428, 29154, 27551, 26909, 26309]
  DEMOCRATIC PEOPLE S REPUBLIC OF KOREA integer [18122, 13099, 13378, 9820, 11052]
  DEMOCRATIC REPUBLIC OF THE CONGO (FORMERLY ZAIRE) integer [551, 680, 655, 979, 1274]
  DENMARK integer [12719, 11084, 9934, 10508, 9135]
  DJIBOUTI integer [141, 129, 141, 166, 197]
  DOMINICA integer [38, 35, 37, 36, 37]
  DOMINICAN REPUBLIC integer [5733, 5789, 6033, 5847, 5874]
  ECUADOR integer [9943, 10529, 10401, 11180, 11977]
  EGYPT integer [55281, 59221, 59195, 58198, 55057]
  EL SALVADOR integer [1761, 1813, 1817, 1699, 1714]
  EQUATORIAL GUINEA integer [1276, 1671, 1395, 1408, 1458]
  ERITREA integer [140, 162, 180, 182, 190]
  ESTONIA integer [4938, 5074, 4806, 5425, 5323]
  ETHIOPIA integer [1796, 2107, 2335, 2900, 3163]
  FAEROE ISLANDS integer [172, 155, 161, 185, 163]
  FALKLAND ISLANDS (MALVINAS) integer [15, 15, 15, 15, 15]
  FEDERATED STATES OF MICRONESIA integer [31, 33, 37, 39, 41]
  FIJI integer [333, 297, 289, 314, 319]
  FINLAND integer [16930, 15494, 13399, 12877, 12899]
  FRANCE (INCLUDING MONACO) integer [96273, 90484, 90872, 91109, 82704]
  FRENCH GUIANA integer [174, 175, 161, 172, 200]
  FRENCH POLYNESIA integer [234, 227, 222, 224, 219]
  GABON integer [1312, 1356, 1392, 1440, 1416]
  GAMBIA integer [118, 122, 124, 118, 140]
  GEORGIA integer [1722, 2174, 2302, 2143, 2451]
  GERMANY integer [206943, 199754, 201762, 206521, 196314]
  GHANA integer [2715, 2681, 3239, 3987, 3945]
  GIBRALTAR integer [127, 123, 126, 134, 144]
  GREECE integer [22868, 21773, 21828, 18948, 18358]
  GREENLAND integer [181, 193, 155, 151, 138]
  GRENADA integer [71, 69, 74, 83, 66]
  GUADELOUPE integer [627, 663, 694, 697, 700]
  GUATEMALA integer [3181, 3228, 3265, 3718, 4998]
  GUINEA integer [710, 758, 704, 627, 668]
  GUINEA BISSAU integer [65, 67, 69, 70, 74]
  GUYANA integer [469, 486, 544, 528, 548]
  HAITI integer [580, 605, 631, 656, 780]
  HONDURAS integer [2175, 2442, 2450, 2472, 2583]
  HONG KONG SPECIAL ADMINSTRATIVE REGION OF CHINA integer [11093, 11943, 11842, 12273, 12605]
  HUNGARY integer [13696, 13047, 12158, 11492, 11477]
  ICELAND integer [535, 513, 491, 518, 541]
  INDIA integer [468964, 502257, 550451, 554882, 610411]
  INDONESIA integer [116924, 164621, 173733, 133686, 126582]
  IRAQ integer [30596, 36647, 41648, 45134, 45935]
  IRELAND integer [10923, 9717, 9706, 9505, 9290]
  ISLAMIC REPUBLIC OF IRAN integer [156267, 160637, 166828, 169015, 177115]
  ISRAEL integer [18784, 18852, 20597, 18290, 17617]
  ITALY (INCLUDING SAN MARINO) integer [110543, 108534, 100755, 94169, 87377]
  JAMAICA integer [1990, 2143, 2035, 2207, 2024]
  JAPAN integer [319505, 324809, 335470, 339928, 331074]
  JORDAN integer [5776, 5909, 6666, 6651, 7213]
  KAZAKHSTAN integer [67780, 70646, 66259, 71679, 67716]
  KENYA integer [3320, 3670, 3413, 3636, 3896]
  KIRIBATI integer [17, 17, 17, 17, 17]
  KUWAIT integer [24441, 24824, 27907, 26819, 26018]
  KYRGYZSTAN integer [1741, 2088, 2763, 2684, 2620]
  LAO PEOPLE S DEMOCRATIC REPUBLIC integer [447, 443, 463, 430, 533]
  LATVIA integer [2202, 1989, 1926, 1931, 1902]
  LEBANON integer [5467, 5575, 6172, 6158, 6564]
  LESOTHO integer [621, 636, 656, 664, 673]
  LIBERIA integer [216, 243, 280, 261, 255]
  LIBYAN ARAB JAMAHIRIYAH integer [16897, 10827, 14367, 15344, 15543]
  LIECHTENSTEIN integer [15, 13, 13, 14, 12]
  LITHUANIA integer [3673, 3760, 3772, 3447, 3501]
  LUXEMBOURG integer [2991, 2983, 2908, 2741, 2634]
  MACAU SPECIAL ADMINSTRATIVE REGION OF CHINA integer [384, 395, 358, 322, 350]
  MACEDONIA integer [2346, 2563, 2445, 2140, 2048]
  MADAGASCAR integer [534, 638, 738, 850, 839]
  MALAWI integer [312, 322, 299, 334, 348]
  MALAYSIA integer [59579, 60105, 59642, 64497, 66218]
  MALDIVES integer [255, 269, 303, 298, 364]
  MALI integer [263, 285, 271, 280, 385]
  MALTA integer [698, 693, 731, 638, 640]
  MARSHALL ISLANDS integer [28, 28, 28, 28, 28]
  MARTINIQUE integer [548, 605, 604, 603, 627]
  MAURITANIA integer [610, 653, 724, 728, 739]
  MAURITIUS integer [1068, 1069, 1082, 1110, 1153]
  MEXICO integer [126618, 132105, 135349, 133717, 130971]
  MONGOLIA integer [3769, 5863, 7152, 10568, 5683]
  MONTENEGRO integer [704, 701, 637, 613, 603]
  MONTSERRAT integer [18, 11, 12, 14, 13]
  MOROCCO integer [15260, 15731, 17107, 16112, 16325]
  MOZAMBIQUE integer [746, 879, 851, 1096, 2298]
  MYANMAR (FORMERLY BURMA) integer [3413, 3899, 3019, 3507, 5899]
  NAMIBIA integer [846, 772, 923, 717, 1024]
  NAURU integer [12, 11, 11, 12, 13]
  NEPAL integer [1379, 1509, 1597, 1810, 2190]
  NETHERLAND ANTILLES integer [1244, 1587, nil, nil, nil]
  NETHERLANDS integer [49919, 47496, 46444, 47247, 45624]
  NEW CALEDONIA integer [966, 995, 990, 1157, 1170]
  NEW ZEALAND integer [8667, 8591, 9313, 9124, 9453]
  NICARAGUA integer [1237, 1331, 1260, 1241, 1326]
  NIGER integer [320, 362, 509, 529, 580]
  NIGERIA integer [24957, 26096, 26862, 26762, 26256]
  NIUE integer [1, 2, 2, 2, 3]
  NORWAY integer [16391, 12325, 13605, 15861, 12988]
  OCCUPIED PALESTINIAN TERRITORY integer [555, 613, 600, 665, 774]
  OMAN integer [12931, 14734, 16133, 16738, 16681]
  PAKISTAN integer [44013, 44166, 44586, 44812, 45350]
  PALAU integer [69, 69, 69, 70, 71]
  PANAMA integer [2499, 2754, 2758, 2923, 2400]
  PAPUA NEW GUINEA integer [1299, 1453, 1385, 1687, 1723]
  PARAGUAY integer [1390, 1451, 1441, 1482, 1555]
  PERU integer [15706, 13535, 15018, 15586, 16838]
  PHILIPPINES integer [23144, 23315, 24872, 26760, 28812]
  PLURINATIONAL STATE OF BOLIVIA integer [4146, 4403, 5125, 5159, 5566]
  POLAND integer [86246, 86446, 81792, 82432, 77922]
  PORTUGAL integer [13127, 12987, 12548, 12388, 12286]
  QATAR integer [19773, 21935, 25668, 23186, 29412]
  REPUBLIC OF CAMEROON integer [1849, 1573, 1671, 1847, 1910]
  REPUBLIC OF KOREA integer [154545, 160731, 159249, 161576, 160119]
  REPUBLIC OF MOLDOVA integer [1345, 1374, 1343, 1363, 1345]
  REUNION integer [1137, 1165, 1159, 1118, 1138]
  ROMANIA integer [21656, 23147, 22286, 19347, 19090]
  RUSSIAN FEDERATION integer [455558, 480885, 499272, 485018, 465052]
  RWANDA integer [161, 181, 201, 219, 229]
  SAINT HELENA integer [3, 3, 3, 3, 3]
  SAINT LUCIA integer [110, 111, 111, 111, 111]
  SAMOA integer [51, 55, 54, 54, 54]
  SAO TOME & PRINCIPE integer [27, 28, 31, 31, 31]
  SAUDI ARABIA integer [141394, 136318, 154034, 147545, 163907]
  SENEGAL integer [2112, 2282, 2158, 2297, 2415]
  SERBIA integer [12532, 13422, 12016, 12240, 10272]
  SEYCHELLES integer [121, 93, 120, 110, 135]
  SIERRA LEONE integer [198, 245, 281, 325, 357]
  SINGAPORE integer [15174, 12332, 9919, 15183, 15373]
  SLOVAKIA integer [9883, 9415, 8935, 9024, 8366]
  SLOVENIA integer [4182, 4115, 4031, 3859, 3494]
  SOLOMON ISLANDS integer [54, 54, 54, 55, 55]
  SOMALIA integer [167, 165, 166, 166, 166]
  SOUTH AFRICA integer [129288, 128329, 127835, 127182, 133562]
  SPAIN integer [73878, 73779, 72206, 64640, 63806]
  SRI LANKA integer [3617, 4128, 4372, 4224, 5016]
  ST. KITTS-NEVIS integer [60, 63, 60, 61, 63]
  ST. PIERRE & MIQUELON integer [19, 19, 19, 20, 21]
  ST. VINCENT & THE GRENADINES integer [60, 54, 69, 57, 57]
  SUDAN integer [4347, 4270, nil, nil, nil]
  SURINAME integer [655, 537, 616, 523, 543]
  SWAZILAND integer [283, 286, 329, 297, 328]
  SWEDEN integer [14187, 14108, 12830, 12230, 11841]
  SWITZERLAND integer [10634, 10081, 10301, 10970, 9628]
  SYRIAN ARAB REPUBLIC integer [16800, 15519, 12198, 9937, 8373]
  TAIWAN integer [73629, 73406, 70393, 71022, 72013]
  TAJIKISTAN integer [694, 641, 800, 949, 1415]
  THAILAND integer [76882, 75898, 80883, 81835, 86232]
  TIMOR-LESTE (FORMERLY EAST TIMOR) integer [64, 67, 80, 120, 128]
  TOGO integer [720, 672, 678, 725, 715]
  TONGA integer [32, 28, 29, 31, 33]
  TRINIDAD AND TOBAGO integer [13072, 12799, 12386, 12692, 12619]
  TUNISIA integer [7543, 7096, 7364, 7545, 7862]
  TURKEY integer [81266, 87494, 89872, 88566, 94350]
  TURKMENISTAN integer [15623, 17035, 17691, 18199, 18659]
  TURKS AND CAICOS ISLANDS integer [52, 52, 54, 54, 56]
  TUVALU integer [2, 2, 3, 3, 3]
  UGANDA integer [1069, 1163, 1110, 1328, 1426]
  UKRAINE integer [83077, 78100, 80663, 74141, 61985]
  UNITED ARAB EMIRATES integer [43854, 45116, 48101, 46552, 57641]
  UNITED KINGDOM integer [134499, 122124, 127781, 124966, 114486]
  UNITED REPUBLIC OF TANZANIA integer [1938, 2207, 2603, 3048, 3153]
  UNITED STATES OF AMERICA integer [1471375, 1442509, 1396083, 1406916, 1432855]
  URUGUAY integer [1742, 2117, 2371, 2069, 1840]
  UZBEKISTAN integer [28407, 31002, 31583, 28185, 28692]
  VANUATU integer [33, 36, 31, 29, 42]
  VENEZUELA integer [51560, 48220, 54204, 50156, 50510]
  VIET NAM integer [38925, 41497, 38784, 40150, 45517]
  WALLIS AND FUTUNA ISLANDS integer [8, 7, 7, 6, 6]
  YEMEN integer [6390, 5363, 5091, 6953, 6190]
  ZAMBIA integer [734, 801, 1000, 1079, 1228]
  ZIMBABWE integer [2121, 2608, 2125, 3184, 3278]
  BONAIRE, SAINT EUSTATIUS, AND SABA integer [nil, nil, 85, 88, 88]
  CURACAO integer [nil, nil, 1636, 1422, 1604]
  REPUBLIC OF SOUTH SUDAN integer [nil, nil, 363, 395, 408]
  REPUBLIC OF SUDAN integer [nil, nil, 3993, 4220, 4190]
  SAINT MARTIN (DUTCH PORTION) integer [nil, nil, 190, 195, 200]
>

Let's make those names look nicer!

tidy_names = fn name ->
  name
  |> String.downcase()
  |> String.replace(~r/\s/, " ")
  |> String.replace(~r/[^A-Za-z\s]/, "")
  |> String.replace(" ", "_")
end

df
|> DF.pivot_wider("country", "total", id_columns: ["year"])
|> DF.rename_with(tidy_names)
#Explorer.DataFrame<
  Polars[5 x 223]
  year integer [2010, 2011, 2012, 2013, 2014]
  afghanistan integer [2308, 3338, 2933, 2731, 2675]
  albania integer [1254, 1429, 1339, 1381, 1559]
  algeria integer [32500, 33048, 35448, 36669, 39651]
  andorra integer [141, 134, 133, 130, 126]
  angola integer [7924, 8274, 9108, 8895, 9480]
  anguilla integer [41, 39, 39, 37, 39]
  antigua__barbuda integer [143, 140, 143, 143, 145]
  argentina integer [51246, 52259, 52456, 51773, 55638]
  armenia integer [1150, 1341, 1553, 1499, 1508]
  aruba integer [684, 682, 368, 235, 238]
  australia integer [106589, 106850, 105843, 101518, 98517]
  austria integer [18408, 17731, 16982, 17040, 16011]
  azerbaijan integer [8366, 9121, 9696, 9720, 10223]
  bahamas integer [451, 509, 537, 764, 659]
  bahrain integer [7981, 7813, 7274, 8539, 8546]
  bangladesh integer [16345, 17293, 18409, 19010, 19959]
  barbados integer [403, 417, 401, 395, 347]
  belarus integer [17192, 17470, 17241, 17390, 17316]
  belgium integer [30222, 27255, 25936, 26444, 25457]
  belize integer [147, 164, 130, 140, 135]
  benin integer [1388, 1444, 1492, 1585, 1723]
  bermuda integer [166, 121, 130, 125, 157]
  bhutan integer [133, 200, 223, 251, 273]
  bosnia__herzegovina integer [5802, 6514, 6070, 5978, 6063]
  botswana integer [1278, 1139, 1154, 1426, 1918]
  brazil integer [114468, 119829, 128178, 137354, 144480]
  british_virgin_islands integer [47, 48, 48, 48, 49]
  brunei_darussalam integer [2237, 2644, 2636, 2128, 2484]
  bulgaria integer [12030, 13457, 12192, 10799, 11567]
  burkina_faso integer [535, 603, 717, 834, 777]
  burundi integer [58, 66, 77, 79, 120]
  cambodia integer [1367, 1420, 1488, 1528, 1823]
  canada integer [145806, 146472, 141112, 141031, 146494]
  cape_verde integer [152, 168, 138, 136, 134]
  cayman_islands integer [152, 160, 146, 146, 148]
  central_african_republic integer [72, 76, 80, 81, 82]
  chad integer [141, 147, 167, 191, 199]
  chile integer [19703, 21610, 22082, 22696, 22515]
  china_mainland integer [2393248, 2654360, 2734817, 2797384, 2806634]
  colombia integer [20773, 20870, 21803, 24441, 22932]
  comoros integer [44, 37, 39, 48, 42]
  congo integer [540, 616, 810, 842, 844]
  cook_islands integer [19, 19, 19, 19, 19]
  costa_rica integer [2064, 2111, 2118, 2072, 2116]
  cote_d_ivoire integer [1900, 1977, 2535, 2914, 3012]
  croatia integer [5501, 5402, 4907, 4786, 4593]
  cuba integer [10465, 9814, 9860, 9490, 9500]
  cyprus integer [2102, 2025, 1887, 1622, 1653]
  czech_republic integer [30428, 29154, 27551, 26909, 26309]
  democratic_people_s_republic_of_korea integer [18122, 13099, 13378, 9820, 11052]
  democratic_republic_of_the_congo_formerly_zaire integer [551, 680, 655, 979, 1274]
  denmark integer [12719, 11084, 9934, 10508, 9135]
  djibouti integer [141, 129, 141, 166, 197]
  dominica integer [38, 35, 37, 36, 37]
  dominican_republic integer [5733, 5789, 6033, 5847, 5874]
  ecuador integer [9943, 10529, 10401, 11180, 11977]
  egypt integer [55281, 59221, 59195, 58198, 55057]
  el_salvador integer [1761, 1813, 1817, 1699, 1714]
  equatorial_guinea integer [1276, 1671, 1395, 1408, 1458]
  eritrea integer [140, 162, 180, 182, 190]
  estonia integer [4938, 5074, 4806, 5425, 5323]
  ethiopia integer [1796, 2107, 2335, 2900, 3163]
  faeroe_islands integer [172, 155, 161, 185, 163]
  falkland_islands_malvinas integer [15, 15, 15, 15, 15]
  federated_states_of_micronesia integer [31, 33, 37, 39, 41]
  fiji integer [333, 297, 289, 314, 319]
  finland integer [16930, 15494, 13399, 12877, 12899]
  france_including_monaco integer [96273, 90484, 90872, 91109, 82704]
  french_guiana integer [174, 175, 161, 172, 200]
  french_polynesia integer [234, 227, 222, 224, 219]
  gabon integer [1312, 1356, 1392, 1440, 1416]
  gambia integer [118, 122, 124, 118, 140]
  georgia integer [1722, 2174, 2302, 2143, 2451]
  germany integer [206943, 199754, 201762, 206521, 196314]
  ghana integer [2715, 2681, 3239, 3987, 3945]
  gibraltar integer [127, 123, 126, 134, 144]
  greece integer [22868, 21773, 21828, 18948, 18358]
  greenland integer [181, 193, 155, 151, 138]
  grenada integer [71, 69, 74, 83, 66]
  guadeloupe integer [627, 663, 694, 697, 700]
  guatemala integer [3181, 3228, 3265, 3718, 4998]
  guinea integer [710, 758, 704, 627, 668]
  guinea_bissau integer [65, 67, 69, 70, 74]
  guyana integer [469, 486, 544, 528, 548]
  haiti integer [580, 605, 631, 656, 780]
  honduras integer [2175, 2442, 2450, 2472, 2583]
  hong_kong_special_adminstrative_region_of_china integer [11093, 11943, 11842, 12273, 12605]
  hungary integer [13696, 13047, 12158, 11492, 11477]
  iceland integer [535, 513, 491, 518, 541]
  india integer [468964, 502257, 550451, 554882, 610411]
  indonesia integer [116924, 164621, 173733, 133686, 126582]
  iraq integer [30596, 36647, 41648, 45134, 45935]
  ireland integer [10923, 9717, 9706, 9505, 9290]
  islamic_republic_of_iran integer [156267, 160637, 166828, 169015, 177115]
  israel integer [18784, 18852, 20597, 18290, 17617]
  italy_including_san_marino integer [110543, 108534, 100755, 94169, 87377]
  jamaica integer [1990, 2143, 2035, 2207, 2024]
  japan integer [319505, 324809, 335470, 339928, 331074]
  jordan integer [5776, 5909, 6666, 6651, 7213]
  kazakhstan integer [67780, 70646, 66259, 71679, 67716]
  kenya integer [3320, 3670, 3413, 3636, 3896]
  kiribati integer [17, 17, 17, 17, 17]
  kuwait integer [24441, 24824, 27907, 26819, 26018]
  kyrgyzstan integer [1741, 2088, 2763, 2684, 2620]
  lao_people_s_democratic_republic integer [447, 443, 463, 430, 533]
  latvia integer [2202, 1989, 1926, 1931, 1902]
  lebanon integer [5467, 5575, 6172, 6158, 6564]
  lesotho integer [621, 636, 656, 664, 673]
  liberia integer [216, 243, 280, 261, 255]
  libyan_arab_jamahiriyah integer [16897, 10827, 14367, 15344, 15543]
  liechtenstein integer [15, 13, 13, 14, 12]
  lithuania integer [3673, 3760, 3772, 3447, 3501]
  luxembourg integer [2991, 2983, 2908, 2741, 2634]
  macau_special_adminstrative_region_of_china integer [384, 395, 358, 322, 350]
  macedonia integer [2346, 2563, 2445, 2140, 2048]
  madagascar integer [534, 638, 738, 850, 839]
  malawi integer [312, 322, 299, 334, 348]
  malaysia integer [59579, 60105, 59642, 64497, 66218]
  maldives integer [255, 269, 303, 298, 364]
  mali integer [263, 285, 271, 280, 385]
  malta integer [698, 693, 731, 638, 640]
  marshall_islands integer [28, 28, 28, 28, 28]
  martinique integer [548, 605, 604, 603, 627]
  mauritania integer [610, 653, 724, 728, 739]
  mauritius integer [1068, 1069, 1082, 1110, 1153]
  mexico integer [126618, 132105, 135349, 133717, 130971]
  mongolia integer [3769, 5863, 7152, 10568, 5683]
  montenegro integer [704, 701, 637, 613, 603]
  montserrat integer [18, 11, 12, 14, 13]
  morocco integer [15260, 15731, 17107, 16112, 16325]
  mozambique integer [746, 879, 851, 1096, 2298]
  myanmar_formerly_burma integer [3413, 3899, 3019, 3507, 5899]
  namibia integer [846, 772, 923, 717, 1024]
  nauru integer [12, 11, 11, 12, 13]
  nepal integer [1379, 1509, 1597, 1810, 2190]
  netherland_antilles integer [1244, 1587, nil, nil, nil]
  netherlands integer [49919, 47496, 46444, 47247, 45624]
  new_caledonia integer [966, 995, 990, 1157, 1170]
  new_zealand integer [8667, 8591, 9313, 9124, 9453]
  nicaragua integer [1237, 1331, 1260, 1241, 1326]
  niger integer [320, 362, 509, 529, 580]
  nigeria integer [24957, 26096, 26862, 26762, 26256]
  niue integer [1, 2, 2, 2, 3]
  norway integer [16391, 12325, 13605, 15861, 12988]
  occupied_palestinian_territory integer [555, 613, 600, 665, 774]
  oman integer [12931, 14734, 16133, 16738, 16681]
  pakistan integer [44013, 44166, 44586, 44812, 45350]
  palau integer [69, 69, 69, 70, 71]
  panama integer [2499, 2754, 2758, 2923, 2400]
  papua_new_guinea integer [1299, 1453, 1385, 1687, 1723]
  paraguay integer [1390, 1451, 1441, 1482, 1555]
  peru integer [15706, 13535, 15018, 15586, 16838]
  philippines integer [23144, 23315, 24872, 26760, 28812]
  plurinational_state_of_bolivia integer [4146, 4403, 5125, 5159, 5566]
  poland integer [86246, 86446, 81792, 82432, 77922]
  portugal integer [13127, 12987, 12548, 12388, 12286]
  qatar integer [19773, 21935, 25668, 23186, 29412]
  republic_of_cameroon integer [1849, 1573, 1671, 1847, 1910]
  republic_of_korea integer [154545, 160731, 159249, 161576, 160119]
  republic_of_moldova integer [1345, 1374, 1343, 1363, 1345]
  reunion integer [1137, 1165, 1159, 1118, 1138]
  romania integer [21656, 23147, 22286, 19347, 19090]
  russian_federation integer [455558, 480885, 499272, 485018, 465052]
  rwanda integer [161, 181, 201, 219, 229]
  saint_helena integer [3, 3, 3, 3, 3]
  saint_lucia integer [110, 111, 111, 111, 111]
  samoa integer [51, 55, 54, 54, 54]
  sao_tome__principe integer [27, 28, 31, 31, 31]
  saudi_arabia integer [141394, 136318, 154034, 147545, 163907]
  senegal integer [2112, 2282, 2158, 2297, 2415]
  serbia integer [12532, 13422, 12016, 12240, 10272]
  seychelles integer [121, 93, 120, 110, 135]
  sierra_leone integer [198, 245, 281, 325, 357]
  singapore integer [15174, 12332, 9919, 15183, 15373]
  slovakia integer [9883, 9415, 8935, 9024, 8366]
  slovenia integer [4182, 4115, 4031, 3859, 3494]
  solomon_islands integer [54, 54, 54, 55, 55]
  somalia integer [167, 165, 166, 166, 166]
  south_africa integer [129288, 128329, 127835, 127182, 133562]
  spain integer [73878, 73779, 72206, 64640, 63806]
  sri_lanka integer [3617, 4128, 4372, 4224, 5016]
  st_kittsnevis integer [60, 63, 60, 61, 63]
  st_pierre__miquelon integer [19, 19, 19, 20, 21]
  st_vincent__the_grenadines integer [60, 54, 69, 57, 57]
  sudan integer [4347, 4270, nil, nil, nil]
  suriname integer [655, 537, 616, 523, 543]
  swaziland integer [283, 286, 329, 297, 328]
  sweden integer [14187, 14108, 12830, 12230, 11841]
  switzerland integer [10634, 10081, 10301, 10970, 9628]
  syrian_arab_republic integer [16800, 15519, 12198, 9937, 8373]
  taiwan integer [73629, 73406, 70393, 71022, 72013]
  tajikistan integer [694, 641, 800, 949, 1415]
  thailand integer [76882, 75898, 80883, 81835, 86232]
  timorleste_formerly_east_timor integer [64, 67, 80, 120, 128]
  togo integer [720, 672, 678, 725, 715]
  tonga integer [32, 28, 29, 31, 33]
  trinidad_and_tobago integer [13072, 12799, 12386, 12692, 12619]
  tunisia integer [7543, 7096, 7364, 7545, 7862]
  turkey integer [81266, 87494, 89872, 88566, 94350]
  turkmenistan integer [15623, 17035, 17691, 18199, 18659]
  turks_and_caicos_islands integer [52, 52, 54, 54, 56]
  tuvalu integer [2, 2, 3, 3, 3]
  uganda integer [1069, 1163, 1110, 1328, 1426]
  ukraine integer [83077, 78100, 80663, 74141, 61985]
  united_arab_emirates integer [43854, 45116, 48101, 46552, 57641]
  united_kingdom integer [134499, 122124, 127781, 124966, 114486]
  united_republic_of_tanzania integer [1938, 2207, 2603, 3048, 3153]
  united_states_of_america integer [1471375, 1442509, 1396083, 1406916, 1432855]
  uruguay integer [1742, 2117, 2371, 2069, 1840]
  uzbekistan integer [28407, 31002, 31583, 28185, 28692]
  vanuatu integer [33, 36, 31, 29, 42]
  venezuela integer [51560, 48220, 54204, 50156, 50510]
  viet_nam integer [38925, 41497, 38784, 40150, 45517]
  wallis_and_futuna_islands integer [8, 7, 7, 6, 6]
  yemen integer [6390, 5363, 5091, 6953, 6190]
  zambia integer [734, 801, 1000, 1079, 1228]
  zimbabwe integer [2121, 2608, 2125, 3184, 3278]
  bonaire_saint_eustatius_and_saba integer [nil, nil, 85, 88, 88]
  curacao integer [nil, nil, 1636, 1422, 1604]
  republic_of_south_sudan integer [nil, nil, 363, 395, 408]
  republic_of_sudan integer [nil, nil, 3993, 4220, 4190]
  saint_martin_dutch_portion integer [nil, nil, 190, 195, 200]
>

Joins

Joining is fast and easy. You can specify the columns to join on and how to join. Polars even supports cartesian (cross) joins, so Explorer does too.

df1 = DF.select(df, ["year", "country", "total"])
df2 = DF.select(df, ["year", "country", "cement"])

DF.join(df1, df2)
#Explorer.DataFrame<
  Polars[1094 x 4]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
>
df3 = df |> DF.select(["year", "cement"]) |> DF.slice(0, 500)

DF.join(df1, df3, how: :left)
#Explorer.DataFrame<
  Polars[109138 x 4]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "AFGHANISTAN", "AFGHANISTAN", "AFGHANISTAN", "AFGHANISTAN", ...]
  total integer [2308, 2308, 2308, 2308, 2308, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
>

Grouping

Explorer supports groupby operations. They're limited based on what's possible in Polars, but they do most of what you need to do.

grouped = DF.group_by(df, ["country"])
#Explorer.DataFrame<
  Polars[1094 x 10]
  Groups: ["country"]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

Notice that the Inspect call now shows groups as well as rows and columns. You can, of course, get them explicitly.

DF.groups(grouped)
["country"]

And you can ungroup explicitly.

DF.ungroup(grouped)
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita f64 [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

But what we care about the most is aggregating! Let's see which country has the max per_capita value.

grouped
|> DF.summarise(max_per_capita: max(per_capita))
|> DF.arrange(desc: max_per_capita)
#Explorer.DataFrame<
  Polars[222 x 2]
  country string ["QATAR", "CURACAO", "TRINIDAD AND TOBAGO", "KUWAIT", "NETHERLAND ANTILLES", ...]
  max_per_capita f64 [13.54, 10.72, 9.84, 8.16, 7.45, ...]
>

Qatar it is.

You may noticed that we are using max/1 inside the summarise macro. This is possible because we expose all functions from the Series module. You can use the following aggregations inside summarise:

The API is similar to mutate: you can use keyword args or a map and specify aggregations to use.

DF.summarise(grouped, min_per_capita: min(per_capita), min_total: min(total))
#Explorer.DataFrame<
  Polars[222 x 3]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  min_per_capita f64 [0.08, 0.43, 0.9, 1.63, 0.37, ...]
  min_total integer [2308, 1254, 32500, 126, 7924, ...]
>

Speaking of mutate, it's 'group-aware'. As are arrange, distinct, and n_rows.

DF.mutate(grouped, total_window_sum: window_sum(total, 3), rows_in_group: count(country))
#Explorer.DataFrame<
  Polars[1094 x 12]
  Groups: ["country"]
  year integer [2010, 2011, 2012, 2013, 2014, ...]
  country string ["AFGHANISTAN", "AFGHANISTAN", "AFGHANISTAN", "AFGHANISTAN", "AFGHANISTAN", ...]
  total integer [2308, 3338, 2933, 2731, 2675, ...]
  solid_fuel integer [627, 1174, 1000, 1075, 1194, ...]
  liquid_fuel integer [1601, 2075, 1844, 1568, 1393, ...]
  gas_fuel integer [74, 84, 84, 81, 74, ...]
  cement integer [5, 5, 5, 7, 14, ...]
  gas_flaring integer [0, 0, 0, 0, 0, ...]
  per_capita f64 [0.08, 0.12, 0.1, 0.09, 0.08, ...]
  bunker_fuels integer [9, 9, 9, 9, 9, ...]
  total_window_sum integer [2308, 5646, 8579, 9002, 8339, ...]
  rows_in_group integer [5, 5, 5, 5, 5, ...]
>

It's also possible to use aggregations inside other functions:

grouped
|> DF.summarise(greater_than_9: greater(max(per_capita), 9.0), per_capita_max: max(per_capita))
|> DataFrame.arrange(desc: per_capita_max)

That's it!

And not. This is certainly not exhaustive, but I hope it gives you a good idea of what can be done and what the 'flavour' of the API is like. I'd love contributions and issues raised where you find them!