View Source Explorer.DataFrame (Explorer v0.3.1)

The DataFrame struct and API.

Dataframes are two-dimensional tabular data structures similar to a spreadsheet. For example, the Iris dataset:

iex> Explorer.Datasets.iris()
#Explorer.DataFrame<
  Polars[150 x 5]
  sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, ...]
  sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, ...]
  petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, ...]
  petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, ...]
  species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", ...]
>

This dataframe has 150 rows and five columns. Each column is an Explorer.Series of the same size (150):

iex> df = Explorer.Datasets.iris()
iex> df["sepal_length"]
#Explorer.Series<
  float[150]
  [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, ...]
>

creating-dataframes
Creating dataframes

Dataframes can be created from normal Elixir objects. The main way you might do this is with the new/1 function. For example:

iex> Explorer.DataFrame.new(a: ["a", "b"], b: [1, 2])
#Explorer.DataFrame<
  Polars[2 x 2]
  a string ["a", "b"]
  b integer [1, 2]
>

Or with a list of maps:

iex> Explorer.DataFrame.new([%{"col1" => "a", "col2" => 1}, %{"col1" => "b", "col2" => 2}])
#Explorer.DataFrame<
  Polars[2 x 2]
  col1 string ["a", "b"]
  col2 integer [1, 2]
>

verbs
Verbs

Explorer uses the idea of a consistent set of SQL-like verbs like dplyr which can help solve common data manipulation challenges. These are split into single table verbs, multiple table verbs, and row-based verbs:

single-table-verbs
Single table verbs

Single table verbs are (unsurprisingly) used for manipulating a single dataframe. Those operations typically driven by column names. These are:

select/3 for picking variables
filter/2 for picking rows based on predicates
mutate/2 for adding or replacing columns that are functions of existing columns
arrange/2 for changing the ordering of rows
distinct/2 for picking unique rows
summarise/2 for reducing multiple rows down to a single summary
pivot_longer/3 and pivot_wider/4 for massaging dataframes into longer or wider forms, respectively

Each of these combine with Explorer.DataFrame.group_by/2 for operating by group.

For more flexibility we also have functions that accept callback functions:

filter_with/2
summarise_with/2
mutate_with/2
arrange_with/2

Those functions work by having a "lazy" representation of the dataframe and series, which adds the possibility to perform complex operations that are optimized by the backend.

multiple-table-verbs
Multiple table verbs

Multiple table verbs are used for combining tables. These are:

join/3 for performing SQL-like joins
concat_rows/1 for vertically "stacking" dataframes
concat_columns/1 for horizontally "stacking" dataframes

row-based-verbs
Row-based verbs

Those operations are driven by the row index. These are:

head/2 for picking the first rows
tail/2 for picking the last rows
slice/2 for slicing the dataframe by row indexes or a range
slice/3 for slicing a section by an offset
sample/2 for sampling the data-frame by row

io
IO

Explorer supports reading and writing of:

delimited files (such as CSV)
Parquet
Arrow IPC
Newline Delimited JSON

The convention Explorer uses is to have from_* and to_* functions to read and write to files in the formats above. load_* and dump_* versions are also available to read and write those formats directly in memory.

access
Access

In addition to this "grammar" of data manipulation, you'll find useful functions for slicing and dicing dataframes such as pull/2, head/2, sample/3, slice/2, and slice/3.

Explorer.DataFrame also implements the Access behaviour (also known as the brackets syntax). This should be familiar for users coming from other language with dataframes such as R or Python. For example:

iex> df = Explorer.Datasets.wine()
iex> df["class"]
#Explorer.Series<
  integer[178]
  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
>

Link to this section Summary

Types

column()

Represents a column name or its index.

column_name()

Represents a column name as atom or string.

column_names()

Represents multiple column names as atoms or strings.

column_pairs(value)

Represents a column pair where the value is a column name or a column index, and the value is of type value.

columns()

Represents multiple columns.

t()

Represents a dataframe.

Functions: Single-table

arrange(df, columns_or_column)

Arranges/sorts rows by columns.

arrange_with(df, fun)

Arranges/sorts rows by columns using a callback function.

collect(df)

This collects the lazy data frame into an eager one, computing the query.

distinct(df, columns \\ 0..-1//1, opts \\ [])

Takes distinct rows by a selection of columns.

drop_nil(df, columns_or_column \\ 0..-1)

Drop nil values.

dummies(df, columns_or_column)

Turns a set of columns to dummy variables.

filter_with(df, fun)

Picks rows based on a callback function.

group_by(df, groups)

Group the dataframe by one or more variables.

mask(df, mask)

Picks rows based on a list or series of values.

mutate(df, columns)

Creates and modifies columns.

mutate_with(df, fun)

Creates or modifies columns using a callback function.

new(data, opts \\ [])

Creates a new dataframe.

pivot_longer(df, columns_to_pivot, opts \\ [])

Pivot data from wide to long.

pivot_wider(df, names_from, values_from, opts \\ [])

Pivot data from long to wide.

pull(df, column)

Extracts a single column as a series.

rename(df, names)

Renames columns.

rename_with(df, columns \\ 0..-1//1, callback)

Renames columns with a function.

select(df, columns_or_column, keep_or_drop \\ :keep)

Selects a subset of columns by name.

summarise(df, columns)

Summarise each group to a single row.

summarise_with(df, fun)

Summarise each group to a single row using a callback function.

table(df, opts \\ [])

Display the DataFrame in a tabular fashion.

to_columns(df, opts \\ [])

Converts a dataframe to a list of columns with lists as values.

to_lazy(df)

Converts the dataframe to the lazy version of the current backend.

to_rows(df, opts \\ [])

Converts a dataframe to a list of maps (rows).

to_series(df, opts \\ [])

Converts a dataframe to a list of columns with series as values.

ungroup(df, groups \\ :all)

Removes grouping variables.

Functions: Multi-table

concat_columns(dfs)

Combine two or more dataframes column-wise.

concat_columns(df1, df2)

Combine two dataframes column-wise.

concat_rows(dfs)

Combine two or more dataframes row-wise (stack).

concat_rows(df1, df2)

Combine two dataframes row-wise.

join(left, right, opts \\ [])

Join two tables.

Functions: Row-based

head(df, nrows \\ 5)

Returns the first n rows of the dataframe.

sample(df, n_or_frac, opts \\ [])

Sample rows from a dataframe.

slice(df, row_indices)

Subset rows with a list of indices or a range.

slice(df, offset, length)

Subset a continuous set of rows.

tail(df, nrows \\ 5)

Returns the last n rows of the dataframe.

Functions: Introspection

dtypes(df)

Gets the dtypes of the dataframe columns.

groups(data_frame)

Returns the groups of a dataframe.

n_columns(df)

Returns the number of columns in the dataframe.

n_rows(df)

Returns the number of rows in the dataframe.

names(df)

Gets the names of the dataframe columns.

shape(df)

Gets the shape of the dataframe as a {height, width} tuple.

Functions: IO

dump_csv(df, opts \\ [])

Writes a dataframe to a binary representation of a delimited file.

from_csv!(filename, opts \\ [])

Similar to from_csv/2 but raises if there is a problem reading the CSV.

from_csv(filename, opts \\ [])

Reads a delimited file into a dataframe.

from_ipc!(filename, opts \\ [])

Similar to from_ipc/2 but raises if there is a problem reading the IPC file.

from_ipc(filename, opts \\ [])

Reads an IPC file into a dataframe.

from_ndjson(filename, opts \\ [])

Read a file of JSON objects or lists separated by new lines

from_parquet(filename, opts \\ [])

Reads a parquet file into a dataframe.

to_csv!(df, filename, opts \\ [])

Similar to to_csv/3 but raises if there is a problem reading the CSV.

to_csv(df, filename, opts \\ [])

Writes a dataframe to a delimited file.

to_ipc(df, filename, opts \\ [])

Writes a dataframe to a IPC file.

to_ndjson(df, filename)

Writes a dataframe to a ndjson file.

to_parquet(df, filename, opts \\ [])

Writes a dataframe to a parquet file.

Functions

filter(df, callback)

Link to this section Types

column()

@type column() :: column_name() | non_neg_integer()

Represents a column name or its index.

column_name()

@type column_name() :: atom() | String.t()

Represents a column name as atom or string.

column_names()

@type column_names() :: [column_name()]

Represents multiple column names as atoms or strings.

column_pairs(value)

@type column_pairs(value) :: [{column(), value}] | %{required(column()) => value}

Represents a column pair where the value is a column name or a column index, and the value is of type value.

columns()

@type columns() ::
  [column()]
  | Range.t()
  | (String.t() -> boolean())
  | (String.t(), Explorer.Series.dtype() -> boolean())

Represents multiple columns.

The columns may be specified as one of:

a list of columns indexes or names as atoms and strings
a range
a one-arity function that receives column names and returns true for column names to keep
a two-arity function that receives column names and types and returns true for column names to keep

t()

@type t() :: %Explorer.DataFrame{
  data: Explorer.Backend.DataFrame.t(),
  dtypes: %{required(String.t()) => Explorer.Series.dtype()},
  groups: [String.t()],
  names: [String.t()]
}

Represents a dataframe.

Link to this section Functions: Single-table

arrange(df, columns_or_column)

@spec arrange(
  df :: t(),
  column_or_columns :: column() | columns() | [{:asc | :desc, column()}]
) :: t()

Arranges/sorts rows by columns.

examples
Examples

A single column name will sort ascending by that column:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange(df, "a")
#Explorer.DataFrame<
  Polars[3 x 2]
  a string ["a", "b", "c"]
  b integer [3, 1, 2]
>

You can also sort descending:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange(df, desc: "a")
#Explorer.DataFrame<
  Polars[3 x 2]
  a string ["c", "b", "a"]
  b integer [2, 1, 3]
>

Sorting by more than one column sorts them in the order they are entered:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.arrange(df, asc: "total", desc: "country")
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2011, 2011, 2012, ...]
  country string ["NIUE", "TUVALU", "TUVALU", "NIUE", "NIUE", ...]
  total integer [1, 2, 2, 2, 2, ...]
  solid_fuel integer [0, 0, 0, 0, 0, ...]
  liquid_fuel integer [1, 2, 2, 2, 2, ...]
  gas_fuel integer [0, 0, 0, 0, 0, ...]
  cement integer [0, 0, 0, 0, 0, ...]
  gas_flaring integer [0, 0, 0, 0, 0, ...]
  per_capita float [0.52, 0.0, 0.0, 1.04, 1.04, ...]
  bunker_fuels integer [0, 0, 0, 0, 0, ...]
>

Alternatively you can pass a callback to sort by the given columns:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange(df, &String.starts_with?(&1, "a"))
#Explorer.DataFrame<
  Polars[3 x 2]
  a string ["a", "b", "c"]
  b integer [3, 1, 2]
>

Or a callback to sort the columns of a given type:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange(df, fn _name, type -> type == :string end)
#Explorer.DataFrame<
  Polars[3 x 2]
  a string ["a", "b", "c"]
  b integer [3, 1, 2]
>

arrange_with(df, fun)

@spec arrange_with(
  df :: t(),
  (Explorer.Backend.LazyFrame.t() ->
     Explorer.Series.lazy_t()
     | [Explorer.Series.lazy_t()]
     | [{:asc | :desc, Explorer.Series.lazy_t()}])
) :: t()

Arranges/sorts rows by columns using a callback function.

examples
Examples

A single column name will sort ascending by that column:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange_with(df, &(&1["a"]))
#Explorer.DataFrame<
  Polars[3 x 2]
  a string ["a", "b", "c"]
  b integer [3, 1, 2]
>

You can also sort descending:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"]])
#Explorer.DataFrame<
  Polars[3 x 2]
  a string ["c", "b", "a"]
  b integer [2, 1, 3]
>

Sorting by more than one column sorts them in the order they are entered:

iex> df = Explorer.DataFrame.new(a: [3, 1, 3], b: [2, 1, 3])
iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"], asc: &1["b"]])
#Explorer.DataFrame<
  Polars[3 x 2]
  a integer [3, 3, 1]
  b integer [2, 3, 1]
>

collect(df)

@spec collect(df :: t()) :: {:ok, t()} | {:error, term()}

This collects the lazy data frame into an eager one, computing the query.

If already eager, this is a noop.

distinct(df, columns \\ 0..-1//1, opts \\ [])

@spec distinct(df :: t(), columns :: columns(), opts :: Keyword.t()) :: t()

Takes distinct rows by a selection of columns.

Distinct is not affected by groups, although groups are kept in the columns selection if keep_all option is false (the default).

options
Options

keep_all - If set to true, keep all columns. Default is false.

examples
Examples

By default will return unique values of the requested columns:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.distinct(df, ["year", "country"])
#Explorer.DataFrame<
  Polars[1094 x 2]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
>

If keep_all is set to true, then the first value of each column not in the requested columns will be returned:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.distinct(df, ["year", "country"], keep_all: true)
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

A callback on the dataframe's names can be passed instead of a list (like select/3):

iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
iex> Explorer.DataFrame.distinct(df, &String.starts_with?(&1, "x"))
#Explorer.DataFrame<
  Polars[2 x 2]
  x1 integer [1, 3]
  x2 string ["a", "c"]
>

If the dataframe has groups, then the columns of each group will be added to the distinct columns:

iex> df = Explorer.DataFrame.new(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
iex> df = Explorer.DataFrame.group_by(df, "x1")
iex> Explorer.DataFrame.distinct(df, ["x2"])
#Explorer.DataFrame<
  Polars[2 x 2]
  Groups: ["x1"]
  x1 integer [1, 3]
  x2 string ["a", "c"]
>

drop_nil(df, columns_or_column \\ 0..-1)

@spec drop_nil(df :: t(), column() | columns()) :: t()

Drop nil values.

Optionally accepts a subset of columns.

examples
Examples

To drop nils on all columns:

iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3])
iex> Explorer.DataFrame.drop_nil(df)
#Explorer.DataFrame<
  Polars[1 x 2]
  a integer [1]
  b integer [1]
>

To select some columns:

iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3], c: [nil, 5, 6])
iex> Explorer.DataFrame.drop_nil(df, [:a, :c])
#Explorer.DataFrame<
  Polars[1 x 3]
  a integer [2]
  b integer [nil]
  c integer [5]
>

To select some columns by range:

iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3], c: [nil, 5, 6])
iex> Explorer.DataFrame.drop_nil(df, 0..1)
#Explorer.DataFrame<
  Polars[1 x 3]
  a integer [1]
  b integer [1]
  c integer [nil]
>

Or to select columns by a callback on the names:

iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3], c: [nil, 5, 6])
iex> Explorer.DataFrame.drop_nil(df, fn name -> name == "a" or name == "b" end)
#Explorer.DataFrame<
  Polars[1 x 3]
  a integer [1]
  b integer [1]
  c integer [nil]
>

Or to select columns by a callback on the names and types:

iex> df = Explorer.DataFrame.new(a: [1, 2, nil], b: [1, nil, 3], c: [nil, 5.0, 6.0])
iex> Explorer.DataFrame.drop_nil(df, fn _name, type -> type == :float end)
#Explorer.DataFrame<
  Polars[2 x 3]
  a integer [2, nil]
  b integer [nil, 3]
  c float [5.0, 6.0]
>

dummies(df, columns_or_column)

@spec dummies(df :: t(), column() | columns()) :: t()

Turns a set of columns to dummy variables.

examples
Examples

To mark a single column as dummy:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "a", "c"], b: ["b", "a", "b", "d"])
iex> Explorer.DataFrame.dummies(df, "a")
#Explorer.DataFrame<
  Polars[4 x 3]
  a_a integer [1, 0, 1, 0]
  a_b integer [0, 1, 0, 0]
  a_c integer [0, 0, 0, 1]
>

Or multiple columns:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "a", "c"], b: ["b", "a", "b", "d"])
iex> Explorer.DataFrame.dummies(df, ["a", "b"])
#Explorer.DataFrame<
  Polars[4 x 6]
  a_a integer [1, 0, 1, 0]
  a_b integer [0, 1, 0, 0]
  a_c integer [0, 0, 0, 1]
  b_a integer [0, 1, 0, 0]
  b_b integer [1, 0, 1, 0]
  b_d integer [0, 0, 0, 1]
>

Or all string columns:

iex> df = Explorer.DataFrame.new(num: [1, 2, 3, 4], b: ["b", "a", "b", "d"])
iex> Explorer.DataFrame.dummies(df, fn _name, type -> type == :string end)
#Explorer.DataFrame<
  Polars[4 x 3]
  b_a integer [0, 1, 0, 0]
  b_b integer [1, 0, 1, 0]
  b_d integer [0, 0, 0, 1]
>

filter_with(df, fun)

@spec filter_with(
  df :: t(),
  callback :: (Explorer.Backend.LazyFrame.t() -> Explorer.Series.lazy_t())
) :: t()

Picks rows based on a callback function.

This function is efficient because it uses a representation of the series without pulling them. The only restriction is that you need to use a function that returns boolean.

But you can also use window functions and aggregations inside comparisons.

examples
Examples

iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
iex> Explorer.DataFrame.filter_with(df, &Explorer.Series.greater(&1["col2"], 2))
#Explorer.DataFrame<
  Polars[1 x 2]
  col1 string ["c"]
  col2 integer [3]
>

iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.equal(df["col1"], "b") end)
#Explorer.DataFrame<
  Polars[1 x 2]
  col1 string ["b"]
  col2 integer [2]
>

iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.cumulative_max(df["col2"]) end)
** (ArgumentError) expecting the function to return a boolean LazySeries, but instead it returned a LazySeries of type :integer

But it's possible to use a boolean operation based on another function:

iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.equal(Explorer.Series.cumulative_max(df["col2"]), 1) end)
#Explorer.DataFrame<
  Polars[1 x 2]
  col1 string ["a"]
  col2 integer [1]
>

group_by(df, groups)

@spec group_by(df :: t(), groups_or_group :: column_names() | column_name()) :: t()

Group the dataframe by one or more variables.

When the dataframe has grouping variables, operations are performed per group. Explorer.DataFrame.ungroup/2 removes grouping.

examples
Examples

You can group by a single variable:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.group_by(df, "country")
#Explorer.DataFrame<
  Polars[1094 x 10]
  Groups: ["country"]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

Or you can group by multiple:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.group_by(df, ["country", "year"])
#Explorer.DataFrame<
  Polars[1094 x 10]
  Groups: ["country", "year"]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

mask(df, mask)

@spec mask(df :: t(), mask :: Explorer.Series.t() | [boolean()]) :: t()

Picks rows based on a list or series of values.

examples
Examples

This function must only be used when you need to select rows based on external values that are not available to the dataframe. For example, you can pass a list:

iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
iex> Explorer.DataFrame.mask(df, [false, true, false])
#Explorer.DataFrame<
  Polars[1 x 2]
  col1 string ["b"]
  col2 integer [2]
>

You must avoid using masks when the masks themselves are computed from other columns. For example, DO NOT do this:

iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
iex> Explorer.DataFrame.mask(df, Explorer.Series.greater(df["col2"], 1))
#Explorer.DataFrame<
  Polars[2 x 2]
  col1 string ["b", "c"]
  col2 integer [2, 3]
>

Instead, do this:

iex> df = Explorer.DataFrame.new(col1: ["a", "b", "c"], col2: [1, 2, 3])
iex> Explorer.DataFrame.filter_with(df, fn df -> Explorer.Series.greater(df["col2"], 1) end)
#Explorer.DataFrame<
  Polars[2 x 2]
  col1 string ["b", "c"]
  col2 integer [2, 3]
>

The filter_with/2 version is much more efficient because it doesn't need to create intermediate series representations to apply the mask.

mutate(df, columns)

@spec mutate(df :: t(), columns :: column_pairs(any())) :: t()

Creates and modifies columns.

Columns are added with keyword list or maps. New variables overwrite existing variables of the same name. Column names are coerced from atoms to strings.

examples
Examples

You can pass in a list directly as a new column:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, c: [4, 5, 6])
#Explorer.DataFrame<
  Polars[3 x 3]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
  c integer [4, 5, 6]
>

Or you can pass in a series:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> s = Explorer.Series.from_list([4, 5, 6])
iex> Explorer.DataFrame.mutate(df, c: s)
#Explorer.DataFrame<
  Polars[3 x 3]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
  c integer [4, 5, 6]
>

You can overwrite existing columns:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, a: [4, 5, 6])
#Explorer.DataFrame<
  Polars[3 x 2]
  a integer [4, 5, 6]
  b integer [1, 2, 3]
>

Scalar values are repeated to fill the series:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, a: 4)
#Explorer.DataFrame<
  Polars[3 x 2]
  a integer [4, 4, 4]
  b integer [1, 2, 3]
>

Alternatively, all of the above works with a map instead of a keyword list:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, %{"c" => [4, 5, 6]})
#Explorer.DataFrame<
  Polars[3 x 3]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
  c integer [4, 5, 6]
>

mutate_with(df, fun)

@spec mutate_with(
  df :: t(),
  callback ::
    (Explorer.Backend.LazyFrame.t() -> column_pairs(Explorer.Series.lazy_t()))
) :: t()

Creates or modifies columns using a callback function.

This function is similar to mutate/2, but allows complex operations to be performed, since it uses a virtual representation of the dataframe. The only requirement is that a series operation is returned.

New variables overwrite existing variables of the same name. Column names are coerced from atoms to strings.

examples
Examples

iex> df = Explorer.DataFrame.new(a: [4, 5, 6], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate_with(df, &[c: Explorer.Series.add(&1["a"], &1["b"])])
#Explorer.DataFrame<
  Polars[3 x 3]
  a integer [4, 5, 6]
  b integer [1, 2, 3]
  c integer [5, 7, 9]
>

You can overwrite existing columns:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate_with(df, &[b: Explorer.Series.pow(&1["b"], 2)])
#Explorer.DataFrame<
  Polars[3 x 2]
  a string ["a", "b", "c"]
  b float [1.0, 4.0, 9.0]
>

Scalar values are repeated to fill the series:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate_with(df, &[a: Explorer.Series.max(&1["b"])])
#Explorer.DataFrame<
  Polars[3 x 2]
  a integer [3, 3, 3]
  b integer [1, 2, 3]
>

Alternatively, all of the above works with a map instead of a keyword list:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate_with(df, fn df -> %{"c" => Explorer.Series.window_mean(df["b"], 2)} end)
#Explorer.DataFrame<
  Polars[3 x 3]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
  c float [1.0, 1.5, 2.5]
>

Mutations in grouped dataframes are also possible:

iex> df = Explorer.DataFrame.new(id: ["a", "a", "b"], b: [1, 2, 3])
iex> grouped = Explorer.DataFrame.group_by(df, :id)
iex> Explorer.DataFrame.mutate_with(grouped, &[count: Explorer.Series.count(&1["b"])])
#Explorer.DataFrame<
  Polars[3 x 3]
  Groups: ["id"]
  id string ["a", "a", "b"]
  b integer [1, 2, 3]
  count integer [2, 2, 1]
>

new(data, opts \\ [])

@spec new(
  Table.Reader.t() | series_pairs,
  opts :: Keyword.t()
) :: t()
when series_pairs:
       %{required(column_name()) => Explorer.Series.t()}
       | [{column_name(), Explorer.Series.t()}]

Creates a new dataframe.

Accepts any tabular data adhering to the Table.Reader protocol, as well as a map or a keyword list with series.

options
Options

backend - The Explorer backend to use. Defaults to the value returned by Explorer.Backend.get/0.

examples
Examples

From series:

iex> Explorer.DataFrame.new(%{floats: Explorer.Series.from_list([1.0, 2.0]), ints: Explorer.Series.from_list([1, nil])})
#Explorer.DataFrame<
  Polars[2 x 2]
  floats float [1.0, 2.0]
  ints integer [1, nil]
>

From columnar data:

iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, nil]})
#Explorer.DataFrame<
  Polars[2 x 2]
  floats float [1.0, 2.0]
  ints integer [1, nil]
>

iex> Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
#Explorer.DataFrame<
  Polars[2 x 2]
  floats float [1.0, 2.0]
  ints integer [1, nil]
>

iex> Explorer.DataFrame.new(%{floats: [1.0, 2.0], ints: [1, "wrong"]})
** (ArgumentError) cannot create series "ints": the value "wrong" does not match the inferred series dtype :integer

From row data:

iex> rows = [%{id: 1, name: "José"}, %{id: 2, name: "Christopher"}, %{id: 3, name: "Cristine"}]
iex> Explorer.DataFrame.new(rows)
#Explorer.DataFrame<
  Polars[3 x 2]
  id integer [1, 2, 3]
  name string ["José", "Christopher", "Cristine"]
>

iex> rows = [[id: 1, name: "José"], [id: 2, name: "Christopher"], [id: 3, name: "Cristine"]]
iex> Explorer.DataFrame.new(rows)
#Explorer.DataFrame<
  Polars[3 x 2]
  id integer [1, 2, 3]
  name string ["José", "Christopher", "Cristine"]
>

pivot_longer(df, columns_to_pivot, opts \\ [])

@spec pivot_longer(
  df :: t(),
  columns_to_pivot :: columns(),
  opts :: Keyword.t()
) :: t()

Pivot data from wide to long.

Explorer.DataFrame.pivot_longer/3 "lengthens" data, increasing the number of rows and decreasing the number of columns. The inverse transformation is Explorer.DataFrame.pivot_wider/4.

The second argument (columns_to_pivot) can be either an array of column names to pivot or a filter callback on the dataframe's names. These columns must always have the same data type.

options
Options

keep - Columns that are not in the list of pivot and should be kept in the dataframe. May be a filter callback on the dataframe's column names. Defaults to all columns except the ones to pivot.
drop - Columns that are not in the list of pivot and should be dropped from the dataframe. May be a filter callback on the dataframe's column names. This list of columns is going to be subtracted from the list of keep. Defaults to an empty list.
names_to - A string specifying the name of the column to create from the data stored in the column names of the dataframe. Defaults to "variable".
values_to - A string specifying the name of the column to create from the data stored in series element values. Defaults to "value".

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"))
#Explorer.DataFrame<
  Polars[3282 x 9]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
  variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
  value integer [627, 117, 332, 0, 0, ...]
>

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pivot_longer(df, &String.ends_with?(&1, "fuel"), keep: ["year", "country"])
#Explorer.DataFrame<
  Polars[3282 x 4]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", ...]
  value integer [627, 117, 332, 0, 0, ...]
>

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pivot_longer(df, ["total"], keep: ["year", "country"], drop: ["country"])
#Explorer.DataFrame<
  Polars[1094 x 3]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  variable string ["total", "total", "total", "total", "total", ...]
  value integer [2308, 1254, 32500, 141, 7924, ...]
>

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pivot_longer(df, ["total"], keep: [], names_to: "my_var", values_to: "my_value")
#Explorer.DataFrame<
  Polars[1094 x 2]
  my_var string ["total", "total", "total", "total", "total", ...]
  my_value integer [2308, 1254, 32500, 141, 7924, ...]
>

pivot_wider(df, names_from, values_from, opts \\ [])

@spec pivot_wider(
  df :: t(),
  names_from :: column(),
  values_from :: column(),
  opts :: Keyword.t()
) :: t()

Pivot data from long to wide.

Explorer.DataFrame.pivot_wider/4 "widens" data, increasing the number of columns and decreasing the number of rows. The inverse transformation is Explorer.DataFrame.pivot_longer/3.

Due to a restriction upstream, values_from must be a numeric type.

options
Options

id_columns - A set of columns that uniquely identifies each observation. Defaults to all columns in data except for the columns specified in names_from and values_from. Typically used when you have redundant variables, i.e. variables whose values are perfectly correlated with existing variables. May accept a filter callback, a list or a range of column names. Default value is 0..-1. If an empty list is passed, or a range that results in a empty list of column names, it raises an error.
ID columns cannot be of the float type and attempting so will raise an error. If you need to use float columns as IDs, you must carefully consider rounding or truncating the column and converting it to integer, as long as doing so preserves the properties of the column.
names_prefix - String added to the start of every variable name. This is particularly useful if names_from is a numeric vector and you want to create syntactic variable names.

examples
Examples

iex> df = Explorer.DataFrame.new(id: [1, 1], variable: ["a", "b"], value: [1, 2])
iex> Explorer.DataFrame.pivot_wider(df, "variable", "value")
#Explorer.DataFrame<
  Polars[1 x 3]
  id integer [1]
  a integer [1]
  b integer [2]
>

pull(df, column)

@spec pull(df :: t(), column :: column()) :: Explorer.Series.t()

Extracts a single column as a series.

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pull(df, "total")
#Explorer.Series<
  integer[1094]
  [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
>

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pull(df, 2)
#Explorer.Series<
  integer[1094]
  [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
>

rename(df, names)

@spec rename(
  df :: t(),
  names :: column_names() | column_pairs(column_name())
) :: t()

Renames columns.

To apply a function to a subset of columns, see rename_with/3.

examples
Examples

You can pass in a list of new names:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
iex> Explorer.DataFrame.rename(df, ["c", "d"])
#Explorer.DataFrame<
  Polars[3 x 2]
  c string ["a", "b", "a"]
  d integer [1, 3, 1]
>

Or you can rename individual columns using keyword args:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
iex> Explorer.DataFrame.rename(df, a: "first")
#Explorer.DataFrame<
  Polars[3 x 2]
  first string ["a", "b", "a"]
  b integer [1, 3, 1]
>

Or you can rename individual columns using a map:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "a"], b: [1, 3, 1])
iex> Explorer.DataFrame.rename(df, %{"a" => "first"})
#Explorer.DataFrame<
  Polars[3 x 2]
  first string ["a", "b", "a"]
  b integer [1, 3, 1]
>

rename_with(df, columns \\ 0..-1//1, callback)

@spec rename_with(df :: t(), columns :: columns(), callback :: function()) :: t()

Renames columns with a function.

examples
Examples

If no columns are specified, it will apply the function to all column names:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.rename_with(df, &String.upcase/1)
#Explorer.DataFrame<
  Polars[1094 x 10]
  YEAR integer [2010, 2010, 2010, 2010, 2010, ...]
  COUNTRY string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
  SOLID_FUEL integer [627, 117, 332, 0, 0, ...]
  LIQUID_FUEL integer [1601, 953, 12381, 141, 3649, ...]
  GAS_FUEL integer [74, 7, 14565, 0, 374, ...]
  CEMENT integer [5, 177, 2598, 0, 204, ...]
  GAS_FLARING integer [0, 0, 2623, 0, 3697, ...]
  PER_CAPITA float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  BUNKER_FUELS integer [9, 7, 663, 0, 321, ...]
>

A callback can be used to filter the column names that will be renamed, similarly to select/3:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.rename_with(df, &String.ends_with?(&1, "_fuel"), &String.trim_trailing(&1, "_fuel"))
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid integer [627, 117, 332, 0, 0, ...]
  liquid integer [1601, 953, 12381, 141, 3649, ...]
  gas integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

Or you can just pass in the list of column names you'd like to apply the function to:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.rename_with(df, ["total", "cement"], &String.upcase/1)
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  TOTAL integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  CEMENT integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

select(df, columns_or_column, keep_or_drop \\ :keep)

@spec select(
  df :: t(),
  column() | columns(),
  keep_or_drop :: :keep | :drop
) :: t()

Selects a subset of columns by name.

Can optionally return all but the named columns if :drop is passed as the last argument.

examples
Examples

You can select a single column:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.select(df, "a")
#Explorer.DataFrame<
  Polars[3 x 1]
  a string ["a", "b", "c"]
>

Or a list of names:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.select(df, ["a"])
#Explorer.DataFrame<
  Polars[3 x 1]
  a string ["a", "b", "c"]
>

You can also use a range or a list of integers:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
iex> Explorer.DataFrame.select(df, [0, 1])
#Explorer.DataFrame<
  Polars[3 x 2]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
>

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
iex> Explorer.DataFrame.select(df, 0..1)
#Explorer.DataFrame<
  Polars[3 x 2]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
>

Or you can use a callback function that takes the dataframe's names as its first argument:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.select(df, &String.starts_with?(&1, "b"))
#Explorer.DataFrame<
  Polars[3 x 1]
  b integer [1, 2, 3]
>

Or a callback function that takes names and types:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.select(df, fn _name, type -> type == :integer end)
#Explorer.DataFrame<
  Polars[3 x 1]
  b integer [1, 2, 3]
>

If you pass :drop as the third argument, it will return all but the named columns:

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.select(df, ["b"], :drop)
#Explorer.DataFrame<
  Polars[3 x 1]
  a string ["a", "b", "c"]
>

iex> df = Explorer.DataFrame.new(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
iex> Explorer.DataFrame.select(df, ["a", "b"], :drop)
#Explorer.DataFrame<
  Polars[3 x 1]
  c integer [4, 5, 6]
>

summarise(df, columns)

@spec summarise(df :: t(), columns :: Keyword.t() | map()) :: t()

Summarise each group to a single row.

Implicitly ungroups.

supported-operations
Supported operations

The following aggregations may be performed:

:min - Take the minimum value within the group. See Explorer.Series.min/1.
:max - Take the maximum value within the group. See Explorer.Series.max/1.
:sum - Take the sum of the series within the group. See Explorer.Series.sum/1.
:mean - Take the mean of the series within the group. See Explorer.Series.mean/1.
:median - Take the median of the series within the group. See Explorer.Series.median/1.
:first - Take the first value within the group. See Explorer.Series.first/1.
:last - Take the last value within the group. See Explorer.Series.last/1.
:count - Count the number of rows per group.
:n_distinct - Count the number of unique rows per group.

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> df |> Explorer.DataFrame.group_by("year") |> Explorer.DataFrame.summarise(total: [:max, :min], country: [:n_distinct])
#Explorer.DataFrame<
  Polars[5 x 4]
  year integer [2010, 2011, 2012, 2013, 2014]
  total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
  total_min integer [1, 2, 2, 2, 3]
  country_n_distinct integer [217, 217, 220, 220, 220]
>

summarise_with(df, fun)

@spec summarise_with(
  df :: t(),
  callback ::
    (Explorer.Backend.LazyFrame.t() -> column_pairs(Explorer.Series.lazy_t()))
) :: t()

Summarise each group to a single row using a callback function.

Implicitly ungroups. The main difference between summarise/2 and summarise_with/2 is that the later accepts a function that can be used to perform complex operations. This is efficient because it doesn't need to create intermediate series representations to summarise.

supported-operations
Supported operations

The function callback should be in the form of [name_of_col: "operation"], where "operation" is one of the Explorer.Series functions. It's required that at least one of the following functions is used for summarisation:

Explorer.Series.min/1 - Take the minimum value within the group.
Explorer.Series.max/1 - Take the maximum value within the group.
Explorer.Series.sum/1 - Take the sum of the series within the group.
Explorer.Series.mean/1 - Take the mean of the series within the group.
Explorer.Series.median/1 - Take the median of the series within the group.
Explorer.Series.first/1 - Take the first value within the group.
Explorer.Series.last/1 - Take the last value within the group.
Explorer.Series.count/1 - Count the number of rows per group.
Explorer.Series.n_distinct/1 - Count the number of unique rows per group.

examples
Examples

iex> alias Explorer.{DataFrame, Series}
iex> df = Explorer.Datasets.fossil_fuels() |> DataFrame.group_by("year")
iex> DataFrame.summarise_with(df, &[total_max: Series.max(&1["total"]), countries: Series.n_distinct(&1["country"])])
#Explorer.DataFrame<
  Polars[5 x 3]
  year integer [2010, 2011, 2012, 2013, 2014]
  total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
  countries integer [217, 217, 220, 220, 220]
>

table(df, opts \\ [])

@spec table(df :: t(), opts :: Keyword.t()) :: :ok

Display the DataFrame in a tabular fashion.

examples
Examples

df = Explorer.Datasets.iris() Explorer.DataFrame.table(df) Explorer.DataFrame.table(df, limit: 1) Explorer.DataFrame.table(df, limit: :infinity)

to_columns(df, opts \\ [])

@spec to_columns(df :: t(), Keyword.t()) :: map()

Converts a dataframe to a list of columns with lists as values.

See to_series/2 if you want a list of columns with series as values.

options
Options

:atom_keys - Configure if the resultant map should have atom keys. (default: false)

examples
Examples

iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
iex> Explorer.DataFrame.to_columns(df)
%{"floats" => [1.0, 2.0], "ints" => [1, nil]}

iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
iex> Explorer.DataFrame.to_columns(df, atom_keys: true)
%{floats: [1.0, 2.0], ints: [1, nil]}

to_lazy(df)

@spec to_lazy(df :: t()) :: t()

Converts the dataframe to the lazy version of the current backend.

If already lazy, this is a noop.

to_rows(df, opts \\ [])

@spec to_rows(df :: t(), Keyword.t()) :: [map()]

Converts a dataframe to a list of maps (rows).

Warning
This may be an expensive operation because polars stores data in columnar format.

options
Options

:atom_keys - Configure if the resultant maps should have atom keys. (default: false)

examples
Examples

iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
iex> Explorer.DataFrame.to_rows(df)
[%{"floats" => 1.0, "ints" => 1}, %{"floats" => 2.0 ,"ints" => nil}]

iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, nil])
iex> Explorer.DataFrame.to_rows(df, atom_keys: true)
[%{floats: 1.0, ints: 1}, %{floats: 2.0, ints: nil}]

to_series(df, opts \\ [])

@spec to_series(df :: t(), Keyword.t()) :: map()

Converts a dataframe to a list of columns with series as values.

See to_columns/2 if you want a list of columns with lists as values.

options
Options

:atom_keys - Configure if the resultant map should have atom keys. (default: false)

examples
Examples

iex> df = Explorer.DataFrame.new(ints: [1, nil], floats: [1.0, 2.0])
iex> map = Explorer.DataFrame.to_series(df)
iex> Explorer.Series.to_list(map["floats"])
[1.0, 2.0]
iex> Explorer.Series.to_list(map["ints"])
[1, nil]

ungroup(df, groups \\ :all)

@spec ungroup(df :: t(), groups_or_group :: column_names() | column_name() | :all) ::
  t()

Removes grouping variables.

Accepts a list of group names. If groups is not specified, then all groups are removed.

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
iex> Explorer.DataFrame.ungroup(df, ["country"])
#Explorer.DataFrame<
  Polars[1094 x 10]
  Groups: ["year"]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

iex> df = Explorer.Datasets.fossil_fuels()
iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
iex> Explorer.DataFrame.ungroup(df)
#Explorer.DataFrame<
  Polars[1094 x 10]
  year integer [2010, 2010, 2010, 2010, 2010, ...]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
  total integer [2308, 1254, 32500, 141, 7924, ...]
  solid_fuel integer [627, 117, 332, 0, 0, ...]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, ...]
  gas_fuel integer [74, 7, 14565, 0, 374, ...]
  cement integer [5, 177, 2598, 0, 204, ...]
  gas_flaring integer [0, 0, 2623, 0, 3697, ...]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, ...]
  bunker_fuels integer [9, 7, 663, 0, 321, ...]
>

Link to this section Functions: Multi-table

concat_columns(dfs)

@spec concat_columns([t()]) :: t()

Combine two or more dataframes column-wise.

Dataframes must have the same number of rows.

examples
Examples

iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
iex> df2 = Explorer.DataFrame.new(z: [4, 5, 6], a: ["d", "e", "f"])
iex> Explorer.DataFrame.concat_columns([df1, df2])
#Explorer.DataFrame<
  Polars[3 x 4]
  x integer [1, 2, 3]
  y string ["a", "b", "c"]
  z integer [4, 5, 6]
  a string ["d", "e", "f"]
>

Conflicting names are suffixed with the index of the dataframe in the array:

iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], a: ["d", "e", "f"])
iex> Explorer.DataFrame.concat_columns([df1, df2])
#Explorer.DataFrame<
  Polars[3 x 4]
  x integer [1, 2, 3]
  y string ["a", "b", "c"]
  x_1 integer [4, 5, 6]
  a string ["d", "e", "f"]
>

concat_columns(df1, df2)

@spec concat_columns(t(), t()) :: t()

Combine two dataframes column-wise.

concat_columns(df1, df2) is equivalent to concat_columns([df1, df2]).

concat_rows(dfs)

@spec concat_rows([t()]) :: t()

Combine two or more dataframes row-wise (stack).

Column names and dtypes must match. The only exception is for numeric columns that can be mixed together, and casted automatically to float columns.

examples
Examples

iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
iex> df2 = Explorer.DataFrame.new(x: [4, 5, 6], y: ["d", "e", "f"])
iex> Explorer.DataFrame.concat_rows([df1, df2])
#Explorer.DataFrame<
  Polars[6 x 2]
  x integer [1, 2, 3, 4, 5, ...]
  y string ["a", "b", "c", "d", "e", ...]
>

iex> df1 = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
iex> df2 = Explorer.DataFrame.new(x: [4.2, 5.3, 6.4], y: ["d", "e", "f"])
iex> Explorer.DataFrame.concat_rows([df1, df2])
#Explorer.DataFrame<
  Polars[6 x 2]
  x float [1.0, 2.0, 3.0, 4.2, 5.3, ...]
  y string ["a", "b", "c", "d", "e", ...]
>

concat_rows(df1, df2)

@spec concat_rows(t(), t()) :: t()

Combine two dataframes row-wise.

concat_rows(df1, df2) is equivalent to concat_rows([df1, df2]).

join(left, right, opts \\ [])

@spec join(left :: t(), right :: t(), opts :: Keyword.t()) :: t()

Join two tables.

join-types
Join types

inner - Returns all rows from left where there are matching values in right, and all columns from left and right.
left - Returns all rows from left and all columns from left and right. Rows in left with no match in right will have nil values in the new columns.
right - Returns all rows from right and all columns from left and right. Rows in right with no match in left will have nil values in the new columns.
outer - Returns all rows and all columns from both left and right. Where there are not matching values, returns nil for the one missing.
cross - Also known as a cartesian join. Returns all combinations of left and right. Can be very computationally expensive.

options
Options

on - The columns to join on. Defaults to overlapping columns. Does not apply to cross join.
how - One of the join types (as an atom) described above. Defaults to :inner.

examples
Examples

Inner join:

iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right)
#Explorer.DataFrame<
  Polars[3 x 3]
  a integer [1, 2, 2]
  b string ["a", "b", "b"]
  c string ["d", "e", "f"]
>

Left join:

iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.new(a: [1, 2, 2], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, how: :left)
#Explorer.DataFrame<
  Polars[4 x 3]
  a integer [1, 2, 2, 3]
  b string ["a", "b", "b", "c"]
  c string ["d", "e", "f", nil]
>

Right join:

iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, how: :right)
#Explorer.DataFrame<
  Polars[3 x 3]
  a integer [1, 2, 4]
  c string ["d", "e", "f"]
  b string ["a", "b", nil]
>

Outer join:

iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, how: :outer)
#Explorer.DataFrame<
  Polars[4 x 3]
  a integer [1, 2, 4, 3]
  b string ["a", "b", nil, "c"]
  c string ["d", "e", "f", nil]
>

Cross join:

iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.new(a: [1, 2, 4], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, how: :cross)
#Explorer.DataFrame<
  Polars[9 x 4]
  a integer [1, 1, 1, 2, 2, ...]
  b string ["a", "a", "a", "b", "b", ...]
  a_right integer [1, 2, 4, 1, 2, ...]
  c string ["d", "e", "f", "d", "e", ...]
>

Inner join with different names:

iex> left = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.new(d: [1, 2, 2], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, on: [{"a", "d"}])
#Explorer.DataFrame<
  Polars[3 x 3]
  a integer [1, 2, 2]
  b string ["a", "b", "b"]
  c string ["d", "e", "f"]
>

Link to this section Functions: Row-based

head(df, nrows \\ 5)

@spec head(df :: t(), nrows :: integer()) :: t()

Returns the first n rows of the dataframe.

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.head(df)
#Explorer.DataFrame<
  Polars[5 x 10]
  year integer [2010, 2010, 2010, 2010, 2010]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA"]
  total integer [2308, 1254, 32500, 141, 7924]
  solid_fuel integer [627, 117, 332, 0, 0]
  liquid_fuel integer [1601, 953, 12381, 141, 3649]
  gas_fuel integer [74, 7, 14565, 0, 374]
  cement integer [5, 177, 2598, 0, 204]
  gas_flaring integer [0, 0, 2623, 0, 3697]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37]
  bunker_fuels integer [9, 7, 663, 0, 321]
>

sample(df, n_or_frac, opts \\ [])

@spec sample(df :: t(), n_or_frac :: number(), opts :: Keyword.t()) :: t()

Sample rows from a dataframe.

If given an integer as the second argument, it will return N samples. If given a float, it will return that proportion of the series.

Can sample with or without replacement.

options
Options

replacement - If set to true, each sample will be independent and therefore values may repeat. Required to be true for n greater then the number of rows in the dataframe or frac > 1.0. (default: false)
seed - An integer to be used as a random seed. If nil, a random value between 1 and 1e12 will be used. (default: nil)

examples
Examples

You can sample N rows:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.sample(df, 3, seed: 100)
#Explorer.DataFrame<
  Polars[3 x 10]
  year integer [2012, 2012, 2013]
  country string ["ZIMBABWE", "NICARAGUA", "NIGER"]
  total integer [2125, 1260, 529]
  solid_fuel integer [917, 0, 93]
  liquid_fuel integer [1006, 1176, 432]
  gas_fuel integer [0, 0, 0]
  cement integer [201, 84, 4]
  gas_flaring integer [0, 0, 0]
  per_capita float [0.15, 0.21, 0.03]
  bunker_fuels integer [9, 18, 19]
>

Or you can sample a proportion of rows:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.sample(df, 0.03, seed: 100)
#Explorer.DataFrame<
  Polars[33 x 10]
  year integer [2013, 2012, 2013, 2012, 2010, ...]
  country string ["BAHAMAS", "POLAND", "SLOVAKIA", "MOZAMBIQUE", "OMAN", ...]
  total integer [764, 81792, 9024, 851, 12931, ...]
  solid_fuel integer [1, 53724, 3657, 11, 0, ...]
  liquid_fuel integer [763, 17353, 2090, 632, 2331, ...]
  gas_fuel integer [0, 8544, 2847, 47, 9309, ...]
  cement integer [0, 2165, 424, 161, 612, ...]
  gas_flaring integer [0, 6, 7, 0, 679, ...]
  per_capita float [2.02, 2.12, 1.67, 0.03, 4.39, ...]
  bunker_fuels integer [167, 573, 34, 56, 1342, ...]
>

slice(df, row_indices)

Subset rows with a list of indices or a range.

examples
Examples

iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
iex> Explorer.DataFrame.slice(df, [0, 2])
#Explorer.DataFrame<
  Polars[2 x 2]
  a integer [1, 3]
  b string ["a", "c"]
>

iex> df = Explorer.DataFrame.new(a: [1, 2, 3], b: ["a", "b", "c"])
iex> Explorer.DataFrame.slice(df, 1..2)
#Explorer.DataFrame<
  Polars[2 x 2]
  a integer [2, 3]
  b string ["b", "c"]
>

slice(df, offset, length)

Subset a continuous set of rows.

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.slice(df, 1, 2)
#Explorer.DataFrame<
  Polars[2 x 10]
  year integer [2010, 2010]
  country string ["ALBANIA", "ALGERIA"]
  total integer [1254, 32500]
  solid_fuel integer [117, 332]
  liquid_fuel integer [953, 12381]
  gas_fuel integer [7, 14565]
  cement integer [177, 2598]
  gas_flaring integer [0, 2623]
  per_capita float [0.43, 0.9]
  bunker_fuels integer [7, 663]
>

Negative offsets count from the end of the series:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.slice(df, -10, 2)
#Explorer.DataFrame<
  Polars[2 x 10]
  year integer [2014, 2014]
  country string ["UNITED STATES OF AMERICA", "URUGUAY"]
  total integer [1432855, 1840]
  solid_fuel integer [450047, 2]
  liquid_fuel integer [576531, 1700]
  gas_fuel integer [390719, 25]
  cement integer [11314, 112]
  gas_flaring integer [4244, 0]
  per_capita float [4.43, 0.54]
  bunker_fuels integer [30722, 251]
>

If the length would run past the end of the dataframe, the result may be shorter than the length:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.slice(df, -10, 20)
#Explorer.DataFrame<
  Polars[10 x 10]
  year integer [2014, 2014, 2014, 2014, 2014, ...]
  country string ["UNITED STATES OF AMERICA", "URUGUAY", "UZBEKISTAN", "VANUATU", "VENEZUELA", ...]
  total integer [1432855, 1840, 28692, 42, 50510, ...]
  solid_fuel integer [450047, 2, 1677, 0, 204, ...]
  liquid_fuel integer [576531, 1700, 2086, 42, 28445, ...]
  gas_fuel integer [390719, 25, 23929, 0, 12731, ...]
  cement integer [11314, 112, 1000, 0, 1088, ...]
  gas_flaring integer [4244, 0, 0, 0, 8042, ...]
  per_capita float [4.43, 0.54, 0.97, 0.16, 1.65, ...]
  bunker_fuels integer [30722, 251, 0, 10, 1256, ...]
>

tail(df, nrows \\ 5)

@spec tail(df :: t(), nrows :: integer()) :: t()

Returns the last n rows of the dataframe.

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.tail(df)
#Explorer.DataFrame<
  Polars[5 x 10]
  year integer [2014, 2014, 2014, 2014, 2014]
  country string ["VIET NAM", "WALLIS AND FUTUNA ISLANDS", "YEMEN", "ZAMBIA", "ZIMBABWE"]
  total integer [45517, 6, 6190, 1228, 3278]
  solid_fuel integer [19246, 0, 137, 132, 2097]
  liquid_fuel integer [12694, 6, 5090, 797, 1005]
  gas_fuel integer [5349, 0, 581, 0, 0]
  cement integer [8229, 0, 381, 299, 177]
  gas_flaring integer [0, 0, 0, 0, 0]
  per_capita float [0.49, 0.44, 0.24, 0.08, 0.22]
  bunker_fuels integer [761, 1, 153, 33, 9]
>

Link to this section Functions: Introspection

dtypes(df)

@spec dtypes(df :: t()) :: %{required(String.t()) => atom()}

Gets the dtypes of the dataframe columns.

examples
Examples

iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
iex> Explorer.DataFrame.dtypes(df)
%{"floats" => :float, "ints" => :integer}

groups(data_frame)

@spec groups(df :: t()) :: [String.t()]

Returns the groups of a dataframe.

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> df = Explorer.DataFrame.group_by(df, "country")
iex> Explorer.DataFrame.groups(df)
["country"]

n_columns(df)

@spec n_columns(df :: t()) :: integer()

Returns the number of columns in the dataframe.

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.n_columns(df)
10

n_rows(df)

@spec n_rows(df :: t()) :: integer()

Returns the number of rows in the dataframe.

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.n_rows(df)
1094

names(df)

@spec names(df :: t()) :: [String.t()]

Gets the names of the dataframe columns.

examples
Examples

iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0], ints: [1, 2])
iex> Explorer.DataFrame.names(df)
["floats", "ints"]

shape(df)

@spec shape(df :: t()) :: {integer(), integer()}

Gets the shape of the dataframe as a {height, width} tuple.

examples
Examples

iex> df = Explorer.DataFrame.new(floats: [1.0, 2.0, 3.0], ints: [1, 2, 3])
iex> Explorer.DataFrame.shape(df)
{3, 2}

Link to this section Functions: IO

dump_csv(df, opts \\ [])

@spec dump_csv(df :: t(), opts :: Keyword.t()) :: String.t()

Writes a dataframe to a binary representation of a delimited file.

options
Options

header - Should the column names be written as the first line of the file? (default: true)
delimiter - A single character used to separate fields within a record. (default: ",")

examples
Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> df |> Explorer.DataFrame.head(2) |> Explorer.DataFrame.dump_csv()
"year,country,total,solid_fuel,liquid_fuel,gas_fuel,cement,gas_flaring,per_capita,bunker_fuels\n2010,AFGHANISTAN,2308,627,1601,74,5,0,0.08,9\n2010,ALBANIA,1254,117,953,7,177,0,0.43,7\n"

from_csv!(filename, opts \\ [])

@spec from_csv!(filename :: String.t(), opts :: Keyword.t()) :: t()

Similar to from_csv/2 but raises if there is a problem reading the CSV.

from_csv(filename, opts \\ [])

@spec from_csv(filename :: String.t(), opts :: Keyword.t()) ::
  {:ok, t()} | {:error, term()}

Reads a delimited file into a dataframe.

If the CSV is compressed, it is automatically decompressed.

options
Options

delimiter - A single character used to separate fields within a record. (default: ",")
dtypes - A list/map of {"column_name", dtype} tuples. Any non-specified column has its type imputed from the first 1000 rows. (default: [])
header - Does the file have a header of column names as the first row or not? (default: true)
max_rows - Maximum number of lines to read. (default: nil)
null_character - The string that should be interpreted as a nil value. (default: "NA")
skip_rows - The number of lines to skip at the beginning of the file. (default: 0)
columns - A list of column names or indexes to keep. If present, only these columns are read into the dataframe. (default: nil)
infer_schema_length Maximum number of rows read for schema inference. Setting this to nil will do a full table scan and will be slow (default: 1000).
parse_dates - Automatically try to parse dates/ datetimes and time. If parsing fails, columns remain of dtype string

from_ipc!(filename, opts \\ [])

@spec from_ipc!(filename :: String.t(), opts :: Keyword.t()) :: t()

Similar to from_ipc/2 but raises if there is a problem reading the IPC file.

from_ipc(filename, opts \\ [])

Reads an IPC file into a dataframe.

options
Options

columns - List with the name or index of columns to be selected. Defaults to all columns.

from_ndjson(filename, opts \\ [])

@spec from_ndjson(filename :: String.t(), opts :: Keyword.t()) ::
  {:ok, t()} | {:error, term()}

Read a file of JSON objects or lists separated by new lines

options
Options

batch_size - Sets the batch size for reading rows. This value may have significant impact in performance, so adjust it for your needs (default: 1000).
infer_schema_length - Maximum number of rows read for schema inference. Setting this to nil will do a full table scan and will be slow (default: 1000).

from_parquet(filename, opts \\ [])

@spec from_parquet(filename :: String.t(), opts :: Keyword.t()) ::
  {:ok, t()} | {:error, term()}

Reads a parquet file into a dataframe.

to_csv!(df, filename, opts \\ [])

@spec to_csv!(df :: t(), filename :: String.t(), opts :: Keyword.t()) :: String.t()

Similar to to_csv/3 but raises if there is a problem reading the CSV.

to_csv(df, filename, opts \\ [])

@spec to_csv(df :: t(), filename :: String.t(), opts :: Keyword.t()) ::
  {:ok, String.t()} | {:error, term()}

Writes a dataframe to a delimited file.

options
Options

header - Should the column names be written as the first line of the file? (default: true)
delimiter - A single character used to separate fields within a record. (default: ",")

to_ipc(df, filename, opts \\ [])

Writes a dataframe to a IPC file.

Apache IPC is a language-agnostic columnar data structure that can be used to store data frames. It excels as a format for quickly exchange data between different programming languages.

options
Options

compression - Sets the algorithm used to compress the IPC file. It accepts :zstd or :lz4 compression. (default: nil)

to_ndjson(df, filename)

@spec to_ndjson(df :: t(), filename :: String.t()) ::
  {:ok, String.t()} | {:error, term()}

Writes a dataframe to a ndjson file.

to_parquet(df, filename, opts \\ [])

Writes a dataframe to a parquet file.

options
Options

compression - The compression algorithm to use when writing files. Where a compression level is available, this can be passed as a tuple, such as {:zstd, 3}. Supported options are:
- nil (uncompressed, default)
- :snappy
- :gzip (with levels 1-9)
- :brotli (with levels 1-11)
- :zstd (with levels -7-22)
- :lz4raw.

Link to this section Functions

filter(df, callback)

@spec filter(df :: t(), callback :: function()) :: t()

Settings View Source Explorer.DataFrame (Explorer v0.3.1)

creating-dataframes Creating dataframes

verbs Verbs

single-table-verbs Single table verbs

multiple-table-verbs Multiple table verbs

row-based-verbs Row-based verbs

io IO

access Access

Link to this section Summary

Types

Functions: Single-table

Functions: Multi-table

Functions: Row-based

Functions: Introspection

Functions: IO

Functions

Link to this section Types

column()

column_name()

column_names()

column_pairs(value)

columns()

t()

Link to this section Functions: Single-table

arrange(df, columns_or_column)

examples Examples

arrange_with(df, fun)

examples Examples

collect(df)

distinct(df, columns \\ 0..-1//1, opts \\ [])

options Options

examples Examples

drop_nil(df, columns_or_column \\ 0..-1)

examples Examples

dummies(df, columns_or_column)

examples Examples

filter_with(df, fun)

examples Examples

group_by(df, groups)

examples Examples

mask(df, mask)

examples Examples

mutate(df, columns)

examples Examples

mutate_with(df, fun)

examples Examples

new(data, opts \\ [])

options Options

examples Examples

pivot_longer(df, columns_to_pivot, opts \\ [])

options Options

examples Examples

pivot_wider(df, names_from, values_from, opts \\ [])

options Options

examples Examples

pull(df, column)

examples Examples

rename(df, names)

examples Examples

rename_with(df, columns \\ 0..-1//1, callback)

examples Examples

select(df, columns_or_column, keep_or_drop \\ :keep)

examples Examples

summarise(df, columns)

supported-operations Supported operations

examples Examples

summarise_with(df, fun)

supported-operations Supported operations

examples Examples

table(df, opts \\ [])

examples Examples

to_columns(df, opts \\ [])

options Options

examples Examples

to_lazy(df)

to_rows(df, opts \\ [])

Warning

options Options

examples Examples

to_series(df, opts \\ [])

View Source Explorer.DataFrame (Explorer v0.3.1)

creating-dataframes
Creating dataframes

verbs
Verbs

single-table-verbs
Single table verbs

multiple-table-verbs
Multiple table verbs

row-based-verbs
Row-based verbs

io
IO

access
Access

examples
Examples

examples
Examples

options
Options

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

options
Options

examples
Examples

options
Options

examples
Examples

options
Options

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

supported-operations
Supported operations

examples
Examples

supported-operations
Supported operations

examples
Examples

examples
Examples

options
Options

examples
Examples

options
Options

examples
Examples

options
Options

examples
Examples

examples
Examples

examples
Examples

examples
Examples

join-types
Join types

options
Options

examples
Examples

examples
Examples

options
Options

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

examples
Examples

options
Options

examples
Examples

options
Options

options
Options

options
Options

options
Options

options
Options

options
Options