Represents a grouped DataFrame, created by SparkEx.DataFrame.group_by/2.
Aggregation functions like agg/2 can be applied to produce a new DataFrame.
Convenience methods count/1, min/2, max/2, sum/2, avg/2, mean/2
apply common aggregations. pivot/3 enables pivot-style aggregation.
Note: convenience numeric aggregation methods (sum/1, avg/1, min/1, max/1, mean/1)
called without explicit columns make an eager schema RPC to discover numeric columns.
The schema is cached per process so repeated calls on the same GroupedData avoid
redundant RPCs. Pass explicit column names to skip the schema lookup entirely.
Summary
Functions
Applies aggregate expressions to the grouped data, returning a new DataFrame.
Computes the average for each group.
Counts the number of records for each group.
Computes the maximum value for each group.
Computes the mean (alias for avg) for each group.
Computes the minimum value for each group.
Pivots on a column, enabling pivot-style aggregation.
Computes the sum for each group.
Types
@type t() :: %SparkEx.GroupedData{ cached_schema: term() | nil, group_type: atom(), grouping_exprs: [SparkEx.Column.expr()], grouping_sets: [[SparkEx.Column.expr()]] | nil, pivot_col: SparkEx.Column.expr() | nil, pivot_values: [term()] | nil, plan: term(), session: GenServer.server() }
Functions
@spec agg(t(), [SparkEx.Column.t()] | map()) :: SparkEx.DataFrame.t()
Applies aggregate expressions to the grouped data, returning a new DataFrame.
Accepts a list of SparkEx.Column structs representing aggregate expressions
(e.g. Functions.sum(col("amount")), Functions.count(col("id"))).
Examples
import SparkEx.Functions
df
|> DataFrame.group_by(["department"])
|> SparkEx.GroupedData.agg([sum(col("salary")), avg(col("age"))])
@spec avg(t(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]) :: SparkEx.DataFrame.t()
Computes the average for each group.
@spec count(t(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]) :: SparkEx.DataFrame.t()
Counts the number of records for each group.
@spec max(t(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]) :: SparkEx.DataFrame.t()
Computes the maximum value for each group.
@spec mean(t(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]) :: SparkEx.DataFrame.t()
Computes the mean (alias for avg) for each group.
@spec min(t(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]) :: SparkEx.DataFrame.t()
Computes the minimum value for each group.
@spec pivot(t(), SparkEx.Column.t() | String.t(), [term()] | nil) :: t()
Pivots on a column, enabling pivot-style aggregation.
After calling pivot/3, use agg/2 to specify the aggregation.
Examples
df
|> DataFrame.group_by(["year"])
|> SparkEx.GroupedData.pivot("course", ["dotNET", "Java"])
|> SparkEx.GroupedData.agg([sum(col("earnings"))])
@spec sum(t(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]) :: SparkEx.DataFrame.t()
Computes the sum for each group.