Expression wrapper for Spark DataFrame columns.
A Column wraps an internal expression representation that gets encoded
into Spark Connect protobuf Expression messages by the PlanEncoder.
Columns are created via SparkEx.Functions constructors (col/1, lit/1, expr/1)
and combined using the operations defined here.
Examples
import SparkEx.Functions, only: [col: 1, lit: 1]
col("age") |> SparkEx.Column.gt(lit(18))
col("name") |> SparkEx.Column.alias_("user_name")
col("score") |> SparkEx.Column.desc()
Summary
Functions
Assigns an alias (name) to this column expression.
Logical AND.
Sort ascending (nulls first by default)
Sort ascending with nulls first
Sort ascending with nulls last
Alias for cast/2.
Returns true if the column value is between lower and upper (inclusive).
Bitwise AND.
Bitwise NOT.
Bitwise OR.
Bitwise XOR.
Casts the column to the given type.
String contains.
Sort descending (nulls last by default)
Sort descending with nulls first
Sort descending with nulls last
Division: col / other.
Drops fields from a struct column.
String ends with.
Alias for ends_with/2.
Equality: col == other.
Null-safe equality.
Extracts a field from a struct column by name.
Extracts a value from an array by index or from a map by key.
Greater than: col > other.
Greater than or equal: col >= other.
Case-insensitive LIKE.
Returns true if the column is NaN.
Returns true if the column is not null.
Returns true if the column is null.
Returns true if the column value is in the given list of values or subquery DataFrame.
SQL LIKE pattern match.
Less than: col < other.
Less than or equal: col <= other.
Subtraction: col - other.
Modulo.
Multiplication: col * other.
Alias for alias_/2.
Unary negation.
Not equal: col != other. Encodes as not(==(a, b)) matching PySpark.
Logical NOT.
Logical OR.
Adds a fallback value to a when/2 expression chain.
Marks this column for lateral join / generator context.
Defines a window specification for this column expression.
Addition: col + other.
Computes col raised to the given power.
Alias for pow/2.
Regex pattern match.
String starts with.
Alias for starts_with/2.
Returns a substring starting at pos for len characters.
Applies a transformation function to this column.
Try-casts the column to the given type. Returns null on cast failure instead of error.
Creates an initial when branch from a condition column.
Appends another condition/value branch to an existing when chain.
Adds or replaces a field in a struct column.
Types
@type expr() :: {:col, String.t()} | {:col, String.t(), term()} | {:lit, term()} | {:expr, String.t()} | {:col_regex, String.t()} | {:col_regex, String.t(), term()} | {:metadata_col, String.t()} | {:metadata_col, String.t(), term()} | {:fn, String.t(), [expr()], boolean()} | {:alias, expr(), String.t()} | {:alias, expr(), String.t(), String.t()} | {:sort_order, expr(), :asc | :desc, :nulls_first | :nulls_last | nil} | {:cast, expr(), String.t() | SparkEx.Types.data_type_proto()} | {:cast, expr(), String.t() | SparkEx.Types.data_type_proto(), :try} | {:star} | {:star, String.t()} | {:star, String.t() | nil, term()} | {:outer, expr()} | {:window, expr(), [expr()], [expr()], term()} | {:unresolved_extract_value, expr(), expr()} | {:update_fields, expr(), String.t(), expr() | nil} | {:lambda, expr(), [{:lambda_var, String.t()}]} | {:lambda_var, String.t()} | {:named_arg, String.t(), expr()} | {:call_function, String.t(), [expr()]} | {:subquery, atom(), term(), keyword()}
@type t() :: %SparkEx.Column{expr: expr()}
Functions
Assigns an alias (name) to this column expression.
Optionally accepts a metadata keyword with a JSON-serializable map.
Logical AND.
Sort ascending (nulls first by default)
Sort ascending with nulls first
Sort ascending with nulls last
@spec astype(t(), String.t() | SparkEx.Types.data_type_proto()) :: t()
Alias for cast/2.
Returns true if the column value is between lower and upper (inclusive).
Bitwise AND.
Bitwise NOT.
Bitwise OR.
Bitwise XOR.
@spec cast(t(), String.t() | SparkEx.Types.data_type_proto()) :: t()
Casts the column to the given type.
The type can be a Spark SQL type string (e.g. "int", "string", "double")
or a Spark Connect DataType protobuf struct.
String contains.
Sort descending (nulls last by default)
Sort descending with nulls first
Sort descending with nulls last
Division: col / other.
Drops fields from a struct column.
String ends with.
Alias for ends_with/2.
Equality: col == other.
Null-safe equality.
Extracts a field from a struct column by name.
Extracts a value from an array by index or from a map by key.
Greater than: col > other.
Greater than or equal: col >= other.
Case-insensitive LIKE.
Returns true if the column is NaN.
Returns true if the column is not null.
Returns true if the column is null.
@spec isin(t(), [term()] | SparkEx.DataFrame.t()) :: t()
Returns true if the column value is in the given list of values or subquery DataFrame.
SQL LIKE pattern match.
Less than: col < other.
Less than or equal: col <= other.
Subtraction: col - other.
Modulo.
Multiplication: col * other.
Alias for alias_/2.
Unary negation.
Not equal: col != other. Encodes as not(==(a, b)) matching PySpark.
Logical NOT.
Logical OR.
Adds a fallback value to a when/2 expression chain.
Marks this column for lateral join / generator context.
@spec over(t(), SparkEx.WindowSpec.t()) :: t()
Defines a window specification for this column expression.
Examples
import SparkEx.Functions, only: [col: 1]
w = SparkEx.Window.partition_by(["dept"]) |> SparkEx.WindowSpec.order_by(["salary"])
col("salary") |> SparkEx.Functions.row_number() |> SparkEx.Column.over(w)
Addition: col + other.
Computes col raised to the given power.
Alias for pow/2.
Regex pattern match.
String starts with.
Alias for starts_with/2.
Returns a substring starting at pos for len characters.
Applies a transformation function to this column.
Equivalent to PySpark's Column.transform(f) which delegates to
the transform SQL function.
Examples
col("arr") |> Column.transform(fn x -> Column.plus(x, lit(1)) end)
@spec try_cast(t(), String.t() | SparkEx.Types.data_type_proto()) :: t()
Try-casts the column to the given type. Returns null on cast failure instead of error.
Creates an initial when branch from a condition column.
Appends another condition/value branch to an existing when chain.
Adds or replaces a field in a struct column.