SparkEx.Column (SparkEx v0.1.0)

Copy Markdown View Source

Expression wrapper for Spark DataFrame columns.

A Column wraps an internal expression representation that gets encoded into Spark Connect protobuf Expression messages by the PlanEncoder.

Columns are created via SparkEx.Functions constructors (col/1, lit/1, expr/1) and combined using the operations defined here.

Examples

import SparkEx.Functions, only: [col: 1, lit: 1]

col("age") |> SparkEx.Column.gt(lit(18))
col("name") |> SparkEx.Column.alias_("user_name")
col("score") |> SparkEx.Column.desc()

Summary

Functions

Assigns an alias (name) to this column expression.

Logical AND.

Sort ascending (nulls first by default)

Sort ascending with nulls first

Sort ascending with nulls last

Returns true if the column value is between lower and upper (inclusive).

Bitwise NOT.

Casts the column to the given type.

String contains.

Sort descending (nulls last by default)

Sort descending with nulls first

Sort descending with nulls last

Division: col / other.

Drops fields from a struct column.

String ends with.

Equality: col == other.

Null-safe equality.

Extracts a field from a struct column by name.

Extracts a value from an array by index or from a map by key.

Greater than: col > other.

Greater than or equal: col >= other.

Case-insensitive LIKE.

Returns true if the column is NaN.

Returns true if the column is not null.

Returns true if the column is null.

Returns true if the column value is in the given list of values or subquery DataFrame.

SQL LIKE pattern match.

Less than: col < other.

Less than or equal: col <= other.

Subtraction: col - other.

Multiplication: col * other.

Unary negation.

Not equal: col != other. Encodes as not(==(a, b)) matching PySpark.

Logical NOT.

Logical OR.

Adds a fallback value to a when/2 expression chain.

Marks this column for lateral join / generator context.

Defines a window specification for this column expression.

Addition: col + other.

Computes col raised to the given power.

Regex pattern match.

String starts with.

Returns a substring starting at pos for len characters.

Applies a transformation function to this column.

Try-casts the column to the given type. Returns null on cast failure instead of error.

Creates an initial when branch from a condition column.

Appends another condition/value branch to an existing when chain.

Adds or replaces a field in a struct column.

Types

expr()

@type expr() ::
  {:col, String.t()}
  | {:col, String.t(), term()}
  | {:lit, term()}
  | {:expr, String.t()}
  | {:col_regex, String.t()}
  | {:col_regex, String.t(), term()}
  | {:metadata_col, String.t()}
  | {:metadata_col, String.t(), term()}
  | {:fn, String.t(), [expr()], boolean()}
  | {:alias, expr(), String.t()}
  | {:alias, expr(), String.t(), String.t()}
  | {:sort_order, expr(), :asc | :desc, :nulls_first | :nulls_last | nil}
  | {:cast, expr(), String.t() | SparkEx.Types.data_type_proto()}
  | {:cast, expr(), String.t() | SparkEx.Types.data_type_proto(), :try}
  | {:star}
  | {:star, String.t()}
  | {:star, String.t() | nil, term()}
  | {:outer, expr()}
  | {:window, expr(), [expr()], [expr()], term()}
  | {:unresolved_extract_value, expr(), expr()}
  | {:update_fields, expr(), String.t(), expr() | nil}
  | {:lambda, expr(), [{:lambda_var, String.t()}]}
  | {:lambda_var, String.t()}
  | {:named_arg, String.t(), expr()}
  | {:call_function, String.t(), [expr()]}
  | {:subquery, atom(), term(), keyword()}

t()

@type t() :: %SparkEx.Column{expr: expr()}

Functions

alias_(col, name, opts \\ [])

@spec alias_(t(), String.t(), keyword()) :: t()

Assigns an alias (name) to this column expression.

Optionally accepts a metadata keyword with a JSON-serializable map.

and_(left, right)

@spec and_(t(), t() | term()) :: t()

Logical AND.

asc(col)

@spec asc(t()) :: t()

Sort ascending (nulls first by default)

asc_nulls_first(col)

@spec asc_nulls_first(t()) :: t()

Sort ascending with nulls first

asc_nulls_last(col)

@spec asc_nulls_last(t()) :: t()

Sort ascending with nulls last

astype(col, type_str)

@spec astype(t(), String.t() | SparkEx.Types.data_type_proto()) :: t()

Alias for cast/2.

between(col, lower, upper)

@spec between(t(), term(), term()) :: t()

Returns true if the column value is between lower and upper (inclusive).

bitwise_and(left, right)

@spec bitwise_and(t(), t() | term()) :: t()

Bitwise AND.

bitwise_not(col)

@spec bitwise_not(t()) :: t()

Bitwise NOT.

bitwise_or(left, right)

@spec bitwise_or(t(), t() | term()) :: t()

Bitwise OR.

bitwise_xor(left, right)

@spec bitwise_xor(t(), t() | term()) :: t()

Bitwise XOR.

cast(col, type_str)

@spec cast(t(), String.t() | SparkEx.Types.data_type_proto()) :: t()

Casts the column to the given type.

The type can be a Spark SQL type string (e.g. "int", "string", "double") or a Spark Connect DataType protobuf struct.

contains(left, right)

@spec contains(t(), t() | term()) :: t()

String contains.

desc(col)

@spec desc(t()) :: t()

Sort descending (nulls last by default)

desc_nulls_first(col)

@spec desc_nulls_first(t()) :: t()

Sort descending with nulls first

desc_nulls_last(col)

@spec desc_nulls_last(t()) :: t()

Sort descending with nulls last

divide(left, right)

@spec divide(t(), t() | term()) :: t()

Division: col / other.

drop_fields(col, field_names)

@spec drop_fields(t(), [String.t()]) :: t()

Drops fields from a struct column.

ends_with(left, right)

@spec ends_with(t(), t() | term()) :: t()

String ends with.

endswith(left, right)

@spec endswith(t(), t() | term()) :: t()

Alias for ends_with/2.

eq(left, right)

@spec eq(t(), t() | term()) :: t()

Equality: col == other.

eq_null_safe(left, right)

@spec eq_null_safe(t(), t() | term()) :: t()

Null-safe equality.

get_field(col, field_name)

@spec get_field(t(), String.t()) :: t()

Extracts a field from a struct column by name.

get_item(col, key)

@spec get_item(t(), t() | term()) :: t()

Extracts a value from an array by index or from a map by key.

gt(left, right)

@spec gt(t(), t() | term()) :: t()

Greater than: col > other.

gte(left, right)

@spec gte(t(), t() | term()) :: t()

Greater than or equal: col >= other.

ilike(left, right)

@spec ilike(t(), t() | term()) :: t()

Case-insensitive LIKE.

is_nan(col)

@spec is_nan(t()) :: t()

Returns true if the column is NaN.

is_not_null(col)

@spec is_not_null(t()) :: t()

Returns true if the column is not null.

is_null(col)

@spec is_null(t()) :: t()

Returns true if the column is null.

isin(col, df)

@spec isin(t(), [term()] | SparkEx.DataFrame.t()) :: t()

Returns true if the column value is in the given list of values or subquery DataFrame.

like(left, right)

@spec like(t(), t() | term()) :: t()

SQL LIKE pattern match.

lt(left, right)

@spec lt(t(), t() | term()) :: t()

Less than: col < other.

lte(left, right)

@spec lte(t(), t() | term()) :: t()

Less than or equal: col <= other.

minus(left, right)

@spec minus(t(), t() | term()) :: t()

Subtraction: col - other.

mod(left, right)

@spec mod(t(), t() | term()) :: t()

Modulo.

multiply(left, right)

@spec multiply(t(), t() | term()) :: t()

Multiplication: col * other.

name(col, alias_name)

@spec name(t(), String.t()) :: t()

Alias for alias_/2.

negate(col)

@spec negate(t()) :: t()

Unary negation.

neq(left, right)

@spec neq(t(), t() | term()) :: t()

Not equal: col != other. Encodes as not(==(a, b)) matching PySpark.

not_(col)

@spec not_(t()) :: t()

Logical NOT.

or_(left, right)

@spec or_(t(), t() | term()) :: t()

Logical OR.

otherwise(when_col, value)

@spec otherwise(t(), t() | term()) :: t()

Adds a fallback value to a when/2 expression chain.

outer(col)

@spec outer(t()) :: t()

Marks this column for lateral join / generator context.

over(col, spec)

@spec over(t(), SparkEx.WindowSpec.t()) :: t()

Defines a window specification for this column expression.

Examples

import SparkEx.Functions, only: [col: 1]

w = SparkEx.Window.partition_by(["dept"]) |> SparkEx.WindowSpec.order_by(["salary"])
col("salary") |> SparkEx.Functions.row_number() |> SparkEx.Column.over(w)

plus(left, right)

@spec plus(t(), t() | term()) :: t()

Addition: col + other.

pow(col, other)

@spec pow(t(), t() | term()) :: t()

Computes col raised to the given power.

power(col, other)

@spec power(t(), t() | term()) :: t()

Alias for pow/2.

rlike(left, right)

@spec rlike(t(), t() | term()) :: t()

Regex pattern match.

starts_with(left, right)

@spec starts_with(t(), t() | term()) :: t()

String starts with.

startswith(left, right)

@spec startswith(t(), t() | term()) :: t()

Alias for starts_with/2.

substr(col, pos, len)

@spec substr(t(), t() | integer(), t() | integer()) :: t()

Returns a substring starting at pos for len characters.

transform(col, func)

@spec transform(t(), (t() -> t())) :: t()

Applies a transformation function to this column.

Equivalent to PySpark's Column.transform(f) which delegates to the transform SQL function.

Examples

col("arr") |> Column.transform(fn x -> Column.plus(x, lit(1)) end)

try_cast(col, type_str)

@spec try_cast(t(), String.t() | SparkEx.Types.data_type_proto()) :: t()

Try-casts the column to the given type. Returns null on cast failure instead of error.

when_(col, value)

@spec when_(t(), t() | term()) :: t()

Creates an initial when branch from a condition column.

when_(column, condition, value)

@spec when_(t(), t(), t() | term()) :: t()

Appends another condition/value branch to an existing when chain.

with_field(col, field_name, value)

@spec with_field(t(), String.t(), t() | term()) :: t()

Adds or replaces a field in a struct column.