SparkEx.Functions (SparkEx v0.1.0)

Copy Markdown View Source

Expression constructors for Spark DataFrame operations.

Provides core constructors (col/1, lit/1, expr/1) and a comprehensive set of Spark SQL functions generated from a declarative registry.

These functions create SparkEx.Column structs that can be used in DataFrame transforms like select/2, filter/2, with_column/3, etc.

Examples

import SparkEx.Functions

df
|> SparkEx.DataFrame.select([col("name"), col("age")])
|> SparkEx.DataFrame.filter(col("age") |> SparkEx.Column.gt(lit(18)))

Summary

Functions

Computes the absolute value.

Computes inverse cosine.

Computes inverse hyperbolic cosine.

Adds months to date.

AES decrypts binary data.

AES encrypts binary data.

Aggregates elements in an array column using an initial value and a merge function.

Returns any value from the group. Optionally ignores null values.

Approximate count of distinct values.

Approximate percentile with accuracy parameter.

Creates array from columns.

Appends element to array.

Removes null values from array.

Checks if array contains value.

Removes duplicates from array.

Returns elements in first but not second array.

Inserts element at position in array.

Returns intersection of two arrays.

Joins array elements with delimiter.

Returns max element of array.

Returns min element of array.

Locates element in array (1-based).

Prepends element to array.

Removes all occurrences of element from array.

Creates array with element repeated n times.

Returns array size.

Sorts array in ascending order. Optional comparator function.

Returns union of two arrays.

Returns true if arrays have common elements.

Zips arrays into array of structs.

Sort ascending by the given column

Sort ascending with nulls first

Sort ascending with nulls last

ASCII value of first character.

Computes inverse sine.

Computes inverse hyperbolic sine.

Raises error if condition is false.

Computes atan2(y, x). Both arguments can be columns or numeric values.

Computes inverse tangent.

Computes inverse hyperbolic tangent.

Computes average.

Base64 encodes binary.

Binary string representation of integer.

Bitwise AND aggregate.

Counts number of set bits.

Returns the value of the bit at the given position.

Returns bit length of string.

Bitwise OR aggregate.

Bitwise XOR aggregate.

Aggregate AND of bitmaps.

Returns bit position within a bitmap bucket.

Returns bitmap bucket number.

Constructs a bitmap from bit positions.

Counts set bits in a bitmap.

Aggregate OR of bitmaps.

Bitwise NOT (standalone function).

True if all values are true.

True if any value is true.

Returns a DataFrame with a broadcast hint for join optimization.

Banker's rounding to scale decimal places.

Trims characters from both sides.

Returns the bucket number for a value and number of buckets.

Calls a function with positional and named arguments.

Calls a registered UDF by name with the given column arguments.

Computes cube root.

Computes ceiling.

Alias for ceil/1.

Returns character from ASCII code.

Character length of string.

Alias for char_/1.

Returns first non-null value.

Creates a column reference by name.

Applies collation to string.

Returns collation of string column.

Collects values into list.

Collects distinct values into set.

Concatenates columns.

Concatenates with separator.

Returns true if string contains substring.

Converts number between bases.

Converts timestamp between timezones. 2-arg form uses session timezone as source.

Pearson correlation.

Computes cosine.

Computes hyperbolic cosine.

Computes cotangent.

Counts non-null values.

Counts distinct non-null values.

Counts rows where condition is true.

Creates a count-min sketch of a column with given eps, confidence, and seed.

Population covariance.

Sample covariance.

CRC32 hash.

Creates map from key-value column pairs.

Computes cosecant.

Cumulative distribution within partition.

Returns current catalog name.

Returns current database name.

Returns current date.

Returns current time.

Returns current timestamp.

Returns current timezone string.

Returns current user name.

Adds days to date.

Formats date/timestamp with pattern.

Creates date from days since epoch.

Subtracts days from date.

Truncates date to specified unit.

Difference in days between dates.

Extracts day.

Returns day name.

Alias for day/1.

Day of week (1=Sun).

Day of year.

Extracts days from an interval expression.

Decodes binary with charset.

Converts radians to degrees.

Dense rank within partition.

Sort descending by the given column

Sort descending with nulls first

Sort descending with nulls last

e()

Returns Euler's number.

Returns element at index/key.

Returns the n-th input string.

Encodes string with charset.

Returns true if string ends with suffix.

Null-safe equality.

Alias for bool_and/1.

Returns true if any element in the array satisfies the predicate.

Computes exponential.

Creates a row for each array/map element.

Like explode but preserves nulls.

Computes exp(x) - 1.

Creates an expression from a SQL expression string.

Extracts date/time field.

Computes factorial.

Filters an array column using a predicate function.

Returns position of string in comma-delimited list.

Returns first value.

Flattens nested array.

Computes floor.

Returns true if all elements in the array satisfy the predicate.

Formats number with d decimal places.

printf-style formatting.

Decodes Avro binary using the provided JSON schema.

Parses a CSV string column into a struct column using the given schema.

Parses a JSON string column into a struct/array/map column using the given schema.

Decodes Protobuf binary using the provided message name and descriptor.

Converts unix timestamp to string. Always sends format (default "yyyy-MM-dd HH:mm:ss").

Converts UTC timestamp to timezone.

Parses an XML string column into a struct column using the given schema.

Returns element at index from array.

Extracts JSON object from path expression.

Returns greatest value.

Indicates whether column is aggregated in grouping set.

Grouping ID for grouping set.

Murmur3 hash of columns.

Hex string of integer/binary.

Computes histogram of column.

Aggregates values into an HLL sketch.

Estimates distinct count from an HLL sketch.

Unions two HLL sketches.

Aggregate union of HLL sketches.

Extracts hour.

Extracts hours from an interval expression.

Computes sqrt(a^2 + b^2).

Returns second value if first is null.

Case-insensitive LIKE. Optional escape character.

Title-cases string.

Explodes array of structs into columns.

Like inline but preserves nulls.

Length of current file block.

Start offset of current file block.

Name of file being read.

Position of first occurrence of substr.

Returns true if string is valid UTF-8.

Spark 3.5-compatible fallback for variant null checks.

True if NaN.

True if not null.

True if null.

Calls a JVM method.

Returns length of outermost JSON array.

Returns keys of outermost JSON object.

Extracts fields from a JSON string column.

Aggregates bigint values into a KLL sketch.

Aggregates double values into a KLL sketch.

Aggregates float values into a KLL sketch.

Returns n (number of items) from a KLL sketch (bigint).

Returns n (number of items) from a KLL sketch (double).

Returns n (number of items) from a KLL sketch (float).

Gets quantile from a KLL sketch (bigint).

Gets quantile from a KLL sketch (double).

Gets quantile from a KLL sketch (float).

Gets rank from a KLL sketch (bigint).

Gets rank from a KLL sketch (double).

Gets rank from a KLL sketch (float).

Merges KLL sketches (bigint).

Merges KLL sketches (double).

Merges KLL sketches (float).

Converts a KLL sketch (bigint) to a string.

Converts a KLL sketch (double) to a string.

Converts a KLL sketch (float) to a string.

Kurtosis.

Value at offset rows before current.

Returns last value.

Last day of month for date.

Alias for lower/1.

Value at offset rows after current.

Returns least value.

Returns leftmost n characters.

Returns length of string or binary.

Levenshtein edit distance between strings.

SQL LIKE pattern match. Optional escape character.

Concatenates values as string.

Concatenates distinct values as string.

Creates a literal value expression.

Alias for log/1.

Returns current local timestamp.

Locates position of substring in a string column. Optional pos start position (default 1).

Computes ln(1 + x).

Computes base-2 logarithm.

Computes base-10 logarithm.

Computes natural logarithm.

Computes logarithm with the specified base.

Converts to lowercase.

Left-pads string to length with pad string.

Left-trims whitespace or specified characters.

Creates date from year, month, day.

Creates a day-time interval from optional components.

Creates an interval from optional components.

Creates time from hour, minute, second.

Creates a timestamp from individual components or from date+time columns.

Creates a timestamp with local timezone from components.

Creates a timestamp without timezone from components.

Replaces invalid UTF-8 with replacement char.

Creates a year-month interval from optional components.

Concatenates maps.

Returns true if map contains the given key.

Returns map entries as array of structs.

Filters entries in a map column using a predicate on key and value.

Creates map from key and value arrays.

Creates map from array of entries.

Returns map keys.

Returns map values.

Merges two maps using a function on overlapping keys.

Masks string characters.

Computes maximum.

Value of first col at max of second.

MD5 hash.

Alias for avg/1.

Median value.

Computes minimum.

Value of first col at min of second.

Extracts minute.

Most frequent value in group.

Most frequent value in group. Optional deterministic parameter (Spark 4.x+).

Globally unique monotonically increasing ID.

Extracts month.

Returns month name.

Extracts months from an interval expression.

Returns the number of months between two dates.

Builds a named argument expression.

Creates struct with named fields.

Returns second value if first is NaN.

Returns negation.

Next day of week after date.

Returns the nth value in a window frame. Optionally ignores null values.

N-tile bucket number within partition.

Returns null if both values are equal.

Returns null if value is zero.

Returns second if first is not null, else third.

Returns second value if first is null.

Returns byte length of string.

Adds a fallback value to a when_/2 expression chain.

Overlays replace over src starting at pos for len characters.

Spark 3.5-compatible fallback: parse JSON text as generic JSON string value.

Extracts a part of a URL. Optional key for query string extraction.

Percent rank within partition.

Exact percentile. Supports single percentage or list/array of percentages.

Approximate percentile.

Returns pi.

Positive modulo.

Like explode but includes position.

Like posexplode but preserves nulls.

Returns position of substring.

Returns positive value.

Computes x raised to the power of y. Both arguments can be columns or numeric values.

Computes product of all values.

Extracts quarter.

Quotes a string for use in SQL.

Converts degrees to radians.

Raises a user-specified error message.

Random value in [0, 1). Auto-generates a random seed when none given. Pass an explicit seed for reproducible results.

Random value from standard normal distribution. Auto-generates a random seed when none given. Pass an explicit seed for reproducible results.

Generates random string of given length. Auto-generates seed when none given.

Rank within partition.

Calls a JVM method via reflection.

Counts regex pattern occurrences.

Extracts regex group.

Extracts all matches for regex group.

Returns position of first regex match.

Returns true if column matches regex.

Replaces regex matches.

Returns first substring matching regex.

Average of independent variable.

Average of dependent variable.

Count of non-null pairs.

Y-intercept of regression line.

Coefficient of determination.

Slope of regression line.

Sum of squares of independent variable.

Sum of products of deviations.

Sum of squares of dependent variable.

Repeats string n times.

Replaces occurrences of search string. When replacement is omitted, uses empty string.

Reverses string or array.

Returns rightmost n characters.

Rounds to nearest integer.

Regex pattern match.

Rounds to scale decimal places.

Row number within partition.

Right-pads string to length with pad string.

Right-trims whitespace or specified characters.

Returns DDL schema string of CSV string. Accepts optional options map.

Returns DDL schema string of JSON string. Accepts optional options map.

Spark 3.5-compatible fallback for schema_of_variant/1.

Spark 3.5-compatible fallback for schema_of_variant_agg/1.

Returns DDL schema string of XML string. Accepts optional options map.

Computes secant.

Extracts second.

Splits text into array of sentences.

Creates array of values from start to stop with optional step.

Returns session user name.

Generates session window for streaming aggregations.

SHA-1 hash.

SHA-2 hash with bit length.

Alias for sha1/1.

Bitwise left shift.

Bitwise right shift.

Bitwise unsigned right shift.

Returns randomly shuffled array. Optional seed parameter.

Alias for signum/1.

Computes sign.

Computes sine.

Computes hyperbolic sine.

Returns size of array or map.

Skewness.

Returns slice of array from start for length.

Alias for bool_or/1.

Soundex code.

Partition ID of each row.

Splits string by regex pattern.

Splits string and returns the field at index.

Computes square root.

Converts geometry/geography to WKB binary.

Creates geography from WKB binary.

Creates geometry from WKB binary.

Sets the SRID of a geometry.

Returns the SRID of a geometry.

Separates column into n rows.

Creates an unresolved star (*) expression for selecting all columns.

Returns true if string starts with prefix.

Alias for stddev/1.

Sample standard deviation.

Population standard deviation.

Creates map from delimited string.

Creates struct from columns.

Returns substring from pos. Optional len parameter.

Returns substring from pos for len.

Returns substring before count occurrences of delimiter.

Computes sum.

Computes sum of distinct values.

Computes tangent.

Computes hyperbolic tangent.

Computes difference of two theta sketches.

Intersects two theta sketches.

Aggregate intersection of theta sketches.

Aggregates values into a theta sketch.

Estimates distinct count from a theta sketch.

Unions two theta sketches.

Aggregate union of theta sketches.

Returns the difference between two times measured in the specified units.

Spark 3.5-compatible fallback for time_trunc/2.

Adds interval to timestamp.

Returns difference between timestamps in given unit.

Creates timestamp from microseconds.

Creates timestamp from milliseconds.

Creates timestamp from seconds.

Encodes a column to Avro binary using an optional JSON schema.

Converts to binary.

Converts to character string with format.

Converts a struct column to a CSV string.

Converts to date, optionally with format.

Converts a struct/array/map column to a JSON string.

Converts string to number with format.

Encodes a column to Protobuf binary using the provided message name and descriptor.

Spark 3.5-compatible fallback for to_time/1,2 via timestamp parsing and formatting.

Converts to timestamp, optionally with format.

Converts to timestamp with local timezone.

Converts to timestamp without timezone.

Converts timestamp to unix seconds.

Converts timestamp from timezone to UTC.

Spark 3.5-compatible fallback for to_variant_object/1.

Converts a struct column to an XML string.

Transforms each element in an array column using a function.

Transforms keys of a map column using a function on key and value.

Transforms values of a map column using a function on key and value.

Translates characters.

Trims whitespace or specified characters from both ends.

Truncates date to specified format.

Try addition, returns null on overflow.

Try AES decrypt, returns null on failure.

Try average, returns null on overflow.

Try division, returns null on division by zero.

Returns element at index/key, null on out of bounds.

Try version of make_interval/1 — returns null on invalid input.

Try version of make_timestamp/1 — returns null on invalid input.

Try version of make_timestamp_ltz/1 — returns null on invalid input.

Try version of make_timestamp_ntz/1 — returns null on invalid input.

Try modulo, returns null on division by zero.

Try multiplication, returns null on overflow.

Spark 3.5-compatible fallback for try_parse_json/1.

Try to extract a part of a URL, returns null on failure. Optional key for query string.

Try to call a JVM method, returns null on failure.

Try subtraction, returns null on overflow.

Try sum, returns null on overflow.

Try to convert to binary, returns null on failure.

Try to convert to date, returns null on failure.

Try to convert to number, returns null on failure.

Spark 3.5-compatible fallback for try_to_time/1,2 via try_to_timestamp.

Try to convert to timestamp, returns null on failure.

Try URL-decode, returns null on failure.

Validates UTF-8 and returns null on invalid.

Spark 3.5-compatible fallback for try_variant_get/3 using JSON path extraction.

Runtime data type string.

Alias for upper/1.

Decodes base64 string.

Decodes hex string to binary.

Random value uniformly distributed in [min, max). Auto-generates seed when none given.

Returns days since epoch for date.

Returns microseconds since epoch.

Returns milliseconds since epoch.

Returns seconds since epoch.

Converts timestamp to unix seconds. Can be called with no args for current timestamp.

Returns the value of a user-defined type (UDT) as its underlying SQL representation.

Converts to uppercase.

URL-decodes string.

URL-encodes string.

Generates a random UUID string.

Generates a random UUID string with deterministic seed (Spark 4.x+).

Validates UTF-8 and raises on invalid.

Population variance.

Sample variance.

Spark 3.5-compatible fallback for variant_get/3 using JSON path extraction.

Returns Spark version string.

Day of week (0=Mon, 6=Sun).

Week of year.

Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise/2 is not used, nil is returned for unmatched conditions.

Returns bucket number for value in equi-width histogram.

Generates tumbling or sliding time window column for streaming aggregations.

Extracts the time column from a window column.

Evaluates XPath expression returning array of strings.

Evaluates XPath expression returning boolean.

Evaluates XPath expression returning double.

Evaluates XPath expression returning float.

Evaluates XPath expression returning integer.

Evaluates XPath expression returning long.

Evaluates XPath expression returning short.

Evaluates XPath expression returning string.

xxHash64 of columns.

Extracts year.

Extracts years from an interval expression.

Returns zero if value is null.

Merges two arrays element-wise using a function.

Functions

abs(col)

Computes the absolute value.

Spark SQL function: abs

acos(col)

@spec acos(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes inverse cosine.

Spark SQL function: acos

acosh(col)

@spec acosh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes inverse hyperbolic cosine.

Spark SQL function: acosh

add_months(col, arg1)

@spec add_months(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Adds months to date.

Spark SQL function: add_months

aes_decrypt(cols)

@spec aes_decrypt([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

AES decrypts binary data.

Spark SQL function: aes_decrypt

aes_encrypt(cols)

@spec aes_encrypt([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

AES encrypts binary data.

Spark SQL function: aes_encrypt

aggregate(col, zero, func, finish \\ nil)

Aggregates elements in an array column using an initial value and a merge function.

The merge function receives two lambda variables: accumulator and element. An optional finish function can be applied to the final accumulator value.

Examples

aggregate(col("arr"), lit(0), fn acc, x -> Column.plus(acc, x) end)
aggregate(col("arr"), lit(0), fn acc, x -> Column.plus(acc, x) end, fn acc -> Column.cast(acc, "string") end)

any_value(col, ignore_nulls \\ false)

@spec any_value(SparkEx.Column.t() | String.t(), boolean()) :: SparkEx.Column.t()

Returns any value from the group. Optionally ignores null values.

approx_count_distinct(col, rsd \\ nil)

@spec approx_count_distinct(SparkEx.Column.t() | String.t(), float() | nil) ::
  SparkEx.Column.t()

Approximate count of distinct values.

Optionally accepts a relative standard deviation parameter.

Examples

approx_count_distinct(col("x"))
approx_count_distinct(col("x"), 0.05)

approx_percentile(col, arg1, arg2)

@spec approx_percentile(SparkEx.Column.t() | String.t(), term(), term()) ::
  SparkEx.Column.t()

Approximate percentile with accuracy parameter.

Spark SQL function: approx_percentile

array(cols)

@spec array([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Creates array from columns.

Spark SQL function: array

array_agg(col)

@spec array_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for collect_list/1.

array_append(col1, col2)

@spec array_append(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Appends element to array.

Spark SQL function: array_append

array_compact(col)

@spec array_compact(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Removes null values from array.

Spark SQL function: array_compact

array_contains(col1, col2)

@spec array_contains(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Checks if array contains value.

Spark SQL function: array_contains

array_distinct(col)

@spec array_distinct(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Removes duplicates from array.

Spark SQL function: array_distinct

array_except(col1, col2)

@spec array_except(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Returns elements in first but not second array.

Spark SQL function: array_except

array_insert(col1, col2, col3)

Inserts element at position in array.

Spark SQL function: array_insert

array_intersect(col1, col2)

@spec array_intersect(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t()
) ::
  SparkEx.Column.t()

Returns intersection of two arrays.

Spark SQL function: array_intersect

array_join(col, delimiter, null_replacement \\ nil)

@spec array_join(SparkEx.Column.t() | String.t(), String.t(), String.t() | nil) ::
  SparkEx.Column.t()

Joins array elements with delimiter.

Optionally accepts a null_replacement string.

Examples

array_join(col("arr"), ",")
array_join(col("arr"), ",", "NULL")

array_max(col)

@spec array_max(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns max element of array.

Spark SQL function: array_max

array_min(col)

@spec array_min(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns min element of array.

Spark SQL function: array_min

array_position(col1, col2)

@spec array_position(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Locates element in array (1-based).

Spark SQL function: array_position

array_prepend(col1, col2)

@spec array_prepend(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Prepends element to array.

Spark SQL function: array_prepend

array_remove(col1, col2)

@spec array_remove(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Removes all occurrences of element from array.

Spark SQL function: array_remove

array_repeat(col, arg1)

@spec array_repeat(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Creates array with element repeated n times.

Spark SQL function: array_repeat

array_size(col)

@spec array_size(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns array size.

Spark SQL function: array_size

array_sort(col)

@spec array_sort(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Sorts array in ascending order. Optional comparator function.

array_sort(col, func)

array_union(col1, col2)

@spec array_union(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Returns union of two arrays.

Spark SQL function: array_union

arrays_overlap(col1, col2)

@spec arrays_overlap(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Returns true if arrays have common elements.

Spark SQL function: arrays_overlap

arrays_zip(cols)

@spec arrays_zip([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Zips arrays into array of structs.

Spark SQL function: arrays_zip

asc(col)

Sort ascending by the given column

asc_nulls_first(col)

@spec asc_nulls_first(SparkEx.Column.t()) :: SparkEx.Column.t()

Sort ascending with nulls first

asc_nulls_last(col)

@spec asc_nulls_last(SparkEx.Column.t()) :: SparkEx.Column.t()

Sort ascending with nulls last

ascii(col)

@spec ascii(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

ASCII value of first character.

Spark SQL function: ascii

asin(col)

@spec asin(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes inverse sine.

Spark SQL function: asin

asinh(col)

@spec asinh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes inverse hyperbolic sine.

Spark SQL function: asinh

assert_true(col, err_msg \\ nil)

@spec assert_true(
  SparkEx.Column.t() | String.t(),
  String.t() | SparkEx.Column.t() | nil
) ::
  SparkEx.Column.t()

Raises error if condition is false.

Optionally accepts an error message.

Examples

assert_true(col("cond"))
assert_true(col("cond"), "Assertion failed!")

atan2(col1, col2)

Computes atan2(y, x). Both arguments can be columns or numeric values.

atan(col)

@spec atan(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes inverse tangent.

Spark SQL function: atan

atanh(col)

@spec atanh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes inverse hyperbolic tangent.

Spark SQL function: atanh

avg(col)

Computes average.

Spark SQL function: avg

base64(col)

@spec base64(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Base64 encodes binary.

Spark SQL function: base64

bin(col)

Binary string representation of integer.

Spark SQL function: bin

bit_and(col)

@spec bit_and(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Bitwise AND aggregate.

Spark SQL function: bit_and

bit_count(col)

@spec bit_count(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Counts number of set bits.

Spark SQL function: bit_count

bit_get(col, arg1)

@spec bit_get(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Returns the value of the bit at the given position.

Spark SQL function: bit_get

bit_length(col)

@spec bit_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns bit length of string.

Spark SQL function: bit_length

bit_or(col)

@spec bit_or(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Bitwise OR aggregate.

Spark SQL function: bit_or

bit_xor(col)

@spec bit_xor(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Bitwise XOR aggregate.

Spark SQL function: bit_xor

bitmap_and_agg(col)

@spec bitmap_and_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Aggregate AND of bitmaps.

Spark SQL function: bitmap_and_agg

bitmap_bit_position(col)

@spec bitmap_bit_position(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns bit position within a bitmap bucket.

Spark SQL function: bitmap_bit_position

bitmap_bucket_number(col)

@spec bitmap_bucket_number(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns bitmap bucket number.

Spark SQL function: bitmap_bucket_number

bitmap_construct_agg(col)

@spec bitmap_construct_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Constructs a bitmap from bit positions.

Spark SQL function: bitmap_construct_agg

bitmap_count(col)

@spec bitmap_count(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Counts set bits in a bitmap.

Spark SQL function: bitmap_count

bitmap_or_agg(col)

@spec bitmap_or_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Aggregate OR of bitmaps.

Spark SQL function: bitmap_or_agg

bitwise_not_(col)

@spec bitwise_not_(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Bitwise NOT (standalone function).

Spark SQL function: ~

bool_and(col)

@spec bool_and(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

True if all values are true.

Spark SQL function: bool_and

bool_or(col)

@spec bool_or(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

True if any value is true.

Spark SQL function: bool_or

broadcast(df)

@spec broadcast(SparkEx.DataFrame.t()) :: SparkEx.DataFrame.t()

Returns a DataFrame with a broadcast hint for join optimization.

Examples

broadcast(df)

bround(col, opts \\ [])

@spec bround(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Banker's rounding to scale decimal places.

Spark SQL function: bround

btrim(col, opts \\ [])

@spec btrim(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Trims characters from both sides.

Spark SQL function: btrim

bucket(num_buckets, col)

Returns the bucket number for a value and number of buckets.

call_function(name, args \\ [], named_args \\ [])

@spec call_function(String.t(), list(), list()) :: SparkEx.Column.t()

Calls a function with positional and named arguments.

call_udf(name, cols)

@spec call_udf(String.t(), [SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Calls a registered UDF by name with the given column arguments.

Equivalent to PySpark's call_udf.

cardinality(col)

@spec cardinality(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for size/1.

cbrt(col)

@spec cbrt(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes cube root.

Spark SQL function: cbrt

ceil(col)

@spec ceil(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes ceiling.

Spark SQL function: ceil

ceiling(col)

@spec ceiling(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for ceil/1.

char_(arg1)

@spec char_(term()) :: SparkEx.Column.t()

Returns character from ASCII code.

Spark SQL function: char

char_length(col)

@spec char_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Character length of string.

Spark SQL function: char_length

character_length(col)

@spec character_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for char_length/1.

chr(a)

@spec chr(term()) :: SparkEx.Column.t()

Alias for char_/1.

coalesce(cols)

@spec coalesce([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Returns first non-null value.

Spark SQL function: coalesce

col(name)

@spec col(String.t()) :: SparkEx.Column.t()

Creates a column reference by name.

Examples

col("age")
col("users.name")

collate(col, arg1)

@spec collate(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Applies collation to string.

Spark SQL function: collate

collation(col)

@spec collation(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns collation of string column.

Spark SQL function: collation

collect_list(col)

@spec collect_list(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Collects values into list.

Spark SQL function: collect_list

collect_set(col)

@spec collect_set(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Collects distinct values into set.

Spark SQL function: collect_set

concat(cols)

@spec concat([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Concatenates columns.

Spark SQL function: concat

concat_ws(lit_arg, cols)

@spec concat_ws(
  term(),
  SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]
) ::
  SparkEx.Column.t()

Concatenates with separator.

Spark SQL function: concat_ws

contains_(col1, col2)

Returns true if string contains substring.

Spark SQL function: contains

conv(col, arg1, arg2)

@spec conv(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()

Converts number between bases.

Spark SQL function: conv

convert_timezone(target_tz, source_ts)

@spec convert_timezone(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t()
) ::
  SparkEx.Column.t()

Converts timestamp between timezones. 2-arg form uses session timezone as source.

convert_timezone(source_tz, target_tz, source_ts)

@spec convert_timezone(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t()
) :: SparkEx.Column.t()

corr(col1, col2)

Pearson correlation.

Spark SQL function: corr

cos(col)

Computes cosine.

Spark SQL function: cos

cosh(col)

@spec cosh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes hyperbolic cosine.

Spark SQL function: cosh

cot(col)

Computes cotangent.

Spark SQL function: cot

count(col)

@spec count(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Counts non-null values.

Spark SQL function: count

count_distinct(cols)

@spec count_distinct(
  SparkEx.Column.t()
  | String.t()
  | [SparkEx.Column.t() | String.t()]
) ::
  SparkEx.Column.t()

Counts distinct non-null values.

Accepts a single column or a list of columns for multi-column distinct count.

Examples

count_distinct(col("x"))
count_distinct(["x", "y", "z"])

count_if(col)

@spec count_if(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Counts rows where condition is true.

Spark SQL function: count_if

count_min_sketch(col, arg1, arg2, arg3)

@spec count_min_sketch(SparkEx.Column.t() | String.t(), term(), term(), term()) ::
  SparkEx.Column.t()

Creates a count-min sketch of a column with given eps, confidence, and seed.

Spark SQL function: count_min_sketch

covar_pop(col1, col2)

Population covariance.

Spark SQL function: covar_pop

covar_samp(col1, col2)

Sample covariance.

Spark SQL function: covar_samp

crc32(col)

@spec crc32(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

CRC32 hash.

Spark SQL function: crc32

create_map(cols)

@spec create_map([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Creates map from key-value column pairs.

Spark SQL function: map

csc(col)

Computes cosecant.

Spark SQL function: csc

cume_dist()

@spec cume_dist() :: SparkEx.Column.t()

Cumulative distribution within partition.

Spark SQL function: cume_dist

curdate()

@spec curdate() :: SparkEx.Column.t()

Alias for current_date/0.

current_catalog()

@spec current_catalog() :: SparkEx.Column.t()

Returns current catalog name.

Spark SQL function: current_catalog

current_database()

@spec current_database() :: SparkEx.Column.t()

Returns current database name.

Spark SQL function: current_database

current_date()

@spec current_date() :: SparkEx.Column.t()

Returns current date.

Spark SQL function: current_date

current_schema()

@spec current_schema() :: SparkEx.Column.t()

Alias for current_database/0.

current_time()

@spec current_time() :: SparkEx.Column.t()

Returns current time.

Spark SQL function: current_time

current_timestamp()

@spec current_timestamp() :: SparkEx.Column.t()

Returns current timestamp.

Spark SQL function: current_timestamp

current_timezone()

@spec current_timezone() :: SparkEx.Column.t()

Returns current timezone string.

Spark SQL function: current_timezone

current_user_()

@spec current_user_() :: SparkEx.Column.t()

Returns current user name.

Spark SQL function: current_user

date_add(col, arg1)

@spec date_add(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Adds days to date.

Spark SQL function: date_add

date_diff(a, b)

Alias for datediff/2.

date_format(col, arg1)

@spec date_format(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Formats date/timestamp with pattern.

Spark SQL function: date_format

date_from_unix_date(col)

@spec date_from_unix_date(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Creates date from days since epoch.

Spark SQL function: date_from_unix_date

date_part(a, b)

Alias for extract/2.

date_sub(col, arg1)

@spec date_sub(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Subtracts days from date.

Spark SQL function: date_sub

date_trunc(lit_arg, cols)

@spec date_trunc(
  term(),
  SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]
) ::
  SparkEx.Column.t()

Truncates date to specified unit.

Spark SQL function: date_trunc

dateadd(col, a)

@spec dateadd(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Alias for date_add/2.

datediff(col1, col2)

Difference in days between dates.

Spark SQL function: datediff

datepart(a, b)

Alias for extract/2.

day(col)

Extracts day.

Spark SQL function: day

dayname(col)

@spec dayname(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns day name.

Spark SQL function: dayname

dayofmonth(col)

@spec dayofmonth(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for day/1.

dayofweek(col)

@spec dayofweek(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Day of week (1=Sun).

Spark SQL function: dayofweek

dayofyear(col)

@spec dayofyear(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Day of year.

Spark SQL function: dayofyear

days(col)

@spec days(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts days from an interval expression.

decode(col, arg1)

@spec decode(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Decodes binary with charset.

Spark SQL function: decode

degrees(col)

@spec degrees(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Converts radians to degrees.

Spark SQL function: degrees

dense_rank()

@spec dense_rank() :: SparkEx.Column.t()

Dense rank within partition.

Spark SQL function: dense_rank

desc(col)

@spec desc(SparkEx.Column.t()) :: SparkEx.Column.t()

Sort descending by the given column

desc_nulls_first(col)

@spec desc_nulls_first(SparkEx.Column.t()) :: SparkEx.Column.t()

Sort descending with nulls first

desc_nulls_last(col)

@spec desc_nulls_last(SparkEx.Column.t()) :: SparkEx.Column.t()

Sort descending with nulls last

e()

@spec e() :: SparkEx.Column.t()

Returns Euler's number.

Spark SQL function: e

element_at(col1, col2)

Returns element at index/key.

Spark SQL function: element_at

elt(cols)

@spec elt([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Returns the n-th input string.

Spark SQL function: elt

encode(col, arg1)

@spec encode(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Encodes string with charset.

Spark SQL function: encode

endswith(col1, col2)

Returns true if string ends with suffix.

Spark SQL function: endsWith

equal_null(col1, col2)

Null-safe equality.

Spark SQL function: equal_null

every(col)

@spec every(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for bool_and/1.

exists(col, func)

Returns true if any element in the array satisfies the predicate.

Examples

exists(col("arr"), fn x -> Column.gt(x, lit(0)) end)

exp(col)

Computes exponential.

Spark SQL function: exp

explode(col)

@spec explode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Creates a row for each array/map element.

Spark SQL function: explode

explode_outer(col)

@spec explode_outer(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Like explode but preserves nulls.

Spark SQL function: explode_outer

expm1(col)

@spec expm1(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes exp(x) - 1.

Spark SQL function: expm1

expr(expression)

@spec expr(String.t()) :: SparkEx.Column.t()

Creates an expression from a SQL expression string.

This is a convenient escape hatch for expressions that are easier to write in SQL syntax.

Examples

expr("age + 1")
expr("CASE WHEN age > 18 THEN 'adult' ELSE 'minor' END")

extract(col1, col2)

Extracts date/time field.

Spark SQL function: extract

factorial(col)

@spec factorial(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes factorial.

Spark SQL function: factorial

filter(col, func)

Filters an array column using a predicate function.

Examples

filter(col("arr"), fn x -> Column.gt(x, lit(0)) end)

find_in_set(col1, col2)

@spec find_in_set(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Returns position of string in comma-delimited list.

Spark SQL function: find_in_set

first(col, opts \\ [])

@spec first(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Returns first value.

Spark SQL function: first

first_value(col, opts \\ [])

@spec first_value(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Alias for first/2.

flatten(col)

@spec flatten(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Flattens nested array.

Spark SQL function: flatten

floor(col)

@spec floor(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes floor.

Spark SQL function: floor

forall(col, func)

Returns true if all elements in the array satisfy the predicate.

Examples

forall(col("arr"), fn x -> Column.gt(x, lit(0)) end)

format_number(col, arg1)

@spec format_number(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Formats number with d decimal places.

Spark SQL function: format_number

format_string(lit_arg, cols)

@spec format_string(
  term(),
  SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]
) ::
  SparkEx.Column.t()

printf-style formatting.

Spark SQL function: format_string

from_avro(col, json_schema, options \\ nil)

@spec from_avro(SparkEx.Column.t() | String.t(), String.t(), map() | nil) ::
  SparkEx.Column.t()

Decodes Avro binary using the provided JSON schema.

from_csv(col, schema, options \\ nil)

@spec from_csv(SparkEx.Column.t() | String.t(), String.t(), map() | nil) ::
  SparkEx.Column.t()

Parses a CSV string column into a struct column using the given schema.

Examples

from_csv(col("csv_str"), "a INT, b STRING")
from_csv(col("csv_str"), "a INT, b STRING", %{"sep" => "|"})

from_json(col, schema, options \\ nil)

Parses a JSON string column into a struct/array/map column using the given schema.

The schema can be a DDL string or a Spark DataType protobuf struct.

Examples

from_json(col("json_str"), "a INT, b STRING")
from_json(col("json_str"), "a INT", %{"mode" => "FAILFAST"})

from_protobuf(col, message_name, opts \\ [])

@spec from_protobuf(SparkEx.Column.t() | String.t(), String.t(), keyword()) ::
  SparkEx.Column.t()

Decodes Protobuf binary using the provided message name and descriptor.

Either desc_file_path or binary_descriptor_set can be provided (only one).

from_unixtime(col, format \\ "yyyy-MM-dd HH:mm:ss")

@spec from_unixtime(SparkEx.Column.t() | String.t(), String.t()) :: SparkEx.Column.t()

Converts unix timestamp to string. Always sends format (default "yyyy-MM-dd HH:mm:ss").

from_utc_timestamp(col, arg1)

@spec from_utc_timestamp(SparkEx.Column.t() | String.t(), term()) ::
  SparkEx.Column.t()

Converts UTC timestamp to timezone.

Spark SQL function: from_utc_timestamp

from_xml(col, schema, options \\ nil)

@spec from_xml(SparkEx.Column.t() | String.t(), String.t(), map() | nil) ::
  SparkEx.Column.t()

Parses an XML string column into a struct column using the given schema.

Examples

from_xml(col("xml_str"), "a INT, b STRING")
from_xml(col("xml_str"), "a INT, b STRING", %{"rowTag" => "item"})

get(col, arg1)

@spec get(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Returns element at index from array.

Spark SQL function: get

get_json_object(col, arg1)

@spec get_json_object(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Extracts JSON object from path expression.

Spark SQL function: get_json_object

getbit(col, a)

@spec getbit(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Alias for bit_get/2.

greatest(cols)

@spec greatest([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Returns greatest value.

Spark SQL function: greatest

grouping(col)

@spec grouping(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Indicates whether column is aggregated in grouping set.

Spark SQL function: grouping

grouping_id(cols)

@spec grouping_id([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Grouping ID for grouping set.

Spark SQL function: grouping_id

hash(cols)

@spec hash([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Murmur3 hash of columns.

Spark SQL function: hash

hex(col)

Hex string of integer/binary.

Spark SQL function: hex

histogram_numeric(col, arg1)

@spec histogram_numeric(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Computes histogram of column.

Spark SQL function: histogram_numeric

hll_sketch_agg(col, opts \\ [])

@spec hll_sketch_agg(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Aggregates values into an HLL sketch.

Spark SQL function: hll_sketch_agg

hll_sketch_estimate(col)

@spec hll_sketch_estimate(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Estimates distinct count from an HLL sketch.

Spark SQL function: hll_sketch_estimate

hll_union(col, opts \\ [])

@spec hll_union(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Unions two HLL sketches.

Spark SQL function: hll_union

hll_union_agg(col, opts \\ [])

@spec hll_union_agg(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Aggregate union of HLL sketches.

Spark SQL function: hll_union_agg

hour(col)

@spec hour(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts hour.

Spark SQL function: hour

hours(col)

@spec hours(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts hours from an interval expression.

hypot(col1, col2)

Computes sqrt(a^2 + b^2).

Spark SQL function: hypot

ifnull(col1, col2)

Returns second value if first is null.

Spark SQL function: ifnull

ilike_(col, pattern, escape \\ nil)

Case-insensitive LIKE. Optional escape character.

initcap(col)

@spec initcap(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Title-cases string.

Spark SQL function: initcap

inline(col)

@spec inline(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Explodes array of structs into columns.

Spark SQL function: inline

inline_outer(col)

@spec inline_outer(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Like inline but preserves nulls.

Spark SQL function: inline_outer

input_file_block_length()

@spec input_file_block_length() :: SparkEx.Column.t()

Length of current file block.

Spark SQL function: input_file_block_length

input_file_block_start()

@spec input_file_block_start() :: SparkEx.Column.t()

Start offset of current file block.

Spark SQL function: input_file_block_start

input_file_name()

@spec input_file_name() :: SparkEx.Column.t()

Name of file being read.

Spark SQL function: input_file_name

instr(col, arg1)

@spec instr(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Position of first occurrence of substr.

Spark SQL function: instr

is_valid_utf8(col)

@spec is_valid_utf8(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns true if string is valid UTF-8.

Spark SQL function: is_valid_utf8

is_variant_null(col)

@spec is_variant_null(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Spark 3.5-compatible fallback for variant null checks.

isnan(col)

@spec isnan(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

True if NaN.

Spark SQL function: isNaN

isnotnull(col)

@spec isnotnull(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

True if not null.

Spark SQL function: isNotNull

isnull(col)

@spec isnull(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

True if null.

Spark SQL function: isNull

java_method(cols)

@spec java_method([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Calls a JVM method.

Spark SQL function: java_method

json_array_length(col)

@spec json_array_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns length of outermost JSON array.

Spark SQL function: json_array_length

json_object_keys(col)

@spec json_object_keys(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns keys of outermost JSON object.

Spark SQL function: json_object_keys

json_tuple(col, fields)

@spec json_tuple(SparkEx.Column.t() | String.t(), [String.t()]) :: SparkEx.Column.t()

Extracts fields from a JSON string column.

First argument is the JSON column, remaining arguments are field name strings.

Examples

json_tuple(col("json_str"), ["name", "age"])

kll_sketch_agg_bigint(col, opts \\ [])

@spec kll_sketch_agg_bigint(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Aggregates bigint values into a KLL sketch.

Spark SQL function: kll_sketch_agg_bigint

kll_sketch_agg_double(col, opts \\ [])

@spec kll_sketch_agg_double(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Aggregates double values into a KLL sketch.

Spark SQL function: kll_sketch_agg_double

kll_sketch_agg_float(col, opts \\ [])

@spec kll_sketch_agg_float(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Aggregates float values into a KLL sketch.

Spark SQL function: kll_sketch_agg_float

kll_sketch_get_n_bigint(col)

@spec kll_sketch_get_n_bigint(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns n (number of items) from a KLL sketch (bigint).

Spark SQL function: kll_sketch_get_n_bigint

kll_sketch_get_n_double(col)

@spec kll_sketch_get_n_double(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns n (number of items) from a KLL sketch (double).

Spark SQL function: kll_sketch_get_n_double

kll_sketch_get_n_float(col)

@spec kll_sketch_get_n_float(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns n (number of items) from a KLL sketch (float).

Spark SQL function: kll_sketch_get_n_float

kll_sketch_get_quantile_bigint(col, arg1)

@spec kll_sketch_get_quantile_bigint(SparkEx.Column.t() | String.t(), term()) ::
  SparkEx.Column.t()

Gets quantile from a KLL sketch (bigint).

Spark SQL function: kll_sketch_get_quantile_bigint

kll_sketch_get_quantile_double(col, arg1)

@spec kll_sketch_get_quantile_double(SparkEx.Column.t() | String.t(), term()) ::
  SparkEx.Column.t()

Gets quantile from a KLL sketch (double).

Spark SQL function: kll_sketch_get_quantile_double

kll_sketch_get_quantile_float(col, arg1)

@spec kll_sketch_get_quantile_float(SparkEx.Column.t() | String.t(), term()) ::
  SparkEx.Column.t()

Gets quantile from a KLL sketch (float).

Spark SQL function: kll_sketch_get_quantile_float

kll_sketch_get_rank_bigint(col, arg1)

@spec kll_sketch_get_rank_bigint(SparkEx.Column.t() | String.t(), term()) ::
  SparkEx.Column.t()

Gets rank from a KLL sketch (bigint).

Spark SQL function: kll_sketch_get_rank_bigint

kll_sketch_get_rank_double(col, arg1)

@spec kll_sketch_get_rank_double(SparkEx.Column.t() | String.t(), term()) ::
  SparkEx.Column.t()

Gets rank from a KLL sketch (double).

Spark SQL function: kll_sketch_get_rank_double

kll_sketch_get_rank_float(col, arg1)

@spec kll_sketch_get_rank_float(SparkEx.Column.t() | String.t(), term()) ::
  SparkEx.Column.t()

Gets rank from a KLL sketch (float).

Spark SQL function: kll_sketch_get_rank_float

kll_sketch_merge_bigint(col1, col2)

@spec kll_sketch_merge_bigint(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t()
) ::
  SparkEx.Column.t()

Merges KLL sketches (bigint).

Spark SQL function: kll_sketch_merge_bigint

kll_sketch_merge_double(col1, col2)

@spec kll_sketch_merge_double(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t()
) ::
  SparkEx.Column.t()

Merges KLL sketches (double).

Spark SQL function: kll_sketch_merge_double

kll_sketch_merge_float(col1, col2)

@spec kll_sketch_merge_float(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t()
) ::
  SparkEx.Column.t()

Merges KLL sketches (float).

Spark SQL function: kll_sketch_merge_float

kll_sketch_to_string_bigint(col)

@spec kll_sketch_to_string_bigint(SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Converts a KLL sketch (bigint) to a string.

Spark SQL function: kll_sketch_to_string_bigint

kll_sketch_to_string_double(col)

@spec kll_sketch_to_string_double(SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Converts a KLL sketch (double) to a string.

Spark SQL function: kll_sketch_to_string_double

kll_sketch_to_string_float(col)

@spec kll_sketch_to_string_float(SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Converts a KLL sketch (float) to a string.

Spark SQL function: kll_sketch_to_string_float

kurtosis(col)

@spec kurtosis(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Kurtosis.

Spark SQL function: kurtosis

lag(col, opts \\ [])

@spec lag(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Value at offset rows before current.

Spark SQL function: lag

last(col, opts \\ [])

@spec last(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Returns last value.

Spark SQL function: last

last_day(col)

@spec last_day(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Last day of month for date.

Spark SQL function: last_day

last_value(col, opts \\ [])

@spec last_value(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Alias for last/2.

lcase(col)

@spec lcase(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for lower/1.

lead(col, opts \\ [])

@spec lead(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Value at offset rows after current.

Spark SQL function: lead

least(cols)

@spec least([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Returns least value.

Spark SQL function: least

left_(col1, col2)

Returns leftmost n characters.

Spark SQL function: left

length(col)

@spec length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns length of string or binary.

Spark SQL function: length

levenshtein(left, right, threshold \\ nil)

@spec levenshtein(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t(),
  integer() | nil
) ::
  SparkEx.Column.t()

Levenshtein edit distance between strings.

Optionally accepts a threshold parameter.

Examples

levenshtein(col("s1"), col("s2"))
levenshtein(col("s1"), col("s2"), 5)

like_(col, pattern, escape \\ nil)

SQL LIKE pattern match. Optional escape character.

listagg(col, opts \\ [])

@spec listagg(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Concatenates values as string.

Spark SQL function: listagg

listagg_distinct(col, opts \\ [])

@spec listagg_distinct(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Concatenates distinct values as string.

Spark SQL function: listagg

lit(col)

@spec lit(term()) :: SparkEx.Column.t()

Creates a literal value expression.

If a Column is passed, it is returned as-is (pass-through). Supports nil, booleans, integers, floats, and strings.

Examples

lit(42)
lit("hello")
lit(true)
lit(col("age"))  # returns the Column unchanged

ln(col)

Alias for log/1.

localtimestamp_()

@spec localtimestamp_() :: SparkEx.Column.t()

Returns current local timestamp.

Spark SQL function: localtimestamp

locate(substr, col, pos \\ 1)

@spec locate(String.t(), SparkEx.Column.t() | String.t(), integer()) ::
  SparkEx.Column.t()

Locates position of substring in a string column. Optional pos start position (default 1).

log1p(col)

@spec log1p(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes ln(1 + x).

Spark SQL function: log1p

log2(col)

@spec log2(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes base-2 logarithm.

Spark SQL function: log2

log10(col)

@spec log10(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes base-10 logarithm.

Spark SQL function: log10

log(col)

Computes natural logarithm.

Spark SQL function: ln

log(base, col)

Computes logarithm with the specified base.

log(col) is defined in the registry as natural log (ln). log(base, col) computes log_base(col).

Examples

log(2, col("x"))
log(10, col("x"))

lower(col)

@spec lower(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Converts to lowercase.

Spark SQL function: lower

lpad(col, arg1, arg2)

@spec lpad(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()

Left-pads string to length with pad string.

Spark SQL function: lpad

ltrim(col, trim_string \\ nil)

@spec ltrim(SparkEx.Column.t() | String.t(), String.t() | nil) :: SparkEx.Column.t()

Left-trims whitespace or specified characters.

make_date(col1, col2, col3)

Creates date from year, month, day.

Spark SQL function: make_date

make_dt_interval(opts \\ [])

@spec make_dt_interval(keyword()) :: SparkEx.Column.t()

Creates a day-time interval from optional components.

Options

  • :days — days column (default: lit(0))
  • :hours — hours column (default: lit(0))
  • :mins — minutes column (default: lit(0))
  • :secs — seconds column (default: lit(0))

make_interval(opts \\ [])

@spec make_interval(keyword()) :: SparkEx.Column.t()

Creates an interval from optional components.

Options

  • :years, :months, :weeks, :days, :hours, :mins, :secs All default to lit(0).

make_time(col1, col2, col3)

Creates time from hour, minute, second.

Spark SQL function: make_time

make_timestamp(cols_or_opts)

@spec make_timestamp([SparkEx.Column.t() | String.t()] | keyword()) ::
  SparkEx.Column.t()

Creates a timestamp from individual components or from date+time columns.

Examples

make_timestamp(col("y"), col("m"), col("d"), col("h"), col("min"), col("sec"))
make_timestamp(col("y"), col("m"), col("d"), col("h"), col("min"), col("sec"), col("tz"))
make_timestamp(date: col("d"), time: col("t"))
make_timestamp(date: col("d"), time: col("t"), timezone: col("tz"))

make_timestamp_ltz(cols)

@spec make_timestamp_ltz([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Creates a timestamp with local timezone from components.

Examples

make_timestamp_ltz([col("y"), col("m"), col("d"), col("h"), col("min"), col("sec")])

make_timestamp_ntz(cols)

@spec make_timestamp_ntz([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Creates a timestamp without timezone from components.

Examples

make_timestamp_ntz([col("y"), col("m"), col("d"), col("h"), col("min"), col("sec")])

make_valid_utf8(col)

@spec make_valid_utf8(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Replaces invalid UTF-8 with replacement char.

Spark SQL function: make_valid_utf8

make_ym_interval(opts \\ [])

@spec make_ym_interval(keyword()) :: SparkEx.Column.t()

Creates a year-month interval from optional components.

Options

  • :years — years column (default: lit(0))
  • :months — months column (default: lit(0))

map_concat(cols)

@spec map_concat([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Concatenates maps.

Spark SQL function: map_concat

map_contains_key(col1, col2)

@spec map_contains_key(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t()
) ::
  SparkEx.Column.t()

Returns true if map contains the given key.

Spark SQL function: map_contains_key

map_entries(col)

@spec map_entries(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns map entries as array of structs.

Spark SQL function: map_entries

map_filter(col, func)

Filters entries in a map column using a predicate on key and value.

Examples

map_filter(col("m"), fn k, v -> Column.gt(v, lit(0)) end)

map_from_arrays(col1, col2)

@spec map_from_arrays(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t()
) ::
  SparkEx.Column.t()

Creates map from key and value arrays.

Spark SQL function: map_from_arrays

map_from_entries(col)

@spec map_from_entries(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Creates map from array of entries.

Spark SQL function: map_from_entries

map_keys(col)

@spec map_keys(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns map keys.

Spark SQL function: map_keys

map_values(col)

@spec map_values(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns map values.

Spark SQL function: map_values

map_zip_with(col1, col2, func)

Merges two maps using a function on overlapping keys.

The function receives three lambda variables: key, value1, value2.

Examples

map_zip_with(col("m1"), col("m2"), fn k, v1, v2 -> Column.plus(v1, v2) end)

mask(col, opts \\ [])

@spec mask(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Masks string characters.

Spark SQL function: mask

max(col)

Computes maximum.

Spark SQL function: max

max_by(col1, col2)

Value of first col at max of second.

Spark SQL function: max_by

md5(col)

MD5 hash.

Spark SQL function: md5

mean(col)

@spec mean(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for avg/1.

median(col)

@spec median(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Median value.

Spark SQL function: median

min(col)

Computes minimum.

Spark SQL function: min

min_by(col1, col2)

Value of first col at min of second.

Spark SQL function: min_by

minute(col)

@spec minute(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts minute.

Spark SQL function: minute

mode(col)

@spec mode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Most frequent value in group.

mode(col, deterministic)

Most frequent value in group. Optional deterministic parameter (Spark 4.x+).

monotonically_increasing_id()

@spec monotonically_increasing_id() :: SparkEx.Column.t()

Globally unique monotonically increasing ID.

Spark SQL function: monotonically_increasing_id

month(col)

@spec month(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts month.

Spark SQL function: month

monthname(col)

@spec monthname(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns month name.

Spark SQL function: monthname

months(col)

@spec months(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts months from an interval expression.

months_between(date1, date2, round_off \\ true)

@spec months_between(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t(),
  boolean()
) ::
  SparkEx.Column.t()

Returns the number of months between two dates.

Always sends 3 arguments with roundOff defaulting to true.

Examples

months_between(col("d1"), col("d2"))
months_between(col("d1"), col("d2"), false)

named_arg(key, value)

@spec named_arg(String.t(), term()) :: SparkEx.Column.t()

Builds a named argument expression.

named_struct(cols)

@spec named_struct([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Creates struct with named fields.

Spark SQL function: named_struct

nanvl(col1, col2)

Returns second value if first is NaN.

Spark SQL function: nanvl

negative(col)

@spec negative(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns negation.

Spark SQL function: negative

next_day(col, arg1)

@spec next_day(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Next day of week after date.

Spark SQL function: next_day

now()

@spec now() :: SparkEx.Column.t()

Alias for current_timestamp/0.

nth_value(col, offset, ignore_nulls \\ false)

@spec nth_value(
  SparkEx.Column.t() | String.t(),
  integer() | SparkEx.Column.t(),
  boolean()
) ::
  SparkEx.Column.t()

Returns the nth value in a window frame. Optionally ignores null values.

ntile(arg1)

@spec ntile(term()) :: SparkEx.Column.t()

N-tile bucket number within partition.

Spark SQL function: ntile

nullif(col1, col2)

Returns null if both values are equal.

Spark SQL function: nullif

nullifzero(col)

@spec nullifzero(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns null if value is zero.

Spark SQL function: nullifzero

nvl2(col1, col2, col3)

Returns second if first is not null, else third.

Spark SQL function: nvl2

nvl(col1, col2)

Returns second value if first is null.

Spark SQL function: nvl

octet_length(col)

@spec octet_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns byte length of string.

Spark SQL function: octet_length

otherwise(when_col, value)

@spec otherwise(SparkEx.Column.t(), SparkEx.Column.t() | term()) :: SparkEx.Column.t()

Adds a fallback value to a when_/2 expression chain.

Examples

when_(col("score") |> Column.gt(90), lit("A"))
|> otherwise(lit("B"))

overlay(src, replace, pos, len \\ -1)

Overlays replace over src starting at pos for len characters.

All arguments accept Column or string column names. len defaults to -1 (replace entire match length).

parse_json(col)

@spec parse_json(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Spark 3.5-compatible fallback: parse JSON text as generic JSON string value.

parse_url(url, part, key \\ nil)

Extracts a part of a URL. Optional key for query string extraction.

percent_rank()

@spec percent_rank() :: SparkEx.Column.t()

Percent rank within partition.

Spark SQL function: percent_rank

percentile(col, percentage, frequency \\ 1)

@spec percentile(
  SparkEx.Column.t() | String.t(),
  number() | [number()],
  SparkEx.Column.t() | integer()
) ::
  SparkEx.Column.t()

Exact percentile. Supports single percentage or list/array of percentages.

Optional frequency parameter (default 1).

percentile_approx(col, arg1, arg2)

@spec percentile_approx(SparkEx.Column.t() | String.t(), term(), term()) ::
  SparkEx.Column.t()

Approximate percentile.

Spark SQL function: percentile_approx

pi()

@spec pi() :: SparkEx.Column.t()

Returns pi.

Spark SQL function: pi

pmod(col1, col2)

Positive modulo.

Spark SQL function: pmod

posexplode(col)

@spec posexplode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Like explode but includes position.

Spark SQL function: posexplode

posexplode_outer(col)

@spec posexplode_outer(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Like posexplode but preserves nulls.

Spark SQL function: posexplode_outer

position(col, arg1)

@spec position(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Returns position of substring.

Spark SQL function: position

positive(col)

@spec positive(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns positive value.

Spark SQL function: positive

pow(col1, col2)

Computes x raised to the power of y. Both arguments can be columns or numeric values.

power(col1, col2)

Alias for pow/2.

printf(a, cols)

@spec printf(
  term(),
  SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]
) ::
  SparkEx.Column.t()

Alias for format_string/2.

product(col)

@spec product(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes product of all values.

Spark SQL function: product

quarter(col)

@spec quarter(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts quarter.

Spark SQL function: quarter

quote_(col)

@spec quote_(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Quotes a string for use in SQL.

Spark SQL function: quote

radians(col)

@spec radians(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Converts degrees to radians.

Spark SQL function: radians

raise_error(col)

@spec raise_error(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Raises a user-specified error message.

Spark SQL function: raise_error

rand(seed \\ nil)

@spec rand(integer() | nil | keyword()) :: SparkEx.Column.t()

Random value in [0, 1). Auto-generates a random seed when none given. Pass an explicit seed for reproducible results.

randn(seed \\ nil)

@spec randn(integer() | nil | keyword()) :: SparkEx.Column.t()

Random value from standard normal distribution. Auto-generates a random seed when none given. Pass an explicit seed for reproducible results.

randstr(length, charset_or_seed \\ nil, seed \\ nil)

@spec randstr(SparkEx.Column.t() | String.t(), term(), integer() | nil) ::
  SparkEx.Column.t()

Generates random string of given length. Auto-generates seed when none given.

rank()

@spec rank() :: SparkEx.Column.t()

Rank within partition.

Spark SQL function: rank

reduce(col, zero, func, finish \\ nil)

Alias for aggregate/3.

reflect_(cols)

@spec reflect_([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Calls a JVM method via reflection.

Spark SQL function: reflect

regexp(col, a)

@spec regexp(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Alias for regexp_like/2.

regexp_count(col1, col2)

@spec regexp_count(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Counts regex pattern occurrences.

Spark SQL function: regexp_count

regexp_extract(col, arg1, arg2)

@spec regexp_extract(SparkEx.Column.t() | String.t(), term(), term()) ::
  SparkEx.Column.t()

Extracts regex group.

Spark SQL function: regexp_extract

regexp_extract_all(col, arg1, arg2)

@spec regexp_extract_all(SparkEx.Column.t() | String.t(), term(), term()) ::
  SparkEx.Column.t()

Extracts all matches for regex group.

Spark SQL function: regexp_extract_all

regexp_instr(col, arg1)

@spec regexp_instr(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Returns position of first regex match.

Spark SQL function: regexp_instr

regexp_like(col, arg1)

@spec regexp_like(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Returns true if column matches regex.

Spark SQL function: regexp_like

regexp_replace(col, arg1, arg2)

@spec regexp_replace(SparkEx.Column.t() | String.t(), term(), term()) ::
  SparkEx.Column.t()

Replaces regex matches.

Spark SQL function: regexp_replace

regexp_substr(col, arg1)

@spec regexp_substr(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Returns first substring matching regex.

Spark SQL function: regexp_substr

regr_avgx(col1, col2)

Average of independent variable.

Spark SQL function: regr_avgx

regr_avgy(col1, col2)

Average of dependent variable.

Spark SQL function: regr_avgy

regr_count(col1, col2)

Count of non-null pairs.

Spark SQL function: regr_count

regr_intercept(col1, col2)

@spec regr_intercept(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Y-intercept of regression line.

Spark SQL function: regr_intercept

regr_r2(col1, col2)

Coefficient of determination.

Spark SQL function: regr_r2

regr_slope(col1, col2)

Slope of regression line.

Spark SQL function: regr_slope

regr_sxx(col1, col2)

Sum of squares of independent variable.

Spark SQL function: regr_sxx

regr_sxy(col1, col2)

Sum of products of deviations.

Spark SQL function: regr_sxy

regr_syy(col1, col2)

Sum of squares of dependent variable.

Spark SQL function: regr_syy

repeat(col, arg1)

@spec repeat(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Repeats string n times.

Spark SQL function: repeat

replace(src, search, replacement \\ "")

Replaces occurrences of search string. When replacement is omitted, uses empty string.

reverse(col)

@spec reverse(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Reverses string or array.

Spark SQL function: reverse

right_(col1, col2)

Returns rightmost n characters.

Spark SQL function: right

rint(col)

@spec rint(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Rounds to nearest integer.

Spark SQL function: rint

rlike_(col1, col2)

Regex pattern match.

Spark SQL function: rlike

round(col, opts \\ [])

@spec round(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Rounds to scale decimal places.

Spark SQL function: round

row_number()

@spec row_number() :: SparkEx.Column.t()

Row number within partition.

Spark SQL function: row_number

rpad(col, arg1, arg2)

@spec rpad(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()

Right-pads string to length with pad string.

Spark SQL function: rpad

rtrim(col, trim_string \\ nil)

@spec rtrim(SparkEx.Column.t() | String.t(), String.t() | nil) :: SparkEx.Column.t()

Right-trims whitespace or specified characters.

schema_of_csv(col, options \\ nil)

@spec schema_of_csv(SparkEx.Column.t() | String.t(), map() | nil) ::
  SparkEx.Column.t()

Returns DDL schema string of CSV string. Accepts optional options map.

schema_of_json(col, options \\ nil)

@spec schema_of_json(SparkEx.Column.t() | String.t(), map() | nil) ::
  SparkEx.Column.t()

Returns DDL schema string of JSON string. Accepts optional options map.

schema_of_variant(col)

@spec schema_of_variant(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Spark 3.5-compatible fallback for schema_of_variant/1.

schema_of_variant_agg(col)

@spec schema_of_variant_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Spark 3.5-compatible fallback for schema_of_variant_agg/1.

schema_of_xml(col, options \\ nil)

@spec schema_of_xml(SparkEx.Column.t() | String.t(), map() | nil) ::
  SparkEx.Column.t()

Returns DDL schema string of XML string. Accepts optional options map.

sec(col)

Computes secant.

Spark SQL function: sec

second(col)

@spec second(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts second.

Spark SQL function: second

sentences(col, language \\ nil, country \\ nil)

@spec sentences(SparkEx.Column.t() | String.t(), String.t() | nil, String.t() | nil) ::
  SparkEx.Column.t()

Splits text into array of sentences.

Optionally accepts language and country parameters.

Examples

sentences(col("text"))
sentences(col("text"), "en", "US")

sequence(start, stop, step \\ nil)

Creates array of values from start to stop with optional step.

Examples

sequence(col("start"), col("stop"))
sequence(col("start"), col("stop"), col("step"))

session_user_()

@spec session_user_() :: SparkEx.Column.t()

Returns session user name.

Spark SQL function: session_user

session_window(col1, col2)

@spec session_window(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Generates session window for streaming aggregations.

Spark SQL function: session_window

sha1(col)

@spec sha1(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

SHA-1 hash.

Spark SQL function: sha1

sha2(col, arg1)

@spec sha2(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

SHA-2 hash with bit length.

Spark SQL function: sha2

sha(col)

Alias for sha1/1.

shiftleft(col, arg1)

@spec shiftleft(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Bitwise left shift.

Spark SQL function: shiftleft

shiftright(col, arg1)

@spec shiftright(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Bitwise right shift.

Spark SQL function: shiftright

shiftrightunsigned(col, arg1)

@spec shiftrightunsigned(SparkEx.Column.t() | String.t(), term()) ::
  SparkEx.Column.t()

Bitwise unsigned right shift.

Spark SQL function: shiftrightunsigned

shuffle(col)

@spec shuffle(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns randomly shuffled array. Optional seed parameter.

shuffle(col, seed)

@spec shuffle(SparkEx.Column.t() | String.t(), integer()) :: SparkEx.Column.t()

sign(col)

@spec sign(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for signum/1.

signum(col)

@spec signum(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes sign.

Spark SQL function: signum

sin(col)

Computes sine.

Spark SQL function: sin

sinh(col)

@spec sinh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes hyperbolic sine.

Spark SQL function: sinh

size(col)

@spec size(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns size of array or map.

Spark SQL function: size

skewness(col)

@spec skewness(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Skewness.

Spark SQL function: skewness

slice(col, arg1, arg2)

@spec slice(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()

Returns slice of array from start for length.

Spark SQL function: slice

some(col)

@spec some(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for bool_or/1.

sort_array(col, opts \\ [])

@spec sort_array(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Sorts array.

Spark SQL function: sort_array

soundex(col)

@spec soundex(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Soundex code.

Spark SQL function: soundex

spark_partition_id()

@spec spark_partition_id() :: SparkEx.Column.t()

Partition ID of each row.

Spark SQL function: spark_partition_id

split(col, pattern, limit \\ nil)

@spec split(SparkEx.Column.t() | String.t(), String.t(), integer() | nil) ::
  SparkEx.Column.t()

Splits string by regex pattern.

Examples

split(col("s"), "\\.")
split(col("s"), "\\.", 3)

split_part(col, arg1, arg2)

@spec split_part(SparkEx.Column.t() | String.t(), term(), term()) ::
  SparkEx.Column.t()

Splits string and returns the field at index.

Spark SQL function: split_part

sqrt(col)

@spec sqrt(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes square root.

Spark SQL function: sqrt

st_asbinary(col)

@spec st_asbinary(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Converts geometry/geography to WKB binary.

Spark SQL function: ST_AsBinary

st_geogfromwkb(col)

@spec st_geogfromwkb(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Creates geography from WKB binary.

Spark SQL function: ST_GeogFromWKB

st_geomfromwkb(col)

@spec st_geomfromwkb(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Creates geometry from WKB binary.

Spark SQL function: ST_GeomFromWKB

st_setsrid(col, arg1)

@spec st_setsrid(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Sets the SRID of a geometry.

Spark SQL function: ST_SetSRID

st_srid(col)

@spec st_srid(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns the SRID of a geometry.

Spark SQL function: ST_SRID

stack(cols)

@spec stack([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Separates column into n rows.

Spark SQL function: stack

star()

@spec star() :: SparkEx.Column.t()

Creates an unresolved star (*) expression for selecting all columns.

startswith(col1, col2)

Returns true if string starts with prefix.

Spark SQL function: startsWith

std(col)

Alias for stddev/1.

stddev(col)

@spec stddev(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Sample standard deviation.

Spark SQL function: stddev

stddev_pop(col)

@spec stddev_pop(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Population standard deviation.

Spark SQL function: stddev_pop

stddev_samp(col)

@spec stddev_samp(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for stddev/1.

str_to_map(col, opts \\ [])

@spec str_to_map(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Creates map from delimited string.

Spark SQL function: str_to_map

string_agg(col, opts \\ [])

@spec string_agg(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Alias for listagg/2.

string_agg_distinct(col, opts \\ [])

@spec string_agg_distinct(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Alias for listagg_distinct/2.

struct(cols)

@spec struct([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Creates struct from columns.

Spark SQL function: struct

substr_(str, pos, len \\ nil)

Returns substring from pos. Optional len parameter.

substring(col, arg1, arg2)

@spec substring(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()

Returns substring from pos for len.

Spark SQL function: substring

substring_index(col, arg1, arg2)

@spec substring_index(SparkEx.Column.t() | String.t(), term(), term()) ::
  SparkEx.Column.t()

Returns substring before count occurrences of delimiter.

Spark SQL function: substring_index

sum(col)

Computes sum.

Spark SQL function: sum

sum_distinct(col)

@spec sum_distinct(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes sum of distinct values.

Spark SQL function: sum

tan(col)

Computes tangent.

Spark SQL function: tan

tanh(col)

@spec tanh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Computes hyperbolic tangent.

Spark SQL function: tanh

theta_difference(col, opts \\ [])

@spec theta_difference(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Computes difference of two theta sketches.

Spark SQL function: theta_difference

theta_intersection(col, opts \\ [])

@spec theta_intersection(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Intersects two theta sketches.

Spark SQL function: theta_intersection

theta_intersection_agg(col, opts \\ [])

@spec theta_intersection_agg(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Aggregate intersection of theta sketches.

Spark SQL function: theta_intersection_agg

theta_sketch_agg(col, opts \\ [])

@spec theta_sketch_agg(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Aggregates values into a theta sketch.

Spark SQL function: theta_sketch_agg

theta_sketch_estimate(col)

@spec theta_sketch_estimate(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Estimates distinct count from a theta sketch.

Spark SQL function: theta_sketch_estimate

theta_union(col, opts \\ [])

@spec theta_union(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Unions two theta sketches.

Spark SQL function: theta_union

theta_union_agg(col, opts \\ [])

@spec theta_union_agg(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Aggregate union of theta sketches.

Spark SQL function: theta_union_agg

time_diff(unit, start_time, end_time)

Returns the difference between two times measured in the specified units.

Spark 4.1+. Unit is passed as a column expression (use lit/1 for string literals). Supported units: "HOUR", "MINUTE", "SECOND", "MILLISECOND", "MICROSECOND".

Examples

time_diff(lit("HOUR"), col("start_time"), col("end_time"))

time_trunc(unit, time_col)

Spark 3.5-compatible fallback for time_trunc/2.

timestamp_add(lit_arg, cols)

@spec timestamp_add(
  term(),
  SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]
) ::
  SparkEx.Column.t()

Adds interval to timestamp.

Spark SQL function: timestampadd

timestamp_diff(lit_arg, cols)

@spec timestamp_diff(
  term(),
  SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()]
) ::
  SparkEx.Column.t()

Returns difference between timestamps in given unit.

Spark SQL function: timestampdiff

timestamp_micros(col)

@spec timestamp_micros(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Creates timestamp from microseconds.

Spark SQL function: timestamp_micros

timestamp_millis(col)

@spec timestamp_millis(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Creates timestamp from milliseconds.

Spark SQL function: timestamp_millis

timestamp_seconds(col)

@spec timestamp_seconds(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Creates timestamp from seconds.

Spark SQL function: timestamp_seconds

to_avro(col, json_schema \\ nil)

@spec to_avro(SparkEx.Column.t() | String.t(), String.t() | nil) :: SparkEx.Column.t()

Encodes a column to Avro binary using an optional JSON schema.

to_binary(col, opts \\ [])

@spec to_binary(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Converts to binary.

Spark SQL function: to_binary

to_char_(col, arg1)

@spec to_char_(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Converts to character string with format.

Spark SQL function: to_char

to_csv(col, options \\ nil)

@spec to_csv(SparkEx.Column.t() | String.t(), map() | nil) :: SparkEx.Column.t()

Converts a struct column to a CSV string.

Examples

to_csv(col("struct_col"))
to_csv(col("struct_col"), %{"sep" => "|"})

to_date(col, opts \\ [])

@spec to_date(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Converts to date, optionally with format.

Spark SQL function: to_date

to_degrees(col)

@spec to_degrees(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for degrees/1.

to_json(col, options \\ nil)

@spec to_json(SparkEx.Column.t() | String.t(), map() | nil) :: SparkEx.Column.t()

Converts a struct/array/map column to a JSON string.

Examples

to_json(col("struct_col"))
to_json(col("struct_col"), %{"pretty" => "true"})

to_number(col, arg1)

@spec to_number(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Converts string to number with format.

Spark SQL function: to_number

to_protobuf(col, message_name, opts \\ [])

@spec to_protobuf(SparkEx.Column.t() | String.t(), String.t(), keyword()) ::
  SparkEx.Column.t()

Encodes a column to Protobuf binary using the provided message name and descriptor.

Either desc_file_path or binary_descriptor_set can be provided (only one).

to_radians(col)

@spec to_radians(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for radians/1.

to_time(col, opts \\ [])

@spec to_time(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Spark 3.5-compatible fallback for to_time/1,2 via timestamp parsing and formatting.

to_timestamp(col, opts \\ [])

@spec to_timestamp(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Converts to timestamp, optionally with format.

Spark SQL function: to_timestamp

to_timestamp_ltz(col, opts \\ [])

@spec to_timestamp_ltz(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Converts to timestamp with local timezone.

Spark SQL function: to_timestamp_ltz

to_timestamp_ntz(col, opts \\ [])

@spec to_timestamp_ntz(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Converts to timestamp without timezone.

Spark SQL function: to_timestamp_ntz

to_unix_timestamp(col, opts \\ [])

@spec to_unix_timestamp(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Converts timestamp to unix seconds.

Spark SQL function: to_unix_timestamp

to_utc_timestamp(col, arg1)

@spec to_utc_timestamp(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Converts timestamp from timezone to UTC.

Spark SQL function: to_utc_timestamp

to_varchar(col, a)

@spec to_varchar(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Alias for to_char_/2.

to_variant_object(col)

@spec to_variant_object(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Spark 3.5-compatible fallback for to_variant_object/1.

to_xml(col, options \\ nil)

@spec to_xml(SparkEx.Column.t() | String.t(), map() | nil) :: SparkEx.Column.t()

Converts a struct column to an XML string.

Examples

to_xml(col("struct_col"))
to_xml(col("struct_col"), %{"rowTag" => "item"})

transform(col, func)

Transforms each element in an array column using a function.

The function receives a lambda variable x representing each element.

Examples

transform(col("arr"), fn x -> Column.plus(x, lit(1)) end)

transform_keys(col, func)

Transforms keys of a map column using a function on key and value.

Examples

transform_keys(col("m"), fn k, v -> Column.plus(k, lit(1)) end)

transform_values(col, func)

Transforms values of a map column using a function on key and value.

Examples

transform_values(col("m"), fn k, v -> Column.plus(v, lit(1)) end)

translate(col, arg1, arg2)

@spec translate(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()

Translates characters.

Spark SQL function: translate

trim(col, trim_string \\ nil)

@spec trim(SparkEx.Column.t() | String.t(), String.t() | nil) :: SparkEx.Column.t()

Trims whitespace or specified characters from both ends.

trunc(col, arg1)

@spec trunc(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Truncates date to specified format.

Spark SQL function: trunc

try_add(col1, col2)

Try addition, returns null on overflow.

Spark SQL function: try_add

try_aes_decrypt(cols)

@spec try_aes_decrypt([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Try AES decrypt, returns null on failure.

Spark SQL function: try_aes_decrypt

try_avg(col)

@spec try_avg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Try average, returns null on overflow.

Spark SQL function: try_avg

try_divide(col1, col2)

Try division, returns null on division by zero.

Spark SQL function: try_divide

try_element_at(col1, col2)

@spec try_element_at(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Returns element at index/key, null on out of bounds.

Spark SQL function: try_element_at

try_make_interval(opts \\ [])

@spec try_make_interval(keyword()) :: SparkEx.Column.t()

Try version of make_interval/1 — returns null on invalid input.

try_make_timestamp(cols)

@spec try_make_timestamp([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Try version of make_timestamp/1 — returns null on invalid input.

try_make_timestamp_ltz(cols)

@spec try_make_timestamp_ltz([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Try version of make_timestamp_ltz/1 — returns null on invalid input.

try_make_timestamp_ntz(cols)

@spec try_make_timestamp_ntz([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Try version of make_timestamp_ntz/1 — returns null on invalid input.

try_mod(col1, col2)

Try modulo, returns null on division by zero.

Spark SQL function: try_mod

try_multiply(col1, col2)

@spec try_multiply(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Try multiplication, returns null on overflow.

Spark SQL function: try_multiply

try_parse_json(col)

@spec try_parse_json(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Spark 3.5-compatible fallback for try_parse_json/1.

try_parse_url(url, part, key \\ nil)

@spec try_parse_url(
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t(),
  SparkEx.Column.t() | String.t() | nil
) :: SparkEx.Column.t()

Try to extract a part of a URL, returns null on failure. Optional key for query string.

try_reflect(cols)

@spec try_reflect([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

Try to call a JVM method, returns null on failure.

Spark SQL function: try_reflect

try_subtract(col1, col2)

@spec try_subtract(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) ::
  SparkEx.Column.t()

Try subtraction, returns null on overflow.

Spark SQL function: try_subtract

try_sum(col)

@spec try_sum(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Try sum, returns null on overflow.

Spark SQL function: try_sum

try_to_binary(col, opts \\ [])

@spec try_to_binary(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Try to convert to binary, returns null on failure.

Spark SQL function: try_to_binary

try_to_date(col, opts \\ [])

@spec try_to_date(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Try to convert to date, returns null on failure.

Spark SQL function: try_to_date

try_to_number(col, arg1)

@spec try_to_number(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Try to convert to number, returns null on failure.

Spark SQL function: try_to_number

try_to_time(col, opts \\ [])

@spec try_to_time(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Spark 3.5-compatible fallback for try_to_time/1,2 via try_to_timestamp.

try_to_timestamp(col, opts \\ [])

@spec try_to_timestamp(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

Try to convert to timestamp, returns null on failure.

Spark SQL function: try_to_timestamp

try_url_decode(col)

@spec try_url_decode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Try URL-decode, returns null on failure.

Spark SQL function: try_url_decode

try_validate_utf8(col)

@spec try_validate_utf8(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Validates UTF-8 and returns null on invalid.

Spark SQL function: try_validate_utf8

try_variant_get(col, path, target_type)

Spark 3.5-compatible fallback for try_variant_get/3 using JSON path extraction.

typeof(col)

@spec typeof(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Runtime data type string.

Spark SQL function: typeof

ucase(col)

@spec ucase(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for upper/1.

unbase64(col)

@spec unbase64(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Decodes base64 string.

Spark SQL function: unbase64

unhex(col)

@spec unhex(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Decodes hex string to binary.

Spark SQL function: unhex

uniform(min, max, seed \\ nil)

@spec uniform(SparkEx.Column.t() | String.t(), term(), integer() | nil) ::
  SparkEx.Column.t()

Random value uniformly distributed in [min, max). Auto-generates seed when none given.

unix_date(col)

@spec unix_date(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns days since epoch for date.

Spark SQL function: unix_date

unix_micros(col)

@spec unix_micros(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns microseconds since epoch.

Spark SQL function: unix_micros

unix_millis(col)

@spec unix_millis(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns milliseconds since epoch.

Spark SQL function: unix_millis

unix_seconds(col)

@spec unix_seconds(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns seconds since epoch.

Spark SQL function: unix_seconds

unix_timestamp()

@spec unix_timestamp() :: SparkEx.Column.t()

Converts timestamp to unix seconds. Can be called with no args for current timestamp.

unix_timestamp(col, opts \\ [])

@spec unix_timestamp(
  SparkEx.Column.t() | String.t(),
  keyword()
) :: SparkEx.Column.t()

unwrap_udt(col)

@spec unwrap_udt(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns the value of a user-defined type (UDT) as its underlying SQL representation.

upper(col)

@spec upper(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Converts to uppercase.

Spark SQL function: upper

url_decode(col)

@spec url_decode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

URL-decodes string.

Spark SQL function: url_decode

url_encode(col)

@spec url_encode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

URL-encodes string.

Spark SQL function: url_encode

user_()

@spec user_() :: SparkEx.Column.t()

Alias for current_user_/0.

uuid()

@spec uuid() :: SparkEx.Column.t()

Generates a random UUID string.

uuid(seed)

@spec uuid(integer()) :: SparkEx.Column.t()

Generates a random UUID string with deterministic seed (Spark 4.x+).

validate_utf8(col)

@spec validate_utf8(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Validates UTF-8 and raises on invalid.

Spark SQL function: validate_utf8

var_pop(col)

@spec var_pop(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Population variance.

Spark SQL function: var_pop

var_samp(col)

@spec var_samp(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Alias for variance/1.

variance(col)

@spec variance(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Sample variance.

Spark SQL function: variance

variant_get(col, path, target_type)

Spark 3.5-compatible fallback for variant_get/3 using JSON path extraction.

version_()

@spec version_() :: SparkEx.Column.t()

Returns Spark version string.

Spark SQL function: version

weekday(col)

@spec weekday(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Day of week (0=Mon, 6=Sun).

Spark SQL function: weekday

weekofyear(col)

@spec weekofyear(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Week of year.

Spark SQL function: weekofyear

when_(condition, value)

Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise/2 is not used, nil is returned for unmatched conditions.

Equivalent to CASE WHEN condition THEN value END in SQL.

Examples

import SparkEx.Functions

when_(col("age") |> Column.lt(13), lit("child"))
|> otherwise(lit("adult"))

width_bucket(col, arg1, arg2, arg3)

@spec width_bucket(SparkEx.Column.t() | String.t(), term(), term(), term()) ::
  SparkEx.Column.t()

Returns bucket number for value in equi-width histogram.

Spark SQL function: width_bucket

window(time_col, window_duration, slide_duration \\ nil, start_time \\ nil)

@spec window(
  SparkEx.Column.t() | String.t(),
  String.t(),
  String.t() | nil,
  String.t() | nil
) ::
  SparkEx.Column.t()

Generates tumbling or sliding time window column for streaming aggregations.

Examples

window(col("timestamp"), "10 minutes")
window(col("timestamp"), "10 minutes", "5 minutes")
window(col("timestamp"), "10 minutes", "5 minutes", "2 minutes")

window_time(col)

@spec window_time(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts the time column from a window column.

Spark SQL function: window_time

xpath(col, arg1)

@spec xpath(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Evaluates XPath expression returning array of strings.

Spark SQL function: xpath

xpath_boolean(col, arg1)

@spec xpath_boolean(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Evaluates XPath expression returning boolean.

Spark SQL function: xpath_boolean

xpath_double(col, arg1)

@spec xpath_double(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Evaluates XPath expression returning double.

Spark SQL function: xpath_double

xpath_float(col, arg1)

@spec xpath_float(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Evaluates XPath expression returning float.

Spark SQL function: xpath_float

xpath_int(col, arg1)

@spec xpath_int(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Evaluates XPath expression returning integer.

Spark SQL function: xpath_int

xpath_long(col, arg1)

@spec xpath_long(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Evaluates XPath expression returning long.

Spark SQL function: xpath_long

xpath_number(col, a)

@spec xpath_number(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Alias for xpath_double/2.

xpath_short(col, arg1)

@spec xpath_short(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Evaluates XPath expression returning short.

Spark SQL function: xpath_short

xpath_string(col, arg1)

@spec xpath_string(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()

Evaluates XPath expression returning string.

Spark SQL function: xpath_string

xxhash64(cols)

@spec xxhash64([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()

xxHash64 of columns.

Spark SQL function: xxhash64

year(col)

@spec year(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts year.

Spark SQL function: year

years(col)

@spec years(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Extracts years from an interval expression.

zeroifnull(col)

@spec zeroifnull(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()

Returns zero if value is null.

Spark SQL function: zeroifnull

zip_with(col1, col2, func)

Merges two arrays element-wise using a function.

Examples

zip_with(col("a1"), col("a2"), fn x, y -> Column.plus(x, y) end)