Expression constructors for Spark DataFrame operations.
Provides core constructors (col/1, lit/1, expr/1) and a comprehensive
set of Spark SQL functions generated from a declarative registry.
These functions create SparkEx.Column structs that can be used in
DataFrame transforms like select/2, filter/2, with_column/3, etc.
Examples
import SparkEx.Functions
df
|> SparkEx.DataFrame.select([col("name"), col("age")])
|> SparkEx.DataFrame.filter(col("age") |> SparkEx.Column.gt(lit(18)))
Summary
Functions
Computes the absolute value.
Computes inverse cosine.
Computes inverse hyperbolic cosine.
Adds months to date.
AES decrypts binary data.
AES encrypts binary data.
Aggregates elements in an array column using an initial value and a merge function.
Returns any value from the group. Optionally ignores null values.
Approximate count of distinct values.
Approximate percentile with accuracy parameter.
Creates array from columns.
Alias for collect_list/1.
Appends element to array.
Removes null values from array.
Checks if array contains value.
Removes duplicates from array.
Returns elements in first but not second array.
Inserts element at position in array.
Returns intersection of two arrays.
Joins array elements with delimiter.
Returns max element of array.
Returns min element of array.
Locates element in array (1-based).
Prepends element to array.
Removes all occurrences of element from array.
Creates array with element repeated n times.
Returns array size.
Sorts array in ascending order. Optional comparator function.
Returns union of two arrays.
Returns true if arrays have common elements.
Zips arrays into array of structs.
Sort ascending by the given column
Sort ascending with nulls first
Sort ascending with nulls last
ASCII value of first character.
Computes inverse sine.
Computes inverse hyperbolic sine.
Raises error if condition is false.
Computes atan2(y, x). Both arguments can be columns or numeric values.
Computes inverse tangent.
Computes inverse hyperbolic tangent.
Computes average.
Base64 encodes binary.
Binary string representation of integer.
Bitwise AND aggregate.
Counts number of set bits.
Returns the value of the bit at the given position.
Returns bit length of string.
Bitwise OR aggregate.
Bitwise XOR aggregate.
Aggregate AND of bitmaps.
Returns bit position within a bitmap bucket.
Returns bitmap bucket number.
Constructs a bitmap from bit positions.
Counts set bits in a bitmap.
Aggregate OR of bitmaps.
Bitwise NOT (standalone function).
True if all values are true.
True if any value is true.
Returns a DataFrame with a broadcast hint for join optimization.
Banker's rounding to scale decimal places.
Trims characters from both sides.
Returns the bucket number for a value and number of buckets.
Calls a function with positional and named arguments.
Calls a registered UDF by name with the given column arguments.
Alias for size/1.
Computes cube root.
Computes ceiling.
Alias for ceil/1.
Returns character from ASCII code.
Character length of string.
Alias for char_length/1.
Returns first non-null value.
Creates a column reference by name.
Applies collation to string.
Returns collation of string column.
Collects values into list.
Collects distinct values into set.
Concatenates columns.
Concatenates with separator.
Returns true if string contains substring.
Converts number between bases.
Converts timestamp between timezones. 2-arg form uses session timezone as source.
Pearson correlation.
Computes cosine.
Computes hyperbolic cosine.
Computes cotangent.
Counts non-null values.
Counts distinct non-null values.
Counts rows where condition is true.
Creates a count-min sketch of a column with given eps, confidence, and seed.
Population covariance.
Sample covariance.
CRC32 hash.
Creates map from key-value column pairs.
Computes cosecant.
Cumulative distribution within partition.
Alias for current_date/0.
Returns current catalog name.
Returns current database name.
Returns current date.
Alias for current_database/0.
Returns current time.
Returns current timestamp.
Returns current timezone string.
Returns current user name.
Adds days to date.
Alias for datediff/2.
Formats date/timestamp with pattern.
Creates date from days since epoch.
Alias for extract/2.
Subtracts days from date.
Truncates date to specified unit.
Alias for date_add/2.
Difference in days between dates.
Alias for extract/2.
Extracts day.
Returns day name.
Alias for day/1.
Day of week (1=Sun).
Day of year.
Extracts days from an interval expression.
Decodes binary with charset.
Converts radians to degrees.
Dense rank within partition.
Sort descending by the given column
Sort descending with nulls first
Sort descending with nulls last
Returns Euler's number.
Returns element at index/key.
Returns the n-th input string.
Encodes string with charset.
Returns true if string ends with suffix.
Null-safe equality.
Alias for bool_and/1.
Returns true if any element in the array satisfies the predicate.
Computes exponential.
Creates a row for each array/map element.
Like explode but preserves nulls.
Computes exp(x) - 1.
Creates an expression from a SQL expression string.
Extracts date/time field.
Computes factorial.
Filters an array column using a predicate function.
Returns position of string in comma-delimited list.
Returns first value.
Alias for first/2.
Flattens nested array.
Computes floor.
Returns true if all elements in the array satisfy the predicate.
Formats number with d decimal places.
printf-style formatting.
Decodes Avro binary using the provided JSON schema.
Parses a CSV string column into a struct column using the given schema.
Parses a JSON string column into a struct/array/map column using the given schema.
Decodes Protobuf binary using the provided message name and descriptor.
Converts unix timestamp to string. Always sends format (default "yyyy-MM-dd HH:mm:ss").
Converts UTC timestamp to timezone.
Parses an XML string column into a struct column using the given schema.
Returns element at index from array.
Extracts JSON object from path expression.
Alias for bit_get/2.
Returns greatest value.
Indicates whether column is aggregated in grouping set.
Grouping ID for grouping set.
Murmur3 hash of columns.
Hex string of integer/binary.
Computes histogram of column.
Aggregates values into an HLL sketch.
Estimates distinct count from an HLL sketch.
Unions two HLL sketches.
Aggregate union of HLL sketches.
Extracts hour.
Extracts hours from an interval expression.
Computes sqrt(a^2 + b^2).
Returns second value if first is null.
Case-insensitive LIKE. Optional escape character.
Title-cases string.
Explodes array of structs into columns.
Like inline but preserves nulls.
Length of current file block.
Start offset of current file block.
Name of file being read.
Position of first occurrence of substr.
Returns true if string is valid UTF-8.
Spark 3.5-compatible fallback for variant null checks.
True if NaN.
True if not null.
True if null.
Calls a JVM method.
Returns length of outermost JSON array.
Returns keys of outermost JSON object.
Extracts fields from a JSON string column.
Aggregates bigint values into a KLL sketch.
Aggregates double values into a KLL sketch.
Aggregates float values into a KLL sketch.
Returns n (number of items) from a KLL sketch (bigint).
Returns n (number of items) from a KLL sketch (double).
Returns n (number of items) from a KLL sketch (float).
Gets quantile from a KLL sketch (bigint).
Gets quantile from a KLL sketch (double).
Gets quantile from a KLL sketch (float).
Gets rank from a KLL sketch (bigint).
Gets rank from a KLL sketch (double).
Gets rank from a KLL sketch (float).
Merges KLL sketches (bigint).
Merges KLL sketches (double).
Merges KLL sketches (float).
Converts a KLL sketch (bigint) to a string.
Converts a KLL sketch (double) to a string.
Converts a KLL sketch (float) to a string.
Kurtosis.
Value at offset rows before current.
Returns last value.
Last day of month for date.
Alias for last/2.
Alias for lower/1.
Value at offset rows after current.
Returns least value.
Returns leftmost n characters.
Returns length of string or binary.
Levenshtein edit distance between strings.
SQL LIKE pattern match. Optional escape character.
Concatenates values as string.
Concatenates distinct values as string.
Creates a literal value expression.
Returns current local timestamp.
Locates position of substring in a string column. Optional pos start position (default 1).
Computes ln(1 + x).
Computes base-2 logarithm.
Computes base-10 logarithm.
Computes natural logarithm.
Computes logarithm with the specified base.
Converts to lowercase.
Left-pads string to length with pad string.
Left-trims whitespace or specified characters.
Creates date from year, month, day.
Creates a day-time interval from optional components.
Creates an interval from optional components.
Creates time from hour, minute, second.
Creates a timestamp from individual components or from date+time columns.
Creates a timestamp with local timezone from components.
Creates a timestamp without timezone from components.
Replaces invalid UTF-8 with replacement char.
Creates a year-month interval from optional components.
Concatenates maps.
Returns true if map contains the given key.
Returns map entries as array of structs.
Filters entries in a map column using a predicate on key and value.
Creates map from key and value arrays.
Creates map from array of entries.
Returns map keys.
Returns map values.
Merges two maps using a function on overlapping keys.
Masks string characters.
Computes maximum.
Value of first col at max of second.
MD5 hash.
Median value.
Computes minimum.
Value of first col at min of second.
Extracts minute.
Most frequent value in group.
Most frequent value in group. Optional deterministic parameter (Spark 4.x+).
Globally unique monotonically increasing ID.
Extracts month.
Returns month name.
Extracts months from an interval expression.
Returns the number of months between two dates.
Builds a named argument expression.
Creates struct with named fields.
Returns second value if first is NaN.
Returns negation.
Next day of week after date.
Alias for current_timestamp/0.
Returns the nth value in a window frame. Optionally ignores null values.
N-tile bucket number within partition.
Returns null if both values are equal.
Returns null if value is zero.
Returns second if first is not null, else third.
Returns second value if first is null.
Returns byte length of string.
Adds a fallback value to a when_/2 expression chain.
Overlays replace over src starting at pos for len characters.
Spark 3.5-compatible fallback: parse JSON text as generic JSON string value.
Extracts a part of a URL. Optional key for query string extraction.
Percent rank within partition.
Exact percentile. Supports single percentage or list/array of percentages.
Approximate percentile.
Returns pi.
Positive modulo.
Like explode but includes position.
Like posexplode but preserves nulls.
Returns position of substring.
Returns positive value.
Computes x raised to the power of y. Both arguments can be columns or numeric values.
Alias for pow/2.
Alias for format_string/2.
Computes product of all values.
Extracts quarter.
Quotes a string for use in SQL.
Converts degrees to radians.
Raises a user-specified error message.
Random value in [0, 1). Auto-generates a random seed when none given. Pass an explicit seed for reproducible results.
Random value from standard normal distribution. Auto-generates a random seed when none given. Pass an explicit seed for reproducible results.
Generates random string of given length. Auto-generates seed when none given.
Rank within partition.
Calls a JVM method via reflection.
Alias for regexp_like/2.
Counts regex pattern occurrences.
Extracts regex group.
Extracts all matches for regex group.
Returns position of first regex match.
Returns true if column matches regex.
Replaces regex matches.
Returns first substring matching regex.
Average of independent variable.
Average of dependent variable.
Count of non-null pairs.
Y-intercept of regression line.
Coefficient of determination.
Slope of regression line.
Sum of squares of independent variable.
Sum of products of deviations.
Sum of squares of dependent variable.
Repeats string n times.
Replaces occurrences of search string. When replacement is omitted, uses empty string.
Reverses string or array.
Returns rightmost n characters.
Rounds to nearest integer.
Regex pattern match.
Rounds to scale decimal places.
Row number within partition.
Right-pads string to length with pad string.
Right-trims whitespace or specified characters.
Returns DDL schema string of CSV string. Accepts optional options map.
Returns DDL schema string of JSON string. Accepts optional options map.
Spark 3.5-compatible fallback for schema_of_variant/1.
Spark 3.5-compatible fallback for schema_of_variant_agg/1.
Returns DDL schema string of XML string. Accepts optional options map.
Computes secant.
Extracts second.
Splits text into array of sentences.
Creates array of values from start to stop with optional step.
Returns session user name.
Generates session window for streaming aggregations.
SHA-1 hash.
SHA-2 hash with bit length.
Bitwise left shift.
Bitwise right shift.
Bitwise unsigned right shift.
Returns randomly shuffled array. Optional seed parameter.
Computes sign.
Computes sine.
Computes hyperbolic sine.
Returns size of array or map.
Skewness.
Returns slice of array from start for length.
Sorts array.
Soundex code.
Partition ID of each row.
Splits string by regex pattern.
Splits string and returns the field at index.
Computes square root.
Converts geometry/geography to WKB binary.
Creates geography from WKB binary.
Creates geometry from WKB binary.
Sets the SRID of a geometry.
Returns the SRID of a geometry.
Separates column into n rows.
Creates an unresolved star (*) expression for selecting all columns.
Returns true if string starts with prefix.
Sample standard deviation.
Population standard deviation.
Alias for stddev/1.
Creates map from delimited string.
Alias for listagg/2.
Creates struct from columns.
Returns substring from pos. Optional len parameter.
Returns substring from pos for len.
Returns substring before count occurrences of delimiter.
Computes sum.
Computes sum of distinct values.
Computes tangent.
Computes hyperbolic tangent.
Computes difference of two theta sketches.
Intersects two theta sketches.
Aggregate intersection of theta sketches.
Aggregates values into a theta sketch.
Estimates distinct count from a theta sketch.
Unions two theta sketches.
Aggregate union of theta sketches.
Returns the difference between two times measured in the specified units.
Spark 3.5-compatible fallback for time_trunc/2.
Adds interval to timestamp.
Returns difference between timestamps in given unit.
Creates timestamp from microseconds.
Creates timestamp from milliseconds.
Creates timestamp from seconds.
Encodes a column to Avro binary using an optional JSON schema.
Converts to binary.
Converts to character string with format.
Converts a struct column to a CSV string.
Converts to date, optionally with format.
Alias for degrees/1.
Converts a struct/array/map column to a JSON string.
Converts string to number with format.
Encodes a column to Protobuf binary using the provided message name and descriptor.
Alias for radians/1.
Spark 3.5-compatible fallback for to_time/1,2 via timestamp parsing and formatting.
Converts to timestamp, optionally with format.
Converts to timestamp with local timezone.
Converts to timestamp without timezone.
Converts timestamp to unix seconds.
Converts timestamp from timezone to UTC.
Alias for to_char_/2.
Spark 3.5-compatible fallback for to_variant_object/1.
Converts a struct column to an XML string.
Transforms each element in an array column using a function.
Transforms keys of a map column using a function on key and value.
Transforms values of a map column using a function on key and value.
Translates characters.
Trims whitespace or specified characters from both ends.
Truncates date to specified format.
Try addition, returns null on overflow.
Try AES decrypt, returns null on failure.
Try average, returns null on overflow.
Try division, returns null on division by zero.
Returns element at index/key, null on out of bounds.
Try version of make_interval/1 — returns null on invalid input.
Try version of make_timestamp/1 — returns null on invalid input.
Try version of make_timestamp_ltz/1 — returns null on invalid input.
Try version of make_timestamp_ntz/1 — returns null on invalid input.
Try modulo, returns null on division by zero.
Try multiplication, returns null on overflow.
Spark 3.5-compatible fallback for try_parse_json/1.
Try to extract a part of a URL, returns null on failure. Optional key for query string.
Try to call a JVM method, returns null on failure.
Try subtraction, returns null on overflow.
Try sum, returns null on overflow.
Try to convert to binary, returns null on failure.
Try to convert to date, returns null on failure.
Try to convert to number, returns null on failure.
Spark 3.5-compatible fallback for try_to_time/1,2 via try_to_timestamp.
Try to convert to timestamp, returns null on failure.
Try URL-decode, returns null on failure.
Validates UTF-8 and returns null on invalid.
Spark 3.5-compatible fallback for try_variant_get/3 using JSON path extraction.
Runtime data type string.
Alias for upper/1.
Decodes base64 string.
Decodes hex string to binary.
Random value uniformly distributed in [min, max). Auto-generates seed when none given.
Returns days since epoch for date.
Returns microseconds since epoch.
Returns milliseconds since epoch.
Returns seconds since epoch.
Converts timestamp to unix seconds. Can be called with no args for current timestamp.
Returns the value of a user-defined type (UDT) as its underlying SQL representation.
Converts to uppercase.
URL-decodes string.
URL-encodes string.
Alias for current_user_/0.
Generates a random UUID string.
Generates a random UUID string with deterministic seed (Spark 4.x+).
Validates UTF-8 and raises on invalid.
Population variance.
Alias for variance/1.
Sample variance.
Spark 3.5-compatible fallback for variant_get/3 using JSON path extraction.
Returns Spark version string.
Day of week (0=Mon, 6=Sun).
Week of year.
Evaluates a list of conditions and returns one of multiple possible result expressions.
If otherwise/2 is not used, nil is returned for unmatched conditions.
Returns bucket number for value in equi-width histogram.
Generates tumbling or sliding time window column for streaming aggregations.
Extracts the time column from a window column.
Evaluates XPath expression returning array of strings.
Evaluates XPath expression returning boolean.
Evaluates XPath expression returning double.
Evaluates XPath expression returning float.
Evaluates XPath expression returning integer.
Evaluates XPath expression returning long.
Alias for xpath_double/2.
Evaluates XPath expression returning short.
Evaluates XPath expression returning string.
xxHash64 of columns.
Extracts year.
Extracts years from an interval expression.
Returns zero if value is null.
Merges two arrays element-wise using a function.
Functions
@spec abs(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes the absolute value.
Spark SQL function: abs
@spec acos(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes inverse cosine.
Spark SQL function: acos
@spec acosh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes inverse hyperbolic cosine.
Spark SQL function: acosh
@spec add_months(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Adds months to date.
Spark SQL function: add_months
@spec aes_decrypt([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
AES decrypts binary data.
Spark SQL function: aes_decrypt
@spec aes_encrypt([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
AES encrypts binary data.
Spark SQL function: aes_encrypt
@spec aggregate( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | term(), (SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t()), (SparkEx.Column.t() -> SparkEx.Column.t()) | nil ) :: SparkEx.Column.t()
Aggregates elements in an array column using an initial value and a merge function.
The merge function receives two lambda variables: accumulator and element. An optional finish function can be applied to the final accumulator value.
Examples
aggregate(col("arr"), lit(0), fn acc, x -> Column.plus(acc, x) end)
aggregate(col("arr"), lit(0), fn acc, x -> Column.plus(acc, x) end, fn acc -> Column.cast(acc, "string") end)
@spec any_value(SparkEx.Column.t() | String.t(), boolean()) :: SparkEx.Column.t()
Returns any value from the group. Optionally ignores null values.
@spec approx_count_distinct(SparkEx.Column.t() | String.t(), float() | nil) :: SparkEx.Column.t()
Approximate count of distinct values.
Optionally accepts a relative standard deviation parameter.
Examples
approx_count_distinct(col("x"))
approx_count_distinct(col("x"), 0.05)
@spec approx_percentile(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Approximate percentile with accuracy parameter.
Spark SQL function: approx_percentile
@spec array([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Creates array from columns.
Spark SQL function: array
@spec array_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for collect_list/1.
@spec array_append(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Appends element to array.
Spark SQL function: array_append
@spec array_compact(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Removes null values from array.
Spark SQL function: array_compact
@spec array_contains(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Checks if array contains value.
Spark SQL function: array_contains
@spec array_distinct(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Removes duplicates from array.
Spark SQL function: array_distinct
@spec array_except(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns elements in first but not second array.
Spark SQL function: array_except
@spec array_insert( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Inserts element at position in array.
Spark SQL function: array_insert
@spec array_intersect( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Returns intersection of two arrays.
Spark SQL function: array_intersect
@spec array_join(SparkEx.Column.t() | String.t(), String.t(), String.t() | nil) :: SparkEx.Column.t()
Joins array elements with delimiter.
Optionally accepts a null_replacement string.
Examples
array_join(col("arr"), ",")
array_join(col("arr"), ",", "NULL")
@spec array_max(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns max element of array.
Spark SQL function: array_max
@spec array_min(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns min element of array.
Spark SQL function: array_min
@spec array_position(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Locates element in array (1-based).
Spark SQL function: array_position
@spec array_prepend(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Prepends element to array.
Spark SQL function: array_prepend
@spec array_remove(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Removes all occurrences of element from array.
Spark SQL function: array_remove
@spec array_repeat(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Creates array with element repeated n times.
Spark SQL function: array_repeat
@spec array_size(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns array size.
Spark SQL function: array_size
@spec array_sort(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Sorts array in ascending order. Optional comparator function.
@spec array_sort(SparkEx.Column.t() | String.t(), (SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t())) :: SparkEx.Column.t()
@spec array_union(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns union of two arrays.
Spark SQL function: array_union
@spec arrays_overlap(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns true if arrays have common elements.
Spark SQL function: arrays_overlap
@spec arrays_zip([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Zips arrays into array of structs.
Spark SQL function: arrays_zip
@spec asc(SparkEx.Column.t()) :: SparkEx.Column.t()
Sort ascending by the given column
@spec asc_nulls_first(SparkEx.Column.t()) :: SparkEx.Column.t()
Sort ascending with nulls first
@spec asc_nulls_last(SparkEx.Column.t()) :: SparkEx.Column.t()
Sort ascending with nulls last
@spec ascii(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
ASCII value of first character.
Spark SQL function: ascii
@spec asin(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes inverse sine.
Spark SQL function: asin
@spec asinh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes inverse hyperbolic sine.
Spark SQL function: asinh
@spec assert_true( SparkEx.Column.t() | String.t(), String.t() | SparkEx.Column.t() | nil ) :: SparkEx.Column.t()
Raises error if condition is false.
Optionally accepts an error message.
Examples
assert_true(col("cond"))
assert_true(col("cond"), "Assertion failed!")
@spec atan2( SparkEx.Column.t() | String.t() | number(), SparkEx.Column.t() | String.t() | number() ) :: SparkEx.Column.t()
Computes atan2(y, x). Both arguments can be columns or numeric values.
@spec atan(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes inverse tangent.
Spark SQL function: atan
@spec atanh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes inverse hyperbolic tangent.
Spark SQL function: atanh
@spec avg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes average.
Spark SQL function: avg
@spec base64(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Base64 encodes binary.
Spark SQL function: base64
@spec bin(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Binary string representation of integer.
Spark SQL function: bin
@spec bit_and(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Bitwise AND aggregate.
Spark SQL function: bit_and
@spec bit_count(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Counts number of set bits.
Spark SQL function: bit_count
@spec bit_get(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Returns the value of the bit at the given position.
Spark SQL function: bit_get
@spec bit_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns bit length of string.
Spark SQL function: bit_length
@spec bit_or(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Bitwise OR aggregate.
Spark SQL function: bit_or
@spec bit_xor(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Bitwise XOR aggregate.
Spark SQL function: bit_xor
@spec bitmap_and_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Aggregate AND of bitmaps.
Spark SQL function: bitmap_and_agg
@spec bitmap_bit_position(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns bit position within a bitmap bucket.
Spark SQL function: bitmap_bit_position
@spec bitmap_bucket_number(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns bitmap bucket number.
Spark SQL function: bitmap_bucket_number
@spec bitmap_construct_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Constructs a bitmap from bit positions.
Spark SQL function: bitmap_construct_agg
@spec bitmap_count(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Counts set bits in a bitmap.
Spark SQL function: bitmap_count
@spec bitmap_or_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Aggregate OR of bitmaps.
Spark SQL function: bitmap_or_agg
@spec bitwise_not_(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Bitwise NOT (standalone function).
Spark SQL function: ~
@spec bool_and(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
True if all values are true.
Spark SQL function: bool_and
@spec bool_or(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
True if any value is true.
Spark SQL function: bool_or
@spec broadcast(SparkEx.DataFrame.t()) :: SparkEx.DataFrame.t()
Returns a DataFrame with a broadcast hint for join optimization.
Examples
broadcast(df)
@spec bround( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Banker's rounding to scale decimal places.
Spark SQL function: bround
@spec btrim( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Trims characters from both sides.
Spark SQL function: btrim
@spec bucket(SparkEx.Column.t() | integer(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns the bucket number for a value and number of buckets.
@spec call_function(String.t(), list(), list()) :: SparkEx.Column.t()
Calls a function with positional and named arguments.
@spec call_udf(String.t(), [SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Calls a registered UDF by name with the given column arguments.
Equivalent to PySpark's call_udf.
@spec cardinality(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for size/1.
@spec cbrt(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes cube root.
Spark SQL function: cbrt
@spec ceil(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes ceiling.
Spark SQL function: ceil
@spec ceiling(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for ceil/1.
@spec char_(term()) :: SparkEx.Column.t()
Returns character from ASCII code.
Spark SQL function: char
@spec char_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Character length of string.
Spark SQL function: char_length
@spec character_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for char_length/1.
@spec chr(term()) :: SparkEx.Column.t()
Alias for char_/1.
@spec coalesce([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Returns first non-null value.
Spark SQL function: coalesce
@spec col(String.t()) :: SparkEx.Column.t()
Creates a column reference by name.
Examples
col("age")
col("users.name")
@spec collate(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Applies collation to string.
Spark SQL function: collate
@spec collation(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns collation of string column.
Spark SQL function: collation
@spec collect_list(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Collects values into list.
Spark SQL function: collect_list
@spec collect_set(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Collects distinct values into set.
Spark SQL function: collect_set
@spec concat([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Concatenates columns.
Spark SQL function: concat
@spec concat_ws( term(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()] ) :: SparkEx.Column.t()
Concatenates with separator.
Spark SQL function: concat_ws
@spec contains_(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns true if string contains substring.
Spark SQL function: contains
@spec conv(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Converts number between bases.
Spark SQL function: conv
@spec convert_timezone( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Converts timestamp between timezones. 2-arg form uses session timezone as source.
@spec convert_timezone( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
@spec corr(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Pearson correlation.
Spark SQL function: corr
@spec cos(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes cosine.
Spark SQL function: cos
@spec cosh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes hyperbolic cosine.
Spark SQL function: cosh
@spec cot(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes cotangent.
Spark SQL function: cot
@spec count(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Counts non-null values.
Spark SQL function: count
@spec count_distinct( SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()] ) :: SparkEx.Column.t()
Counts distinct non-null values.
Accepts a single column or a list of columns for multi-column distinct count.
Examples
count_distinct(col("x"))
count_distinct(["x", "y", "z"])
@spec count_if(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Counts rows where condition is true.
Spark SQL function: count_if
@spec count_min_sketch(SparkEx.Column.t() | String.t(), term(), term(), term()) :: SparkEx.Column.t()
Creates a count-min sketch of a column with given eps, confidence, and seed.
Spark SQL function: count_min_sketch
@spec covar_pop(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Population covariance.
Spark SQL function: covar_pop
@spec covar_samp(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Sample covariance.
Spark SQL function: covar_samp
@spec crc32(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
CRC32 hash.
Spark SQL function: crc32
@spec create_map([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Creates map from key-value column pairs.
Spark SQL function: map
@spec csc(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes cosecant.
Spark SQL function: csc
@spec cume_dist() :: SparkEx.Column.t()
Cumulative distribution within partition.
Spark SQL function: cume_dist
@spec curdate() :: SparkEx.Column.t()
Alias for current_date/0.
@spec current_catalog() :: SparkEx.Column.t()
Returns current catalog name.
Spark SQL function: current_catalog
@spec current_database() :: SparkEx.Column.t()
Returns current database name.
Spark SQL function: current_database
@spec current_date() :: SparkEx.Column.t()
Returns current date.
Spark SQL function: current_date
@spec current_schema() :: SparkEx.Column.t()
Alias for current_database/0.
@spec current_time() :: SparkEx.Column.t()
Returns current time.
Spark SQL function: current_time
@spec current_timestamp() :: SparkEx.Column.t()
Returns current timestamp.
Spark SQL function: current_timestamp
@spec current_timezone() :: SparkEx.Column.t()
Returns current timezone string.
Spark SQL function: current_timezone
@spec current_user_() :: SparkEx.Column.t()
Returns current user name.
Spark SQL function: current_user
@spec date_add(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Adds days to date.
Spark SQL function: date_add
@spec date_diff(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for datediff/2.
@spec date_format(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Formats date/timestamp with pattern.
Spark SQL function: date_format
@spec date_from_unix_date(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Creates date from days since epoch.
Spark SQL function: date_from_unix_date
@spec date_part(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for extract/2.
@spec date_sub(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Subtracts days from date.
Spark SQL function: date_sub
@spec date_trunc( term(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()] ) :: SparkEx.Column.t()
Truncates date to specified unit.
Spark SQL function: date_trunc
@spec dateadd(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Alias for date_add/2.
@spec datediff(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Difference in days between dates.
Spark SQL function: datediff
@spec datepart(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for extract/2.
@spec day(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts day.
Spark SQL function: day
@spec dayname(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns day name.
Spark SQL function: dayname
@spec dayofmonth(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for day/1.
@spec dayofweek(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Day of week (1=Sun).
Spark SQL function: dayofweek
@spec dayofyear(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Day of year.
Spark SQL function: dayofyear
@spec days(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts days from an interval expression.
@spec decode(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Decodes binary with charset.
Spark SQL function: decode
@spec degrees(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Converts radians to degrees.
Spark SQL function: degrees
@spec dense_rank() :: SparkEx.Column.t()
Dense rank within partition.
Spark SQL function: dense_rank
@spec desc(SparkEx.Column.t()) :: SparkEx.Column.t()
Sort descending by the given column
@spec desc_nulls_first(SparkEx.Column.t()) :: SparkEx.Column.t()
Sort descending with nulls first
@spec desc_nulls_last(SparkEx.Column.t()) :: SparkEx.Column.t()
Sort descending with nulls last
@spec e() :: SparkEx.Column.t()
Returns Euler's number.
Spark SQL function: e
@spec element_at(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns element at index/key.
Spark SQL function: element_at
@spec elt([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Returns the n-th input string.
Spark SQL function: elt
@spec encode(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Encodes string with charset.
Spark SQL function: encode
@spec endswith(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns true if string ends with suffix.
Spark SQL function: endsWith
@spec equal_null(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Null-safe equality.
Spark SQL function: equal_null
@spec every(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for bool_and/1.
@spec exists(SparkEx.Column.t() | String.t(), (SparkEx.Column.t() -> SparkEx.Column.t())) :: SparkEx.Column.t()
Returns true if any element in the array satisfies the predicate.
Examples
exists(col("arr"), fn x -> Column.gt(x, lit(0)) end)
@spec exp(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes exponential.
Spark SQL function: exp
@spec explode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Creates a row for each array/map element.
Spark SQL function: explode
@spec explode_outer(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Like explode but preserves nulls.
Spark SQL function: explode_outer
@spec expm1(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes exp(x) - 1.
Spark SQL function: expm1
@spec expr(String.t()) :: SparkEx.Column.t()
Creates an expression from a SQL expression string.
This is a convenient escape hatch for expressions that are easier to write in SQL syntax.
Examples
expr("age + 1")
expr("CASE WHEN age > 18 THEN 'adult' ELSE 'minor' END")
@spec extract(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts date/time field.
Spark SQL function: extract
@spec factorial(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes factorial.
Spark SQL function: factorial
@spec filter( SparkEx.Column.t() | String.t(), (SparkEx.Column.t() -> SparkEx.Column.t()) | (SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t()) ) :: SparkEx.Column.t()
Filters an array column using a predicate function.
Examples
filter(col("arr"), fn x -> Column.gt(x, lit(0)) end)
@spec find_in_set(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns position of string in comma-delimited list.
Spark SQL function: find_in_set
@spec first( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Returns first value.
Spark SQL function: first
@spec first_value( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Alias for first/2.
@spec flatten(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Flattens nested array.
Spark SQL function: flatten
@spec floor(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes floor.
Spark SQL function: floor
@spec forall(SparkEx.Column.t() | String.t(), (SparkEx.Column.t() -> SparkEx.Column.t())) :: SparkEx.Column.t()
Returns true if all elements in the array satisfy the predicate.
Examples
forall(col("arr"), fn x -> Column.gt(x, lit(0)) end)
@spec format_number(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Formats number with d decimal places.
Spark SQL function: format_number
@spec format_string( term(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()] ) :: SparkEx.Column.t()
printf-style formatting.
Spark SQL function: format_string
@spec from_avro(SparkEx.Column.t() | String.t(), String.t(), map() | nil) :: SparkEx.Column.t()
Decodes Avro binary using the provided JSON schema.
@spec from_csv(SparkEx.Column.t() | String.t(), String.t(), map() | nil) :: SparkEx.Column.t()
Parses a CSV string column into a struct column using the given schema.
Examples
from_csv(col("csv_str"), "a INT, b STRING")
from_csv(col("csv_str"), "a INT, b STRING", %{"sep" => "|"})
@spec from_json( SparkEx.Column.t() | String.t(), String.t() | SparkEx.Types.data_type_proto(), map() | nil ) :: SparkEx.Column.t()
Parses a JSON string column into a struct/array/map column using the given schema.
The schema can be a DDL string or a Spark DataType protobuf struct.
Examples
from_json(col("json_str"), "a INT, b STRING")
from_json(col("json_str"), "a INT", %{"mode" => "FAILFAST"})
@spec from_protobuf(SparkEx.Column.t() | String.t(), String.t(), keyword()) :: SparkEx.Column.t()
Decodes Protobuf binary using the provided message name and descriptor.
Either desc_file_path or binary_descriptor_set can be provided (only one).
@spec from_unixtime(SparkEx.Column.t() | String.t(), String.t()) :: SparkEx.Column.t()
Converts unix timestamp to string. Always sends format (default "yyyy-MM-dd HH:mm:ss").
@spec from_utc_timestamp(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Converts UTC timestamp to timezone.
Spark SQL function: from_utc_timestamp
@spec from_xml(SparkEx.Column.t() | String.t(), String.t(), map() | nil) :: SparkEx.Column.t()
Parses an XML string column into a struct column using the given schema.
Examples
from_xml(col("xml_str"), "a INT, b STRING")
from_xml(col("xml_str"), "a INT, b STRING", %{"rowTag" => "item"})
@spec get(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Returns element at index from array.
Spark SQL function: get
@spec get_json_object(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Extracts JSON object from path expression.
Spark SQL function: get_json_object
@spec getbit(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Alias for bit_get/2.
@spec greatest([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Returns greatest value.
Spark SQL function: greatest
@spec grouping(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Indicates whether column is aggregated in grouping set.
Spark SQL function: grouping
@spec grouping_id([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Grouping ID for grouping set.
Spark SQL function: grouping_id
@spec hash([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Murmur3 hash of columns.
Spark SQL function: hash
@spec hex(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Hex string of integer/binary.
Spark SQL function: hex
@spec histogram_numeric(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Computes histogram of column.
Spark SQL function: histogram_numeric
@spec hll_sketch_agg( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Aggregates values into an HLL sketch.
Spark SQL function: hll_sketch_agg
@spec hll_sketch_estimate(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Estimates distinct count from an HLL sketch.
Spark SQL function: hll_sketch_estimate
@spec hll_union( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Unions two HLL sketches.
Spark SQL function: hll_union
@spec hll_union_agg( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Aggregate union of HLL sketches.
Spark SQL function: hll_union_agg
@spec hour(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts hour.
Spark SQL function: hour
@spec hours(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts hours from an interval expression.
@spec hypot(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes sqrt(a^2 + b^2).
Spark SQL function: hypot
@spec ifnull(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns second value if first is null.
Spark SQL function: ifnull
@spec ilike_( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() | nil ) :: SparkEx.Column.t()
Case-insensitive LIKE. Optional escape character.
@spec initcap(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Title-cases string.
Spark SQL function: initcap
@spec inline(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Explodes array of structs into columns.
Spark SQL function: inline
@spec inline_outer(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Like inline but preserves nulls.
Spark SQL function: inline_outer
@spec input_file_block_length() :: SparkEx.Column.t()
Length of current file block.
Spark SQL function: input_file_block_length
@spec input_file_block_start() :: SparkEx.Column.t()
Start offset of current file block.
Spark SQL function: input_file_block_start
@spec input_file_name() :: SparkEx.Column.t()
Name of file being read.
Spark SQL function: input_file_name
@spec instr(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Position of first occurrence of substr.
Spark SQL function: instr
@spec is_valid_utf8(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns true if string is valid UTF-8.
Spark SQL function: is_valid_utf8
@spec is_variant_null(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for variant null checks.
@spec isnan(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
True if NaN.
Spark SQL function: isNaN
@spec isnotnull(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
True if not null.
Spark SQL function: isNotNull
@spec isnull(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
True if null.
Spark SQL function: isNull
@spec java_method([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Calls a JVM method.
Spark SQL function: java_method
@spec json_array_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns length of outermost JSON array.
Spark SQL function: json_array_length
@spec json_object_keys(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns keys of outermost JSON object.
Spark SQL function: json_object_keys
@spec json_tuple(SparkEx.Column.t() | String.t(), [String.t()]) :: SparkEx.Column.t()
Extracts fields from a JSON string column.
First argument is the JSON column, remaining arguments are field name strings.
Examples
json_tuple(col("json_str"), ["name", "age"])
@spec kll_sketch_agg_bigint( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Aggregates bigint values into a KLL sketch.
Spark SQL function: kll_sketch_agg_bigint
@spec kll_sketch_agg_double( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Aggregates double values into a KLL sketch.
Spark SQL function: kll_sketch_agg_double
@spec kll_sketch_agg_float( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Aggregates float values into a KLL sketch.
Spark SQL function: kll_sketch_agg_float
@spec kll_sketch_get_n_bigint(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns n (number of items) from a KLL sketch (bigint).
Spark SQL function: kll_sketch_get_n_bigint
@spec kll_sketch_get_n_double(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns n (number of items) from a KLL sketch (double).
Spark SQL function: kll_sketch_get_n_double
@spec kll_sketch_get_n_float(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns n (number of items) from a KLL sketch (float).
Spark SQL function: kll_sketch_get_n_float
@spec kll_sketch_get_quantile_bigint(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Gets quantile from a KLL sketch (bigint).
Spark SQL function: kll_sketch_get_quantile_bigint
@spec kll_sketch_get_quantile_double(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Gets quantile from a KLL sketch (double).
Spark SQL function: kll_sketch_get_quantile_double
@spec kll_sketch_get_quantile_float(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Gets quantile from a KLL sketch (float).
Spark SQL function: kll_sketch_get_quantile_float
@spec kll_sketch_get_rank_bigint(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Gets rank from a KLL sketch (bigint).
Spark SQL function: kll_sketch_get_rank_bigint
@spec kll_sketch_get_rank_double(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Gets rank from a KLL sketch (double).
Spark SQL function: kll_sketch_get_rank_double
@spec kll_sketch_get_rank_float(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Gets rank from a KLL sketch (float).
Spark SQL function: kll_sketch_get_rank_float
@spec kll_sketch_merge_bigint( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Merges KLL sketches (bigint).
Spark SQL function: kll_sketch_merge_bigint
@spec kll_sketch_merge_double( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Merges KLL sketches (double).
Spark SQL function: kll_sketch_merge_double
@spec kll_sketch_merge_float( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Merges KLL sketches (float).
Spark SQL function: kll_sketch_merge_float
@spec kll_sketch_to_string_bigint(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Converts a KLL sketch (bigint) to a string.
Spark SQL function: kll_sketch_to_string_bigint
@spec kll_sketch_to_string_double(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Converts a KLL sketch (double) to a string.
Spark SQL function: kll_sketch_to_string_double
@spec kll_sketch_to_string_float(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Converts a KLL sketch (float) to a string.
Spark SQL function: kll_sketch_to_string_float
@spec kurtosis(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Kurtosis.
Spark SQL function: kurtosis
@spec lag( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Value at offset rows before current.
Spark SQL function: lag
@spec last( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Returns last value.
Spark SQL function: last
@spec last_day(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Last day of month for date.
Spark SQL function: last_day
@spec last_value( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Alias for last/2.
@spec lcase(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for lower/1.
@spec lead( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Value at offset rows after current.
Spark SQL function: lead
@spec least([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Returns least value.
Spark SQL function: least
@spec left_(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns leftmost n characters.
Spark SQL function: left
@spec length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns length of string or binary.
Spark SQL function: length
@spec levenshtein( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), integer() | nil ) :: SparkEx.Column.t()
Levenshtein edit distance between strings.
Optionally accepts a threshold parameter.
Examples
levenshtein(col("s1"), col("s2"))
levenshtein(col("s1"), col("s2"), 5)
@spec like_( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() | nil ) :: SparkEx.Column.t()
SQL LIKE pattern match. Optional escape character.
@spec listagg( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Concatenates values as string.
Spark SQL function: listagg
@spec listagg_distinct( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Concatenates distinct values as string.
Spark SQL function: listagg
@spec lit(term()) :: SparkEx.Column.t()
Creates a literal value expression.
If a Column is passed, it is returned as-is (pass-through).
Supports nil, booleans, integers, floats, and strings.
Examples
lit(42)
lit("hello")
lit(true)
lit(col("age")) # returns the Column unchanged
@spec ln(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for log/1.
@spec localtimestamp_() :: SparkEx.Column.t()
Returns current local timestamp.
Spark SQL function: localtimestamp
@spec locate(String.t(), SparkEx.Column.t() | String.t(), integer()) :: SparkEx.Column.t()
Locates position of substring in a string column. Optional pos start position (default 1).
@spec log1p(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes ln(1 + x).
Spark SQL function: log1p
@spec log2(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes base-2 logarithm.
Spark SQL function: log2
@spec log10(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes base-10 logarithm.
Spark SQL function: log10
@spec log(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes natural logarithm.
Spark SQL function: ln
@spec log(number(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes logarithm with the specified base.
log(col) is defined in the registry as natural log (ln).
log(base, col) computes log_base(col).
Examples
log(2, col("x"))
log(10, col("x"))
@spec lower(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Converts to lowercase.
Spark SQL function: lower
@spec lpad(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Left-pads string to length with pad string.
Spark SQL function: lpad
@spec ltrim(SparkEx.Column.t() | String.t(), String.t() | nil) :: SparkEx.Column.t()
Left-trims whitespace or specified characters.
@spec make_date( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Creates date from year, month, day.
Spark SQL function: make_date
@spec make_dt_interval(keyword()) :: SparkEx.Column.t()
Creates a day-time interval from optional components.
Options
:days— days column (default:lit(0)):hours— hours column (default:lit(0)):mins— minutes column (default:lit(0)):secs— seconds column (default:lit(0))
@spec make_interval(keyword()) :: SparkEx.Column.t()
Creates an interval from optional components.
Options
:years,:months,:weeks,:days,:hours,:mins,:secsAll default tolit(0).
@spec make_time( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Creates time from hour, minute, second.
Spark SQL function: make_time
@spec make_timestamp([SparkEx.Column.t() | String.t()] | keyword()) :: SparkEx.Column.t()
Creates a timestamp from individual components or from date+time columns.
Examples
make_timestamp(col("y"), col("m"), col("d"), col("h"), col("min"), col("sec"))
make_timestamp(col("y"), col("m"), col("d"), col("h"), col("min"), col("sec"), col("tz"))
make_timestamp(date: col("d"), time: col("t"))
make_timestamp(date: col("d"), time: col("t"), timezone: col("tz"))
@spec make_timestamp_ltz([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Creates a timestamp with local timezone from components.
Examples
make_timestamp_ltz([col("y"), col("m"), col("d"), col("h"), col("min"), col("sec")])
@spec make_timestamp_ntz([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Creates a timestamp without timezone from components.
Examples
make_timestamp_ntz([col("y"), col("m"), col("d"), col("h"), col("min"), col("sec")])
@spec make_valid_utf8(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Replaces invalid UTF-8 with replacement char.
Spark SQL function: make_valid_utf8
@spec make_ym_interval(keyword()) :: SparkEx.Column.t()
Creates a year-month interval from optional components.
Options
:years— years column (default:lit(0)):months— months column (default:lit(0))
@spec map_concat([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Concatenates maps.
Spark SQL function: map_concat
@spec map_contains_key( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Returns true if map contains the given key.
Spark SQL function: map_contains_key
@spec map_entries(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns map entries as array of structs.
Spark SQL function: map_entries
@spec map_filter(SparkEx.Column.t() | String.t(), (SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t())) :: SparkEx.Column.t()
Filters entries in a map column using a predicate on key and value.
Examples
map_filter(col("m"), fn k, v -> Column.gt(v, lit(0)) end)
@spec map_from_arrays( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Creates map from key and value arrays.
Spark SQL function: map_from_arrays
@spec map_from_entries(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Creates map from array of entries.
Spark SQL function: map_from_entries
@spec map_keys(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns map keys.
Spark SQL function: map_keys
@spec map_values(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns map values.
Spark SQL function: map_values
@spec map_zip_with( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), (SparkEx.Column.t(), SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t()) ) :: SparkEx.Column.t()
Merges two maps using a function on overlapping keys.
The function receives three lambda variables: key, value1, value2.
Examples
map_zip_with(col("m1"), col("m2"), fn k, v1, v2 -> Column.plus(v1, v2) end)
@spec mask( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Masks string characters.
Spark SQL function: mask
@spec max(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes maximum.
Spark SQL function: max
@spec max_by(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Value of first col at max of second.
Spark SQL function: max_by
@spec md5(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
MD5 hash.
Spark SQL function: md5
@spec mean(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for avg/1.
@spec median(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Median value.
Spark SQL function: median
@spec min(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes minimum.
Spark SQL function: min
@spec min_by(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Value of first col at min of second.
Spark SQL function: min_by
@spec minute(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts minute.
Spark SQL function: minute
@spec mode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Most frequent value in group.
@spec mode(SparkEx.Column.t() | String.t(), boolean()) :: SparkEx.Column.t()
Most frequent value in group. Optional deterministic parameter (Spark 4.x+).
@spec monotonically_increasing_id() :: SparkEx.Column.t()
Globally unique monotonically increasing ID.
Spark SQL function: monotonically_increasing_id
@spec month(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts month.
Spark SQL function: month
@spec monthname(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns month name.
Spark SQL function: monthname
@spec months(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts months from an interval expression.
@spec months_between( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), boolean() ) :: SparkEx.Column.t()
Returns the number of months between two dates.
Always sends 3 arguments with roundOff defaulting to true.
Examples
months_between(col("d1"), col("d2"))
months_between(col("d1"), col("d2"), false)
@spec named_arg(String.t(), term()) :: SparkEx.Column.t()
Builds a named argument expression.
@spec named_struct([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Creates struct with named fields.
Spark SQL function: named_struct
@spec nanvl(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns second value if first is NaN.
Spark SQL function: nanvl
@spec negative(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns negation.
Spark SQL function: negative
@spec next_day(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Next day of week after date.
Spark SQL function: next_day
@spec now() :: SparkEx.Column.t()
Alias for current_timestamp/0.
@spec nth_value( SparkEx.Column.t() | String.t(), integer() | SparkEx.Column.t(), boolean() ) :: SparkEx.Column.t()
Returns the nth value in a window frame. Optionally ignores null values.
@spec ntile(term()) :: SparkEx.Column.t()
N-tile bucket number within partition.
Spark SQL function: ntile
@spec nullif(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns null if both values are equal.
Spark SQL function: nullif
@spec nullifzero(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns null if value is zero.
Spark SQL function: nullifzero
@spec nvl2( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Returns second if first is not null, else third.
Spark SQL function: nvl2
@spec nvl(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns second value if first is null.
Spark SQL function: nvl
@spec octet_length(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns byte length of string.
Spark SQL function: octet_length
@spec otherwise(SparkEx.Column.t(), SparkEx.Column.t() | term()) :: SparkEx.Column.t()
Adds a fallback value to a when_/2 expression chain.
Examples
when_(col("score") |> Column.gt(90), lit("A"))
|> otherwise(lit("B"))
@spec overlay( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() | integer() ) :: SparkEx.Column.t()
Overlays replace over src starting at pos for len characters.
All arguments accept Column or string column names.
len defaults to -1 (replace entire match length).
@spec parse_json(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Spark 3.5-compatible fallback: parse JSON text as generic JSON string value.
@spec parse_url( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() | nil ) :: SparkEx.Column.t()
Extracts a part of a URL. Optional key for query string extraction.
@spec percent_rank() :: SparkEx.Column.t()
Percent rank within partition.
Spark SQL function: percent_rank
@spec percentile( SparkEx.Column.t() | String.t(), number() | [number()], SparkEx.Column.t() | integer() ) :: SparkEx.Column.t()
Exact percentile. Supports single percentage or list/array of percentages.
Optional frequency parameter (default 1).
@spec percentile_approx(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Approximate percentile.
Spark SQL function: percentile_approx
@spec pi() :: SparkEx.Column.t()
Returns pi.
Spark SQL function: pi
@spec pmod(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Positive modulo.
Spark SQL function: pmod
@spec posexplode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Like explode but includes position.
Spark SQL function: posexplode
@spec posexplode_outer(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Like posexplode but preserves nulls.
Spark SQL function: posexplode_outer
@spec position(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Returns position of substring.
Spark SQL function: position
@spec positive(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns positive value.
Spark SQL function: positive
@spec pow( SparkEx.Column.t() | String.t() | number(), SparkEx.Column.t() | String.t() | number() ) :: SparkEx.Column.t()
Computes x raised to the power of y. Both arguments can be columns or numeric values.
@spec power( SparkEx.Column.t() | String.t() | number(), SparkEx.Column.t() | String.t() | number() ) :: SparkEx.Column.t()
Alias for pow/2.
@spec printf( term(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()] ) :: SparkEx.Column.t()
Alias for format_string/2.
@spec product(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes product of all values.
Spark SQL function: product
@spec quarter(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts quarter.
Spark SQL function: quarter
@spec quote_(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Quotes a string for use in SQL.
Spark SQL function: quote
@spec radians(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Converts degrees to radians.
Spark SQL function: radians
@spec raise_error(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Raises a user-specified error message.
Spark SQL function: raise_error
@spec rand(integer() | nil | keyword()) :: SparkEx.Column.t()
Random value in [0, 1). Auto-generates a random seed when none given. Pass an explicit seed for reproducible results.
@spec randn(integer() | nil | keyword()) :: SparkEx.Column.t()
Random value from standard normal distribution. Auto-generates a random seed when none given. Pass an explicit seed for reproducible results.
@spec randstr(SparkEx.Column.t() | String.t(), term(), integer() | nil) :: SparkEx.Column.t()
Generates random string of given length. Auto-generates seed when none given.
@spec rank() :: SparkEx.Column.t()
Rank within partition.
Spark SQL function: rank
@spec reduce( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | term(), (SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t()), (SparkEx.Column.t() -> SparkEx.Column.t()) | nil ) :: SparkEx.Column.t()
Alias for aggregate/3.
@spec reflect_([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Calls a JVM method via reflection.
Spark SQL function: reflect
@spec regexp(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Alias for regexp_like/2.
@spec regexp_count(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Counts regex pattern occurrences.
Spark SQL function: regexp_count
@spec regexp_extract(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Extracts regex group.
Spark SQL function: regexp_extract
@spec regexp_extract_all(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Extracts all matches for regex group.
Spark SQL function: regexp_extract_all
@spec regexp_instr(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Returns position of first regex match.
Spark SQL function: regexp_instr
@spec regexp_like(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Returns true if column matches regex.
Spark SQL function: regexp_like
@spec regexp_replace(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Replaces regex matches.
Spark SQL function: regexp_replace
@spec regexp_substr(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Returns first substring matching regex.
Spark SQL function: regexp_substr
@spec regr_avgx(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Average of independent variable.
Spark SQL function: regr_avgx
@spec regr_avgy(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Average of dependent variable.
Spark SQL function: regr_avgy
@spec regr_count(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Count of non-null pairs.
Spark SQL function: regr_count
@spec regr_intercept(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Y-intercept of regression line.
Spark SQL function: regr_intercept
@spec regr_r2(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Coefficient of determination.
Spark SQL function: regr_r2
@spec regr_slope(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Slope of regression line.
Spark SQL function: regr_slope
@spec regr_sxx(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Sum of squares of independent variable.
Spark SQL function: regr_sxx
@spec regr_sxy(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Sum of products of deviations.
Spark SQL function: regr_sxy
@spec regr_syy(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Sum of squares of dependent variable.
Spark SQL function: regr_syy
@spec repeat(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Repeats string n times.
Spark SQL function: repeat
@spec replace( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Replaces occurrences of search string. When replacement is omitted, uses empty string.
@spec reverse(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Reverses string or array.
Spark SQL function: reverse
@spec right_(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns rightmost n characters.
Spark SQL function: right
@spec rint(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Rounds to nearest integer.
Spark SQL function: rint
@spec rlike_(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Regex pattern match.
Spark SQL function: rlike
@spec round( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Rounds to scale decimal places.
Spark SQL function: round
@spec row_number() :: SparkEx.Column.t()
Row number within partition.
Spark SQL function: row_number
@spec rpad(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Right-pads string to length with pad string.
Spark SQL function: rpad
@spec rtrim(SparkEx.Column.t() | String.t(), String.t() | nil) :: SparkEx.Column.t()
Right-trims whitespace or specified characters.
@spec schema_of_csv(SparkEx.Column.t() | String.t(), map() | nil) :: SparkEx.Column.t()
Returns DDL schema string of CSV string. Accepts optional options map.
@spec schema_of_json(SparkEx.Column.t() | String.t(), map() | nil) :: SparkEx.Column.t()
Returns DDL schema string of JSON string. Accepts optional options map.
@spec schema_of_variant(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for schema_of_variant/1.
@spec schema_of_variant_agg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for schema_of_variant_agg/1.
@spec schema_of_xml(SparkEx.Column.t() | String.t(), map() | nil) :: SparkEx.Column.t()
Returns DDL schema string of XML string. Accepts optional options map.
@spec sec(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes secant.
Spark SQL function: sec
@spec second(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts second.
Spark SQL function: second
@spec sentences(SparkEx.Column.t() | String.t(), String.t() | nil, String.t() | nil) :: SparkEx.Column.t()
Splits text into array of sentences.
Optionally accepts language and country parameters.
Examples
sentences(col("text"))
sentences(col("text"), "en", "US")
@spec sequence( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() | nil ) :: SparkEx.Column.t()
Creates array of values from start to stop with optional step.
Examples
sequence(col("start"), col("stop"))
sequence(col("start"), col("stop"), col("step"))
@spec session_user_() :: SparkEx.Column.t()
Returns session user name.
Spark SQL function: session_user
@spec session_window(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Generates session window for streaming aggregations.
Spark SQL function: session_window
@spec sha1(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
SHA-1 hash.
Spark SQL function: sha1
@spec sha2(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
SHA-2 hash with bit length.
Spark SQL function: sha2
@spec sha(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for sha1/1.
@spec shiftleft(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Bitwise left shift.
Spark SQL function: shiftleft
@spec shiftright(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Bitwise right shift.
Spark SQL function: shiftright
@spec shiftrightunsigned(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Bitwise unsigned right shift.
Spark SQL function: shiftrightunsigned
@spec shuffle(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns randomly shuffled array. Optional seed parameter.
@spec shuffle(SparkEx.Column.t() | String.t(), integer()) :: SparkEx.Column.t()
@spec sign(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for signum/1.
@spec signum(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes sign.
Spark SQL function: signum
@spec sin(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes sine.
Spark SQL function: sin
@spec sinh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes hyperbolic sine.
Spark SQL function: sinh
@spec size(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns size of array or map.
Spark SQL function: size
@spec skewness(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Skewness.
Spark SQL function: skewness
@spec slice(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Returns slice of array from start for length.
Spark SQL function: slice
@spec some(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for bool_or/1.
@spec sort_array( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Sorts array.
Spark SQL function: sort_array
@spec soundex(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Soundex code.
Spark SQL function: soundex
@spec spark_partition_id() :: SparkEx.Column.t()
Partition ID of each row.
Spark SQL function: spark_partition_id
@spec split(SparkEx.Column.t() | String.t(), String.t(), integer() | nil) :: SparkEx.Column.t()
Splits string by regex pattern.
Examples
split(col("s"), "\\.")
split(col("s"), "\\.", 3)
@spec split_part(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Splits string and returns the field at index.
Spark SQL function: split_part
@spec sqrt(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes square root.
Spark SQL function: sqrt
@spec st_asbinary(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Converts geometry/geography to WKB binary.
Spark SQL function: ST_AsBinary
@spec st_geogfromwkb(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Creates geography from WKB binary.
Spark SQL function: ST_GeogFromWKB
@spec st_geomfromwkb(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Creates geometry from WKB binary.
Spark SQL function: ST_GeomFromWKB
@spec st_setsrid(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Sets the SRID of a geometry.
Spark SQL function: ST_SetSRID
@spec st_srid(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns the SRID of a geometry.
Spark SQL function: ST_SRID
@spec stack([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Separates column into n rows.
Spark SQL function: stack
@spec star() :: SparkEx.Column.t()
Creates an unresolved star (*) expression for selecting all columns.
@spec startswith(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns true if string starts with prefix.
Spark SQL function: startsWith
@spec std(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for stddev/1.
@spec stddev(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Sample standard deviation.
Spark SQL function: stddev
@spec stddev_pop(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Population standard deviation.
Spark SQL function: stddev_pop
@spec stddev_samp(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for stddev/1.
@spec str_to_map( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Creates map from delimited string.
Spark SQL function: str_to_map
@spec string_agg( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Alias for listagg/2.
@spec string_agg_distinct( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Alias for listagg_distinct/2.
@spec struct([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Creates struct from columns.
Spark SQL function: struct
@spec substr_( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() | nil ) :: SparkEx.Column.t()
Returns substring from pos. Optional len parameter.
@spec substring(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Returns substring from pos for len.
Spark SQL function: substring
@spec substring_index(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Returns substring before count occurrences of delimiter.
Spark SQL function: substring_index
@spec sum(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes sum.
Spark SQL function: sum
@spec sum_distinct(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes sum of distinct values.
Spark SQL function: sum
@spec tan(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes tangent.
Spark SQL function: tan
@spec tanh(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Computes hyperbolic tangent.
Spark SQL function: tanh
@spec theta_difference( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Computes difference of two theta sketches.
Spark SQL function: theta_difference
@spec theta_intersection( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Intersects two theta sketches.
Spark SQL function: theta_intersection
@spec theta_intersection_agg( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Aggregate intersection of theta sketches.
Spark SQL function: theta_intersection_agg
@spec theta_sketch_agg( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Aggregates values into a theta sketch.
Spark SQL function: theta_sketch_agg
@spec theta_sketch_estimate(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Estimates distinct count from a theta sketch.
Spark SQL function: theta_sketch_estimate
@spec theta_union( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Unions two theta sketches.
Spark SQL function: theta_union
@spec theta_union_agg( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Aggregate union of theta sketches.
Spark SQL function: theta_union_agg
@spec time_diff( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Returns the difference between two times measured in the specified units.
Spark 4.1+. Unit is passed as a column expression (use lit/1 for string literals).
Supported units: "HOUR", "MINUTE", "SECOND", "MILLISECOND", "MICROSECOND".
Examples
time_diff(lit("HOUR"), col("start_time"), col("end_time"))
@spec time_trunc(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for time_trunc/2.
@spec timestamp_add( term(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()] ) :: SparkEx.Column.t()
Adds interval to timestamp.
Spark SQL function: timestampadd
@spec timestamp_diff( term(), SparkEx.Column.t() | String.t() | [SparkEx.Column.t() | String.t()] ) :: SparkEx.Column.t()
Returns difference between timestamps in given unit.
Spark SQL function: timestampdiff
@spec timestamp_micros(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Creates timestamp from microseconds.
Spark SQL function: timestamp_micros
@spec timestamp_millis(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Creates timestamp from milliseconds.
Spark SQL function: timestamp_millis
@spec timestamp_seconds(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Creates timestamp from seconds.
Spark SQL function: timestamp_seconds
@spec to_avro(SparkEx.Column.t() | String.t(), String.t() | nil) :: SparkEx.Column.t()
Encodes a column to Avro binary using an optional JSON schema.
@spec to_binary( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Converts to binary.
Spark SQL function: to_binary
@spec to_char_(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Converts to character string with format.
Spark SQL function: to_char
@spec to_csv(SparkEx.Column.t() | String.t(), map() | nil) :: SparkEx.Column.t()
Converts a struct column to a CSV string.
Examples
to_csv(col("struct_col"))
to_csv(col("struct_col"), %{"sep" => "|"})
@spec to_date( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Converts to date, optionally with format.
Spark SQL function: to_date
@spec to_degrees(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for degrees/1.
@spec to_json(SparkEx.Column.t() | String.t(), map() | nil) :: SparkEx.Column.t()
Converts a struct/array/map column to a JSON string.
Examples
to_json(col("struct_col"))
to_json(col("struct_col"), %{"pretty" => "true"})
@spec to_number(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Converts string to number with format.
Spark SQL function: to_number
@spec to_protobuf(SparkEx.Column.t() | String.t(), String.t(), keyword()) :: SparkEx.Column.t()
Encodes a column to Protobuf binary using the provided message name and descriptor.
Either desc_file_path or binary_descriptor_set can be provided (only one).
@spec to_radians(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for radians/1.
@spec to_time( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for to_time/1,2 via timestamp parsing and formatting.
@spec to_timestamp( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Converts to timestamp, optionally with format.
Spark SQL function: to_timestamp
@spec to_timestamp_ltz( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Converts to timestamp with local timezone.
Spark SQL function: to_timestamp_ltz
@spec to_timestamp_ntz( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Converts to timestamp without timezone.
Spark SQL function: to_timestamp_ntz
@spec to_unix_timestamp( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Converts timestamp to unix seconds.
Spark SQL function: to_unix_timestamp
@spec to_utc_timestamp(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Converts timestamp from timezone to UTC.
Spark SQL function: to_utc_timestamp
@spec to_varchar(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Alias for to_char_/2.
@spec to_variant_object(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for to_variant_object/1.
@spec to_xml(SparkEx.Column.t() | String.t(), map() | nil) :: SparkEx.Column.t()
Converts a struct column to an XML string.
Examples
to_xml(col("struct_col"))
to_xml(col("struct_col"), %{"rowTag" => "item"})
@spec transform( SparkEx.Column.t() | String.t(), (SparkEx.Column.t() -> SparkEx.Column.t()) | (SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t()) ) :: SparkEx.Column.t()
Transforms each element in an array column using a function.
The function receives a lambda variable x representing each element.
Examples
transform(col("arr"), fn x -> Column.plus(x, lit(1)) end)
@spec transform_keys(SparkEx.Column.t() | String.t(), (SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t())) :: SparkEx.Column.t()
Transforms keys of a map column using a function on key and value.
Examples
transform_keys(col("m"), fn k, v -> Column.plus(k, lit(1)) end)
@spec transform_values(SparkEx.Column.t() | String.t(), (SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t())) :: SparkEx.Column.t()
Transforms values of a map column using a function on key and value.
Examples
transform_values(col("m"), fn k, v -> Column.plus(v, lit(1)) end)
@spec translate(SparkEx.Column.t() | String.t(), term(), term()) :: SparkEx.Column.t()
Translates characters.
Spark SQL function: translate
@spec trim(SparkEx.Column.t() | String.t(), String.t() | nil) :: SparkEx.Column.t()
Trims whitespace or specified characters from both ends.
@spec trunc(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Truncates date to specified format.
Spark SQL function: trunc
@spec try_add(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Try addition, returns null on overflow.
Spark SQL function: try_add
@spec try_aes_decrypt([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Try AES decrypt, returns null on failure.
Spark SQL function: try_aes_decrypt
@spec try_avg(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Try average, returns null on overflow.
Spark SQL function: try_avg
@spec try_divide(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Try division, returns null on division by zero.
Spark SQL function: try_divide
@spec try_element_at(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns element at index/key, null on out of bounds.
Spark SQL function: try_element_at
@spec try_make_interval(keyword()) :: SparkEx.Column.t()
Try version of make_interval/1 — returns null on invalid input.
@spec try_make_timestamp([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Try version of make_timestamp/1 — returns null on invalid input.
@spec try_make_timestamp_ltz([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Try version of make_timestamp_ltz/1 — returns null on invalid input.
@spec try_make_timestamp_ntz([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Try version of make_timestamp_ntz/1 — returns null on invalid input.
@spec try_mod(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Try modulo, returns null on division by zero.
Spark SQL function: try_mod
@spec try_multiply(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Try multiplication, returns null on overflow.
Spark SQL function: try_multiply
@spec try_parse_json(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for try_parse_json/1.
@spec try_parse_url( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() | nil ) :: SparkEx.Column.t()
Try to extract a part of a URL, returns null on failure. Optional key for query string.
@spec try_reflect([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
Try to call a JVM method, returns null on failure.
Spark SQL function: try_reflect
@spec try_subtract(SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Try subtraction, returns null on overflow.
Spark SQL function: try_subtract
@spec try_sum(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Try sum, returns null on overflow.
Spark SQL function: try_sum
@spec try_to_binary( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Try to convert to binary, returns null on failure.
Spark SQL function: try_to_binary
@spec try_to_date( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Try to convert to date, returns null on failure.
Spark SQL function: try_to_date
@spec try_to_number(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Try to convert to number, returns null on failure.
Spark SQL function: try_to_number
@spec try_to_time( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for try_to_time/1,2 via try_to_timestamp.
@spec try_to_timestamp( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
Try to convert to timestamp, returns null on failure.
Spark SQL function: try_to_timestamp
@spec try_url_decode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Try URL-decode, returns null on failure.
Spark SQL function: try_url_decode
@spec try_validate_utf8(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Validates UTF-8 and returns null on invalid.
Spark SQL function: try_validate_utf8
@spec try_variant_get( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for try_variant_get/3 using JSON path extraction.
@spec typeof(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Runtime data type string.
Spark SQL function: typeof
@spec ucase(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for upper/1.
@spec unbase64(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Decodes base64 string.
Spark SQL function: unbase64
@spec unhex(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Decodes hex string to binary.
Spark SQL function: unhex
@spec uniform(SparkEx.Column.t() | String.t(), term(), integer() | nil) :: SparkEx.Column.t()
Random value uniformly distributed in [min, max). Auto-generates seed when none given.
@spec unix_date(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns days since epoch for date.
Spark SQL function: unix_date
@spec unix_micros(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns microseconds since epoch.
Spark SQL function: unix_micros
@spec unix_millis(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns milliseconds since epoch.
Spark SQL function: unix_millis
@spec unix_seconds(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns seconds since epoch.
Spark SQL function: unix_seconds
@spec unix_timestamp() :: SparkEx.Column.t()
Converts timestamp to unix seconds. Can be called with no args for current timestamp.
@spec unix_timestamp( SparkEx.Column.t() | String.t(), keyword() ) :: SparkEx.Column.t()
@spec unwrap_udt(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns the value of a user-defined type (UDT) as its underlying SQL representation.
@spec upper(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Converts to uppercase.
Spark SQL function: upper
@spec url_decode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
URL-decodes string.
Spark SQL function: url_decode
@spec url_encode(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
URL-encodes string.
Spark SQL function: url_encode
@spec user_() :: SparkEx.Column.t()
Alias for current_user_/0.
@spec uuid() :: SparkEx.Column.t()
Generates a random UUID string.
@spec uuid(integer()) :: SparkEx.Column.t()
Generates a random UUID string with deterministic seed (Spark 4.x+).
@spec validate_utf8(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Validates UTF-8 and raises on invalid.
Spark SQL function: validate_utf8
@spec var_pop(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Population variance.
Spark SQL function: var_pop
@spec var_samp(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Alias for variance/1.
@spec variance(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Sample variance.
Spark SQL function: variance
@spec variant_get( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t() ) :: SparkEx.Column.t()
Spark 3.5-compatible fallback for variant_get/3 using JSON path extraction.
@spec version_() :: SparkEx.Column.t()
Returns Spark version string.
Spark SQL function: version
@spec weekday(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Day of week (0=Mon, 6=Sun).
Spark SQL function: weekday
@spec weekofyear(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Week of year.
Spark SQL function: weekofyear
@spec when_(SparkEx.Column.t(), SparkEx.Column.t() | term()) :: SparkEx.Column.t()
Evaluates a list of conditions and returns one of multiple possible result expressions.
If otherwise/2 is not used, nil is returned for unmatched conditions.
Equivalent to CASE WHEN condition THEN value END in SQL.
Examples
import SparkEx.Functions
when_(col("age") |> Column.lt(13), lit("child"))
|> otherwise(lit("adult"))
@spec width_bucket(SparkEx.Column.t() | String.t(), term(), term(), term()) :: SparkEx.Column.t()
Returns bucket number for value in equi-width histogram.
Spark SQL function: width_bucket
@spec window( SparkEx.Column.t() | String.t(), String.t(), String.t() | nil, String.t() | nil ) :: SparkEx.Column.t()
Generates tumbling or sliding time window column for streaming aggregations.
Examples
window(col("timestamp"), "10 minutes")
window(col("timestamp"), "10 minutes", "5 minutes")
window(col("timestamp"), "10 minutes", "5 minutes", "2 minutes")
@spec window_time(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts the time column from a window column.
Spark SQL function: window_time
@spec xpath(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Evaluates XPath expression returning array of strings.
Spark SQL function: xpath
@spec xpath_boolean(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Evaluates XPath expression returning boolean.
Spark SQL function: xpath_boolean
@spec xpath_double(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Evaluates XPath expression returning double.
Spark SQL function: xpath_double
@spec xpath_float(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Evaluates XPath expression returning float.
Spark SQL function: xpath_float
@spec xpath_int(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Evaluates XPath expression returning integer.
Spark SQL function: xpath_int
@spec xpath_long(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Evaluates XPath expression returning long.
Spark SQL function: xpath_long
@spec xpath_number(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Alias for xpath_double/2.
@spec xpath_short(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Evaluates XPath expression returning short.
Spark SQL function: xpath_short
@spec xpath_string(SparkEx.Column.t() | String.t(), term()) :: SparkEx.Column.t()
Evaluates XPath expression returning string.
Spark SQL function: xpath_string
@spec xxhash64([SparkEx.Column.t() | String.t()]) :: SparkEx.Column.t()
xxHash64 of columns.
Spark SQL function: xxhash64
@spec year(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts year.
Spark SQL function: year
@spec years(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Extracts years from an interval expression.
@spec zeroifnull(SparkEx.Column.t() | String.t()) :: SparkEx.Column.t()
Returns zero if value is null.
Spark SQL function: zeroifnull
@spec zip_with( SparkEx.Column.t() | String.t(), SparkEx.Column.t() | String.t(), (SparkEx.Column.t(), SparkEx.Column.t() -> SparkEx.Column.t()) ) :: SparkEx.Column.t()
Merges two arrays element-wise using a function.
Examples
zip_with(col("a1"), col("a2"), fn x, y -> Column.plus(x, y) end)