SparkEx.Types (SparkEx v0.1.0)

Copy Markdown View Source

Elixir-friendly type construction for Spark schemas.

Provides helpers to build structured schema types that can be passed to SparkEx.Reader.schema/2 and SparkEx.StreamReader.schema/2.

Examples

import SparkEx.Types

schema = struct_type([
  struct_field("id", :long),
  struct_field("name", :string),
  struct_field("score", :double, nullable: false)
])

reader |> SparkEx.Reader.schema(schema) |> SparkEx.Reader.load("/data")

Summary

Types

A Spark Connect DataType protobuf struct.

A Spark Connect StreamingForeachFunction protobuf struct.

A Spark Connect StorageLevel protobuf struct.

Functions

Creates an array type.

Converts a Spark Connect DataType protobuf value to Spark JSON schema string.

Creates a struct field.

Creates a struct type (schema) from a list of fields.

Converts a struct type to a DDL schema string.

Converts a struct type to a JSON schema string (Spark JSON format).

Converts a struct type to Spark Connect DataType protobuf.

Types

data_type_proto()

@type data_type_proto() :: Spark.Connect.DataType.t()

A Spark Connect DataType protobuf struct.

field()

@type field() :: %{
  name: String.t(),
  type: term(),
  nullable: boolean(),
  metadata: term()
}

foreach_function()

@type foreach_function() :: Spark.Connect.StreamingForeachFunction.t()

A Spark Connect StreamingForeachFunction protobuf struct.

spark_type()

@type spark_type() ::
  :null
  | :boolean
  | :byte
  | :short
  | :integer
  | :long
  | :float
  | :double
  | :string
  | {:string, String.t()}
  | {:char, non_neg_integer()}
  | {:varchar, non_neg_integer()}
  | :binary
  | :date
  | :time
  | :timestamp
  | :timestamp_ntz
  | :day_time_interval
  | :year_month_interval
  | :calendar_interval
  | {:decimal, non_neg_integer(), non_neg_integer()}
  | {:array, spark_type()}
  | {:map, spark_type(), spark_type()}
  | {:struct, [field()]}
  | :variant
  | :geometry
  | :geography

storage_level()

@type storage_level() :: Spark.Connect.StorageLevel.t()

A Spark Connect StorageLevel protobuf struct.

struct_type()

@type struct_type() :: {:struct, [field()]}

Functions

array_type(element_type, opts \\ [])

@spec array_type(
  spark_type(),
  keyword()
) :: {:array, spark_type()} | {:array, spark_type(), boolean()}

Creates an array type.

Examples

array_type(:string)
array_type({:struct, fields})

data_type_to_json(data_type)

@spec data_type_to_json(data_type_proto()) :: String.t()

Converts a Spark Connect DataType protobuf value to Spark JSON schema string.

This mirrors PySpark's DataType.json() output.

map_type(key_type, value_type, opts \\ [])

@spec map_type(spark_type(), spark_type(), keyword()) ::
  {:map, spark_type(), spark_type()}
  | {:map, spark_type(), spark_type(), boolean()}

Creates a map type.

Examples

map_type(:string, :long)

struct_field(name, type, opts \\ [])

@spec struct_field(String.t(), term(), keyword()) :: field()

Creates a struct field.

Options

  • :nullable — whether the field can be null (default: true)
  • :metadata — metadata map (default: %{})

Examples

struct_field("id", :long)
struct_field("name", :string, nullable: false)
struct_field("tags", :string, metadata: %{"comment" => "user tags"})

struct_type(fields)

@spec struct_type([field()]) :: struct_type()

Creates a struct type (schema) from a list of fields.

Examples

struct_type([
  struct_field("id", :long),
  struct_field("name", :string)
])

to_ddl(arg)

@spec to_ddl(struct_type()) :: String.t()

Converts a struct type to a DDL schema string.

Examples

iex> schema = struct_type([struct_field("id", :long), struct_field("name", :string)])
iex> SparkEx.Types.to_ddl(schema)
"id LONG, name STRING"

to_json(arg)

@spec to_json(struct_type()) :: String.t()

Converts a struct type to a JSON schema string (Spark JSON format).

This produces the same JSON that PySpark's StructType.json() generates.

Examples

iex> schema = struct_type([struct_field("id", :long), struct_field("name", :string)])
iex> SparkEx.Types.to_json(schema)
~s({"type":"struct","fields":[{"name":"id","type":"long","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}}]})

to_proto(schema)

@spec to_proto(struct_type()) :: data_type_proto()

Converts a struct type to Spark Connect DataType protobuf.

Preserves JSON-level fidelity (field nullability and metadata) for nested types.