Elixir-friendly type construction for Spark schemas.
Provides helpers to build structured schema types that can be passed
to SparkEx.Reader.schema/2 and SparkEx.StreamReader.schema/2.
Examples
import SparkEx.Types
schema = struct_type([
struct_field("id", :long),
struct_field("name", :string),
struct_field("score", :double, nullable: false)
])
reader |> SparkEx.Reader.schema(schema) |> SparkEx.Reader.load("/data")
Summary
Types
A Spark Connect DataType protobuf struct.
A Spark Connect StreamingForeachFunction protobuf struct.
A Spark Connect StorageLevel protobuf struct.
Functions
Creates an array type.
Converts a Spark Connect DataType protobuf value to Spark JSON schema string.
Creates a map type.
Creates a struct field.
Creates a struct type (schema) from a list of fields.
Converts a struct type to a DDL schema string.
Converts a struct type to a JSON schema string (Spark JSON format).
Converts a struct type to Spark Connect DataType protobuf.
Types
@type data_type_proto() :: Spark.Connect.DataType.t()
A Spark Connect DataType protobuf struct.
@type foreach_function() :: Spark.Connect.StreamingForeachFunction.t()
A Spark Connect StreamingForeachFunction protobuf struct.
@type spark_type() :: :null | :boolean | :byte | :short | :integer | :long | :float | :double | :string | {:string, String.t()} | {:char, non_neg_integer()} | {:varchar, non_neg_integer()} | :binary | :date | :time | :timestamp | :timestamp_ntz | :day_time_interval | :year_month_interval | :calendar_interval | {:decimal, non_neg_integer(), non_neg_integer()} | {:array, spark_type()} | {:map, spark_type(), spark_type()} | {:struct, [field()]} | :variant | :geometry | :geography
@type storage_level() :: Spark.Connect.StorageLevel.t()
A Spark Connect StorageLevel protobuf struct.
@type struct_type() :: {:struct, [field()]}
Functions
@spec array_type( spark_type(), keyword() ) :: {:array, spark_type()} | {:array, spark_type(), boolean()}
Creates an array type.
Examples
array_type(:string)
array_type({:struct, fields})
@spec data_type_to_json(data_type_proto()) :: String.t()
Converts a Spark Connect DataType protobuf value to Spark JSON schema string.
This mirrors PySpark's DataType.json() output.
@spec map_type(spark_type(), spark_type(), keyword()) :: {:map, spark_type(), spark_type()} | {:map, spark_type(), spark_type(), boolean()}
Creates a map type.
Examples
map_type(:string, :long)
Creates a struct field.
Options
:nullable— whether the field can be null (default:true):metadata— metadata map (default:%{})
Examples
struct_field("id", :long)
struct_field("name", :string, nullable: false)
struct_field("tags", :string, metadata: %{"comment" => "user tags"})
@spec struct_type([field()]) :: struct_type()
Creates a struct type (schema) from a list of fields.
Examples
struct_type([
struct_field("id", :long),
struct_field("name", :string)
])
@spec to_ddl(struct_type()) :: String.t()
Converts a struct type to a DDL schema string.
Examples
iex> schema = struct_type([struct_field("id", :long), struct_field("name", :string)])
iex> SparkEx.Types.to_ddl(schema)
"id LONG, name STRING"
@spec to_json(struct_type()) :: String.t()
Converts a struct type to a JSON schema string (Spark JSON format).
This produces the same JSON that PySpark's StructType.json() generates.
Examples
iex> schema = struct_type([struct_field("id", :long), struct_field("name", :string)])
iex> SparkEx.Types.to_json(schema)
~s({"type":"struct","fields":[{"name":"id","type":"long","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}}]})
@spec to_proto(struct_type()) :: data_type_proto()
Converts a struct type to Spark Connect DataType protobuf.
Preserves JSON-level fidelity (field nullability and metadata) for nested types.