Builder API for writing DataFrames to external storage.
Mirrors PySpark's DataFrameWriter with a builder pattern.
Examples
import SparkEx.Writer
# Write to Parquet
df
|> SparkEx.DataFrame.write()
|> format("parquet")
|> mode(:overwrite)
|> option("compression", "snappy")
|> save("/data/output.parquet")
# Save as table
df
|> SparkEx.DataFrame.write()
|> format("parquet")
|> mode(:append)
|> save_as_table("my_database.my_table")
# Insert into existing table
df
|> SparkEx.DataFrame.write()
|> mode(:append)
|> insert_into("my_table")
# Shorthand: write Parquet
SparkEx.Writer.parquet(df, "/data/output.parquet", mode: :overwrite)
Summary
Functions
Writes the DataFrame as Avro.
Sets bucketing for the write.
Sets clustering columns for the write.
Writes the DataFrame as CSV.
Sets the output data source format (e.g. "parquet", "csv", "json", "orc").
Inserts the DataFrame into an existing table.
Writes the DataFrame via JDBC.
Writes the DataFrame as JSON.
Sets the save mode.
Sets a single writer option.
Merges a map of options into the writer.
Writes the DataFrame as ORC.
Writes the DataFrame as Parquet.
Sets the partitioning columns for the write.
Saves the DataFrame to the given path.
Saves the DataFrame as a named table.
Sets the sort columns for the write.
Writes the DataFrame as text (single column).
Writes the DataFrame as XML.
Types
@type t() :: %SparkEx.Writer{ bucket_by: {pos_integer(), [String.t()]} | nil, cluster_by: [String.t()], df: SparkEx.DataFrame.t(), mode: atom(), options: %{required(String.t()) => String.t()}, partition_by: [String.t()], sort_by: [String.t()], source: String.t() | nil }
Functions
@spec avro(SparkEx.DataFrame.t(), String.t(), keyword()) :: :ok | {:error, term()}
Writes the DataFrame as Avro.
Options
:mode— save mode (default::error_if_exists):options— map of Avro writer options:partition_by— partitioning columns
@spec bucket_by(t(), pos_integer(), [String.t()]) :: t()
Sets bucketing for the write.
Sets clustering columns for the write.
@spec csv(SparkEx.DataFrame.t(), String.t(), keyword()) :: :ok | {:error, term()}
Writes the DataFrame as CSV.
Options
:mode— save mode (default::error_if_exists):header— whether to include a header row:separator— field separator:options— map of CSV writer options
Sets the output data source format (e.g. "parquet", "csv", "json", "orc").
Inserts the DataFrame into an existing table.
When overwrite: true is passed, mode is set to :overwrite.
When overwrite: false is passed, mode is set to :append.
When no :overwrite option is given, the writer's current mode is used unchanged
(matching PySpark's insertInto behavior where mode defaults to server-side handling).
@spec jdbc( SparkEx.DataFrame.t(), keyword() ) :: :ok | {:error, term()}
@spec jdbc(SparkEx.DataFrame.t(), String.t(), String.t(), keyword()) :: :ok | {:error, term()}
Writes the DataFrame via JDBC.
Options
:mode— save mode (default::error_if_exists):options— map of JDBC writer options (e.g.url,dbtable)
@spec json(SparkEx.DataFrame.t(), String.t(), keyword()) :: :ok | {:error, term()}
Writes the DataFrame as JSON.
Options
:mode— save mode (default::error_if_exists):options— map of JSON writer options
Sets the save mode.
:append— append to existing data:overwrite— overwrite existing data:error_if_exists— error if data already exists (default):ignore— silently ignore if data already exists
Sets a single writer option.
Merges a map of options into the writer.
@spec orc(SparkEx.DataFrame.t(), String.t(), keyword()) :: :ok | {:error, term()}
Writes the DataFrame as ORC.
Options
:mode— save mode (default::error_if_exists):options— map of ORC writer options
@spec parquet(SparkEx.DataFrame.t(), String.t(), keyword()) :: :ok | {:error, term()}
Writes the DataFrame as Parquet.
Options
:mode— save mode (default::error_if_exists):options— map of Parquet writer options:partition_by— partitioning columns
Sets the partitioning columns for the write.
Saves the DataFrame to the given path.
Executes the write operation on the Spark server.
Saves the DataFrame as a named table.
Sets the sort columns for the write.
@spec text(SparkEx.DataFrame.t(), String.t(), keyword()) :: :ok | {:error, term()}
Writes the DataFrame as text (single column).
Options
:mode— save mode (default::error_if_exists):options— map of text writer options
@spec xml(SparkEx.DataFrame.t(), String.t(), keyword()) :: :ok | {:error, term()}
Writes the DataFrame as XML.
Options
:mode— save mode (default::error_if_exists):options— map of XML writer options:partition_by— partitioning columns