ExArrow.Parquet.Writer (ex_arrow v0.4.0)

View Source

Parquet file writer: serialise Arrow record batches to a .parquet file or to an in-memory binary.

Accepts an ExArrow.Schema handle and a list of ExArrow.RecordBatch handles produced by any ExArrow source (IPC reader, ADBC execute, Flight do_get, or compute kernels).

Examples

# Write a query result to Parquet
{:ok, db}   = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_sqlite",
                uri: ":memory:")
{:ok, conn} = ExArrow.ADBC.Connection.open(db)
{:ok, stmt} = ExArrow.ADBC.Statement.new(conn, "SELECT 1 AS n, 'hello' AS s")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
{:ok, schema} = ExArrow.Stream.schema(stream)
batches       = ExArrow.Stream.to_list(stream)

:ok = ExArrow.Parquet.Writer.to_file("/out/result.parquet", schema, batches)

# Or serialise to an in-memory binary (e.g. to upload to S3)
{:ok, parquet_bytes} = ExArrow.Parquet.Writer.to_binary(schema, batches)

# Round-trip: write then read back
{:ok, rt_stream} = ExArrow.Parquet.Reader.from_binary(parquet_bytes)
rt_batch = ExArrow.Stream.next(rt_stream)

Summary

Functions

Serialise schema and batches to a Parquet binary in memory.

Write schema and batches to a Parquet file at path.

Functions

to_binary(schema, batches)

@spec to_binary(ExArrow.Schema.t(), [ExArrow.RecordBatch.t()]) ::
  {:ok, binary()} | {:error, String.t()}

Serialise schema and batches to a Parquet binary in memory.

Returns {:ok, binary} or {:error, message}. The binary can be uploaded to object storage, sent over HTTP, or passed to ExArrow.Parquet.Reader.from_binary/1 for a round-trip.

Example

{:ok, schema} = ExArrow.Stream.schema(stream)
batches       = ExArrow.Stream.to_list(stream)
{:ok, bytes}  = ExArrow.Parquet.Writer.to_binary(schema, batches)

# Upload to S3, write to a socket, etc.
byte_size(bytes)  #=> e.g. 2048

to_file(path, schema, batches)

@spec to_file(Path.t(), ExArrow.Schema.t(), [ExArrow.RecordBatch.t()]) ::
  :ok | {:error, String.t()}

Write schema and batches to a Parquet file at path.

Creates or overwrites the file. Returns :ok or {:error, message}.

Example

{:ok, schema}  = ExArrow.Stream.schema(stream)
batches        = ExArrow.Stream.to_list(stream)
:ok = ExArrow.Parquet.Writer.to_file("/data/output.parquet", schema, batches)