ExArrow.Explorer
(ex_arrow v0.4.0)
View Source
Bridge between ExArrow and Explorer DataFrames.
Converts between ExArrow.Stream / ExArrow.RecordBatch and
Explorer.DataFrame via an in-memory Arrow IPC round-trip. No CSV or
row-by-row conversion is performed — the path is always columnar binary.
Requires {:explorer, "~> 0.11"} in your mix.exs dependencies. When
Explorer is absent every function returns {:error, "Explorer is not available..."}.
Typical usage
ExArrow → Explorer (e.g. after a Flight or ADBC query):
{:ok, stream} = ExArrow.Flight.Client.do_get(client, "sales_2024")
{:ok, df} = ExArrow.Explorer.from_stream(stream)
Explorer.DataFrame.filter(df, score > 0.9)Explorer → ExArrow (e.g. to write to Parquet or send via Flight):
df = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
{:ok, stream} = ExArrow.Explorer.to_stream(df)
:ok = ExArrow.Flight.Client.do_put(client, stream_schema, batches,
descriptor: {:cmd, "enriched"})C Data Interface (CDI) — future zero-copy path
The current implementation serialises through an IPC binary. ExArrow.CDI
provides CDI export/import that completely bypasses serialisation. When
Explorer exposes a CDI import API the bridge here will use it automatically,
making from_record_batch/1 and from_stream/1 truly zero-copy. See
ExArrow.CDI for the low-level interface.
Summary
Functions
Convert a single ExArrow.RecordBatch to an Explorer.DataFrame.
Convert an ExArrow.Stream to an Explorer.DataFrame.
Convert an Explorer.DataFrame to a list of ExArrow.RecordBatch handles.
Convert an Explorer.DataFrame to an ExArrow.Stream.
Functions
@spec from_record_batch(ExArrow.RecordBatch.t()) :: {:ok, Explorer.DataFrame.t()} | {:error, String.t()}
Convert a single ExArrow.RecordBatch to an Explorer.DataFrame.
Returns {:ok, dataframe} or {:error, message}.
Example
{:ok, stream} = ExArrow.IPC.Reader.from_file("/data/chunk.arrow")
batch = ExArrow.Stream.next(stream)
{:ok, df} = ExArrow.Explorer.from_record_batch(batch)
Explorer.DataFrame.names(df)
#=> ["id", "name", "score"]
@spec from_stream(ExArrow.Stream.t()) :: {:ok, Explorer.DataFrame.t()} | {:error, String.t()}
Convert an ExArrow.Stream to an Explorer.DataFrame.
Collects all batches from stream, serialises them to Arrow IPC, then
loads the binary with Explorer.DataFrame.load_ipc_stream!/1.
Returns {:ok, dataframe} or {:error, message}.
Example
{:ok, stream} = ExArrow.IPC.Reader.from_file("/data/events.arrow")
{:ok, df} = ExArrow.Explorer.from_stream(stream)
Explorer.DataFrame.n_rows(df)
#=> 1_000_000
@spec to_record_batches(Explorer.DataFrame.t()) :: {:ok, [ExArrow.RecordBatch.t()]} | {:error, String.t()}
Convert an Explorer.DataFrame to a list of ExArrow.RecordBatch handles.
Returns {:ok, [batch]} or {:error, message}.
Example
df = Explorer.DataFrame.new(a: [10, 20], b: [1.0, 2.0])
{:ok, batches} = ExArrow.Explorer.to_record_batches(df)
total_rows = Enum.sum(Enum.map(batches, &ExArrow.RecordBatch.num_rows/1))
#=> 2
@spec to_stream(Explorer.DataFrame.t()) :: {:ok, ExArrow.Stream.t()} | {:error, String.t()}
Convert an Explorer.DataFrame to an ExArrow.Stream.
Serialises the dataframe to Arrow IPC via Explorer.DataFrame.dump_ipc_stream!/1,
then opens an ExArrow.Stream from the resulting binary.
Returns {:ok, stream} or {:error, message}.
Example
df = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
{:ok, stream} = ExArrow.Explorer.to_stream(df)
{:ok, schema} = ExArrow.Stream.schema(stream)
ExArrow.Schema.field_names(schema)
#=> ["x", "y"]