ExArrow.Compute
(ex_arrow v0.4.0)
View Source
Arrow compute kernels: filter rows, project (select) columns, and sort — all entirely in native memory.
Column buffers are never copied into BEAM terms. Every function takes one or
more ExArrow.RecordBatch handles and returns a new handle. The result can
be passed directly to ExArrow.IPC.Writer, ExArrow.Flight.Client.do_put/4,
or further compute operations without any intermediate serialisation.
Quick example
Given a batch from an ADBC query or IPC file:
# Select only the columns you need
{:ok, slim} = ExArrow.Compute.project(batch, ["user_id", "score"])
# Sort by score descending
{:ok, sorted} = ExArrow.Compute.sort(slim, "score", ascending: false)Building a boolean predicate for filter/2
filter/2 expects the first column of a second record batch to be a
boolean Arrow array. The most common source is a query result that already
contains a boolean column:
# e.g. "SELECT id, score, is_active FROM users"
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
batch = ExArrow.Stream.next(stream)
# Project the boolean column into its own batch
{:ok, mask} = ExArrow.Compute.project(batch, ["is_active"])
{:ok, filtered} = ExArrow.Compute.filter(batch, mask)You can also write a Parquet/IPC file that contains a pre-computed boolean column and read it back as the predicate.
Summary
Functions
Filter rows from batch using the first (boolean) column of predicate_batch.
Project (select) a subset of columns from batch by name.
Sort batch by column_name.
Functions
@spec filter(ExArrow.RecordBatch.t(), ExArrow.RecordBatch.t()) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}
Filter rows from batch using the first (boolean) column of predicate_batch.
predicate_batch must have at least one column and its first column must be
a boolean Arrow array with the same row count as batch. Rows where the
predicate is true are kept; rows where it is false or null are dropped.
Returns {:ok, filtered_batch} or {:error, message}.
Example
# Keep only rows where "is_active" is true.
# batch has columns [id, score, is_active]; extract the bool column first.
{:ok, mask} = ExArrow.Compute.project(batch, ["is_active"])
{:ok, filtered} = ExArrow.Compute.filter(batch, mask)
# filtered has the same columns as batch but only the rows where is_active = true
@spec project(ExArrow.RecordBatch.t(), [String.t()]) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}
Project (select) a subset of columns from batch by name.
Columns appear in the result in the order given by column_names. Requesting
a name that does not exist returns {:error, "column 'x' not found"}.
Returns {:ok, projected_batch} or {:error, message}.
Examples
# Select two columns; result schema has only [user_id, score]
{:ok, slim} = ExArrow.Compute.project(batch, ["user_id", "score"])
# Reorder: result schema is [score, user_id]
{:ok, reordered} = ExArrow.Compute.project(batch, ["score", "user_id"])
# Unknown column
{:error, "column 'missing' not found"} =
ExArrow.Compute.project(batch, ["missing"])
@spec sort(ExArrow.RecordBatch.t(), String.t(), keyword()) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}
Sort batch by column_name.
All columns in the batch are reordered together — the sort is applied to the full batch, not just the key column. Nulls are always placed first regardless of sort direction.
Options
:ascending—true(default) for A→Z / small→large order;falsefor descending.
Returns {:ok, sorted_batch} or {:error, message}.
Examples
# Sort by score, lowest first (default)
{:ok, sorted} = ExArrow.Compute.sort(batch, "score")
# Sort by score, highest first
{:ok, sorted} = ExArrow.Compute.sort(batch, "score", ascending: false)
# Sort by a string column alphabetically
{:ok, sorted} = ExArrow.Compute.sort(batch, "name")
# Unknown column
{:error, msg} = ExArrow.Compute.sort(batch, "nonexistent")