ExArrow.Nx
(ex_arrow v0.4.0)
View Source
Bridge between ExArrow and Nx tensors.
Converts numeric Arrow columns to Nx.Tensor values (and back) by copying
the raw byte buffer once from native Arrow memory into an Elixir binary, then
handing it directly to Nx.from_binary/2. No intermediate list
materialisation occurs.
Requires {:nx, "~> 0.9"} in your mix.exs dependencies. When Nx is
absent every function returns {:error, "Nx is not available..."}.
Supported column types
| Arrow type | Nx dtype |
|---|---|
| Int8 | {:s, 8} |
| Int16 | {:s, 16} |
| Int32 | {:s, 32} |
| Int64 | {:s, 64} |
| UInt8 | {:u, 8} |
| UInt16 | {:u, 16} |
| UInt32 | {:u, 32} |
| UInt64 | {:u, 64} |
| Float32 | {:f, 32} |
| Float64 | {:f, 64} |
Columns of other types (Utf8, Boolean, Timestamp, …) are not supported for
direct buffer extraction and return {:error, "unsupported column type…"}.
to_tensors/1 silently skips non-numeric columns.
Null handling
Arrow null positions are treated as zero bytes in the extracted buffer. If your column contains nulls and you need to distinguish them, inspect the original batch (null support may be added in a future release).
Public API
| Function | Direction | Description |
|---|---|---|
column_to_tensor/2 | Arrow → Nx | Extract one named numeric column as an Nx.Tensor |
to_tensors/1 | Arrow → Nx | Extract all numeric columns as %{name => Nx.Tensor} |
from_tensor/2 | Nx → Arrow | Single tensor → single-column RecordBatch |
from_tensors/1 | Nx → Arrow | Map of tensors → multi-column RecordBatch (single NIF call) |
Quick example
# Read a batch, extract one column as a tensor
{:ok, stream} = ExArrow.Parquet.Reader.from_file("/data/trades.parquet")
batch = ExArrow.Stream.next(stream)
{:ok, tensor} = ExArrow.Nx.column_to_tensor(batch, "price")
mean_price = tensor |> Nx.mean() |> Nx.to_number()
# Build a multi-column batch from tensors (v0.4+)
tensors = %{
"price" => Nx.tensor([1.0, 2.0, 3.0], type: {:f, 64}),
"volume" => Nx.tensor([10, 20, 30], type: {:s, 64})
}
{:ok, batch} = ExArrow.Nx.from_tensors(tensors)
Summary
Functions
Convert a named numeric column from batch to an Nx.Tensor.
Convert an Nx.Tensor to a single-column ExArrow.RecordBatch.
Convert a map of {column_name => Nx.Tensor} to a multi-column
ExArrow.RecordBatch in a single call.
Convert all numeric columns from batch to a map of Nx.Tensor values.
Functions
@spec column_to_tensor(ExArrow.RecordBatch.t(), String.t()) :: {:ok, Nx.Tensor.t()} | {:error, String.t()}
Convert a named numeric column from batch to an Nx.Tensor.
The column's raw byte buffer is copied once from native Arrow memory into an
Elixir binary, then passed to Nx.from_binary/2. No list materialisation
occurs.
Returns {:ok, tensor} or {:error, message}.
Examples
# Extract an int64 column
{:ok, ids} = ExArrow.Nx.column_to_tensor(batch, "id")
Nx.type(ids) #=> {:s, 64}
Nx.shape(ids) #=> {1000}
# Extract a float64 column and compute the mean
{:ok, prices} = ExArrow.Nx.column_to_tensor(batch, "price")
Nx.mean(prices) |> Nx.to_number()
# Non-numeric column returns an error
{:error, msg} = ExArrow.Nx.column_to_tensor(batch, "name")
msg #=> "unsupported column type for Nx: Utf8"
# Unknown column returns an error
{:error, msg} = ExArrow.Nx.column_to_tensor(batch, "no_such_col")
@spec from_tensor(Nx.Tensor.t(), String.t()) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}
Convert an Nx.Tensor to a single-column ExArrow.RecordBatch.
The tensor's raw bytes are extracted via Nx.to_binary/1 and written into
a native Arrow array. For rank-2 or higher-rank tensors, all elements are
flattened into a single 1-D column (Nx.size(tensor) elements).
Supported Nx dtypes: {:s, 8|16|32|64}, {:u, 8|16|32|64},
{:f, 32|64}. Other dtypes (e.g. {:bf, 16}, {:c, 64}) return
{:error, "unsupported Nx dtype…"}.
Returns {:ok, batch} or {:error, message}.
Examples
# Float64 tensor → RecordBatch
tensor = Nx.tensor([1.0, 2.0, 3.0], type: {:f, 64})
{:ok, batch} = ExArrow.Nx.from_tensor(tensor, "weights")
ExArrow.RecordBatch.num_rows(batch) #=> 3
# Round-trip: tensor → batch → tensor
original = Nx.tensor([10, 20, 30], type: {:s, 64})
{:ok, batch} = ExArrow.Nx.from_tensor(original, "vals")
{:ok, recovered} = ExArrow.Nx.column_to_tensor(batch, "vals")
Nx.to_list(recovered) #=> [10, 20, 30]
# Unsupported dtype
{:error, msg} = ExArrow.Nx.from_tensor(Nx.tensor([1, 2], type: {:bf, 16}), "x")
@spec from_tensors(%{required(String.t()) => Nx.Tensor.t()}) :: {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}
Convert a map of {column_name => Nx.Tensor} to a multi-column
ExArrow.RecordBatch in a single call.
All tensors must have the same number of elements (Nx.size/1). For
rank-2 or higher-rank tensors the elements are flattened into a 1-D column.
Column order in the resulting batch follows Map.to_list/1 ordering (i.e.
sorted by key). Supported dtypes are the same as from_tensor/2.
Returns {:ok, batch} or {:error, message}.
Examples
tensors = %{
"price" => Nx.tensor([1.5, 2.5, 3.5], type: {:f, 64}),
"qty" => Nx.tensor([10, 20, 30], type: {:s, 32})
}
{:ok, batch} = ExArrow.Nx.from_tensors(tensors)
ExArrow.RecordBatch.num_rows(batch) #=> 3
# Round-trip: all columns
{:ok, recovered} = ExArrow.Nx.to_tensors(batch)
Nx.to_list(recovered["price"]) #=> [1.5, 2.5, 3.5]
# Mismatched sizes return an error
bad = %{"a" => Nx.tensor([1, 2]), "b" => Nx.tensor([1, 2, 3])}
{:error, _} = ExArrow.Nx.from_tensors(bad)
@spec to_tensors(ExArrow.RecordBatch.t()) :: {:ok, %{required(String.t()) => Nx.Tensor.t()}} | {:error, String.t()}
Convert all numeric columns from batch to a map of Nx.Tensor values.
Non-numeric columns (Utf8, Boolean, Timestamp, etc.) are silently skipped.
Returns {:ok, %{column_name => tensor}} or {:error, message}.
Example
{:ok, tensors} = ExArrow.Nx.to_tensors(batch)
# tensors is a map: %{"price" => #Nx.Tensor<...>, "qty" => #Nx.Tensor<...>}
tensors["price"] |> Nx.sort()
Map.keys(tensors) # only numeric columns are present