ExArrow.Nx (ex_arrow v0.4.0)

View Source

Bridge between ExArrow and Nx tensors.

Converts numeric Arrow columns to Nx.Tensor values (and back) by copying the raw byte buffer once from native Arrow memory into an Elixir binary, then handing it directly to Nx.from_binary/2. No intermediate list materialisation occurs.

Requires {:nx, "~> 0.9"} in your mix.exs dependencies. When Nx is absent every function returns {:error, "Nx is not available..."}.

Supported column types

Arrow typeNx dtype
Int8{:s, 8}
Int16{:s, 16}
Int32{:s, 32}
Int64{:s, 64}
UInt8{:u, 8}
UInt16{:u, 16}
UInt32{:u, 32}
UInt64{:u, 64}
Float32{:f, 32}
Float64{:f, 64}

Columns of other types (Utf8, Boolean, Timestamp, …) are not supported for direct buffer extraction and return {:error, "unsupported column type…"}. to_tensors/1 silently skips non-numeric columns.

Null handling

Arrow null positions are treated as zero bytes in the extracted buffer. If your column contains nulls and you need to distinguish them, inspect the original batch (null support may be added in a future release).

Public API

FunctionDirectionDescription
column_to_tensor/2Arrow → NxExtract one named numeric column as an Nx.Tensor
to_tensors/1Arrow → NxExtract all numeric columns as %{name => Nx.Tensor}
from_tensor/2Nx → ArrowSingle tensor → single-column RecordBatch
from_tensors/1Nx → ArrowMap of tensors → multi-column RecordBatch (single NIF call)

Quick example

# Read a batch, extract one column as a tensor
{:ok, stream}  = ExArrow.Parquet.Reader.from_file("/data/trades.parquet")
batch          = ExArrow.Stream.next(stream)
{:ok, tensor}  = ExArrow.Nx.column_to_tensor(batch, "price")
mean_price     = tensor |> Nx.mean() |> Nx.to_number()

# Build a multi-column batch from tensors (v0.4+)
tensors = %{
  "price"  => Nx.tensor([1.0, 2.0, 3.0], type: {:f, 64}),
  "volume" => Nx.tensor([10, 20, 30],     type: {:s, 64})
}
{:ok, batch} = ExArrow.Nx.from_tensors(tensors)

Summary

Functions

Convert a named numeric column from batch to an Nx.Tensor.

Convert a map of {column_name => Nx.Tensor} to a multi-column ExArrow.RecordBatch in a single call.

Convert all numeric columns from batch to a map of Nx.Tensor values.

Functions

column_to_tensor(batch, col_name)

@spec column_to_tensor(ExArrow.RecordBatch.t(), String.t()) ::
  {:ok, Nx.Tensor.t()} | {:error, String.t()}

Convert a named numeric column from batch to an Nx.Tensor.

The column's raw byte buffer is copied once from native Arrow memory into an Elixir binary, then passed to Nx.from_binary/2. No list materialisation occurs.

Returns {:ok, tensor} or {:error, message}.

Examples

# Extract an int64 column
{:ok, ids} = ExArrow.Nx.column_to_tensor(batch, "id")
Nx.type(ids)   #=> {:s, 64}
Nx.shape(ids)  #=> {1000}

# Extract a float64 column and compute the mean
{:ok, prices} = ExArrow.Nx.column_to_tensor(batch, "price")
Nx.mean(prices) |> Nx.to_number()

# Non-numeric column returns an error
{:error, msg} = ExArrow.Nx.column_to_tensor(batch, "name")
msg #=> "unsupported column type for Nx: Utf8"

# Unknown column returns an error
{:error, msg} = ExArrow.Nx.column_to_tensor(batch, "no_such_col")

from_tensor(tensor, col_name)

@spec from_tensor(Nx.Tensor.t(), String.t()) ::
  {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}

Convert an Nx.Tensor to a single-column ExArrow.RecordBatch.

The tensor's raw bytes are extracted via Nx.to_binary/1 and written into a native Arrow array. For rank-2 or higher-rank tensors, all elements are flattened into a single 1-D column (Nx.size(tensor) elements).

Supported Nx dtypes: {:s, 8|16|32|64}, {:u, 8|16|32|64}, {:f, 32|64}. Other dtypes (e.g. {:bf, 16}, {:c, 64}) return {:error, "unsupported Nx dtype…"}.

Returns {:ok, batch} or {:error, message}.

Examples

# Float64 tensor → RecordBatch
tensor = Nx.tensor([1.0, 2.0, 3.0], type: {:f, 64})
{:ok, batch} = ExArrow.Nx.from_tensor(tensor, "weights")
ExArrow.RecordBatch.num_rows(batch)  #=> 3

# Round-trip: tensor → batch → tensor
original = Nx.tensor([10, 20, 30], type: {:s, 64})
{:ok, batch}     = ExArrow.Nx.from_tensor(original, "vals")
{:ok, recovered} = ExArrow.Nx.column_to_tensor(batch, "vals")
Nx.to_list(recovered)  #=> [10, 20, 30]

# Unsupported dtype
{:error, msg} = ExArrow.Nx.from_tensor(Nx.tensor([1, 2], type: {:bf, 16}), "x")

from_tensors(tensors)

@spec from_tensors(%{required(String.t()) => Nx.Tensor.t()}) ::
  {:ok, ExArrow.RecordBatch.t()} | {:error, String.t()}

Convert a map of {column_name => Nx.Tensor} to a multi-column ExArrow.RecordBatch in a single call.

All tensors must have the same number of elements (Nx.size/1). For rank-2 or higher-rank tensors the elements are flattened into a 1-D column.

Column order in the resulting batch follows Map.to_list/1 ordering (i.e. sorted by key). Supported dtypes are the same as from_tensor/2.

Returns {:ok, batch} or {:error, message}.

Examples

tensors = %{
  "price" => Nx.tensor([1.5, 2.5, 3.5], type: {:f, 64}),
  "qty"   => Nx.tensor([10, 20, 30],     type: {:s, 32})
}
{:ok, batch} = ExArrow.Nx.from_tensors(tensors)
ExArrow.RecordBatch.num_rows(batch)  #=> 3

# Round-trip: all columns
{:ok, recovered} = ExArrow.Nx.to_tensors(batch)
Nx.to_list(recovered["price"])  #=> [1.5, 2.5, 3.5]

# Mismatched sizes return an error
bad = %{"a" => Nx.tensor([1, 2]), "b" => Nx.tensor([1, 2, 3])}
{:error, _} = ExArrow.Nx.from_tensors(bad)

to_tensors(batch)

@spec to_tensors(ExArrow.RecordBatch.t()) ::
  {:ok, %{required(String.t()) => Nx.Tensor.t()}} | {:error, String.t()}

Convert all numeric columns from batch to a map of Nx.Tensor values.

Non-numeric columns (Utf8, Boolean, Timestamp, etc.) are silently skipped.

Returns {:ok, %{column_name => tensor}} or {:error, message}.

Example

{:ok, tensors} = ExArrow.Nx.to_tensors(batch)
# tensors is a map: %{"price" => #Nx.Tensor<...>, "qty" => #Nx.Tensor<...>}
tensors["price"] |> Nx.sort()
Map.keys(tensors)  # only numeric columns are present