C Data Interface Guide

View Source

The Arrow C Data Interface (CDI) is a standardised C ABI that lets Arrow implementations transfer data without serialisation — no IPC bytes, no copies, just raw C struct pointers shared between runtimes that both live in the same OS process.

ExArrow v0.4 introduces ExArrow.CDI which exposes the full CDI export/import cycle from Elixir.

How it works

ExArrow RecordBatch
        
          cdi_export  (arrow-rs to_ffi)
FFI_ArrowSchema + FFI_ArrowArray  (heap-allocated C structs)
        
          schema_ptr / array_ptr
   integer addresses (uintptr_t cast to u64)
        
          external CDI consumer (future Explorer, Polars, DuckDB, )
   zero-copy import into the consumer's Arrow runtime

When both ExArrow and the consuming library are loaded into the same BEAM process, the C structs are valid shared memory — no network, no file, no binary copy is needed.

Within ExArrow (round-trip)

The simplest use is a within-ExArrow round-trip, which is also a useful correctness test:

{:ok, batch}  = ExArrow.IPC.Reader.from_file("trades.arrow") |> then(&ExArrow.Stream.next/1)

{:ok, handle} = ExArrow.CDI.export(batch)
{:ok, batch2} = ExArrow.CDI.import(handle)

ExArrow.RecordBatch.num_rows(batch2)  #=> same as batch

export/1 allocates FFI_ArrowArray and FFI_ArrowSchema on the heap and wraps them in a BEAM-managed resource handle. import/1 consumes the handle, rebuilds the RecordBatch, and safely releases all native memory.

With an external CDI consumer

Any CDI-compatible library loaded in the same BEAM process can import the raw C struct pointers:

{:ok, handle}           = ExArrow.CDI.export(batch)
{schema_ptr, array_ptr} = ExArrow.CDI.pointers(handle)

# Hand the integer addresses to the external consumer.
# Keep `handle` alive (in scope) until the consumer has finished importing!
SomeLib.import_arrow_cdi(schema_ptr, array_ptr)

# Tell ExArrow the consumer has taken ownership (called release internally).
:ok = ExArrow.CDI.mark_consumed(handle)

After mark_consumed/1 the BEAM GC will drop the handle without calling the Arrow release callback a second time, preventing a double-free.

Memory safety guarantees

ScenarioWhat happens
import/1 called — ExArrow consumes the handlePointers atomically swapped to null; Drop is a no-op
mark_consumed/1 called — external consumer took the dataSame as above
Handle GC'd without import or mark_consumedDrop calls Arrow release callbacks; underlying data freed
External consumer already called release (null'd the callback)Drop sees null release; no double-free

Explorer CDI path (roadmap)

ExArrow.Explorer currently uses an IPC binary round-trip. The CDI module lays the groundwork for a zero-copy path that will activate automatically once Explorer exposes a CDI import API. No code changes in user applications will be required — the bridge will detect CDI availability at compile time and choose the fastest available path.

See also