DuckDB Elixir Port - Technical Design Document

Overview

This document provides the comprehensive technical design for porting the DuckDB Python client to Elixir. This is a 100% exact port of the Python client functionality to Elixir, maintaining API parity while following Elixir conventions and idioms.

Source Reference

Primary Reference: duckdb-python/ directory in this repository

All implementation decisions should reference the Python implementation for:

API surface area and function signatures
Behavior and semantics
Error handling patterns
Type conversions
Edge cases and special handling

Architecture Overview

Current Implementation (CLI)

DuckdbEx currently uses the DuckDB CLI in JSON mode managed via erlexec. The DuckdbEx.Port GenServer owns the CLI process and parses ordered JSON results into tuple rows. Connections are PIDs; cursors are lightweight wrappers around the connection PID to avoid extra file locks in the CLI backend.

Environment configuration:

DUCKDB_PATH to locate the CLI binary.
DUCKDB_EX_EXEC_AS_ROOT to force root execution when required in containers.
mix duckdb_ex.install installs the CLI into the project priv/duckdb/duckdb and is picked up automatically.

Core Components

The DuckDB Elixir client mirrors the Python architecture with the following main components:

Connection (DuckdbEx.Connection)
- Maps to DuckDBPyConnection (pyconnection.hpp)
- Handles database connection lifecycle
- Manages transactions
- Executes queries and statements
Relation (DuckdbEx.Relation)
- Maps to DuckDBPyRelation (pyrelation.hpp)
- Lazy query builder pattern
- Chainable query operations
- Result materialization
Result (DuckdbEx.Result)
- Maps to DuckDBPyResult (pyresult.hpp)
- Query result container
- Multiple fetch modes
- Type conversions
Type System (DuckdbEx.Type)
- Maps to DuckDBPyType and related type classes
- Type creation and introspection
- Custom type support (enum, decimal, struct, etc.)
Exceptions (DuckdbEx.Exceptions)
- All DuckDB exception types
- Maps Python exception hierarchy to Elixir

Future NIF Layer Design (Not Implemented)

The sections below describe a potential Rustler-based NIF architecture. The current implementation does not use NIFs; it uses the CLI-based port described above.

Technology Stack

Rustler: Elixir NIF framework using Rust
DuckDB Rust bindings: Interface to DuckDB C++ library
Architecture: Rust NIF layer between Elixir and DuckDB

Why Rustler?

Safety: Memory-safe bindings to C++
Performance: Near-native performance
Ergonomics: Better developer experience than C NIFs
Compatibility: Excellent DuckDB Rust bindings exist

NIF Module Structure

native/duckdb_nif/
├── src/
│   ├── lib.rs              # NIF entry point
│   ├── connection.rs       # Connection resource and methods
│   ├── relation.rs         # Relation resource and methods
│   ├── result.rs           # Result handling
│   ├── types.rs            # Type system
│   ├── exceptions.rs       # Exception mapping
│   ├── conversions.rs      # Elixir ↔ DuckDB type conversions
│   ├── arrow.rs            # Arrow integration
│   └── filesystem.rs       # Filesystem integration
└── Cargo.toml

Module Structure

Elixir Module Hierarchy

lib/duckdb_ex/
├── connection.ex           # Main connection module
├── relation.ex             # Relation/query builder
├── result.ex               # Result handling
├── exceptions.ex           # Exception definitions
├── port.ex                 # DuckDB CLI process management
├── default_connection.ex   # Default connection holder
├── cursor.ex               # Cursor wrapper
├── parameters.ex           # SQL parameter interpolation
├── statement.ex            # Statement metadata
├── statement_type.ex       # Statement type enum
└── expected_result_type.ex # Expected result enum

# Planned modules (not yet implemented)
# - type.ex, expression/*, value/*, arrow/*, etc.

API Surface

Connection API

The Connection module provides the primary interface to DuckDB. Reference: duckdb-python/src/duckdb_py/include/duckdb_python/pyconnection/pyconnection.hpp

defmodule DuckdbEx.Connection do
  # Connection Management
  @spec connect(String.t() | :memory, keyword()) :: {:ok, t()} | {:error, term()}
  @spec close(t()) :: :ok
  @spec interrupt(t()) :: :ok

  # Transactions
  @spec begin(t()) :: {:ok, t()} | {:error, term()}
  @spec commit(t()) :: {:ok, t()} | {:error, term()}
  @spec rollback(t()) :: {:ok, t()} | {:error, term()}
  @spec checkpoint(t()) :: {:ok, t()} | {:error, term()}

  # Query Execution
  @spec execute(t(), String.t(), list()) :: {:ok, t()} | {:error, term()}
  @spec execute_many(t(), String.t(), list()) :: {:ok, t()} | {:error, term()}
  @spec query(t(), String.t(), String.t(), list()) :: {:ok, Relation.t()} | {:error, term()}
  @spec sql(t(), String.t(), list()) :: {:ok, Relation.t()} | {:error, term()}

  # Data Sources
  @spec read_csv(t(), String.t() | list(String.t()), keyword()) :: {:ok, Relation.t()}
  @spec read_json(t(), String.t() | list(String.t()), keyword()) :: {:ok, Relation.t()}
  @spec read_parquet(t(), String.t() | list(String.t()), keyword()) :: {:ok, Relation.t()}
  @spec from_arrow(t(), term()) :: {:ok, Relation.t()}
  @spec from_df(t(), term()) :: {:ok, Relation.t()}
  @spec from_query(t(), String.t()) :: {:ok, Relation.t()}

  # Table/View Access
  @spec table(t(), String.t()) :: {:ok, Relation.t()}
  @spec view(t(), String.t()) :: {:ok, Relation.t()}
  @spec values(t(), list()) :: {:ok, Relation.t()}
  @spec table_function(t(), String.t(), list()) :: {:ok, Relation.t()}

  # Schema Operations
  @spec get_table_names(t(), String.t(), boolean()) :: {:ok, list(String.t())}

  # UDF Registration
  @spec create_function(t(), String.t(), (... -> term()), keyword()) :: {:ok, t()}
  @spec remove_function(t(), String.t()) :: {:ok, t()}

  # Type Creation
  @spec map_type(t(), Type.t(), Type.t()) :: Type.t()
  @spec struct_type(t(), keyword()) :: Type.t()
  @spec list_type(t(), Type.t()) :: Type.t()
  @spec array_type(t(), Type.t(), non_neg_integer()) :: Type.t()
  @spec union_type(t(), keyword()) :: Type.t()
  @spec enum_type(t(), String.t(), Type.t(), list()) :: Type.t()
  @spec decimal_type(t(), integer(), integer()) :: Type.t()
  @spec string_type(t(), String.t()) :: Type.t()
  @spec type(t(), String.t()) :: Type.t()

  # Extensions
  @spec install_extension(t(), String.t(), keyword()) :: :ok
  @spec load_extension(t(), String.t()) :: :ok

  # Object Registration
  @spec register(t(), String.t(), term()) :: {:ok, t()}
  @spec unregister(t(), String.t()) :: {:ok, t()}
  @spec append(t(), String.t(), term(), keyword()) :: {:ok, t()}

  # Filesystem
  @spec register_filesystem(t(), term()) :: :ok
  @spec unregister_filesystem(t(), String.t()) :: :ok
  @spec list_filesystems(t()) :: list(String.t())
  @spec filesystem_is_registered(t(), String.t()) :: boolean()

  # Result Fetching (for DBAPI compatibility)
  @spec fetch_one(t()) :: {:ok, tuple() | nil}
  @spec fetch_many(t(), non_neg_integer()) :: {:ok, list(tuple())}
  @spec fetch_all(t()) :: {:ok, list(tuple())}
  @spec fetch_df(t(), keyword()) :: {:ok, term()}
  @spec fetch_arrow(t(), keyword()) :: {:ok, term()}

  # Progress Tracking
  @spec query_progress(t()) :: float()

  # Statement Extraction
  @spec extract_statements(t(), String.t()) :: list(term())
end

Relation API

The Relation module provides the lazy query builder interface. Reference: duckdb-python/src/duckdb_py/include/duckdb_python/pyrelation.hpp

defmodule DuckdbEx.Relation do
  # Basic Operations
  @spec project(t(), list(String.t())) :: t()
  @spec filter(t(), String.t() | Expression.t()) :: t()
  @spec limit(t(), integer(), integer()) :: t()
  @spec order(t(), String.t()) :: t()
  @spec sort(t(), list(String.t())) :: t()
  @spec distinct(t()) :: t()
  @spec unique(t(), String.t()) :: t()

  # Aggregations
  @spec aggregate(t(), String.t() | list(String.t()), String.t()) :: t()
  @spec count(t(), String.t(), keyword()) :: t()
  @spec sum(t(), String.t(), keyword()) :: t()
  @spec avg(t(), String.t(), keyword()) :: t()
  @spec min(t(), String.t(), keyword()) :: t()
  @spec max(t(), String.t(), keyword()) :: t()
  @spec median(t(), String.t(), keyword()) :: t()
  @spec mode(t(), String.t(), keyword()) :: t()
  @spec stddev(t(), String.t(), keyword()) :: t()
  @spec variance(t(), String.t(), keyword()) :: t()
  @spec first(t(), String.t(), keyword()) :: t()
  @spec last(t(), String.t(), keyword()) :: t()
  @spec list(t(), String.t(), keyword()) :: t()
  @spec string_agg(t(), String.t(), String.t(), keyword()) :: t()

  # Window Functions
  @spec row_number(t(), String.t(), String.t()) :: t()
  @spec rank(t(), String.t(), String.t()) :: t()
  @spec dense_rank(t(), String.t(), String.t()) :: t()
  @spec percent_rank(t(), String.t(), String.t()) :: t()
  @spec cume_dist(t(), String.t(), String.t()) :: t()
  @spec ntile(t(), String.t(), integer(), String.t()) :: t()
  @spec lag(t(), String.t(), String.t(), keyword()) :: t()
  @spec lead(t(), String.t(), String.t(), keyword()) :: t()
  @spec first_value(t(), String.t(), keyword()) :: t()
  @spec last_value(t(), String.t(), keyword()) :: t()
  @spec nth_value(t(), String.t(), String.t(), integer(), keyword()) :: t()

  # Set Operations
  @spec union(t(), t()) :: t()
  @spec except(t(), t()) :: t()
  @spec intersect(t(), t()) :: t()

  # Joins
  @spec join(t(), t(), String.t() | Expression.t(), String.t()) :: t()
  @spec cross(t(), t()) :: t()

  # Transformations
  @spec map(t(), (term() -> term()), keyword()) :: t()

  # Execution & Fetching
  @spec execute(t()) :: t()
  @spec fetch_one(t()) :: {:ok, tuple() | nil}
  @spec fetch_many(t(), non_neg_integer()) :: {:ok, list(tuple())}
  @spec fetch_all(t()) :: {:ok, list(tuple())}
  @spec fetch_df(t(), keyword()) :: {:ok, term()}
  @spec fetch_arrow(t(), keyword()) :: {:ok, term()}
  @spec fetch_record_batch(t(), keyword()) :: {:ok, term()}

  # Export
  @spec to_csv(t(), String.t(), keyword()) :: :ok
  @spec to_parquet(t(), String.t(), keyword()) :: :ok
  @spec to_arrow_table(t(), keyword()) :: {:ok, term()}
  @spec to_arrow_capsule(t(), keyword()) :: {:ok, term()}

  # Table/View Creation
  @spec create_view(t(), String.t(), keyword()) :: t()
  @spec insert_into(t(), String.t()) :: :ok
  @spec insert(t(), list()) :: :ok
  @spec update(t(), keyword(), String.t()) :: :ok
  @spec create(t(), String.t()) :: :ok

  # Introspection
  @spec describe(t()) :: t()
  @spec length(t()) :: non_neg_integer()
  @spec shape(t()) :: {non_neg_integer(), non_neg_integer()}
  @spec columns(t()) :: list(String.t())
  @spec column_types(t()) :: list(String.t())
  @spec description(t()) :: list(tuple())
  @spec type(t()) :: String.t()
  @spec alias(t()) :: String.t()
  @spec set_alias(t(), String.t()) :: t()

  # SQL Generation
  @spec to_sql(t()) :: String.t()
  @spec explain(t(), atom()) :: String.t()

  # Display
  @spec to_string(t()) :: String.t()
  @spec print(t(), keyword()) :: :ok
end

Type System

Reference: duckdb-python/duckdb/typing/__init__.py and type-related C++ code

defmodule DuckdbEx.Type do
  # Type struct
  defstruct [:id, :internal_type, :metadata]

  # Standard Types (constants)
  @type standard_type ::
    :boolean | :tinyint | :smallint | :integer | :bigint |
    :hugeint | :utinyint | :usmallint | :uinteger | :ubigint |
    :float | :double | :decimal | :varchar | :blob |
    :timestamp | :timestamp_s | :timestamp_ms | :timestamp_ns |
    :date | :time | :interval | :uuid | :json

  # Composite type constructors
  @spec list(t()) :: t()
  @spec array(t(), non_neg_integer()) :: t()
  @spec map(t(), t()) :: t()
  @spec struct(keyword()) :: t()
  @spec union(keyword()) :: t()
  @spec enum(String.t(), t(), list()) :: t()
  @spec decimal(integer(), integer()) :: t()
end

Exception Hierarchy

Reference: duckdb-python/duckdb/__init__.py exception imports

defmodule DuckdbEx.Exceptions do
  # Base exceptions
  defmodule Error, do: defexception [:message]
  defmodule Warning, do: defexception [:message]

  # Specific exception types
  defmodule DatabaseError, do: defexception [:message]
  defmodule DataError, do: defexception [:message]
  defmodule OperationalError, do: defexception [:message]
  defmodule IntegrityError, do: defexception [:message]
  defmodule InternalError, do: defexception [:message]
  defmodule ProgrammingError, do: defexception [:message]
  defmodule NotSupportedError, do: defexception [:message]

  # DuckDB-specific exceptions
  defmodule BinderException, do: defexception [:message]
  defmodule CatalogException, do: defexception [:message]
  defmodule ConnectionException, do: defexception [:message]
  defmodule ConstraintException, do: defexception [:message]
  defmodule ConversionException, do: defexception [:message]
  defmodule DependencyException, do: defexception [:message]
  defmodule FatalException, do: defexception [:message]
  defmodule HTTPException, do: defexception [:message]
  defmodule InternalException, do: defexception [:message]
  defmodule InterruptException, do: defexception [:message]
  defmodule InvalidInputException, do: defexception [:message]
  defmodule InvalidTypeException, do: defexception [:message]
  defmodule IOException, do: defexception [:message]
  defmodule NotImplementedException, do: defexception [:message]
  defmodule OutOfMemoryException, do: defexception [:message]
  defmodule OutOfRangeException, do: defexception [:message]
  defmodule ParserException, do: defexception [:message]
  defmodule PermissionException, do: defexception [:message]
  defmodule SequenceException, do: defexception [:message]
  defmodule SerializationException, do: defexception [:message]
  defmodule SyntaxException, do: defexception [:message]
  defmodule TransactionException, do: defexception [:message]
  defmodule TypeMismatchException, do: defexception [:message]
end

Data Type Mapping

DuckDB → Elixir Type Conversions

DuckDB Type	Elixir Type	Notes
BOOLEAN	boolean()	Direct mapping
TINYINT/SMALLINT/INTEGER	integer()	Arbitrary precision
BIGINT	integer()	Arbitrary precision
HUGEINT	integer()	Arbitrary precision
FLOAT/DOUBLE	float()	IEEE 754
DECIMAL	Decimal.t()	Use Decimal library
VARCHAR/TEXT	String.t()	UTF-8 strings
BLOB	binary()	Raw bytes
DATE	Date.t()	Elixir Date
TIME	Time.t()	Elixir Time
TIMESTAMP	DateTime.t() / NaiveDateTime.t()	With/without timezone
INTERVAL	DuckdbEx.Interval.t()	Custom struct
UUID	String.t()	String representation
JSON	term()	Decoded JSON via Jason
LIST	list()	Elixir lists
STRUCT	map()	Elixir maps
MAP	%{}	Elixir maps
UNION	tagged tuple	{tag, value}
ENUM	atom() or String.t()	Configurable

Parameter Binding

Support both positional and named parameters:

# Positional
DuckdbEx.Connection.execute(conn, "SELECT * FROM users WHERE id = ?", [42])

# Named (similar to Python)
DuckdbEx.Connection.execute(conn, "SELECT * FROM users WHERE id = :id", [id: 42])

Module-Level API (Default Connection)

Reference: duckdb-python/duckdb/__init__.py for module-level functions

defmodule DuckdbEx do
  # Default connection management
  @spec default_connection() :: Connection.t()
  @spec set_default_connection(Connection.t()) :: :ok

  # Convenience functions using default connection
  @spec connect(String.t() | :memory, keyword()) :: {:ok, Connection.t()}
  @spec close() :: :ok
  @spec execute(String.t(), list()) :: {:ok, Connection.t()}
  @spec query(String.t(), list()) :: {:ok, Relation.t()}
  @spec sql(String.t(), list()) :: {:ok, Relation.t()}
  @spec read_csv(String.t(), keyword()) :: {:ok, Relation.t()}
  @spec read_json(String.t(), keyword()) :: {:ok, Relation.t()}
  @spec read_parquet(String.t(), keyword()) :: {:ok, Relation.t()}
  @spec table(String.t()) :: {:ok, Relation.t()}
  @spec values(list()) :: {:ok, Relation.t()}
end

Resource Management

NIF Resources

Use Rustler resources for:

Connection Resource: Wraps DuckDB connection handle
Relation Resource: Wraps DuckDB relation handle
Result Resource: Wraps query results

// Example resource definition
#[derive(NifStruct)]
#[module = "DuckdbEx.Native.Connection"]
pub struct ConnectionResource {
    // Internal DuckDB connection
    inner: Arc<Mutex<duckdb::Connection>>,
}

#[derive(NifStruct)]
#[module = "DuckdbEx.Native.Relation"]
pub struct RelationResource {
    inner: Arc<Mutex<duckdb::Relation>>,
    connection: ConnectionResource,
}

Resource Lifecycle

Resources cleaned up by BEAM GC
Explicit close() methods for deterministic cleanup
Connection pool support for multiple connections

Error Handling Strategy

NIF Error Propagation

// In Rust NIF
fn execute_query(conn: ResourceArc<ConnectionResource>, query: String)
    -> Result<RelationResource, Error> {
    conn.inner.lock()
        .execute(&query)
        .map(|rel| RelationResource::new(rel, conn.clone()))
        .map_err(|e| Error::DatabaseError(e.to_string()))
}

Elixir Error Handling

# Pattern 1: {:ok, result} | {:error, exception}
case DuckdbEx.Connection.execute(conn, "SELECT * FROM invalid") do
  {:ok, result} -> process(result)
  {:error, %DuckdbEx.Exceptions.CatalogException{} = e} ->
    Logger.error("Table not found: #{e.message}")
end

# Pattern 2: Raise on error (bang methods)
result = DuckdbEx.Connection.execute!(conn, "SELECT * FROM users")

Testing Strategy

Test Structure

Reference: duckdb-python/tests/ for comprehensive test coverage

test/
├── duckdb_ex_test.exs              # Module-level API tests
├── connection_test.exs             # Connection tests
├── relation_test.exs               # Relation/query builder tests
├── result_test.exs                 # Result handling tests
├── type_test.exs                   # Type system tests
├── exception_test.exs              # Exception handling tests
├── integration/
│   ├── csv_test.exs               # CSV reading/writing
│   ├── parquet_test.exs           # Parquet integration
│   ├── arrow_test.exs             # Arrow integration
│   ├── transaction_test.exs       # Transaction tests
│   ├── udf_test.exs               # User-defined functions
│   └── filesystem_test.exs        # Filesystem integration
└── support/
    ├── test_helper.exs
    └── fixtures/
        ├── test.csv
        ├── test.parquet
        └── test.json

Test Patterns

Property-based testing (StreamData)
Comparison testing against Python client
Integration tests with real DuckDB operations
Concurrent access tests
Memory leak detection

Integration Points

Arrow Integration

Support Apache Arrow for zero-copy data exchange:

defmodule DuckdbEx.Arrow do
  @spec to_arrow_table(Relation.t(), keyword()) :: {:ok, term()}
  @spec from_arrow(Connection.t(), term()) :: {:ok, Relation.t()}
  @spec to_arrow_capsule(Relation.t()) :: {:ok, term()}
end

Explorer Integration

Provide first-class integration with Elixir's Explorer library:

defmodule DuckdbEx.Explorer do
  @spec to_dataframe(Relation.t()) :: Explorer.DataFrame.t()
  @spec from_dataframe(Connection.t(), Explorer.DataFrame.t()) :: Relation.t()
end

Nx Integration

Support Nx tensors for numerical computing:

defmodule DuckdbEx.Nx do
  @spec to_tensor(Relation.t()) :: Nx.Tensor.t()
  @spec from_tensor(Connection.t(), Nx.Tensor.t(), keyword()) :: Relation.t()
end

Performance Considerations

Optimization Strategies

Lazy Evaluation: Relations build query plans without execution
Streaming Results: Support for chunked result fetching
Zero-Copy: Use Arrow for data transfer where possible
Connection Pooling: DBConnection-compatible pool
Prepared Statements: Cache compiled queries
Batch Operations: Efficient bulk inserts

Benchmarking

Compare performance against:

Python duckdb client
PostgreSQL (Postgrex)
SQLite (Exqlite)

Security Considerations

SQL Injection: Use parameterized queries exclusively
Resource Limits: Configurable memory limits
Path Traversal: Validate file paths for read_csv, etc.
Extension Loading: Optional restrictions on loading extensions

Configuration

# config/config.exs
config :duckdb_ex,
  default_connection: [
    database: ":memory:",
    config: [
      threads: 4,
      max_memory: "1GB",
      temp_directory: "/tmp/duckdb"
    ]
  ],
  pool: [
    size: 10,
    max_overflow: 5
  ]

Migration from Python

API Compatibility Table

Python API	Elixir API	Notes
`duckdb.connect()`	`DuckdbEx.connect()`	Returns {:ok, conn} tuple
`con.execute()`	`Connection.execute()`	Same parameters
`con.sql()`	`Connection.sql()`	Returns Relation
`rel.filter()`	`Relation.filter()`	Chainable
`rel.fetchall()`	`Relation.fetch_all()`	Returns {:ok, list}
`duckdb.read_csv()`	`DuckdbEx.read_csv()`	Same options

Migration Guide

Document to be created showing side-by-side examples of Python vs Elixir code.

Documentation Requirements

ExDoc: Complete module and function documentation
Guides: Getting started, cookbook, migration guide
Examples: Real-world usage examples
Changelog: Track changes and version compatibility
Type Specs: Complete @spec for all public functions

Future Enhancements

Ecto Adapter: Full Ecto integration
Phoenix Integration: LiveView components
Telemetry: Instrumentation for observability
Distributed Queries: Multi-node query execution
Custom Extensions: Elixir-based DuckDB extensions

References

DuckDB Python source: duckdb-python/ directory
DuckDB Documentation: https://duckdb.org/docs
DuckDB Rust: https://github.com/duckdb/duckdb-rs
Rustler: https://github.com/rusterlium/rustler
DB Connection: https://github.com/elixir-ecto/db_connection

← Previous Page DuckDB Elixir Port - Project Summary

Next Page → DuckDB Elixir Implementation Roadmap