DuckDB Elixir Port - Technical Design Document
View SourceOverview
This document provides the comprehensive technical design for porting the DuckDB Python client to Elixir. This is a 100% exact port of the Python client functionality to Elixir, maintaining API parity while following Elixir conventions and idioms.
Source Reference
Primary Reference: duckdb-python/ directory in this repository
All implementation decisions should reference the Python implementation for:
- API surface area and function signatures
- Behavior and semantics
- Error handling patterns
- Type conversions
- Edge cases and special handling
Architecture Overview
Current Implementation (CLI)
DuckdbEx currently uses the DuckDB CLI in JSON mode managed via erlexec. The
DuckdbEx.Port GenServer owns the CLI process and parses ordered JSON results
into tuple rows. Connections are PIDs; cursors are lightweight wrappers around
the connection PID to avoid extra file locks in the CLI backend.
Environment configuration:
DUCKDB_PATHto locate the CLI binary.DUCKDB_EX_EXEC_AS_ROOTto force root execution when required in containers.mix duckdb_ex.installinstalls the CLI into the projectpriv/duckdb/duckdband is picked up automatically.
Core Components
The DuckDB Elixir client mirrors the Python architecture with the following main components:
Connection (
DuckdbEx.Connection)- Maps to
DuckDBPyConnection(pyconnection.hpp) - Handles database connection lifecycle
- Manages transactions
- Executes queries and statements
- Maps to
Relation (
DuckdbEx.Relation)- Maps to
DuckDBPyRelation(pyrelation.hpp) - Lazy query builder pattern
- Chainable query operations
- Result materialization
- Maps to
Result (
DuckdbEx.Result)- Maps to
DuckDBPyResult(pyresult.hpp) - Query result container
- Multiple fetch modes
- Type conversions
- Maps to
Type System (
DuckdbEx.Type)- Maps to
DuckDBPyTypeand related type classes - Type creation and introspection
- Custom type support (enum, decimal, struct, etc.)
- Maps to
Exceptions (
DuckdbEx.Exceptions)- All DuckDB exception types
- Maps Python exception hierarchy to Elixir
Future NIF Layer Design (Not Implemented)
The sections below describe a potential Rustler-based NIF architecture. The current implementation does not use NIFs; it uses the CLI-based port described above.
Technology Stack
- Rustler: Elixir NIF framework using Rust
- DuckDB Rust bindings: Interface to DuckDB C++ library
- Architecture: Rust NIF layer between Elixir and DuckDB
Why Rustler?
- Safety: Memory-safe bindings to C++
- Performance: Near-native performance
- Ergonomics: Better developer experience than C NIFs
- Compatibility: Excellent DuckDB Rust bindings exist
NIF Module Structure
native/duckdb_nif/
├── src/
│ ├── lib.rs # NIF entry point
│ ├── connection.rs # Connection resource and methods
│ ├── relation.rs # Relation resource and methods
│ ├── result.rs # Result handling
│ ├── types.rs # Type system
│ ├── exceptions.rs # Exception mapping
│ ├── conversions.rs # Elixir ↔ DuckDB type conversions
│ ├── arrow.rs # Arrow integration
│ └── filesystem.rs # Filesystem integration
└── Cargo.tomlModule Structure
Elixir Module Hierarchy
lib/duckdb_ex/
├── connection.ex # Main connection module
├── relation.ex # Relation/query builder
├── result.ex # Result handling
├── exceptions.ex # Exception definitions
├── port.ex # DuckDB CLI process management
├── default_connection.ex # Default connection holder
├── cursor.ex # Cursor wrapper
├── parameters.ex # SQL parameter interpolation
├── statement.ex # Statement metadata
├── statement_type.ex # Statement type enum
└── expected_result_type.ex # Expected result enum
# Planned modules (not yet implemented)
# - type.ex, expression/*, value/*, arrow/*, etc.API Surface
Connection API
The Connection module provides the primary interface to DuckDB. Reference: duckdb-python/src/duckdb_py/include/duckdb_python/pyconnection/pyconnection.hpp
defmodule DuckdbEx.Connection do
# Connection Management
@spec connect(String.t() | :memory, keyword()) :: {:ok, t()} | {:error, term()}
@spec close(t()) :: :ok
@spec interrupt(t()) :: :ok
# Transactions
@spec begin(t()) :: {:ok, t()} | {:error, term()}
@spec commit(t()) :: {:ok, t()} | {:error, term()}
@spec rollback(t()) :: {:ok, t()} | {:error, term()}
@spec checkpoint(t()) :: {:ok, t()} | {:error, term()}
# Query Execution
@spec execute(t(), String.t(), list()) :: {:ok, t()} | {:error, term()}
@spec execute_many(t(), String.t(), list()) :: {:ok, t()} | {:error, term()}
@spec query(t(), String.t(), String.t(), list()) :: {:ok, Relation.t()} | {:error, term()}
@spec sql(t(), String.t(), list()) :: {:ok, Relation.t()} | {:error, term()}
# Data Sources
@spec read_csv(t(), String.t() | list(String.t()), keyword()) :: {:ok, Relation.t()}
@spec read_json(t(), String.t() | list(String.t()), keyword()) :: {:ok, Relation.t()}
@spec read_parquet(t(), String.t() | list(String.t()), keyword()) :: {:ok, Relation.t()}
@spec from_arrow(t(), term()) :: {:ok, Relation.t()}
@spec from_df(t(), term()) :: {:ok, Relation.t()}
@spec from_query(t(), String.t()) :: {:ok, Relation.t()}
# Table/View Access
@spec table(t(), String.t()) :: {:ok, Relation.t()}
@spec view(t(), String.t()) :: {:ok, Relation.t()}
@spec values(t(), list()) :: {:ok, Relation.t()}
@spec table_function(t(), String.t(), list()) :: {:ok, Relation.t()}
# Schema Operations
@spec get_table_names(t(), String.t(), boolean()) :: {:ok, list(String.t())}
# UDF Registration
@spec create_function(t(), String.t(), (... -> term()), keyword()) :: {:ok, t()}
@spec remove_function(t(), String.t()) :: {:ok, t()}
# Type Creation
@spec map_type(t(), Type.t(), Type.t()) :: Type.t()
@spec struct_type(t(), keyword()) :: Type.t()
@spec list_type(t(), Type.t()) :: Type.t()
@spec array_type(t(), Type.t(), non_neg_integer()) :: Type.t()
@spec union_type(t(), keyword()) :: Type.t()
@spec enum_type(t(), String.t(), Type.t(), list()) :: Type.t()
@spec decimal_type(t(), integer(), integer()) :: Type.t()
@spec string_type(t(), String.t()) :: Type.t()
@spec type(t(), String.t()) :: Type.t()
# Extensions
@spec install_extension(t(), String.t(), keyword()) :: :ok
@spec load_extension(t(), String.t()) :: :ok
# Object Registration
@spec register(t(), String.t(), term()) :: {:ok, t()}
@spec unregister(t(), String.t()) :: {:ok, t()}
@spec append(t(), String.t(), term(), keyword()) :: {:ok, t()}
# Filesystem
@spec register_filesystem(t(), term()) :: :ok
@spec unregister_filesystem(t(), String.t()) :: :ok
@spec list_filesystems(t()) :: list(String.t())
@spec filesystem_is_registered(t(), String.t()) :: boolean()
# Result Fetching (for DBAPI compatibility)
@spec fetch_one(t()) :: {:ok, tuple() | nil}
@spec fetch_many(t(), non_neg_integer()) :: {:ok, list(tuple())}
@spec fetch_all(t()) :: {:ok, list(tuple())}
@spec fetch_df(t(), keyword()) :: {:ok, term()}
@spec fetch_arrow(t(), keyword()) :: {:ok, term()}
# Progress Tracking
@spec query_progress(t()) :: float()
# Statement Extraction
@spec extract_statements(t(), String.t()) :: list(term())
endRelation API
The Relation module provides the lazy query builder interface. Reference: duckdb-python/src/duckdb_py/include/duckdb_python/pyrelation.hpp
defmodule DuckdbEx.Relation do
# Basic Operations
@spec project(t(), list(String.t())) :: t()
@spec filter(t(), String.t() | Expression.t()) :: t()
@spec limit(t(), integer(), integer()) :: t()
@spec order(t(), String.t()) :: t()
@spec sort(t(), list(String.t())) :: t()
@spec distinct(t()) :: t()
@spec unique(t(), String.t()) :: t()
# Aggregations
@spec aggregate(t(), String.t() | list(String.t()), String.t()) :: t()
@spec count(t(), String.t(), keyword()) :: t()
@spec sum(t(), String.t(), keyword()) :: t()
@spec avg(t(), String.t(), keyword()) :: t()
@spec min(t(), String.t(), keyword()) :: t()
@spec max(t(), String.t(), keyword()) :: t()
@spec median(t(), String.t(), keyword()) :: t()
@spec mode(t(), String.t(), keyword()) :: t()
@spec stddev(t(), String.t(), keyword()) :: t()
@spec variance(t(), String.t(), keyword()) :: t()
@spec first(t(), String.t(), keyword()) :: t()
@spec last(t(), String.t(), keyword()) :: t()
@spec list(t(), String.t(), keyword()) :: t()
@spec string_agg(t(), String.t(), String.t(), keyword()) :: t()
# Window Functions
@spec row_number(t(), String.t(), String.t()) :: t()
@spec rank(t(), String.t(), String.t()) :: t()
@spec dense_rank(t(), String.t(), String.t()) :: t()
@spec percent_rank(t(), String.t(), String.t()) :: t()
@spec cume_dist(t(), String.t(), String.t()) :: t()
@spec ntile(t(), String.t(), integer(), String.t()) :: t()
@spec lag(t(), String.t(), String.t(), keyword()) :: t()
@spec lead(t(), String.t(), String.t(), keyword()) :: t()
@spec first_value(t(), String.t(), keyword()) :: t()
@spec last_value(t(), String.t(), keyword()) :: t()
@spec nth_value(t(), String.t(), String.t(), integer(), keyword()) :: t()
# Set Operations
@spec union(t(), t()) :: t()
@spec except(t(), t()) :: t()
@spec intersect(t(), t()) :: t()
# Joins
@spec join(t(), t(), String.t() | Expression.t(), String.t()) :: t()
@spec cross(t(), t()) :: t()
# Transformations
@spec map(t(), (term() -> term()), keyword()) :: t()
# Execution & Fetching
@spec execute(t()) :: t()
@spec fetch_one(t()) :: {:ok, tuple() | nil}
@spec fetch_many(t(), non_neg_integer()) :: {:ok, list(tuple())}
@spec fetch_all(t()) :: {:ok, list(tuple())}
@spec fetch_df(t(), keyword()) :: {:ok, term()}
@spec fetch_arrow(t(), keyword()) :: {:ok, term()}
@spec fetch_record_batch(t(), keyword()) :: {:ok, term()}
# Export
@spec to_csv(t(), String.t(), keyword()) :: :ok
@spec to_parquet(t(), String.t(), keyword()) :: :ok
@spec to_arrow_table(t(), keyword()) :: {:ok, term()}
@spec to_arrow_capsule(t(), keyword()) :: {:ok, term()}
# Table/View Creation
@spec create_view(t(), String.t(), keyword()) :: t()
@spec insert_into(t(), String.t()) :: :ok
@spec insert(t(), list()) :: :ok
@spec update(t(), keyword(), String.t()) :: :ok
@spec create(t(), String.t()) :: :ok
# Introspection
@spec describe(t()) :: t()
@spec length(t()) :: non_neg_integer()
@spec shape(t()) :: {non_neg_integer(), non_neg_integer()}
@spec columns(t()) :: list(String.t())
@spec column_types(t()) :: list(String.t())
@spec description(t()) :: list(tuple())
@spec type(t()) :: String.t()
@spec alias(t()) :: String.t()
@spec set_alias(t(), String.t()) :: t()
# SQL Generation
@spec to_sql(t()) :: String.t()
@spec explain(t(), atom()) :: String.t()
# Display
@spec to_string(t()) :: String.t()
@spec print(t(), keyword()) :: :ok
endType System
Reference: duckdb-python/duckdb/typing/__init__.py and type-related C++ code
defmodule DuckdbEx.Type do
# Type struct
defstruct [:id, :internal_type, :metadata]
# Standard Types (constants)
@type standard_type ::
:boolean | :tinyint | :smallint | :integer | :bigint |
:hugeint | :utinyint | :usmallint | :uinteger | :ubigint |
:float | :double | :decimal | :varchar | :blob |
:timestamp | :timestamp_s | :timestamp_ms | :timestamp_ns |
:date | :time | :interval | :uuid | :json
# Composite type constructors
@spec list(t()) :: t()
@spec array(t(), non_neg_integer()) :: t()
@spec map(t(), t()) :: t()
@spec struct(keyword()) :: t()
@spec union(keyword()) :: t()
@spec enum(String.t(), t(), list()) :: t()
@spec decimal(integer(), integer()) :: t()
endException Hierarchy
Reference: duckdb-python/duckdb/__init__.py exception imports
defmodule DuckdbEx.Exceptions do
# Base exceptions
defmodule Error, do: defexception [:message]
defmodule Warning, do: defexception [:message]
# Specific exception types
defmodule DatabaseError, do: defexception [:message]
defmodule DataError, do: defexception [:message]
defmodule OperationalError, do: defexception [:message]
defmodule IntegrityError, do: defexception [:message]
defmodule InternalError, do: defexception [:message]
defmodule ProgrammingError, do: defexception [:message]
defmodule NotSupportedError, do: defexception [:message]
# DuckDB-specific exceptions
defmodule BinderException, do: defexception [:message]
defmodule CatalogException, do: defexception [:message]
defmodule ConnectionException, do: defexception [:message]
defmodule ConstraintException, do: defexception [:message]
defmodule ConversionException, do: defexception [:message]
defmodule DependencyException, do: defexception [:message]
defmodule FatalException, do: defexception [:message]
defmodule HTTPException, do: defexception [:message]
defmodule InternalException, do: defexception [:message]
defmodule InterruptException, do: defexception [:message]
defmodule InvalidInputException, do: defexception [:message]
defmodule InvalidTypeException, do: defexception [:message]
defmodule IOException, do: defexception [:message]
defmodule NotImplementedException, do: defexception [:message]
defmodule OutOfMemoryException, do: defexception [:message]
defmodule OutOfRangeException, do: defexception [:message]
defmodule ParserException, do: defexception [:message]
defmodule PermissionException, do: defexception [:message]
defmodule SequenceException, do: defexception [:message]
defmodule SerializationException, do: defexception [:message]
defmodule SyntaxException, do: defexception [:message]
defmodule TransactionException, do: defexception [:message]
defmodule TypeMismatchException, do: defexception [:message]
endData Type Mapping
DuckDB → Elixir Type Conversions
| DuckDB Type | Elixir Type | Notes |
|---|---|---|
| BOOLEAN | boolean() | Direct mapping |
| TINYINT/SMALLINT/INTEGER | integer() | Arbitrary precision |
| BIGINT | integer() | Arbitrary precision |
| HUGEINT | integer() | Arbitrary precision |
| FLOAT/DOUBLE | float() | IEEE 754 |
| DECIMAL | Decimal.t() | Use Decimal library |
| VARCHAR/TEXT | String.t() | UTF-8 strings |
| BLOB | binary() | Raw bytes |
| DATE | Date.t() | Elixir Date |
| TIME | Time.t() | Elixir Time |
| TIMESTAMP | DateTime.t() / NaiveDateTime.t() | With/without timezone |
| INTERVAL | DuckdbEx.Interval.t() | Custom struct |
| UUID | String.t() | String representation |
| JSON | term() | Decoded JSON via Jason |
| LIST | list() | Elixir lists |
| STRUCT | map() | Elixir maps |
| MAP | %{} | Elixir maps |
| UNION | tagged tuple | {tag, value} |
| ENUM | atom() or String.t() | Configurable |
Parameter Binding
Support both positional and named parameters:
# Positional
DuckdbEx.Connection.execute(conn, "SELECT * FROM users WHERE id = ?", [42])
# Named (similar to Python)
DuckdbEx.Connection.execute(conn, "SELECT * FROM users WHERE id = :id", [id: 42])Module-Level API (Default Connection)
Reference: duckdb-python/duckdb/__init__.py for module-level functions
defmodule DuckdbEx do
# Default connection management
@spec default_connection() :: Connection.t()
@spec set_default_connection(Connection.t()) :: :ok
# Convenience functions using default connection
@spec connect(String.t() | :memory, keyword()) :: {:ok, Connection.t()}
@spec close() :: :ok
@spec execute(String.t(), list()) :: {:ok, Connection.t()}
@spec query(String.t(), list()) :: {:ok, Relation.t()}
@spec sql(String.t(), list()) :: {:ok, Relation.t()}
@spec read_csv(String.t(), keyword()) :: {:ok, Relation.t()}
@spec read_json(String.t(), keyword()) :: {:ok, Relation.t()}
@spec read_parquet(String.t(), keyword()) :: {:ok, Relation.t()}
@spec table(String.t()) :: {:ok, Relation.t()}
@spec values(list()) :: {:ok, Relation.t()}
endResource Management
NIF Resources
Use Rustler resources for:
- Connection Resource: Wraps DuckDB connection handle
- Relation Resource: Wraps DuckDB relation handle
- Result Resource: Wraps query results
// Example resource definition
#[derive(NifStruct)]
#[module = "DuckdbEx.Native.Connection"]
pub struct ConnectionResource {
// Internal DuckDB connection
inner: Arc<Mutex<duckdb::Connection>>,
}
#[derive(NifStruct)]
#[module = "DuckdbEx.Native.Relation"]
pub struct RelationResource {
inner: Arc<Mutex<duckdb::Relation>>,
connection: ConnectionResource,
}Resource Lifecycle
- Resources cleaned up by BEAM GC
- Explicit close() methods for deterministic cleanup
- Connection pool support for multiple connections
Error Handling Strategy
NIF Error Propagation
// In Rust NIF
fn execute_query(conn: ResourceArc<ConnectionResource>, query: String)
-> Result<RelationResource, Error> {
conn.inner.lock()
.execute(&query)
.map(|rel| RelationResource::new(rel, conn.clone()))
.map_err(|e| Error::DatabaseError(e.to_string()))
}Elixir Error Handling
# Pattern 1: {:ok, result} | {:error, exception}
case DuckdbEx.Connection.execute(conn, "SELECT * FROM invalid") do
{:ok, result} -> process(result)
{:error, %DuckdbEx.Exceptions.CatalogException{} = e} ->
Logger.error("Table not found: #{e.message}")
end
# Pattern 2: Raise on error (bang methods)
result = DuckdbEx.Connection.execute!(conn, "SELECT * FROM users")Testing Strategy
Test Structure
Reference: duckdb-python/tests/ for comprehensive test coverage
test/
├── duckdb_ex_test.exs # Module-level API tests
├── connection_test.exs # Connection tests
├── relation_test.exs # Relation/query builder tests
├── result_test.exs # Result handling tests
├── type_test.exs # Type system tests
├── exception_test.exs # Exception handling tests
├── integration/
│ ├── csv_test.exs # CSV reading/writing
│ ├── parquet_test.exs # Parquet integration
│ ├── arrow_test.exs # Arrow integration
│ ├── transaction_test.exs # Transaction tests
│ ├── udf_test.exs # User-defined functions
│ └── filesystem_test.exs # Filesystem integration
└── support/
├── test_helper.exs
└── fixtures/
├── test.csv
├── test.parquet
└── test.jsonTest Patterns
- Property-based testing (StreamData)
- Comparison testing against Python client
- Integration tests with real DuckDB operations
- Concurrent access tests
- Memory leak detection
Integration Points
Arrow Integration
Support Apache Arrow for zero-copy data exchange:
defmodule DuckdbEx.Arrow do
@spec to_arrow_table(Relation.t(), keyword()) :: {:ok, term()}
@spec from_arrow(Connection.t(), term()) :: {:ok, Relation.t()}
@spec to_arrow_capsule(Relation.t()) :: {:ok, term()}
endExplorer Integration
Provide first-class integration with Elixir's Explorer library:
defmodule DuckdbEx.Explorer do
@spec to_dataframe(Relation.t()) :: Explorer.DataFrame.t()
@spec from_dataframe(Connection.t(), Explorer.DataFrame.t()) :: Relation.t()
endNx Integration
Support Nx tensors for numerical computing:
defmodule DuckdbEx.Nx do
@spec to_tensor(Relation.t()) :: Nx.Tensor.t()
@spec from_tensor(Connection.t(), Nx.Tensor.t(), keyword()) :: Relation.t()
endPerformance Considerations
Optimization Strategies
- Lazy Evaluation: Relations build query plans without execution
- Streaming Results: Support for chunked result fetching
- Zero-Copy: Use Arrow for data transfer where possible
- Connection Pooling: DBConnection-compatible pool
- Prepared Statements: Cache compiled queries
- Batch Operations: Efficient bulk inserts
Benchmarking
Compare performance against:
- Python duckdb client
- PostgreSQL (Postgrex)
- SQLite (Exqlite)
Security Considerations
- SQL Injection: Use parameterized queries exclusively
- Resource Limits: Configurable memory limits
- Path Traversal: Validate file paths for read_csv, etc.
- Extension Loading: Optional restrictions on loading extensions
Configuration
# config/config.exs
config :duckdb_ex,
default_connection: [
database: ":memory:",
config: [
threads: 4,
max_memory: "1GB",
temp_directory: "/tmp/duckdb"
]
],
pool: [
size: 10,
max_overflow: 5
]Migration from Python
API Compatibility Table
| Python API | Elixir API | Notes |
|---|---|---|
duckdb.connect() | DuckdbEx.connect() | Returns {:ok, conn} tuple |
con.execute() | Connection.execute() | Same parameters |
con.sql() | Connection.sql() | Returns Relation |
rel.filter() | Relation.filter() | Chainable |
rel.fetchall() | Relation.fetch_all() | Returns {:ok, list} |
duckdb.read_csv() | DuckdbEx.read_csv() | Same options |
Migration Guide
Document to be created showing side-by-side examples of Python vs Elixir code.
Documentation Requirements
- ExDoc: Complete module and function documentation
- Guides: Getting started, cookbook, migration guide
- Examples: Real-world usage examples
- Changelog: Track changes and version compatibility
- Type Specs: Complete @spec for all public functions
Future Enhancements
- Ecto Adapter: Full Ecto integration
- Phoenix Integration: LiveView components
- Telemetry: Instrumentation for observability
- Distributed Queries: Multi-node query execution
- Custom Extensions: Elixir-based DuckDB extensions
References
- DuckDB Python source:
duckdb-python/directory - DuckDB Documentation: https://duckdb.org/docs
- DuckDB Rust: https://github.com/duckdb/duckdb-rs
- Rustler: https://github.com/rusterlium/rustler
- DB Connection: https://github.com/elixir-ecto/db_connection