EctoFDB stores each entity as a single FDB value using Erlang's External Term
Format (:erlang.term_to_binary/1). The input is a keyword list where each key
is an atom corresponding to a column name and each value is the column's Elixir
term. For example:
[id: "abc-123", name: "Alice", notes: nil, inserted_at: ~N[...], updated_at: ~N[...]]This document describes why this encoding was chosen and the tradeoffs involved.
Ecto.Schema as the source of truth
Because every row carries its own column names, EctoFDB does not need to store a
schema definition as a separate metadata entry in FoundationDB, and does not
require a create table migration statement. Instead, the
Ecto.Schema module in the application code is the sole source of truth for
the shape of the data.
This is a natural fit for FoundationDB's client-driven Layer architecture: the client is always in control of how data is interpreted. Avoiding a stored schema eliminates an entire class of coordination problems:
- No schema metadata key to read on every transaction.
- No cache invalidation surface for schema changes.
- No versioned schema mappings to maintain.
The Metadata system (see metadata.md) already handles index metadata and
cache invalidation. Keeping the value encoding free of metadata dependencies
keeps the common read/write path simple.
Column Evolution
Adding a column requires only a code change to the Ecto.Schema. Existing
rows that predate the new column will decode without it, and the missing key
resolves to nil at the Ecto layer. No backfill migration is needed.
Renaming a column requires rewriting all existing rows, since the old atom name is embedded in each value. This is consistent with the general expectation that column renames are rare and expensive operations.
In practice, the tenant model eases this during development: rather than migrating a tenant's data through a rename, it is often simpler to delete the tenant's data and let it repopulate from scratch. This works well when tenants are cheap to recreate and the authoritative data lives elsewhere or can be reseeded.
Why term_to_binary?
:erlang.term_to_binary/1 is implemented as a single C-level BIF in the BEAM.
It handles the entire keyword list (atoms, strings, integers, floats,
timestamps, nested structures) in one call with no Elixir-level dispatch per
field. It's well optimized, portable, simple, reliable, and well-trodden.
We benchmarked an alternative encoding that strips atom keys out of the binary
and stores them as a NUL-delimited string header, with a single
:erlang.term_to_binary call for the values list:
<<atom1, 0x00, atom2, 0x00, ..., atomN, 0x00, 0x00, term_to_binary(values_list)>>Results across representative row shapes (narrow 2-field rows, typical 5-field rows, wide 12-field rows, and content-heavy 9-field rows):
| Metric | Custom vs. Baseline |
|---|---|
| Size savings | 1-7% |
| Encode speed | 1.2x slower |
| Decode speed | 1.6-2.7x slower |
The size savings are modest because Erlang's External Term Format already
encodes atoms compactly (a small tag followed by the atom bytes). The
performance cost comes from the Elixir-level header parsing and per-field
String.to_existing_atom calls on decode, which cannot compete with the
BEAM's native binary_to_term.
Storage Overhead
Each atom key costs roughly 10-15 bytes per row in the serialized form. For a typical schema with 5-8 fields, this is 50-120 bytes of overhead per entity. The overhead as a percentage of total row size depends on the data:
- Narrow rows (2-3 small fields): ~40-60% overhead
- Typical rows (5-8 fields, moderate values): ~15-30%
- Content-heavy rows (large string fields): <5%
This overhead is not free. Larger values reduce the number of rows that fit in a single FDB transaction (10 MB limit), which matters for bulk data loading scenarios. For narrow rows with high atom overhead, the effective throughput per transaction can be noticeably lower than it would be with a more compact encoding.
Comparison with Other Systems
The schema-per-row approach is similar to DynamoDB's approach:
- DynamoDB stores attribute names with every item, for the same reasons: schemaless flexibility and no coordination on schema changes.
Some alternative approaches include:
- FoundationDB Record Layer (Java) uses Protocol Buffers, which replace field names with small integer tags. This is more space-efficient but requires a compiled schema definition and careful field-number evolution rules.
- CockroachDB uses small integer column-ID deltas in a compact TUPLE encoding within each column family's value. Column names are not stored per row; the schema maps IDs to names.
- Spanner uses an internal columnar/PAX storage format (Ressi) where column identity is part of the schema metadata, not repeated per row.
EctoFDB's approach trades a moderate, predictable storage overhead for a significantly simpler architecture with no stored schema coordination.
Future Direction: Integer-Keyed Encoding
This strategy is not yet implemented.
A potential optimization is to replace atom keys with user-assigned integers, similar to Protocol Buffer field numbers. Instead of:
:erlang.term_to_binary([id: "abc-123", name: "Alice"])The encoding would be:
:erlang.term_to_binary([{0, "abc-123"}, {1, "Alice"}])Benchmarks show this yields 8-15% size savings on typical rows, with encode performance roughly on par with the current approach and decode at break-even.
This would be opt-in via a schema annotation, for example:
schema "users" do
field :name, :string, fdb_key: 1
field :notes, :string, fdb_key: 2
timestamps()
endIf no fdb_key annotations are present, the current keyword-list encoding would be
used. The decoder can disambiguate the two formats by inspecting the
deserialized term: a keyword list (atom-keyed tuples) vs integer-keyed tuples.
The user would own the stability of the integer mapping, the same contract as protobuf field numbers: once assigned, an integer must never be reassigned to a different field. This is a reasonable burden because it enables two benefits beyond size savings:
- Column renames become free. Renaming
:nameto:full_namerequires only a schema change; the integer key1stays the same, so existing data needs no migration, assuming all records were stored with this scheme. - No stored metadata. The
Ecto.Schemamodule remains the sole source of truth for the field-to-integer mapping, preserving the architecture described in this document.