Grove Roadmap

View Source

Implementation plan for the Grove CRDT library, organized into phases based on dependencies and complexity.


Phase 1: Foundation

Goal: Establish core abstractions and implement basic CRDT types.

1.1 Core Abstractions

  • [x] Define Grove.CRDT behaviour with callbacks:
    • new/1, merge/2, value/1
    • Optional: apply_operation/2, delta/1, reset_delta/1
  • [x] Define protocols:
  • [x] Implement Grove.VectorClock:
    • new/0, increment/2, merge/2
    • compare/2 returning :before | :after | :equal | :concurrent

    • descends?/2 for causal ordering

1.2 Counter CRDTs

  • [x] Grove.GCounter (Grow-only Counter)

    • State: %{actor => count}
    • Operations: increment/1, increment/2 (with amount)
    • Merge: pointwise maximum
    • Value: sum of all counts
  • [x] Grove.PNCounter (Positive-Negative Counter)

    • State: %{positive: GCounter, negative: GCounter}
    • Operations: increment/1, decrement/1
    • Value: positive.value - negative.value

1.3 Register CRDTs

  • [x] Grove.LWWRegister (Last-Write-Wins Register)

    • State: %{value: term, timestamp: Lamport.t, actor: actor}
    • Use Lamport timestamps (NOT wall-clock)
    • Tiebreaker: actor ID comparison
  • [x] Grove.MVRegister (Multi-Value Register)

    • State: %{values: %{{value, vclock} => true}}
    • Preserve all concurrent values
    • value/1 returns list of concurrent values

1.4 Basic Tests

  • [x] Unit tests for each CRDT type
  • [x] Property-based tests for CRDT laws:
    • Commutativity: merge(a, b) == merge(b, a)
    • Associativity: merge(merge(a, b), c) == merge(a, merge(b, c))
    • Idempotence: merge(a, a) == a

Phase 2: Sets and Maps

Goal: Implement set CRDTs with proper conflict resolution.

2.1 Set CRDTs

  • [x] Grove.GSet (Grow-only Set)

    • State: MapSet.t
    • Operations: add/2
    • Merge: union
  • [x] Grove.TwoPSet (Two-Phase Set)

    • State: %{added: GSet, removed: GSet}
    • Operations: add/2, remove/2
    • Constraint: cannot re-add after remove
    • Value: added - removed
  • [x] Grove.ORSet (Observed-Remove Set / AWSet)

    • State: %{entries: %{elem => MapSet.t(tag)}}
    • Tag: {actor, counter} - unique per add operation
    • Add-wins semantics (concurrent add + remove → element present)
    • Operations: add/2, remove/2, member?/2
    • Merge: union of tags per element

2.2 Map CRDT

  • [x] Grove.ORMap (Observed-Remove Map)
    • State: %{keys: ORSet, values: %{key => CRDT}}
    • Keys managed by OR-Set semantics
    • Values can be any CRDT (nested composition)
    • Operations: put/3, delete/2, get/2, update/3
    • Merge: merge keys via OR-Set, recursively merge values via Grove.Mergeable

2.3 Composition Support

  • [x] Implement recursive merge for nested CRDTs
  • [x] Type specification for valid CRDT values in maps
  • [ ] Helper macros for defining composite CRDTs

2.4 Tests

  • [x] Property tests for set convergence
  • [x] Concurrent add/remove scenarios
  • [x] Nested CRDT merge correctness

Phase 3: Delta-State CRDTs

Goal: Implement bandwidth-efficient delta synchronization.

3.1 Delta Infrastructure

  • [x] Extend Grove.CRDT behaviour with delta callbacks:

    • delta/1 - extract accumulated delta
    • reset_delta/1 - clear delta after sync
    • merge_delta/2 - not needed; regular merge/2 works for deltas
  • [ ] Implement Grove.DeltaCRDT wrapper:

    defstruct [:full_state, :delta, :version]

3.2 Delta Implementations

3.3 Delta Serialization

  • [ ] Implement Grove.Serialization.Delta:
    • encode/2 - encode delta since version
    • decode/1 - decode delta
    • Compression for wire transmission

3.4 Tests

  • [x] Verify delta merge produces same result as full-state merge
  • [ ] Bandwidth measurement tests
  • [x] Delta accumulation across multiple operations

Phase 4: Replication Layer

Goal: Provide GenServer-based managed replicas with automatic sync.

4.1 Replication Server

  • [x] Grove.Replication.Server GenServer:
    • Manages single CRDT instance
    • Periodic sync via configurable interval
    • Broadcast delta to all peers
    • Merge incoming deltas automatically

4.2 Cluster Membership

Note: libcluster handles node-level discovery (connecting Erlang nodes via strategies like Kubernetes, DNS, multicast). Grove should use libcluster as an optional dependency for node discovery, then layer :pg on top for replica-level process group management.

  • [x] Grove.Cluster.Membership:
    • Integration with :pg (process groups) for replica tracking
    • Peer discovery (join/leave/peers)
    • Broadcast and random peer selection
    • Auto-starts in application supervision tree
  • [ ] Optional libcluster integration for node discovery
  • [ ] Node up/down handling and partition detection

4.3 Sync Protocols

  • [x] Broadcast protocol (send delta to all peers)

  • [ ] Grove.Replication.Gossip:

    • Random peer selection with fanout
    • Anti-entropy protocol
  • [ ] Grove.Replication.Sync:

    • Full state sync for new replicas
    • Version vector exchange
    • Demand-based pull sync

4.4 Tests

  • [x] Multi-process convergence tests
  • [x] Concurrent update tests
  • [ ] Multi-node tests using :peer
  • [ ] Partition and heal scenarios

Phase 5: Storage and Persistence

Goal: Provide durable storage backends.

5.1 Storage Behaviour

  • [x] Define Grove.Storage behaviour:
    • init/1, save/3, load/2, delete/2, list_groups/1, close/1
  • [x] Implement Grove.Storage.Serializer:
    • term_to_binary with compression
    • Safe decoding with :safe option

5.2 ETS Backend

  • [x] Grove.Storage.ETS:
    • In-memory storage with read/write concurrency
    • Named tables with :public access
    • Full CRUD operations
  • [ ] Delta storage with pruning (future)
  • [ ] Materialized view support for fast reads (future)

5.3 DETS Backend

  • [x] Grove.Storage.DETS:
    • Disk-based persistence
    • Auto-save at configurable intervals
    • Recovery on restart

5.4 Server Integration

  • [x] Integrate storage into Grove.Replication.Server:
    • State restoration on startup via merge
    • Periodic snapshot timer (configurable)
    • Save on terminate for graceful shutdown

5.5 Tests

  • [x] Persistence and recovery tests
  • [x] Concurrent access tests
  • [x] Integration tests with Replication.Server
  • [ ] Large dataset tests

Phase 6: Garbage Collection

Goal: Prevent unbounded growth of tombstones and metadata.

6.1 GC Infrastructure

  • [x] Grove.GC behaviour:
    • collect/2 - run GC on CRDT
    • needs_gc?/2 - check if GC is recommended
    • gc_info/1 - get GC metadata

6.2 GC Strategies

  • [x] Grove.GC.Structural:

    • Remove entries with empty tag sets (ORSet)
    • Remove orphaned values (ORMap)
    • Safe without replica coordination (based on "Causality to Stability" paper)
  • [ ] Grove.GC.Causal (future):

    • Track replica vector clocks via gossip
    • Compute causal stability (minimum observed version)
    • Only GC tombstones in stable past
  • [ ] Grove.GC.TimeBased (future):

    • Configurable retention period
    • Requires timestamps in tags

6.3 Integration

6.4 Tests

  • [x] Verify GC doesn't affect convergence
  • [x] ORSet and ORMap cleanup tests
  • [ ] Memory usage tests over time
  • [ ] Edge cases (offline replicas, clock skew)

Phase 7: Advanced Causality

Goal: Optimize causality tracking for production workloads.

7.1 Dotted Version Vectors

  • [x] Grove.DottedVersionVector:
    • Dot (single event) + context (observed events)
    • event/2, sync/2, merge/2, descends?/2, dominates?/2
    • Precise causality for OR-Set semantics

7.2 Hybrid Logical Clocks

  • [x] Grove.HybridLogicalClock:
    • Combines physical time + logical counter
    • tick/1, update/2, compare/2, merge/2
    • Stays close to wall-clock, handles drift

7.3 Optimized OR-Set

  • [x] Grove.Set.DVVSet:
    • O(replicas) metadata instead of O(operations)
    • One dot per actor per element
    • Same add-wins semantics as ORSet
  • [ ] Benchmark against basic implementation (future)
  • [ ] Memory usage comparison (future)

Phase 8: Developer Experience (P0 Features)

Goal: Essential APIs for practical usage.

8.1 Plain Data Conversion (Critical)

  • [x] Grove.from_data/2 - Convert plain data to CRDT tree

    • Accept replica_id option
    • Generate node IDs if not present
    • Initialize vector clocks
  • [x] Grove.to_data/1 - Convert CRDT tree to plain data

    • Strip all CRDT metadata (vclocks, tombstones, deltas)
    • Preserve node IDs and structure
    • Round-trip guarantee: to_data(from_data(data)) == data

8.2 Node Query API

8.3 Batch Operations

  • [x] Grove.Tree.batch/2 - Atomic multi-operation wrapper
    {tree, delta} = Grove.Tree.batch(tree, fn t ->
      t
      |> Grove.Tree.put_node(Node.new("step_2", :step))
      |> Grove.Tree.put_node(Node.new("field_1", :text))
    end)
    
    # Integrates with Session.mutate/2
    Session.mutate(pid, fn tree ->
      Grove.Tree.batch(tree, fn t ->
        t |> Grove.Tree.put_node(...) |> Grove.Tree.update_node(...)
      end)
    end)
  • [x] Single compound operation for undo (batch pushed as one entry to undo_stack)
  • [x] Single broadcast for all changes (returns {tree, {:batch, ops}} tuple)
  • [x] Grove.Tree.reset_pending_ops/1 - Clear pending ops after broadcast
  • [x] Operation tracking in put_node/2 and update_node/3

8.4 Operation Metadata

  • [x] Extend operations with meta field via optional trailing element
    Grove.Tree.put_node(tree, node, meta: %{user_id: "u1", user_name: "Alice"})
    Grove.Tree.update_node(tree, node_id, update_fn, meta: %{reason: "fix"})
    Grove.Tree.batch(tree, fn t -> ... end, meta: %{source: "import"})
  • [x] Grove.Tree.history/2 - Get operation history with metadata
    Tree.history(tree)
    Tree.history(tree, since: timestamp, node_id: "field_1", where: [user_id: "u1"])
  • [x] Grove.Tree.operation_meta/1 - Extract metadata from operations
  • [x] Grove.Session.history/2 - Get history via Session API
  • [x] Grove.Tree.HistoryEntry struct for history entries
  • [x] Metadata preserved but doesn't affect merge semantics

8.5 Move Operations

  • [x] Grove.Tree.move_node/4 - Move node to new parent with LWW semantics
    Tree.move_node(tree, "field_1", "step_2")
    Tree.move_node(tree, "field_1", "step_2", meta: %{user_id: "u1"})
    Tree.move_node(tree, "field_1", nil)  # Move to root
  • [x] Cycle prevention - Prevent moving a node into its own descendants
  • [x] LWW conflict resolution using HLC timestamps
  • [x] Grove.Tree.apply_remote_move/2 - Apply remote move with conflict resolution
  • [x] parent_timestamp field on Node for tracking LWW state
  • [x] Silent skip of cycle-creating moves (Kleppmann's approach for convergence)
  • [x] Full Kleppmann undo-do-redo algorithm for guaranteed convergence:
    • Operation log (operation_log) stores all moves in descending timestamp order
    • Grove.Tree.LogMove module for internal operation tracking with old_parent
    • Grove.Tree.Move module for move operation representation
    • Undo-do-redo: when new operation arrives, undo later ops, apply new, redo
    • Validity recalculation during redo (handles cycles, deleted nodes, LWW)
    • Property-based tests for convergence, acyclicity, unique parents
    • Concurrent scenario tests (parent-child swap, three-way cycles)

8.6 Undo/Redo Operations

  • [x] Grove.Tree.undo/1 - Undo last local operation
    {:ok, tree} = Tree.undo(tree)
    {:error, :empty_stack} = Tree.undo(empty_tree)
  • [x] Grove.Tree.redo/1 - Redo last undone operation
  • [x] Grove.Tree.can_undo?/1 - Check if undo is available
  • [x] Grove.Tree.can_redo?/1 - Check if redo is available
  • [x] Local undo semantics (each replica has its own undo stack)
  • [x] Inverse operation generation for all operation types
  • [x] New operations clear redo stack
  • [x] Batch operations undo as a single unit
  • [x] Move operation format extended with old_parent_id for undo support

8.7 Testing Utilities


Phase 9: Phoenix Integration

Goal: First-class Phoenix and LiveView integration.

9.1 LiveView Integration

  • [x] Grove.LiveView module:
    • mount_tree/3 - Mount tree into socket, subscribe to PubSub
    • apply_and_broadcast/2 - Apply local op and broadcast delta
    • apply_remote/3 - Apply remote delta from PubSub
    • get_tree/1, get_replica_id/1, get_doc_id/1 - Accessors
    • update_tree/2 - Local-only updates without broadcast
defmodule MyAppWeb.EditorLive do
  use Phoenix.LiveView

  def mount(%{"id" => id}, _session, socket) do
    socket = Grove.LiveView.mount_tree(socket, id,
      replica_id: socket.id,
      pubsub: MyApp.PubSub
    )
    {:ok, socket}
  end

  def handle_info({:grove_delta, delta, from_pid}, socket) when from_pid != self() do
    socket = Grove.LiveView.apply_remote(socket, delta, &apply_delta/2)
    {:noreply, socket}
  end
end

9.2 Session/Document Lifecycle

  • [x] Grove.Session GenServer:
    • get_or_start/2 - Start or get existing session for document
    • join/2 - Join session, returns current tree state
    • leave/2 - Leave session
    • mutate/2 - Apply tree mutation and broadcast delta
    • get_state/1 - Get current tree state
    • info/1 - Get session info (subscriber count, tree size)
  • [x] Registry + DynamicSupervisor for session management
  • [x] Grace period before termination (configurable)
  • [x] Periodic persistence (configurable interval)
  • [x] Subscriber crash detection via Process.link

9.3 Presence Integration

9.4 Phoenix.PubSub Integration

  • [x] Grove.PubSub module with topic-based namespacing
  • [x] Delta broadcast via PubSub (alternative to :pg)
  • [x] Configurable PubSub server in Replication.Server

Phase 10: Performance Features (P2)

Goal: Optimizations for production workloads.

10.1 Debounced Operations ✅

  • [x] Grove.buffer_update/3 - Buffer updates locally with immediate tree application
  • [x] Grove.flush_buffer/1 - Flush buffered deltas as {:batch, [deltas]}
  • [x] Per-socket buffer via :grove_update_buffer assign
  • [x] Configurable debounce via debounce: ms option (default 100ms)
  • [x] Auto-flush on apply_and_broadcast/2 to maintain operation order
  • [x] Receiver-side batch handling in apply_remote/2

10.2 Schema DSL ✅

  • [x] Schema DSL for per-field CRDT types via compile-time macros:
    defmodule MyApp.FormSchema do
      use Grove.Schema
    
      version 1
    
      node :dropdown do
        field :label, :lww_register, default: "Untitled"
        field :options, :rga_list, default: []
        field :tags, :or_set, default: MapSet.new()
      end
    end
  • [x] Grove.Schema.Types - Type registry for CRDT dispatch
    • Supported types: :lww_register, :mv_register, :or_set, :rga_list, :counter
    • init/3, extract/1, update/4 for type-specific operations
  • [x] Grove.Schema - Compile-time macro DSL
    • node/2 macro for defining node types with fields
    • field/2,3 macro for field definitions with CRDT types
    • Generated functions: new_node/3, read_node/1, update_field/4
  • [x] Schema versioning with migration DSL
    • version/1, migrate/2, add_field/4, remove_field/2,3, transform_field/3
  • [x] Grove.Node.attrs accepts Grove.Map.ORMap for schema-based nodes
  • [x] Schema-agnostic merge (preserves CRDT convergence properties)

Phase 11: Sequences and Trees (Future)

Goal: Support ordered collections and hierarchical data.

11.1 Research

  • [x] Evaluate RGA vs Logoot vs LSEQ (chose RGA for fixed-size IDs and simplicity)
  • [x] Study tree move semantics (implemented in Phase 8.5)
  • [x] Review existing implementations (Yjs, Automerge) - informed RGA design

11.2 Sequence CRDT

  • [x] Grove.Sequence.RGA (Replicated Growable Array):
    • Ordered list with insert/delete
    • Unique position identifiers {replica_id, counter}
    • Tombstone-based deletion
    • Delta-state support
    • HLC timestamps for concurrent insert ordering
    • Full property-based testing (commutativity, associativity, idempotence)

11.3 Tree CRDT

  • [x] Grove.Tree - Hierarchical structure (implemented in Phase 8)
  • [x] Move operation support with LWW conflict resolution (Phase 8.5)
  • [x] Cycle detection and prevention (Phase 8.5)
  • [x] Position-based sibling ordering (Fugue-inspired) for concurrent moves (Phase 11.3)
  • [x] Undo/redo support for move operations (Phase 8.6)

Phase 12: Eg-walker Integration (Future)

Goal: Optimize for sequential editing using pure operation-based CRDT approach.

Design Document: guides/future/eg-walker-integration.md Paper: Eg-walker: Better, Faster, Smaller (EuroSys 2025 Best Paper)

12.0 Prerequisites (Foundation)

  • [x] Grove.Tree.Event - Event structure with parent references (DAG edges)
  • [x] Grove.Tree.Version - Frontier-based version tracking
  • [x] Operation deduplication via event IDs
  • [x] Migrate HistoryEntryEvent with parent refs

12.1 Critical Version Detection

  • [x] Implement Grove.Tree.Version.critical?/1 (frontier.size == 1)
  • [x] Fast-path for sequential operations (skip transient merge state rebuild)
  • [x] Auto-prune seen_events at critical versions (with threshold)
  • [x] Track parent_was_critical in Events for future optimization

12.2 Event Graph

  • [x] Grove.Tree.EventGraph module
    • events_since/2, find_common_ancestor/3
    • topological_sort/2, concurrent?/3
    • ancestors/2, descendants/2
  • [x] DAG traversal and querying (BFS-based)
  • [x] Critical version fast paths in graph operations

12.3 Partial Replay

  • [x] diff/3 - Find events to retreat/advance between versions
  • [x] find_last_critical/1 - Find last critical event for replay optimization
  • [x] events_between/3 - Linear forward movement helper

12.4 Retreat/Advance for Trees

State rebuilding approach (recommended by CRDT expert over inverse operations):

  • [x] navigate_to_version/2 - Navigate to specific version via state rebuilding
  • [x] state_at_event/2 - Navigate to state at specific event
  • [x] rebuild_tree_state/2 - Core algorithm replaying events in topological order
  • [x] Transient views (history immutable, returns computed state)
  • [x] Reuse do_apply_remote/2 for operation application
  • [x] Handles all operation types (put_node, update_node, move_node, batch)

12.5 Storage & Persistence

  • [x] Event graph serialization (columnar format) - Grove.Tree.EventSerializer
  • [x] Snapshot + event log storage strategy - Grove.Tree.DocumentStorage
  • [x] Event compaction at critical versions - DocumentStorage.compact/2
  • [x] Fast document loading (snapshot + event replay)

12.6 Optimization

  • [x] Columnar serialization with RLE - EventSerializer with RLE for event IDs
  • [x] Benchmarking with Benchee - benchmarks/eg_walker_bench.exs
  • [x] Persistent event_index for O(1) lookups (+15% collaborative throughput)
  • [ ] B-tree index for O(log n) history insertion (future)
  • [ ] Memory profiling (steady state vs peak)

12.7 Tests

  • [x] Critical version detection tests
  • [x] Event graph DAG operations
  • [x] Partial replay correctness
  • [x] Retreat/advance for all operation types
  • [x] Convergence with Eg-walker approach (property-based tests added)
  • [ ] Performance benchmarks vs traditional CRDT

Testing Strategy (Ongoing)

Unit Tests

  • Each CRDT type
  • Each protocol implementation
  • Edge cases and error handling

Property-Based Tests

  • CRDT algebraic laws (commutative, associative, idempotent)
  • Convergence under arbitrary operation sequences
  • Delta equivalence to full-state merge

Integration Tests

  • Multi-process replication
  • Storage persistence and recovery
  • GC correctness

Distributed Tests

  • Multi-node using :peer module
  • Network partition scenarios
  • Clock skew simulation

Dependencies

# mix.exs
defp deps do
  [
    # Testing
    {:stream_data, "~> 0.6", only: [:test, :dev]},

    # Optional: Phoenix integration
    {:phoenix_pubsub, "~> 2.1", optional: true},
    {:phoenix_live_view, "~> 0.20", optional: true}
  ]
end

Milestones

MilestonePhasesDescription
v0.1.01-2Core CRDTs (counters, registers, sets, maps)
v0.2.03Delta-state support
v0.3.04-5Replication and storage
v0.4.06-7GC and advanced causality
v0.5.08Developer experience (data conversion, query API, batch ops)
v1.0.09Phoenix/LiveView integration
v1.1.010Performance features (debouncing, schema types)
v2.0.011Sequences and trees
v3.0.012Eg-walker integration (pure operation-based CRDT)