Core Concepts

View Source

This guide explains the ideas behind Grove. You don't need to understand all of this to use Grove, but it helps when things get complex.

The Problem: Sync is Hard

When multiple users (or devices) edit the same data, conflicts happen:

sequenceDiagram
    participant A as Alice
    participant S as Server
    participant B as Bob

    A->>S: title = "Meeting Notes"
    B->>S: title = "Project Plan"
    Note over S: Which one wins?

Traditional solutions all have drawbacks:

ApproachProblem
Last-write-winsAlice's change is silently lost
Lock the documentBob has to wait, can't work offline
Manual conflict resolutionUser sees scary merge dialogs
Operational Transform (Google Docs)Complex, needs central server

CRDTs: A Better Way

CRDT stands for "Conflict-free Replicated Data Type."

The key insight: Design your data structure so that any order of operations produces the same result.

flowchart TB
    subgraph "Same Result, Any Order"
        A1[Operation A] --> R1[Result: XYZ]
        B1[Operation B] --> R1

        A2[Operation B] --> R2[Result: XYZ]
        B2[Operation A] --> R2
    end

This is called commutativity — A + B = B + A.

How Grove Achieves This

  1. Unique IDs everywhere — Every node, every operation has a globally unique ID
  2. Timestamps for ordering — When two operations conflict, timestamps decide
  3. Merge by design — Data structures are designed to merge, not conflict

Grove's Data Model

Trees and Nodes

Grove stores data as a tree of nodes:

graph TD
    R[Root] --> A[Form]
    A --> B[Step 1]
    A --> C[Step 2]
    B --> D[Field: Name]
    B --> E[Field: Email]
    C --> F[Field: Address]

Each node has:

FieldDescription
idUnique identifier
typeWhat kind of node (atom)
attrsNode data (map)
parent_idWho is my parent?
childrenOrdered list of child IDs

Operations

Every change creates an operation:

# These create operations internally
Grove.Tree.put_node(tree, node)     # => {:put_node, id, node}
Grove.Tree.update_node(tree, id, f) # => {:update_node, id, changes}
Grove.Tree.delete_node(tree, id)    # => {:delete_node, id}
Grove.Tree.move_node(tree, id, p)   # => {:move_node, id, new_parent}

Operations are wrapped in events with metadata:

%Grove.Tree.Event{
  id: {"laptop", 42},           # {replica_id, sequence}
  operation: {:put_node, ...},  # The actual change
  parents: MapSet.new([...]),   # Causal dependencies
  timestamp: 1704067200000      # When it happened
}

Replica IDs

Every device/user needs a unique replica ID:

flowchart LR
    subgraph "Replica IDs"
        L[Laptop: abc123]
        P[Phone: def456]
        T[Tablet: ghi789]
    end

    L --> S[Same Document]
    P --> S
    T --> S

Replica IDs ensure:

  • Node IDs are globally unique ({replica_id}_{sequence})
  • Operations can be attributed to their source
  • Conflicts can be resolved deterministically

Eg-walker: Why Grove is Fast

Grove implements Eg-walker, an optimization from recent research (EuroSys 2025).

The Key Insight

Most collaborative editing is sequential — one person typing, then another. True concurrency (two people typing at the exact same moment) is rare.

pie title Typical Editing Patterns
    "Sequential (fast-path)" : 85
    "Concurrent (slow-path)" : 15

Fast-Path vs Slow-Path

flowchart TD
    E[Incoming Event] --> C{Is history linear?}
    C -->|Yes| F[Fast-path: Just apply it]
    C -->|No| S[Slow-path: Merge with CRDT rules]

    F --> R[Result]
    S --> R

Fast-path (sequential operations):

  • Direct application
  • No merge overhead
  • O(1) complexity

Slow-path (concurrent operations):

  • Build merge state
  • Apply CRDT rules
  • Still automatic, just slower

Performance Impact

ScenarioThroughputMemory
Sequential (fast-path)617 ops/sec1.4 MB
Concurrent (slow-path)63 ops/sec21.8 MB

The fast-path is 10x faster and uses 15x less memory.

Key Guarantees

Grove provides these guarantees:

1. Eventual Consistency

All replicas that have seen the same operations will have the same state.

flowchart LR
    subgraph "Eventually..."
        A[Replica A] -.->|sync| B[Replica B]
        B -.->|sync| C[Replica C]
        C -.->|sync| A
    end

    A --> S[Same State]
    B --> S
    C --> S

2. No Conflicts

Operations always merge. There's no "conflict" state that requires user intervention.

3. Causality Preserved

If operation B happened after seeing operation A, then B will always be applied after A.

4. Intent Preserved

Operations do what they intended:

  • Adds stay added (unless explicitly deleted)
  • Deletes stay deleted
  • Moves go to intended parent (unless it would create a cycle)

Move Operations

Moving nodes in a distributed tree is the hardest problem. Grove implements Kleppmann's algorithm:

flowchart TD
    subgraph "Concurrent Moves"
        A1[Alice: Move X under Y]
        B1[Bob: Move Y under X]
    end

    A1 --> M{Would create cycle?}
    B1 --> M
    M -->|Yes| R[One move wins by timestamp]
    M -->|No| OK[Both moves apply]

Guarantees:

  • No cycles ever created
  • Deterministic winner (by Lamport timestamp)
  • Both replicas converge to same tree

Summary

ConceptWhat It Means
CRDTData structure designed to merge automatically
Replica IDUnique identifier for each device/user
OperationA single change (add, update, delete, move)
EventOperation + metadata (ID, timestamp, parents)
Fast-pathOptimized path for sequential operations
Slow-pathFull merge for concurrent operations
ConvergenceAll replicas end up with same state

Next Steps