Macula Roadmap

View Source

Last Updated: 2025-12-25 Current Version: v0.16.0 Status: Registry System complete with Ed25519 signing, security scanning, and Cluster Controller. 60 tests passing.


Executive Summary

This roadmap reflects a significant architectural refinement (December 2025) that introduces:

  1. Macula Cluster - Deployment-agnostic logical grouping (replaces "MuC")
  2. CRDTs + Gossip - Replaces Raft consensus for coordination
  3. Bridge Nodes - Cross-Cluster federation for SuperMesh
  4. Federated Registry - Secure application distribution
  5. Protocol Gateway - HTTP/3 API for non-BEAM clients

Key Change: The reckon_db dependency is removed. CRDTs provide distributed state without external event store.


Architecture Overview

Fractal Mesh Hierarchy

SuperMesh is fractal - it nests at any geographic scale:

Cluster (Home)            Smallest unit (1-10 nodes)
     Street Mesh           Neighbors
             Neighborhood Mesh
                     City Mesh
                             Province Mesh
                                     Country Mesh
                                             Region Mesh (EU, NA, APAC)
                                                     Global Mesh

Example: City-Level View


                      CITY MESH: Amsterdam                       
          
    NEIGHBORHOOD: Centrum       NEIGHBORHOOD: Zuid            
                        
    Street   Street        Street   Street          
     Mesh     Mesh          Mesh     Mesh           
                        
                                    
          
                                   
                       Bridge Nodes                              

Single Cluster (Smallest Unit)

Nodes within a Cluster form their own intra-cluster mesh (Erlang distribution, or QUIC distribution in future).


              CLUSTER (e.g., Home Server)                        
                                                                  
               
   Node               Node               Node             
     1    2    3              
                                           
          Intra-cluster: Erlang distributed mesh                 
              Local DHT (Kademlia) + CRDT State                  
                                                                
                    Bridge Node  Inter-cluster: QUIC/HTTP3    

Two transport layers:

  • Intra-cluster: Erlang distribution (or QUIC distribution when ready)
  • Inter-cluster: Macula QUIC/HTTP3 via Bridge Nodes

Terminology

TermDefinition
Macula ClusterSmallest unit: local deployment (home, office, edge). 1-10 nodes.
Seed NodeDHT entry point for new peers. No special software - just well-known address.
Bridge NodeConnects to next mesh level. Bridges form their own mesh + DHT at each level.
SuperMeshAny federation level above Cluster (street, city, country, global). Fractal.
RealmVirtual namespace spanning the entire hierarchy (like DNS domain).

Hierarchical DHT

Each level of the mesh has its own DHT. Bridge Nodes form meshes at each level:


                    CITY MESH (Bridge Layer)                     
   BridgeBridgeBridge     City-level DHT               

                          
  
NEIGHBORHOOD NEIGHB. NEIGHB.   Neighborhood-level DHTs
Bridge mesh   mesh    mesh  
  
                           
  
STREET mesh STREET  STREET    Street-level DHTs
(Bridges)    mesh    mesh  
  
                          
     Clusters  Clusters  Clusters    Cluster-level DHTs

DHT query escalation (locality-first):

  1. Query local Cluster DHT
  2. If miss → escalate to Street Mesh DHT
  3. If miss → escalate to Neighborhood Mesh DHT
  4. Continue until found or top level reached
  5. Cache results at lower levels

Current State (v0.12.6)

Completed Features

ComponentStatusNotes
QUIC TransportCompleteFull gen_server, quicer integration
PubSub (local)CompleteTopic-based, wildcard support
RPC (local)CompleteNATS-style async, direct P2P
DHT KademliaCompletek-bucket routing, service discovery
Gateway SystemCompleteMessage routing, client management
Bootstrap SystemCompleteDHT bootstrap, service registry
TLS SecurityCompleteTwo-mode (production/development)
Hybrid TrustCompleteRealm auth + TOFU + rate limiting
NAT TraversalCompleteHole punching, connection pooling, relay
Memory ManagementCompleteBounded pools, TTL cleanup

What's Incomplete (Being Replaced)

ComponentOld PlanNew Approach
Platform Layer (Raft)reckon_db integrationCRDTs + Gossip (no Raft)
Distributed CRDTsLocal ETS onlyGossip-replicated CRDTs
Cross-realmNot plannedBridge Nodes + Federation
App distributionNot plannedFederated Registry
Non-BEAM clientsNot plannedProtocol Gateway (HTTP/3 API)

Revised Version Plan

VersionFocusStatus
v0.13.0Bridge System✅ COMPLETED - Hierarchical DHT, Bridge Nodes, Cache
v0.14.0CRDT Foundation✅ COMPLETED - Ra/Raft removed, OR-Set, G-Counter, PN-Counter
v0.14.1Pub/Sub Fixes✅ COMPLETED - Remove message amplification, DHT routing fixes
v0.15.0Gossip Protocol✅ COMPLETED - CRDT state replication, push-pull-push anti-entropy
v0.15.1Cross-Gateway Pub/Sub✅ COMPLETED - Physical node validation, race condition fixes
v0.16.0Registry System✅ COMPLETED - Ed25519 signing, Cluster Controller, security scanning (60 tests)
v0.17.0Protocol GatewayHTTP/3 API, WebTransport, OpenAPI spec
v1.0.0Production ReadyFull Cluster + Bridge + Registry
v1.1.0+EcosystemQUIC Distribution, macula_crdt hex package

v0.13.0 - Bridge System (COMPLETED)

Implemented hierarchical DHT with Bridge mesh support:

  • macula_bridge_system.erl - Supervisor for bridge subsystem
  • macula_bridge_node.erl - Parent mesh connection and query escalation
  • macula_bridge_mesh.erl - Peer-to-peer mesh between bridges
  • macula_bridge_cache.erl - TTL-based caching with LRU eviction
  • ✅ Routing integration with find_value_with_escalation/5
  • ✅ 40 tests for bridge system

v0.14.0 - CRDT Foundation (COMPLETED - December 2025)

Goal: Replace ETS-based registries with CRDT-backed versions using gossip replication.

Status: ✅ Ra/Raft REMOVED, CRDT Foundation IMPLEMENTED (48 tests)

What was delivered:

  • macula_crdt.erl - Core types and operations
  • macula_crdt_orset.erl - OR-Set implementation (17 tests)
  • macula_crdt_lww.erl - LWW-Register implementation (14 tests)
  • macula_crdt_gcounter.erl - G-Counter implementation (9 tests)
  • macula_crdt_pncounter.erl - PN-Counter implementation (8 tests)
  • ✅ Removed macula_leader_election.erl (deleted)
  • ✅ Removed macula_leader_machine.erl (deleted)
  • ✅ Removed ra dependency from rebar.config
  • ✅ Updated macula_platform_system.erl (now masterless)

Deliverables

1. Core CRDT Module

%% apps/macula_crdt/src/macula_crdt.erl
-type or_set() :: #{
    adds => #{element() => vector_clock()},
    removes => #{element() => vector_clock()}
}.

-type lww_register() :: #{
    value => term(),
    timestamp => integer(),
    node_id => node_id()
}.

Operations:

  • or_set_add/2, or_set_remove/2, or_set_merge/2, or_set_value/1
  • lww_register_set/2, lww_register_merge/2, lww_register_value/1

2. Gossip Protocol

%% apps/macula_crdt/src/macula_gossip.erl
-spec start_gossip(cell_id()) -> ok.
-spec push_state(node_id()) -> ok.   %% Push local state to peer
-spec pull_state(node_id()) -> ok.   %% Request state from peer
-spec anti_entropy() -> ok.          %% Periodic full state sync

Parameters:

  • Push interval: 1 second
  • Anti-entropy: 30 seconds
  • Fanout: 3 peers per round

3. CRDT-backed Registry

Wrap existing macula_bootstrap_registry with CRDT frontend:

  • Service registrations use OR-Set
  • Node metadata uses LWW-Register
  • Gossip syncs state across Cluster

Files to Create

apps/macula_crdt/
 src/
    macula_crdt.erl           # Core types and operations
    macula_crdt_orset.erl     # OR-Set implementation
    macula_crdt_lww.erl       # LWW-Register implementation
    macula_gossip.erl         # Gossip protocol
 test/
    macula_crdt_tests.erl
    macula_gossip_tests.erl
 rebar.config

Acceptance Criteria

  • [x] OR-Set correctly handles concurrent add/remove (17 tests)
  • [x] LWW-Register resolves conflicts by timestamp (14 tests)
  • [x] Gossip achieves convergence (implemented in v0.15.0)
  • [x] Existing tests pass with CRDT backend
  • [x] 48 new tests for CRDT operations (originally targeted 50+)

v0.15.0 - Gossip Protocol (COMPLETED - December 2025)

Goal: Implement gossip-based state synchronization for CRDT replication across nodes.

Status: ✅ COMPLETE (29 tests)

What was delivered:

  • macula_gossip.erl - Complete gossip gen_server with push-pull-push anti-entropy
  • ✅ Protocol message types (0x70-0x7F) for gossip messages
  • ✅ Vector clock implementation for causal ordering
  • ✅ CRDT-aware merging for all types (LWW, OR-Set, G-Counter, PN-Counter)
  • macula_platform_system updated to start gossip as supervised child
  • ✅ Configurable intervals and fanout parameters

Key Features

Gossip API:

%% Store/retrieve CRDT state
macula_gossip:put(Pid, Key, Type, Value).
macula_gossip:get(Pid, Key).
macula_gossip:delete(Pid, Key).

%% Explicit gossip operations
macula_gossip:push_state(Pid, PeerNodeId).
macula_gossip:pull_state(Pid, PeerNodeId).
macula_gossip:anti_entropy(Pid).

%% Peer management
macula_gossip:add_peer(Pid, PeerNodeId).
macula_gossip:remove_peer(Pid, PeerNodeId).

Protocol Messages:

  • gossip_push (0x70) - Push local CRDT state to peer
  • gossip_pull (0x71) - Request CRDT state from peer
  • gossip_pull_reply (0x72) - Reply with CRDT state
  • gossip_sync (0x73) - Full anti-entropy sync request
  • gossip_sync_reply (0x74) - Full anti-entropy sync response

Configuration:

  • gossip_enabled: Enable/disable (default: true)
  • gossip_push_interval: Push interval in ms (default: 1000)
  • gossip_anti_entropy_interval: Anti-entropy interval in ms (default: 30000)
  • gossip_fanout: Peers per round (default: 3)

Acceptance Criteria

  • [x] Gossip server starts as supervised child of platform system
  • [x] Push/pull/anti-entropy operations work correctly
  • [x] Vector clocks track causal ordering
  • [x] CRDT merging handles concurrent updates
  • [x] 29 new tests for gossip operations

v0.15.1 - Cross-Gateway Pub/Sub (COMPLETED - December 2025)

Goal: Validate and fix cross-gateway message routing for production use.

Status: ✅ COMPLETE - Validated on physical beam cluster (beam00-03)

What was delivered:

  • ✅ Fixed PUBLISH message validation (missing qos, retain, message_id fields)
  • ✅ Fixed race condition in macula_peer where subscribe called before QUIC connected
  • ✅ Fixed duplicate message delivery by removing HYBRID routing mode
  • ✅ Fixed realm mismatch in gateway_subscriber/gateway_publisher node types
  • ✅ Added wait_for_connection/2 to ensure QUIC ready before returning from connect
  • ✅ Fixed edoc XML parse error in macula_rpc_handler

Physical Validation

Tested on beam cluster (4 Intel Celeron J4105 nodes):

  • beam00 (192.168.1.10): Registry/Seed node
  • beam01 (192.168.1.11): Subscriber gateway
  • beam02 (192.168.1.12): Publisher gateway
  • beam03 (192.168.1.13): Observer gateway (additional subscriber)

Results:

  • 800+ messages delivered successfully over 1+ hour
  • No message loss or duplication
  • Multicast to multiple subscribers working

New Files

FilePurpose
docker/docker-compose.cross-gateway-pubsub.ymlIntegration test topology
scripts/deploy-beam-cluster.shBeam cluster deployment script

Files Modified

FileChange
src/macula_peer.erlAdded wait_for_connection/2
src/macula_pubsub_system/macula_pubsub_handler.erlRemoved HYBRID routing mode
src/macula_gateway_system/macula_gateway_pubsub.erlFixed PUBLISH message fields
docker/entrypoint.shAdded gateway_subscriber/gateway_publisher types, fixed realm

Acceptance Criteria

  • [x] Cross-gateway pub/sub works in Docker (3-node test)
  • [x] Cross-gateway pub/sub works on physical hardware (4-node beam cluster)
  • [x] No duplicate message delivery
  • [x] Multiple subscribers receive same messages
  • [x] Stable operation for 1+ hour
  • [x] Published to hex.pm

v0.14.1 - Pub/Sub Routing Fixes (COMPLETED)

Goal: Fix message amplification issues in DHT-routed pub/sub and improve routing reliability.

Status: ✅ COMPLETED (December 2025)

Changes

Bug Fixes:

  • ✅ Removed relay_to_mesh_peers/4 from macula_gateway.erl - caused exponential message amplification
  • ✅ Added build_gateway_endpoint/1 for proper PONG response endpoint construction
  • ✅ Fixed macula_protocol_types_tests.erl - updated for new pubsub_route (0x13) message type

DHT Routing Improvements:

  • ✅ Enhanced DHT routing in macula_pubsub_dht.erl
  • ✅ Improved topic subscription handling

Files Modified

FileChange
src/macula_gateway_system/macula_gateway.erlRemoved relay_to_mesh_peers, added build_gateway_endpoint
src/macula_pubsub_system/macula_pubsub_dht.erlDHT routing enhancements
test/macula_protocol_types_tests.erlFixed unassigned ID tests (0x13→0x14, 0x24→0x26)

Acceptance Criteria

  • [x] No message amplification in pub/sub routing
  • [x] Protocol type tests pass (0x13 is pubsub_route, not unassigned)
  • [x] Version bumped to v0.14.1
  • [x] Documentation updated (CHANGELOG.md, CLAUDE.md, ROADMAP.md)
  • [ ] All unit tests pass (20 infrastructure-related failures remain - require QUIC/TLS services)
  • [ ] Published to hex.pm (requires manual rebar3 hex publish)

Bridge Nodes - COMPLETED in v0.13.0

Note: The Bridge Node functionality was implemented in v0.13.0 (December 2025). See "v0.13.0 - Bridge System (COMPLETED)" section above.

What was delivered:

  • macula_bridge_system.erl - Supervisor
  • macula_bridge_node.erl - Parent mesh connection
  • macula_bridge_mesh.erl - Peer-to-peer mesh
  • macula_bridge_cache.erl - TTL-based caching
  • ✅ DHT query escalation via find_value_with_escalation/5
  • ✅ 40 tests

Future enhancements (v0.15.0+):

  • [ ] Federation policy enforcement
  • [ ] DNS SRV bridge discovery
  • [ ] boot.macula.io directory integration

v0.16.0 - Registry System (COMPLETED - December 2025)

Goal: Secure application distribution with federated registries.

Status: ✅ COMPLETE (60 tests)

What was delivered:

  • macula_registry_system.erl - Supervisor for registry subsystem
  • macula_registry_server.erl - Publish/fetch API with DHT integration
  • macula_registry_store.erl - ETS + disk storage with TTL cleanup
  • macula_registry_verify.erl - Ed25519 digital signatures
  • macula_registry_manifest.erl - SemVer manifest parsing/validation
  • macula_security_scanner.erl - Static analysis for dangerous BIFs
  • macula_app_monitor.erl - Runtime defense (memory, queue, crash monitoring)
  • macula_cluster_controller.erl - Deploy/upgrade/stop app lifecycle
  • ✅ Protocol messages 0x80-0x89 for registry operations
  • ✅ Integrated into macula_root.erl supervision tree

Module Structure

src/macula_registry_system/
 macula_registry_system.erl       # Supervisor (one_for_one)
 macula_registry_server.erl       # Package API (gen_server)
 macula_registry_store.erl        # Local storage (gen_server)
 macula_registry_verify.erl       # Ed25519 signatures (stateless)
 macula_registry_manifest.erl     # Manifest parsing (stateless)
 macula_cluster_controller.erl    # App lifecycle (gen_server)
 macula_security_scanner.erl      # Static analysis (stateless)
 macula_app_monitor.erl           # Runtime defense (gen_server)

Protocol Messages (0x80-0x89)

TypeIDPurpose
registry_publish0x80Publish package
registry_publish_ack0x81Publish confirmation
registry_fetch0x82Fetch package
registry_fetch_reply0x83Package data
registry_query0x84Query metadata
registry_query_reply0x85Metadata response
registry_verify0x86Verify signature
registry_verify_reply0x87Verification result
registry_sync0x88Sync index
registry_sync_reply0x89Index response

Key Features

Ed25519 Signing:

{PubKey, PrivKey} = macula_registry_verify:generate_keypair().
{ok, Signature} = macula_registry_verify:sign_package(ManifestBin, BeamArchive, PrivKey).
ok = macula_registry_verify:verify_package(ManifestBin, BeamArchive, Signature, PubKey).

Static Analysis:

  • Detects dangerous BIFs: os:cmd, erlang:open_port, erlang:load_nif, file:delete, etc.
  • Audits NIF usage
  • Flags undeclared capabilities
  • Calculates security score (0-100)

Runtime Defense:

  • Memory limit enforcement
  • Message queue monitoring
  • Crash rate detection with sliding window
  • Automatic throttle → kill → quarantine escalation

Cluster Controller:

  • Deploy/upgrade/stop/remove applications
  • Auto-update policy per app (always, major, minor, never)
  • Signature verification before deploy
  • Supervisor monitoring with automatic status updates

Test Coverage (60 tests)

ModuleTestsDescription
macula_registry_verify_tests10Ed25519 keypair, signing, verification
macula_registry_manifest_tests8Validation, SemVer comparison, serialization
macula_registry_store_tests8Storage, retrieval, version management
macula_security_scanner_tests8Dangerous BIF detection, score calculation
macula_app_monitor_tests6Start/stop, limits, quarantine
macula_cluster_controller_tests10Deploy, upgrade, policies
macula_registry_system_tests6Supervisor, child processes
macula_protocol_types_tests4Message IDs 0x80-0x89

Acceptance Criteria

  • [x] Package signing with Ed25519
  • [x] Package verification before deployment
  • [x] Cluster Controller deploys from registry
  • [x] Static analysis detects dangerous BIFs
  • [x] Runtime monitor enforces limits
  • [x] 60 tests passing (target was 50+)
  • [x] Integrated into supervision tree

v0.17.0 - Protocol Gateway

Goal: HTTP/3 API for non-BEAM clients.

Deliverables

1. HTTP/3 Endpoints

MethodPathPurpose
POST/v1/sessionEstablish session
POST/v1/rpc/{realm}/{procedure}Call RPC
POST/v1/publish/{realm}/{topic}Publish event
GET/v1/subscribe/{realm}/{topic}Subscribe (SSE)
GET/v1/discover/{realm}/{pattern}Find services

2. Message Encoding

  • Primary: MessagePack
  • Fallback: JSON
  • Content negotiation via Accept header

3. WebTransport Support

  • Browser-to-mesh direct communication
  • Bidirectional streams
  • WebSocket fallback for older browsers

4. Protocol Documentation

  • docs/PROTOCOL.md - Wire protocol spec
  • docs/AUTH.md - Authentication flows
  • docs/ERRORS.md - Error taxonomy
  • docs/openapi.yaml - OpenAPI spec

Files to Create

apps/macula_protocol/
 src/
    macula_protocol_http3.erl    # HTTP/3 handler
    macula_protocol_codec.erl    # MessagePack/JSON
    macula_protocol_auth.erl     # Session management
 test/

Acceptance Criteria

  • [ ] curl can call RPC endpoints
  • [ ] SSE subscription works
  • [ ] MessagePack encoding/decoding
  • [ ] OpenAPI spec validates
  • [ ] 30+ tests for protocol

v1.0.0 - Production Ready

Goal: Complete Cluster + Bridge + Registry for production use.

Checklist

  • [ ] All v0.13-v0.16 items complete
  • [ ] E2E test suite with multi-Cluster scenarios
  • [ ] Production deployment guide
  • [ ] Monitoring and alerting documentation
  • [ ] Performance benchmarks (target: 10k msg/sec/Cluster)
  • [ ] Security audit of registry and federation

Documentation

  • docs/ARCHITECTURE.md - Updated conceptual model
  • docs/GLOSSARY.md - Cluster, Seed, Bridge, SuperMesh
  • docs/FEDERATION.md - How to federate Clusters
  • docs/REGISTRY.md - Registry operation
  • docs/DEPLOYMENT.md - Production guide

v1.1.0+ - Ecosystem Contributions

QUIC Distribution

  • Integrate with Erlang distribution
  • libcluster strategy
  • Test with Horde, Mnesia, :pg
  • Publish as separate hex package

macula_crdt Hex Package

  • Extract CRDT module as standalone library
  • Publish to hex.pm
  • Community contribution

Design Decisions

1. No Raft Consensus

Rationale: Raft adds operational complexity for consistency guarantees Macula doesn't need.

Impact:

  • No quorum management
  • No leader election
  • Nodes operate during partitions (AP in CAP)
  • State converges eventually

2. CRDT Selection

State TypeCRDTRationale
Service RegistryOR-SetConcurrent add/remove
Peer MembershipOR-SetPeers join/leave
Node MetadataLWW-RegisterLast update wins
SubscriptionsOR-Set per topicTopic subscribers

3. Hierarchical DHT

  • Local DHT per Cluster
  • Bridge Nodes forward cross-Cluster queries
  • No global DHT state
  • Locality-first (most queries resolve locally)

4. Federated Registries

  • Organizations control their own registry
  • boot.macula.io is one option among many
  • Trust is configurable per Cluster
  • Capability-based security model

5. Protocol-First Multi-Language

  • Define HTTP/3 protocol thoroughly
  • Let community build SDKs
  • BEAM-native remains first-class
  • Non-BEAM via Protocol Gateway

Gap Analysis

Resolved

GapResolution
Cluster membershipConfig overrides discovery
Bridge discoveryLayered (env → DNS → directory)
Capability enforcementLayered defense model

Deferred to v1.1+

GapNotes
CRDT garbage collectionMerkle tree + compaction
Federation PKIManual key exchange for v1.0
Cross-Cluster addressing formatTBD

Risk Analysis

RiskLikelihoodImpactMitigation
CRDT divergenceMediumHighAnti-entropy protocol
Gossip stormsLowHighAdaptive fanout
Bridge bottleneckMediumMediumMultiple Bridges per Cluster
Security scan false negativesHighHighRuntime defense layer
Scope creepHighHighStrict phase gating

Opportunity Analysis

OpportunityPotentialNotes
Edge-first federationHighUnique differentiator
GitOps without K8sHighRegistry + Cluster Controller
Nerves integrationHighCluster on embedded
Hosted registry (SaaS)Highboot.macula.io revenue
Enterprise BridgeHighManaged federation

Archived Plans

Previous planning documents preserved in architecture/archive/:

  • ROADMAP-pre-supermesh.md - Previous roadmap (pre-December 2025)
  • reckon_db-integration.md - Superseded by CRDT approach

References

  • Plan document: ~/.claude/plans/snuggly-finding-simon.md (session artifact)
  • Source analysis: 102 .erl files, 52 test files
  • Current tests: 200+ passing

Document Version: 3.1 (Cross-Gateway Validation) Last Updated: 2025-12-25