Changelog

View Source

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.7.9] - 2025-11-16

Added

  • Gateway Supervision Refactoring: Implemented proper OTP supervision tree
    • New 3-tier architecture: macula_gateway_sup (root) supervises macula_gateway_quic_server, macula_gateway, macula_gateway_workers_sup
    • Added macula_gateway_quic_server.erl - Dedicated QUIC transport layer (248 LOC, 17 tests)
    • Added macula_gateway_workers_sup.erl - Supervises business logic workers (152 LOC, 24 tests)
    • Added macula_gateway_clients.erl - Renamed from macula_gateway_client_manager (clearer naming)
    • Circular dependency resolution via set_gateway/2 callback pattern
    • rest_for_one supervision strategy for controlled fault isolation

Changed

  • Gateway Architecture: Refactored from manual process management to supervised architecture
    • Gateway now finds siblings via supervisor instead of starting them manually
    • Simplified macula_gateway init/1 - uses find_parent_supervisor/0 and find_sibling/2
    • Removed manual lifecycle management - supervisor handles cleanup
    • Updated macula_gateway_sup.erl to be root supervisor (was workers supervisor)
    • All gateway tests updated for new supervision tree (106 tests, 0 failures)

Fixed

  • CRITICAL: Gateway now actually USES DHT-routed pub/sub (v0.7.8 had the code but wasn't calling it!)
    • Bug: Gateway's handle_publish was still using v0.7.7 endpoint-based routing
    • Impact: v0.7.8 protocol infrastructure existed but gateway bypassed it entirely
    • Root cause: handle_publish (macula_gateway.erl:885-943) never called macula_pubsub_routing
    • Solution: Rewrote handle_publish to use macula_pubsub_routing:wrap_publish and send via pubsub_route messages
    • Flow: Gateway now queries DHT for node_id (not endpoint), wraps PUBLISH in pubsub_route, sends via mesh connection manager
    • Result: Messages now actually route via multi-hop Kademlia DHT to remote subscribers
  • Fixed test failures in macula_connection_tests - replaced invalid connected message type with subscribe
  • Fixed edoc warning in macula_gateway_sup.erl - replaced markdown code fence with HTML pre tags for proper documentation generation

Improved

  • Fault Tolerance: Automatic recovery from gateway/QUIC/worker crashes
  • Production Stability: Proper OTP supervision with configurable restart strategies
  • Code Organization: Clean separation between transport (QUIC), coordination (gateway), and business logic (workers)
  • Testability: Each module tested independently with comprehensive coverage

Technical Details

  • v0.7.8 added pubsub_route protocol + routing modules but gateway never used them
  • v0.7.9 integrates the v0.7.8 infrastructure into gateway's publish flow
  • This completes the DHT-routed pub/sub implementation started in v0.7.8
  • Supervision refactoring provides +2/10 scalability improvement (foundational infrastructure)
  • Enables future optimizations: process pools, connection pooling, horizontal scaling

[0.7.8] - 2025-11-16

Fixed

  • CRITICAL: Implemented multi-hop DHT routing for pub/sub to fix matchmaking
    • Bug: v0.7.7 gateway queried DHT but routed to endpoints, which failed for NAT peers
    • Impact: Matchmaking still broken - messages couldn't reach subscribers behind NAT
    • Root cause: Split-brain architecture - subscribers register locally but routing via gateway
    • Solution: Multi-hop Kademlia DHT routing (same pattern as RPC routing)

Added

  • Protocol Layer: New pubsub_route message type (0x13)

    • Wraps PUBLISH messages for multi-hop routing through mesh
    • Fields: destination_node_id, source_node_id, hop_count, max_hops, topic, payload
    • Protocol encoder/decoder support with validation
    • 8 encoder tests + 3 decoder tests added
  • Routing Module: macula_pubsub_routing.erl (NEW - 115 LOC)

    • Stateless routing logic for pub/sub messages
    • wrap_publish/4 - Wraps PUBLISH in routing envelope
    • route_or_deliver/3 - Routes to next hop or delivers locally
    • should_deliver_locally/2 - Checks if destination matches
    • TTL protection via max_hops (default: 10)
    • 14 comprehensive tests (all passing)
  • Gateway Integration: Enhanced macula_gateway.erl

    • Added handle_decoded_message clause for pubsub_route messages
    • Routes via XOR distance to next hop OR delivers locally
    • handle_pubsub_route_deliver/2 - Unwraps and delivers to local subscribers
    • forward_pubsub_route/3 - Forwards to next hop through mesh
  • Pub/Sub Handler: Updated macula_pubsub_dht.erl

    • route_to_subscribers/5 now uses actual DHT routing (was TODO stub)
    • Extracts subscriber node_id (not endpoint) from DHT results
    • Wraps PUBLISH in pubsub_route envelope
    • Sends via connection manager which routes through gateway

Technical Details

v0.7.7 Architecture (BROKEN):

  • ❌ Publisher queries DHT for subscriber endpoints
  • ❌ Tries to route directly to endpoints
  • ❌ Fails for NAT peers (can't accept connections)
  • ❌ Matchmaking stuck on "Looking for opponent..."

v0.7.8 Architecture (FIXED):

  • ✅ Publisher queries DHT for subscriber node IDs
  • ✅ Wraps PUBLISH in pubsub_route envelope
  • ✅ Routes via multi-hop Kademlia (same as RPC)
  • ✅ Works with relay OR direct connections
  • ✅ Matchmaking succeeds across NAT peers

Message Flow:

Publisher                Gateway              Node A               Subscriber
  |                         |                    |                      |
  |--pubsub_route---------->|                    |                      |
  |  dest: Subscriber       |--pubsub_route----->|                      |
  |  topic: "matchmaking"   |  (forward closer)  |--pubsub_route------->|
  |  payload: {msg}         |                    |                      |
  |                         |                    |                      | Deliver locally

Tests

  • Protocol encoder: 49 tests (8 new for pubsub_route)
  • Protocol decoder: 35 tests (3 new for pubsub_route)
  • Pub/sub routing: 14 tests (all passing)
    • wrap_publish envelope creation
    • should_deliver_locally checks
    • route_or_deliver decision logic
    • TTL exhaustion handling
    • No-route error handling

Architecture Documentation

  • Added architecture/dht_routed_pubsub.md with complete design
  • Future refactoring note: Consider unifying RPC and pub/sub routing modules (nearly identical logic)

This completes the DHT-routed pub/sub implementation and should enable working matchmaking.


[0.7.7] - 2025-11-15

Fixed

  • CRITICAL: Gateway pub/sub now queries DHT for remote subscribers
    • Bug: Gateway only checked local subscriptions, never queried DHT for remote subscribers
    • Impact: Distributed pub/sub and matchmaking completely broken - remote peers couldn't receive messages
    • Root cause: handle_publish only called macula_gateway_pubsub:get_subscribers (local streams only)
    • Fix Phase 1: Added endpoint → stream PID tracking in macula_gateway_client_manager
      • New state field: endpoint_to_stream :: #{binary() => pid()}
      • New API: get_stream_by_endpoint/2
      • Updated store_client_stream/4 to track endpoints
      • Updated remove_client/2 to clean up endpoint mappings
    • Fix Phase 2: Modified handle_publish to query DHT
      • Queries local subscribers (existing behavior)
      • Queries DHT for remote subscribers via crypto:hash(sha256, Topic)
      • Converts remote endpoints to stream PIDs using client_manager
      • Combines local + remote and delivers to all
    • Fix Phase 3: Added macula_gateway_dht:lookup_value/1
    • Tests: 90 tests passing (39 client_manager + 49 gateway + 7 endpoint + 5 pub/sub DHT)

This completes the distributed pub/sub fix and enables working matchmaking across multiple peer containers.

Technical Details

Before v0.7.7:

  • ❌ Gateway only queried macula_gateway_pubsub (local subscriptions)
  • ❌ Remote subscribers stored in DHT but never looked up
  • ❌ Pub/sub messages only delivered to local streams
  • ❌ Multi-peer matchmaking broken

After v0.7.7:

  • ✅ Gateway queries both local + DHT for subscribers
  • ✅ Remote endpoints resolved to stream PIDs via endpoint tracking
  • ✅ Messages delivered to all subscribers (local + remote)
  • ✅ Multi-peer matchmaking works correctly

The architecture remains hub-and-spoke (v0.7.x):

  • All peers connect to gateway
  • Gateway routes all pub/sub messages
  • Subscriptions stored in DHT for discovery
  • Gateway has stream PIDs for all connected peers

[0.8.0] - TBD (Q2 2025)

Planned - True Mesh Architecture

  • BREAKING: Opportunistic NAT hole punching for direct peer-to-peer connections
    • 80% direct P2P connections (cone NAT, no firewall)
    • 20% gateway relay fallback (symmetric NAT, strict firewalls)
    • True mesh topology (no single point of failure)
    • New modules: macula_nat_discovery, macula_hole_punch, macula_connection_upgrade
    • Backward compatible with v0.7.x gateway relay architecture

This will transform Macula from hub-and-spoke (star topology) to true decentralized mesh.

See architecture/NAT_TRAVERSAL_ROADMAP.md for complete design.


[0.7.6] - 2025-11-15

Fixed

  • CRITICAL: Disabled QUIC transport-layer idle timeout causing connection closures
    • Root cause: MsQuic default idle timeout of 30 seconds (2x = 60s to closure)
    • v0.7.4-0.7.5 application-level PING/PONG worked but didn't reset QUIC transport timer
    • Added idle_timeout_ms => 0 to both client connection and gateway listener options
    • Setting to 0 disables QUIC idle timeout entirely
    • Connections now stay alive indefinitely (application PING/PONG provides health checks)
    • Modified: macula_quic:connect/4 and macula_quic:listen/2

This completes the connection stability fix started in v0.7.4-0.7.5.

Tests

  • Added test/macula_quic_idle_timeout_tests.erl with 7 tests
    • Client connection idle timeout configuration
    • Gateway listener idle timeout configuration
    • Option structure and value validation
    • Defense-in-depth architecture documentation

Technical Details

Defense in Depth approach:

  1. Transport Layer (v0.7.6): QUIC idle timeout disabled (idle_timeout_ms => 0)
  2. Application Layer (v0.7.4-0.7.5): PING/PONG keep-alive every 30 seconds
  3. Result: Connections stay alive + health monitoring

Previous versions had application keep-alive but QUIC transport still enforced 30s idle timeout independently.


[0.7.5] - 2025-11-15

Fixed

  • CRITICAL: Gateway PING message handler missing, preventing keep-alive from working
    • v0.7.4 implemented keep-alive on edge peer side only
    • Gateway had no handler for incoming PING messages
    • Result: PINGs sent but never acknowledged, connections still timed out after 2 minutes
    • Added handle_decoded_message({ok, {ping, PingMsg}}, ...) to gateway
    • Gateway now responds with PONG to all incoming PING messages
    • Keep-alive now works bidirectionally (edge peer ↔ gateway)
    • Also added PONG message handler to gateway for completeness

This completes the keep-alive implementation started in v0.7.4.

Technical Details

The keep-alive flow now works correctly:

  1. Edge peer timer fires every 30 seconds (configurable)
  2. Edge peer sends PING to gateway
  3. Gateway receives PING and responds with PONG (new in v0.7.5)
  4. Edge peer receives PONG confirmation
  5. QUIC connection stays alive (no idle timeout)

Without this fix, PINGs were sent but ignored, causing connections to timeout despite v0.7.4's implementation.


[0.7.4] - 2025-11-15

Fixed

  • CRITICAL: Configurable keep-alive mechanism to prevent QUIC connection timeouts
    • PING/PONG message support in macula_connection
    • Default keep-alive interval: 30 seconds (configurable)
    • Keep-alive enabled by default (can be disabled via options)
    • Automatic PONG response to incoming PING messages
    • Configuration via macula_connection:default_config/0
    • Prevents 2-minute connection timeout that broke distributed matchmaking
    • Added 6 tests for keep-alive functionality (all passing)

This is a critical fix for production deployments where QUIC connections timeout after ~2 minutes of inactivity, breaking pub/sub and matchmaking.

Configuration

Enable/disable keep-alive:

%% Enable with custom interval (milliseconds)
Opts = #{
    keepalive_enabled => true,
    keepalive_interval => 30000  %% 30 seconds
}.

%% Disable keep-alive
Opts = #{
    keepalive_enabled => false
}.

%% Use defaults (enabled, 30 second interval)
DefaultConfig = macula_connection:default_config().

Architecture Note

v0.7.4 maintains hub-and-spoke (star) topology:

  • Edge peers connect to gateway (not each other)
  • Gateway routes all messages (relay architecture)
  • Gateway is single point of failure (by design for now)
  • DHT routing table exists but routing happens at gateway
  • True peer-to-peer mesh deferred to v0.8.0 (NAT traversal required)

[0.7.3] - 2025-11-15

Fixed

  • CRITICAL: Fixed DHT routing table address serialization crash in macula_gateway_dht
    • Bug: Gateway stored parsed address tuples {{127,0,0,1}, 9443} in DHT instead of binary strings
    • Impact: When FIND_VALUE replies tried to serialize node addresses, msgpack returned error {:error, {:badarg, {{127,0,0,1}, 9443}}}
    • Root cause: macula_gateway.erl:522 used Address (tuple from parse_endpoint/1) instead of Endpoint (binary string)
    • Error chain: DHT stored tuples → encode_node_info extracted tuples → msgpack:pack failed → byte_size crashed
    • Symptoms: Gateway crashed with "ArgumentError: 1st argument not a bitstring" when peers queried DHT
    • Fix: Store original Endpoint binary string in DHT routing table instead of parsed tuple
    • Added test: dht_address_serialization_test documents bug and validates fix

This is a critical fix for distributed matchmaking and service discovery. Without it, DHT queries crash the gateway.

[0.7.2] - 2025-11-15

Fixed

  • CRITICAL: Fixed gateway crash in parse_endpoint/1 when DNS resolution fails
    • Bug: inet:getaddr/2 error tuple was not handled, causing ArgumentError when passed to byte_size/1
    • Impact: Gateway crashed repeatedly, closing all client connections and preventing pub/sub communication
    • Symptoms: "Failed to publish to topic: :closed", "Failed to send STORE for subscription: :closed"
    • Fix: Added proper error handling with localhost fallback when DNS resolution fails
    • Now returns {{127,0,0,1}, Port} fallback instead of crashing

This is a critical fix for production deployments where endpoint DNS resolution may fail.

[0.7.1] - 2025-11-15

Fixed

  • CRITICAL: Fixed ArithmeticError in macula_pubsub_handler message ID handling
    • Bug: Was assigning binary MsgId to counter instead of integer NewCounter
    • Impact: Caused pub/sub to crash on second publish attempt with "bad argument in arithmetic expression"
    • Fix: Corrected destructuring in line 300 to use {_MsgId, NewCounter} instead of {MsgIdCounter, _}
    • Now properly increments integer counter instead of trying to do arithmetic on binary

This is a critical fix for anyone using pub/sub functionality in v0.7.0.

0.7.0 - 2025-11-15

Changed

  • BREAKING: Major nomenclature refactoring for clarity and industry alignment
    • Renamed macula_connectionmacula_peer (mesh participant facade - high-level API)
    • Renamed macula_connection_managermacula_connection (QUIC transport layer - low-level)
    • Follows industry standards used by libp2p, IPFS, and BitTorrent
    • Clear separation: macula_peer = mesh participant, macula_connection = transport

Added

  • Comprehensive transport layer test coverage (36 tests total)
    • 11 new tests for message decoding, buffering, URL parsing, and realm normalization
    • All tests passing with zero regressions
  • Complete v0.7.0 documentation in CLAUDE.md
    • Migration guide with specific API examples
    • Architecture rationale and benefits
    • Status tracking for implementation phases

Migration Guide (0.6.x → 0.7.0)

API Changes:

All high-level mesh operations now use macula_peer instead of macula_connection:

%% Before (0.6.x)
{ok, Client} = macula_connection:start_link(Url, Opts).
ok = macula_connection:publish(Client, Topic, Data).
{ok, SubRef} = macula_connection:subscribe(Client, Topic, Callback).
{ok, Result} = macula_connection:call(Client, Procedure, Args).

%% After (0.7.0)
{ok, Client} = macula_peer:start_link(Url, Opts).
ok = macula_peer:publish(Client, Topic, Data).
{ok, SubRef} = macula_peer:subscribe(Client, Topic, Callback).
{ok, Result} = macula_peer:call(Client, Procedure, Args).

Why This Change?

The original naming was confusing:

  • macula_connection served both facade AND transport roles
  • ❌ Mixed high-level mesh operations with low-level QUIC handling
  • ❌ Not aligned with P2P industry standards

After v0.7.0:

  • macula_peer = mesh participant (clear high-level API for pub/sub, RPC, DHT)
  • macula_connection = QUIC transport (clear low-level transport layer)
  • ✅ Follows libp2p/IPFS/BitTorrent naming conventions

Note: The macula_client wrapper module has been updated to use macula_peer internally, so if you're using macula_client, no changes are required.

0.6.7 - 2025-11-15

Fixed

  • CRITICAL: Fixed all installation examples to use Hex package references instead of git dependencies
    • README.md: Changed from git-based to {:macula, "~> 0.6"} (Elixir) and {macula, "0.6.7"} (Erlang)
    • HELLO_WORLD.md: Updated to use proper Hex package format
    • architecture/macula_http3_mesh_hello_world.md: Fixed tutorial installation examples
    • architecture/macula_http3_mesh_rpc_guide.md: Fixed migration guide examples
    • All code examples now show proper Hex.pm installation for published package

[0.6.6] - 2025-11-15

Fixed

  • Fixed navigation links in documentation guides to use ex_doc HTML filenames
    • Changed GitHub-style relative paths (../README.md) to ex_doc HTML references (readme.html)
    • Fixed all navigation links in EXECUTIVE_SUMMARY.md, COMPARISONS.md, USE_CASES.md, and DEVELOPMENT.md
    • Links now work correctly in published Hexdocs without "page not found" errors

[0.6.5] - 2025-11-15

Changed

  • Updated to modern alternative logo (macula-alt-logo.svg) in both README.md and ex_doc
  • Changed tutorial greeting to brand-specific "Hello, Macula!" instead of generic greeting

Fixed

  • Replaced old color logo with cleaner, more modern alternative logo for better visual appeal

[0.6.4] - 2025-11-15

Changed

  • Documentation restructuring - Split README.md into focused landing page with table of contents
    • Created docs/EXECUTIVE_SUMMARY.md - Why Macula and the case for decentralization
    • Created docs/COMPARISONS.md - How Macula compares to libp2p, Distributed Erlang, Akka, etc.
    • Created docs/USE_CASES.md - Real-world applications across business, IoT, and AI domains
    • Created docs/DEVELOPMENT.md - Complete development guide and coding standards
    • README.md now serves as concise landing page (119 lines vs 372 lines)
    • All detailed content accessible via clear table of contents
    • Removed Mermaid diagram from README.md (ex_doc doesn't support Mermaid - works on GitHub)

Fixed

  • ex_doc landing page uses HELLO_WORLD.md (tutorial-first approach, no multi-page split)
  • Documentation properly links to all new guide documents
  • Better first impression for Hex.pm users (logo, quick navigation)

[0.6.3] - 2025-11-15

Fixed

  • Removed README.md from ex_doc extras to prevent multi-page split and broken landing page
  • Documentation now correctly redirects to API reference page

[0.6.2] - 2025-11-15

Fixed

  • ex_doc landing page configuration ({main, "api-reference"}) - resolved "readme.html not found" error

[0.6.1] - 2025-11-15

Added

  • Professional documentation structure for Hex publication
    • Architecture diagram in README.md (Mermaid format) showing mesh topology
    • Organized documentation: moved 50+ files from root to docs/archive/, docs/development/, docs/planning/
    • Created docs/README.md navigation index
    • Logo and assets configuration for ex_doc
    • Comprehensive Hex package file list (artwork/, docs/, architecture/)

Fixed

  • README.md badge rendering (moved badges outside <div> tag for proper GitHub display)
  • ex_doc assets configuration (deprecated warning resolved)
  • ex_doc landing page configuration (changed {main, "readme"} to {main, "api-reference"} to fix "readme.html not found" error)
  • Hex package configuration to include all necessary assets and documentation
  • Documentation organization for professional first impression

0.6.0 - 2025-11-15

Changed

  • BREAKING: Renamed environment variable from GATEWAY_REALM to MACULA_REALM for better API consistency
    • All MACULA_* environment variables now follow consistent naming
    • Applies to both gateway mode and edge peer mode
    • Update your deployment configurations to use MACULA_REALM instead of GATEWAY_REALM

Added

  • Comprehensive Kademlia DHT architecture documentation (docs/KADEMLIA_DHT_ARCHITECTURE.md)
    • XOR distance metric explanation
    • K-bucket routing table details
    • DHT operations (PING, STORE, FIND_NODE, FIND_VALUE)
    • Iterative lookup algorithm
    • Macula-specific adaptations (realm-scoped DHT, HTTP/3 transport)
    • Performance characteristics and comparisons

Fixed

  • Updated documentation to reflect MACULA_REALM environment variable usage
  • Updated entrypoint.sh, Dockerfile.gateway, and config/sys.config to use MACULA_REALM

Upcoming in v0.7.0

  • Architecture improvement: Separation of macula_connection into macula_peer (high-level mesh API) and macula_connection (low-level QUIC transport)
  • See docs/NOMENCLATURE_PROPOSAL_CONNECTION_TO_PEER.md and docs/PEER_CONNECTION_SEPARATION_PLAN.md for details
  • Expected timeline: 4-5 weeks after v0.6.0 release

Migration Guide (0.5.0 → 0.6.0)

If you're using Macula in gateway mode or configuring realm multi-tenancy:

Before (0.5.0):

export GATEWAY_REALM=my-app

After (0.6.0):

export MACULA_REALM=my-app

Elixir/Phoenix runtime.exs:

# Before (0.5.0)
System.put_env("GATEWAY_REALM", realm)

# After (0.6.0)
System.put_env("MACULA_REALM", realm)

0.5.0 - 2025-11-14

Added

  • Initial public release
  • HTTP/3 (QUIC) mesh networking platform
  • Gateway mode for accepting incoming connections
  • Edge peer mode for mesh participation
  • Multi-tenancy via realm isolation
  • Pub/Sub messaging with wildcard support
  • RPC (request/response) patterns
  • Service discovery and advertisement
  • mDNS local discovery support
  • Process registry via gproc
  • Comprehensive documentation

Known Issues

  • Gateway mode requires proper TLS certificate configuration
  • Certificates must have Subject Alternative Name (SAN) extension
  • Docker deployments require proper file ownership (--chown=app:app)