# Parrhesia Relay Sync

## 1. Purpose

This document defines the Parrhesia proposal for **relay-to-relay event synchronization**.

It is intentionally transport-focused:

- manage remote relay peers,
- catch up on matching events,
- keep a live stream open,
- expose health and basic stats.

It does **not** define application data semantics.

Parrhesia syncs Nostr events. Callers decide which events matter and how to apply them.

---

## 2. Boundary

### Parrhesia is responsible for

- storing and validating events,
- querying and streaming events,
- running outbound sync workers against remote relays,
- tracking peer configuration, worker health, and sync counters,
- exposing peer management through `Parrhesia.API.Sync`.

### Parrhesia is not responsible for

- resource mapping,
- trusted node allowlists for an app profile,
- mutation payload validation beyond normal event validation,
- conflict resolution,
- replay winner selection,
- database upsert/delete semantics.

For Tribes, those remain in `TRIBES-NOSTRSYNC` and `AshNostrSync`.

---

## 3. Security Foundation

### Default posture

The baseline posture for sync traffic is:

- no access to sync events by default,
- no implicit trust from ordinary relay usage,
- no reliance on plaintext confidentiality from public relays.

For the first implementation, Parrhesia should protect sync data primarily with:

- authenticated server identities,
- ACL-gated read and write access,
- TLS with certificate pinning for outbound peers.

### Server identity

Parrhesia owns a low-level server identity used for relay-to-relay authentication.

This identity is separate from:

- TLS endpoint identity,
- application event author pubkeys.

Recommended model:

- Parrhesia has one local server-auth pubkey,
- sync peers authenticate as server-auth pubkeys,
- ACL grants are bound to those authenticated server-auth pubkeys,
- application-level writer trust remains outside Parrhesia.

Identity lifecycle:

1. use configured/imported key if provided,
2. otherwise use persisted local identity,
3. otherwise generate once during initial startup and persist it.

Private key export should not be supported.

### ACLs

Sync traffic should use a real ACL layer, not moderation allowlists.

Current implementation note:

- Parrhesia already has storage-backed moderation state such as `allowed_pubkeys` and `blocked_ips`,
- that is not the sync ACL model,
- sync protection must be enforced in the active websocket/query/count/negentropy/write path, not inferred from management tables alone.

Initial ACL model:

- principal: authenticated pubkey,
- capabilities: `sync_read`, `sync_write`,
- match: event/filter shape such as `kinds: [5000]` and namespace tags.

This is enough for now. We do **not** need a separate user ACL model and server ACL model yet.

A sync peer is simply an authenticated principal with sync capabilities.

### TLS pinning

Each outbound sync peer must include pinned TLS material.

Recommended pin type:

- SPKI SHA-256 pins

Multiple pins should be allowed to support certificate rotation.

---

## 4. Sync Model

Each configured sync server represents one outbound worker managed by Parrhesia.

Implementation note:

- Khatru-style relay designs benefit from explicit runtime stages,
- Parrhesia sync should therefore plug into clear internal phases for connection admission, auth, query/count, subscription, negentropy, publish, and fanout,
- this should stay a runtime refactor, not become extra sync semantics.

Minimum behavior:

1. connect to the remote relay,
2. run an initial catch-up query for the configured filters,
3. ingest received events into the local relay through the normal API path,
4. switch to a live subscription for the same filters,
5. reconnect with backoff when disconnected.

The worker treats filters as opaque Nostr filters. It does not interpret app payloads.

### Sync modes

Parrhesia supports two catch-up modes:

- `:req_stream` — catch-up via `REQ` + overlap window, then live `REQ` subscription.
- `:negentropy_first` — attempt NIP-77 negentropy catch-up first, then fetch missing event ids via `REQ`, then switch to live `REQ` subscription. Falls back to `:req_stream` behavior when negentropy is unavailable or fails.

This keeps deployment flexibility while allowing bandwidth-efficient catch-up on trusted links.

### Topology and convergence semantics

Parrhesia sync is intentionally a relay sync foundation, not an application convergence engine.

Operationally:

- use sync peers to define topology (mesh, hub/spoke, or staged rollout),
- keep per-peer filters narrow and explicit,
- treat sync health as transport/control-plane health, not proof of app-level convergence.

Delivery expectations:

- persisted events: practical eventual convergence via reconnect + catch-up,
- ephemeral events: best-effort only,
- no global total ordering guarantee across nodes.

### NIP-77

Parrhesia now has a real reusable relay-side NIP-77 engine:

- proper `NEG-OPEN` / `NEG-MSG` / `NEG-CLOSE` / `NEG-ERR` framing,
- a reusable negentropy codec and reconciliation engine,
- bounded local `(created_at, id)` snapshot enumeration for matching filters,
- connection/session integration with policy checks and resource limits.

That means NIP-77 can be used for bandwidth-efficient catch-up between trusted nodes.

The sync worker now exposes this as configuration (`mode: :req_stream | :negentropy_first`) so deployments can choose the operational tradeoff per peer.

---

## 5. API Surface

Primary control plane:

- `Parrhesia.API.Identity.get/1`
- `Parrhesia.API.Identity.ensure/1`
- `Parrhesia.API.Identity.import/2`
- `Parrhesia.API.Identity.rotate/1`
- `Parrhesia.API.ACL.grant/2`
- `Parrhesia.API.ACL.revoke/2`
- `Parrhesia.API.ACL.list/1`
- `Parrhesia.API.Sync.put_server/2`
- `Parrhesia.API.Sync.remove_server/2`
- `Parrhesia.API.Sync.get_server/2`
- `Parrhesia.API.Sync.list_servers/1`
- `Parrhesia.API.Sync.start_server/2`
- `Parrhesia.API.Sync.stop_server/2`
- `Parrhesia.API.Sync.sync_now/2`
- `Parrhesia.API.Sync.server_stats/2`
- `Parrhesia.API.Sync.sync_stats/1`
- `Parrhesia.API.Sync.sync_health/1`

These APIs are in-process. HTTP management may expose them through `Parrhesia.API.Admin` or direct routing to `Parrhesia.API.Sync`.

---

## 6. Server Specification

`put_server/2` is an upsert.

Suggested server shape:

```elixir
%{
  id: "tribes-primary",
  url: "wss://relay-a.example/relay",
  enabled?: true,
  auth_pubkey: "<remote-server-auth-pubkey>",
  mode: :negentropy_first,
  filters: [
    %{
      "kinds" => [5000],
      "#r" => ["tribes.accounts.user", "tribes.chat.tribe"]
    }
  ],
  overlap_window_seconds: 300,
  relay_info_mode: :diagnostic,
  auth: %{
    type: :nip42,
    mode: :on_challenge
  },
  tls: %{
    mode: :required,
    hostname: "relay-a.example",
    ca_certfile: "/etc/tribes/sync-ca.pem",
    client_certfile: "/etc/tribes/node.crt",
    client_keyfile: "/etc/tribes/node.key",
    pins: [
      %{type: :spki_sha256, value: "<pin-a>"}
    ]
  },
  metadata: %{}
}
```

Required fields:

- `id`
- `url`
- `auth_pubkey`
- `filters`
- `tls`

Recommended fields:

- `enabled?`
- `mode`
- `overlap_window_seconds`
- `relay_info_mode`
- `auth`
- `metadata`

Rules:

- `id` must be stable and unique locally.
- `url` is the remote relay websocket URL.
- `auth_pubkey` is the expected remote server-auth pubkey.
- `filters` must be valid NIP-01 filters.
- filters are owned by the caller; Parrhesia only validates filter shape.
- `mode` supports `:req_stream` and `:negentropy_first`; it defaults to `:req_stream`.
- `relay_info_mode` supports `:required`, `:diagnostic`, and `:disabled`; it defaults to `:required`.
- `auth.mode` supports `:on_challenge` and `:disabled`; it defaults to `:on_challenge`.
- `tls.mode` defaults to `:required`.
- `tls.pins` are optional and may be combined with dedicated CA trust and client certs.

---

## 7. Runtime State

Each server should have both configuration and runtime status.

Suggested runtime fields:

```elixir
%{
  server_id: "tribes-primary",
  state: :running,
  connected?: true,
  last_connected_at: ~U[2026-03-16 10:00:00Z],
  last_disconnected_at: nil,
  last_sync_started_at: ~U[2026-03-16 10:00:00Z],
  last_sync_completed_at: ~U[2026-03-16 10:00:02Z],
  last_event_received_at: ~U[2026-03-16 10:12:45Z],
  last_eose_at: ~U[2026-03-16 10:00:02Z],
  reconnect_attempts: 0,
  last_error: nil
}
```

Parrhesia should keep this state generic. It is about relay sync health, not app state convergence.

---

## 8. Stats and Health

### Per-server stats

`server_stats/2` should return basic counters such as:

- `events_received`
- `events_accepted`
- `events_duplicate`
- `events_rejected`
- `query_runs`
- `subscription_restarts`
- `reconnects`
- `last_remote_eose_at`
- `last_error`

### Aggregate sync stats

`sync_stats/1` should summarize:

- total configured servers,
- enabled servers,
- running servers,
- connected servers,
- aggregate event counters,
- aggregate reconnect count.

### Health

`sync_health/1` should be operator-oriented, for example:

```elixir
%{
  "status" => "degraded",
  "servers_total" => 3,
  "servers_connected" => 2,
  "servers_failing" => [
    %{"id" => "tribes-secondary", "reason" => "connection_refused"}
  ]
}
```

This is intentionally simple. It should answer “is sync working?” without pretending to prove application convergence.

---

## 9. Event Ingest Path

Events received from a remote sync worker should enter Parrhesia through the same ingest path as any other accepted event.

That means:

1. validate the event,
2. run normal write policy,
3. persist or reject,
4. fan out locally,
5. rely on duplicate-event behavior for idempotency.

This avoids a second ingest path with divergent behavior.

Before normal event acceptance, the sync worker should enforce:

1. pinned TLS validation for the remote endpoint,
2. remote server-auth identity match,
3. local ACL grant permitting the peer to perform sync reads and/or writes.

The sync worker may attach request-context metadata such as:

```elixir
%Parrhesia.API.RequestContext{
  caller: :sync,
  peer_id: "tribes-primary",
  metadata: %{sync_server_id: "tribes-primary"}
}
```

Recommended additional context when available:

- `remote_ip`
- `subscription_id`

This context is for telemetry, policy, and audit only. It must not become app sync semantics.

---

## 10. Persistence

Parrhesia persists enough sync control-plane state to survive restart:

- local server identity reference,
- configured ACL rules for sync principals,
- configured sync servers (`sync_servers` table),
- per-server sync runtime snapshot (`sync_server_runtime` table), including cursor/watermark and basic health counters.

This persistence is controlled by `:sync.persist_state?` (`PARRHESIA_SYNC_PERSIST_STATE`) and is enabled by default.

Parrhesia does not persist application replay heads or winner state. That remains in the embedding application.

---

## 11. Relationship to Runtime Features

### Cross-node sync data plane

Parrhesia provides cross-node relay sync primitives through `Parrhesia.API.Sync` workers.

Local in-node fanout remains process-local (`Parrhesia.Fanout.Dispatcher` + subscription index).
Cross-node event convergence is handled by authenticated relay-to-relay sync.

### Management stats

Current admin `stats` is relay-global and minimal.

Sync adds a new dimension:

- peer config,
- worker state,
- per-peer counters,
- sync health summary.

That should be exposed without coupling it to app-specific sync semantics.

---

## 12. Tribes Usage

For Tribes, `AshNostrSync` should be able to:

1. rely on Parrhesia’s local server identity,
2. register one or more remote relays with `Parrhesia.API.Sync.put_server/2`,
3. grant sync ACLs for trusted server-auth pubkeys,
4. provide narrow Nostr filters for `kind: 5000`,
5. observe sync health and counters,
6. consume events via the normal local Parrhesia ingest/query/stream surface.

Tribes should not need Parrhesia to know:

- what a resource namespace means,
- which node pubkeys are trusted for Tribes,
- how to resolve conflicts,
- how to apply an upsert or delete.

That is the key boundary.