erlang_quic Design Document
View SourceThis document describes the architecture and design of the erlang_quic implementation.
Architecture Overview
The implementation is organized into the following module groups:
Public API Layer
| Module | Responsibility |
|---|---|
quic | Main public API for client connections |
quic_listener | Server-side listener for accepting connections |
quic_listener_sup | Supervisor for pooled listeners with SO_REUSEPORT |
Connection Layer
| Module | Responsibility |
|---|---|
quic_connection | Connection state machine (gen_statem) |
quic_stream | Stream state management |
Protocol Layer
| Module | Responsibility |
|---|---|
quic_packet | Packet encoding/decoding |
quic_frame | Frame encoding/decoding |
quic_varint | Variable-length integer encoding (RFC 9000 Section 16) |
Cryptography Layer
| Module | Responsibility |
|---|---|
quic_crypto | Key derivation and transcript hashing |
quic_tls | TLS 1.3 message building and parsing |
quic_keys | Traffic key derivation |
quic_aead | AEAD encryption/decryption and header protection |
quic_hkdf | HKDF-based key expansion |
Flow Control Layer
| Module | Responsibility |
|---|---|
quic_flow | Connection and stream flow control |
quic_cc | Congestion control (NewReno) |
quic_loss | Loss detection and recovery |
quic_ack | ACK frame processing and generation |
Session Layer
| Module | Responsibility |
|---|---|
quic_ticket | Session ticket storage and PSK derivation |
Connection Lifecycle
┌─────────┐
│ idle │
└────┬────┘
│ send ClientHello
▼
┌──────────────┐
│ handshaking │
└──────┬───────┘
│ receive server Finished
│ send client Finished
▼
┌──────────────┐
│ connected │◄────────┐
└──────┬───────┘ │
│ │ key update
│ │ migration
└─────────────────┘
│ close/error
▼
┌──────────────┐
│ draining │
└──────┬───────┘
│ drain timeout
▼
┌─────────┐
│ closed │
└─────────┘State Descriptions
- idle: Initial state, preparing to connect
- handshaking: TLS 1.3 handshake in progress
- connected: Connection established, data transfer active
- draining: Connection closing, processing remaining packets
- closed: Connection terminated
Packet Processing
Encryption Levels
QUIC uses four encryption levels, each with its own keys:
| Level | Usage |
|---|---|
| Initial | ClientHello, ServerHello (derived from DCID) |
| Handshake | EncryptedExtensions through Finished |
| 0-RTT | Early data (optional, from session resumption) |
| 1-RTT | Application data after handshake |
Packet Types
Long Header Packets:
- Initial (type 0x00)
- 0-RTT (type 0x01)
- Handshake (type 0x02)
- Retry (type 0x03)
Short Header Packets:
- 1-RTT (application data)
Frame Processing
Frames are processed in order within a packet. Key frame types:
| Frame Type | Description |
|---|---|
| PADDING (0x00) | Padding for packet size |
| PING (0x01) | Keep-alive |
| ACK (0x02-0x03) | Acknowledgment |
| CRYPTO (0x06) | TLS handshake data |
| STREAM (0x08-0x0f) | Stream data |
| MAX_DATA (0x10) | Connection flow control |
| MAX_STREAM_DATA (0x11) | Stream flow control |
| NEW_CONNECTION_ID (0x18) | Issue new CID |
| RETIRE_CONNECTION_ID (0x19) | Retire old CID |
| PATH_CHALLENGE (0x1a) | Path validation |
| PATH_RESPONSE (0x1b) | Path validation response |
| CONNECTION_CLOSE (0x1c-0x1d) | Close connection |
| DATAGRAM (0x30-0x31) | Unreliable datagram (RFC 9221) |
TLS Integration
Handshake Flow (Client)
Client Server
│ │
│───── Initial[ClientHello] ──────────►│
│ │
│◄──── Initial[ServerHello] ───────────│
│◄──── Handshake[EncryptedExtensions] ─│
│◄──── Handshake[Certificate] ─────────│
│◄──── Handshake[CertificateVerify] ───│
│◄──── Handshake[Finished] ────────────│
│ │
│───── Handshake[Finished] ───────────►│
│ │
│◄════ 1-RTT[Application Data] ════════│Key Derivation
Keys are derived using HKDF with the following hierarchy:
PSK (or 0)
│
▼
┌─────────────┐
│ Early Secret│
└──────┬──────┘
│
┌─────────────┼─────────────┐
▼ │ ▼
client_early binder_key res_secret
traffic_secret │
│
▼ │
┌───────────────┐ │
(EC)DHE──►│Handshake Secret│ │
└───────┬───────┘ │
│ │
┌─────────────┼─────────────┐ │
▼ │ ▼ │
client_hs server_hs derived│
traffic traffic │ │
secret secret │ │
▼ │
┌─────────────┐
│Master Secret│
└──────┬──────┘
│
┌────────────────┼────────────────┐
▼ │ ▼
client_app server_app resumption
traffic traffic master
secret secret secretStream Management
Stream IDs
Stream IDs encode initiator and directionality:
| Bits 1-0 | Stream Type |
|---|---|
| 0x00 | Client-initiated, bidirectional |
| 0x01 | Server-initiated, bidirectional |
| 0x02 | Client-initiated, unidirectional |
| 0x03 | Server-initiated, unidirectional |
Stream States
Sending side:
- Ready → Send → Data Sent → Data Recvd (terminal)
- Ready → Reset Sent → Reset Recvd (terminal)
Receiving side:
- Recv → Size Known → Data Recvd → Data Read (terminal)
- Recv → Reset Recvd → Reset Read (terminal)
Flow Control
Connection-Level
MAX_DATAframe advertises connection-level receive window- Sender tracks
data_sentagainst peer'smax_data - Receiver sends
MAX_DATAupdates when buffer space is freed
Stream-Level
MAX_STREAM_DATAframe advertises per-stream receive window- Similar tracking at stream granularity
BLOCKEDandSTREAM_BLOCKEDframes signal flow control limits
Congestion Control
The implementation supports pluggable congestion control algorithms:
Algorithms
| Algorithm | Module | Description |
|---|---|---|
| NewReno | quic_cc | RFC 9002 default, simple and robust |
| CUBIC | quic_cc_cubic | RFC 9438, better high-bandwidth utilization |
| BBR | quic_cc_bbr | Bottleneck Bandwidth and RTT-based |
All algorithms support HyStart++ (RFC 9406) for improved slow start exit.
States
- Slow Start: Exponential growth until
ssthresh(with HyStart++ exit) - Congestion Avoidance: Linear growth (NewReno), cubic function (CUBIC), or pacing-based (BBR)
- Recovery: After packet loss detection
Key Variables
| Variable | Description |
|---|---|
cwnd | Congestion window (bytes) |
ssthresh | Slow start threshold |
bytes_in_flight | Unacknowledged bytes |
pacing_rate | Send rate limit (BBR) |
NewReno (Default)
On ACK:
if bytes_acked > 0:
if cwnd < ssthresh:
cwnd += bytes_acked # Slow start
else:
cwnd += (bytes_acked * max_datagram_size) / cwnd # Congestion avoidanceOn Loss:
ssthresh = max(cwnd / 2, 2 * max_datagram_size)
cwnd = ssthreshCUBIC (RFC 9438)
Uses a cubic function for window growth that is more aggressive in high-bandwidth networks:
W(t) = C * (t - K)^3 + W_maxWhere K = cbrt(W_max * beta / C) and C = 0.4, beta = 0.7.
BBR
BBR probes for maximum bandwidth and minimum RTT:
- Startup: Doubles sending rate each RTT until bandwidth plateaus
- Drain: Reduces inflight to match BDP
- ProbeBW: Cycles through pacing gain phases
- ProbeRTT: Periodically reduces cwnd to measure min RTT
Packet Pacing
All algorithms use packet pacing (RFC 9002 Section 7.7) to prevent bursts:
- Packets are spaced based on pacing rate
- Prevents buffer bloat at bottleneck links
- Improves throughput on lossy networks
Loss Detection
Packet Loss Detection
Two mechanisms detect packet loss:
Packet Threshold: A packet is lost if a later packet in the same number space has been acknowledged and the gap exceeds the threshold (default: 3).
Time Threshold: A packet is lost if it was sent more than
max(smoothed_rtt + max(4 * rtt_var, 1ms), 1.125 * smoothed_rtt)before the largest acknowledged packet.
Probe Timeout (PTO)
When no ACKs are received, PTO triggers retransmission:
PTO = smoothed_rtt + max(4 * rtt_var, 1ms) + max_ack_delayAfter PTO expires:
- Send 1-2 ack-eliciting packets
- Double PTO for next timeout
- After persistent congestion, reset to slow start
Connection Migration
Path Validation
When the peer's address changes:
- Send
PATH_CHALLENGEframe with random 8-byte data - Peer responds with
PATH_RESPONSEcontaining same data - Path is validated upon receiving correct response
Migration Process
- Detect new remote address
- Initiate path validation
- Continue using old path until validation completes
- Switch to new path upon successful validation
- Reset congestion controller for new path
Key Update
RFC 9001 Section 6 defines key update mechanism:
- Sender increments key phase bit in packet header
- Derives new keys from current application secrets
- Receiver detects phase change and derives matching keys
- Old keys retained briefly for reordered packets
next_app_secret = HKDF-Expand-Label(current_app_secret, "quic ku", "", hash_len)Connection ID Management
Local CID Pool
- Generate and issue CIDs via
NEW_CONNECTION_IDframe - Track sequence numbers and stateless reset tokens
- Retire old CIDs via
RETIRE_CONNECTION_ID
Peer CID Pool
- Store CIDs received from peer
- Select appropriate CID for path
- Respect
active_connection_id_limittransport parameter
Session Resumption and 0-RTT
Session Tickets
After a successful handshake, the server may issue a NewSessionTicket:
- Server sends ticket after handshake completion
- Client receives via
{session_ticket, Ticket}message - Client stores ticket for future connections
- Ticket contains PSK identity, resumption secret, and max_early_data
Resumption Flow
Client Server
│ │
│───── Initial[ClientHello+PSK] ───────►│
│───── 0-RTT[Early Data] ──────────────►│ (optional)
│ │
│◄──── Initial[ServerHello+PSK] ────────│
│◄──── Handshake[EncryptedExtensions] ──│
│◄──── Handshake[Finished] ─────────────│
│ │
│───── Handshake[Finished] ────────────►│
│ │
│◄════ 1-RTT[Application Data] ═════════│0-RTT Early Data
When resuming with a stored ticket:
- Client derives
early_keysfrom PSK - Client sends 0-RTT packets (type 0x01) immediately
- Server validates PSK and derives matching keys
- Server processes early data or rejects with
early_data_rejected - Client falls back to 1-RTT if early data rejected
Limitations:
- 0-RTT data is not forward-secret
- Max early data size is limited by ticket's
max_early_data_size - Replay protection is application-layer responsibility
DATAGRAM Extension (RFC 9221)
Overview
DATAGRAM frames provide unreliable message delivery:
- Not retransmitted on loss
- No flow control (use connection-level limits)
- Useful for latency-sensitive data (gaming, real-time media)
Transport Parameter Negotiation
Datagram support is negotiated via the max_datagram_frame_size transport parameter:
| Value | Meaning |
|---|---|
| 0 or absent | Datagrams not supported (default) |
| 1-65535 | Maximum datagram payload size accepted |
Both endpoints must advertise support for datagrams to work. The effective maximum size is the minimum of what the peer advertises and what fits in a QUIC packet (path MTU minus overhead).
Configuration
%% Enable datagram support (client)
quic:connect(Host, Port, #{
max_datagram_frame_size => 65535 % Recommended: accept any size
}, Owner).
%% Enable datagram support (server)
quic:start_server(my_server, Port, #{
max_datagram_frame_size => 65535,
cert => Cert,
key => Key
}).API
%% Check if datagrams are supported and get max size
MaxSize = quic:datagram_max_size(ConnRef).
%% Returns 0 if datagrams not supported, otherwise peer's max size
%% Send a datagram (unreliable)
ok = quic:send_datagram(ConnRef, Data).
%% Returns {error, datagrams_not_supported} if peer didn't advertise
%% Returns {error, datagram_too_large} if Data exceeds peer's limit
%% Returns {error, congestion_limited} if cwnd is full (datagram dropped)
%% Receive datagrams (owner process)
receive
{quic, ConnRef, {datagram, Data}} -> handle_datagram(Data)
end.Error Handling
| Error | Cause |
|---|---|
{error, datagrams_not_supported} | Peer didn't advertise max_datagram_frame_size |
{error, datagram_too_large} | Data exceeds peer's advertised limit |
{error, congestion_limited} | Congestion window full (datagram dropped) |
Protocol Violations
Per RFC 9221, the following result in PROTOCOL_VIOLATION connection close:
- Receiving a DATAGRAM frame when we didn't advertise support
- Receiving a DATAGRAM frame larger than our advertised limit
Active Migration
Triggering Migration
The quic:migrate/1 API triggers active connection migration:
- Application calls
quic:migrate(ConnRef) - Connection rebinds to new local socket
- PATH_CHALLENGE sent to peer on new path
- On PATH_RESPONSE: migration complete
- Congestion controller reset for new path
Use Cases
- Network handover (WiFi to cellular)
- NAT rebinding recovery
- Load balancing