erlang_quic Design Document

View Source

This document describes the architecture and design of the erlang_quic implementation.

Architecture Overview

The implementation is organized into the following module groups:

Public API Layer

ModuleResponsibility
quicMain public API for client connections
quic_listenerServer-side listener for accepting connections
quic_listener_supSupervisor for pooled listeners with SO_REUSEPORT

Connection Layer

ModuleResponsibility
quic_connectionConnection state machine (gen_statem)
quic_streamStream state management

Protocol Layer

ModuleResponsibility
quic_packetPacket encoding/decoding
quic_frameFrame encoding/decoding
quic_varintVariable-length integer encoding (RFC 9000 Section 16)

Cryptography Layer

ModuleResponsibility
quic_cryptoKey derivation and transcript hashing
quic_tlsTLS 1.3 message building and parsing
quic_keysTraffic key derivation
quic_aeadAEAD encryption/decryption and header protection
quic_hkdfHKDF-based key expansion

Flow Control Layer

ModuleResponsibility
quic_flowConnection and stream flow control
quic_ccCongestion control (NewReno)
quic_lossLoss detection and recovery
quic_ackACK frame processing and generation

Session Layer

ModuleResponsibility
quic_ticketSession ticket storage and PSK derivation

Connection Lifecycle

                    
                      idle   
                    
                          send ClientHello
                         
                  
                   handshaking  
                  
                          receive server Finished
                          send client Finished
                         
                  
                    connected   
                           
                                           key update
                                           migration
                         
                          close/error
                         
                  
                     draining   
                  
                          drain timeout
                         
                    
                     closed  
                    

State Descriptions

  • idle: Initial state, preparing to connect
  • handshaking: TLS 1.3 handshake in progress
  • connected: Connection established, data transfer active
  • draining: Connection closing, processing remaining packets
  • closed: Connection terminated

Packet Processing

Encryption Levels

QUIC uses four encryption levels, each with its own keys:

LevelUsage
InitialClientHello, ServerHello (derived from DCID)
HandshakeEncryptedExtensions through Finished
0-RTTEarly data (optional, from session resumption)
1-RTTApplication data after handshake

Packet Types

Long Header Packets:

  • Initial (type 0x00)
  • 0-RTT (type 0x01)
  • Handshake (type 0x02)
  • Retry (type 0x03)

Short Header Packets:

  • 1-RTT (application data)

Frame Processing

Frames are processed in order within a packet. Key frame types:

Frame TypeDescription
PADDING (0x00)Padding for packet size
PING (0x01)Keep-alive
ACK (0x02-0x03)Acknowledgment
CRYPTO (0x06)TLS handshake data
STREAM (0x08-0x0f)Stream data
MAX_DATA (0x10)Connection flow control
MAX_STREAM_DATA (0x11)Stream flow control
NEW_CONNECTION_ID (0x18)Issue new CID
RETIRE_CONNECTION_ID (0x19)Retire old CID
PATH_CHALLENGE (0x1a)Path validation
PATH_RESPONSE (0x1b)Path validation response
CONNECTION_CLOSE (0x1c-0x1d)Close connection
DATAGRAM (0x30-0x31)Unreliable datagram (RFC 9221)

TLS Integration

Handshake Flow (Client)

Client                                  Server
                                         
   Initial[ClientHello] 
                                         
   Initial[ServerHello] 
   Handshake[EncryptedExtensions] 
   Handshake[Certificate] 
   Handshake[CertificateVerify] 
   Handshake[Finished] 
                                         
   Handshake[Finished] 
                                         
   1-RTT[Application Data] 

Key Derivation

Keys are derived using HKDF with the following hierarchy:

                    PSK (or 0)
                        
                        
                 
                  Early Secret
                 
                        
          
                                    
    client_early    binder_key    res_secret
    traffic_secret                    
                                      
                                     
                     
    (EC)DHEHandshake Secret      
                     
                                     
         
                                   
   client_hs      server_hs    derived
   traffic        traffic            
   secret         secret             
                                     
                        
                        Master Secret
                        
                               
              
                                              
        client_app         server_app      resumption
        traffic            traffic         master
        secret             secret          secret

Stream Management

Stream IDs

Stream IDs encode initiator and directionality:

Bits 1-0Stream Type
0x00Client-initiated, bidirectional
0x01Server-initiated, bidirectional
0x02Client-initiated, unidirectional
0x03Server-initiated, unidirectional

Stream States

Sending side:

  • Ready → Send → Data Sent → Data Recvd (terminal)
  • Ready → Reset Sent → Reset Recvd (terminal)

Receiving side:

  • Recv → Size Known → Data Recvd → Data Read (terminal)
  • Recv → Reset Recvd → Reset Read (terminal)

Flow Control

Connection-Level

  • MAX_DATA frame advertises connection-level receive window
  • Sender tracks data_sent against peer's max_data
  • Receiver sends MAX_DATA updates when buffer space is freed

Stream-Level

  • MAX_STREAM_DATA frame advertises per-stream receive window
  • Similar tracking at stream granularity
  • BLOCKED and STREAM_BLOCKED frames signal flow control limits

Congestion Control

The implementation supports pluggable congestion control algorithms:

Algorithms

AlgorithmModuleDescription
NewRenoquic_ccRFC 9002 default, simple and robust
CUBICquic_cc_cubicRFC 9438, better high-bandwidth utilization
BBRquic_cc_bbrBottleneck Bandwidth and RTT-based

All algorithms support HyStart++ (RFC 9406) for improved slow start exit.

States

  • Slow Start: Exponential growth until ssthresh (with HyStart++ exit)
  • Congestion Avoidance: Linear growth (NewReno), cubic function (CUBIC), or pacing-based (BBR)
  • Recovery: After packet loss detection

Key Variables

VariableDescription
cwndCongestion window (bytes)
ssthreshSlow start threshold
bytes_in_flightUnacknowledged bytes
pacing_rateSend rate limit (BBR)

NewReno (Default)

On ACK:

if bytes_acked > 0:
    if cwnd < ssthresh:
        cwnd += bytes_acked  # Slow start
    else:
        cwnd += (bytes_acked * max_datagram_size) / cwnd  # Congestion avoidance

On Loss:

ssthresh = max(cwnd / 2, 2 * max_datagram_size)
cwnd = ssthresh

CUBIC (RFC 9438)

Uses a cubic function for window growth that is more aggressive in high-bandwidth networks:

W(t) = C * (t - K)^3 + W_max

Where K = cbrt(W_max * beta / C) and C = 0.4, beta = 0.7.

BBR

BBR probes for maximum bandwidth and minimum RTT:

  1. Startup: Doubles sending rate each RTT until bandwidth plateaus
  2. Drain: Reduces inflight to match BDP
  3. ProbeBW: Cycles through pacing gain phases
  4. ProbeRTT: Periodically reduces cwnd to measure min RTT

Packet Pacing

All algorithms use packet pacing (RFC 9002 Section 7.7) to prevent bursts:

  • Packets are spaced based on pacing rate
  • Prevents buffer bloat at bottleneck links
  • Improves throughput on lossy networks

Loss Detection

Packet Loss Detection

Two mechanisms detect packet loss:

  1. Packet Threshold: A packet is lost if a later packet in the same number space has been acknowledged and the gap exceeds the threshold (default: 3).

  2. Time Threshold: A packet is lost if it was sent more than max(smoothed_rtt + max(4 * rtt_var, 1ms), 1.125 * smoothed_rtt) before the largest acknowledged packet.

Probe Timeout (PTO)

When no ACKs are received, PTO triggers retransmission:

PTO = smoothed_rtt + max(4 * rtt_var, 1ms) + max_ack_delay

After PTO expires:

  1. Send 1-2 ack-eliciting packets
  2. Double PTO for next timeout
  3. After persistent congestion, reset to slow start

Connection Migration

Path Validation

When the peer's address changes:

  1. Send PATH_CHALLENGE frame with random 8-byte data
  2. Peer responds with PATH_RESPONSE containing same data
  3. Path is validated upon receiving correct response

Migration Process

  1. Detect new remote address
  2. Initiate path validation
  3. Continue using old path until validation completes
  4. Switch to new path upon successful validation
  5. Reset congestion controller for new path

Key Update

RFC 9001 Section 6 defines key update mechanism:

  1. Sender increments key phase bit in packet header
  2. Derives new keys from current application secrets
  3. Receiver detects phase change and derives matching keys
  4. Old keys retained briefly for reordered packets
next_app_secret = HKDF-Expand-Label(current_app_secret, "quic ku", "", hash_len)

Connection ID Management

Local CID Pool

  • Generate and issue CIDs via NEW_CONNECTION_ID frame
  • Track sequence numbers and stateless reset tokens
  • Retire old CIDs via RETIRE_CONNECTION_ID

Peer CID Pool

  • Store CIDs received from peer
  • Select appropriate CID for path
  • Respect active_connection_id_limit transport parameter

Session Resumption and 0-RTT

Session Tickets

After a successful handshake, the server may issue a NewSessionTicket:

  1. Server sends ticket after handshake completion
  2. Client receives via {session_ticket, Ticket} message
  3. Client stores ticket for future connections
  4. Ticket contains PSK identity, resumption secret, and max_early_data

Resumption Flow

Client                                  Server
                                         
   Initial[ClientHello+PSK] 
   0-RTT[Early Data]  (optional)
                                         
   Initial[ServerHello+PSK] 
   Handshake[EncryptedExtensions] 
   Handshake[Finished] 
                                         
   Handshake[Finished] 
                                         
   1-RTT[Application Data] 

0-RTT Early Data

When resuming with a stored ticket:

  1. Client derives early_keys from PSK
  2. Client sends 0-RTT packets (type 0x01) immediately
  3. Server validates PSK and derives matching keys
  4. Server processes early data or rejects with early_data_rejected
  5. Client falls back to 1-RTT if early data rejected

Limitations:

  • 0-RTT data is not forward-secret
  • Max early data size is limited by ticket's max_early_data_size
  • Replay protection is application-layer responsibility

DATAGRAM Extension (RFC 9221)

Overview

DATAGRAM frames provide unreliable message delivery:

  • Not retransmitted on loss
  • No flow control (use connection-level limits)
  • Useful for latency-sensitive data (gaming, real-time media)

Transport Parameter Negotiation

Datagram support is negotiated via the max_datagram_frame_size transport parameter:

ValueMeaning
0 or absentDatagrams not supported (default)
1-65535Maximum datagram payload size accepted

Both endpoints must advertise support for datagrams to work. The effective maximum size is the minimum of what the peer advertises and what fits in a QUIC packet (path MTU minus overhead).

Configuration

%% Enable datagram support (client)
quic:connect(Host, Port, #{
    max_datagram_frame_size => 65535  % Recommended: accept any size
}, Owner).

%% Enable datagram support (server)
quic:start_server(my_server, Port, #{
    max_datagram_frame_size => 65535,
    cert => Cert,
    key => Key
}).

API

%% Check if datagrams are supported and get max size
MaxSize = quic:datagram_max_size(ConnRef).
%% Returns 0 if datagrams not supported, otherwise peer's max size

%% Send a datagram (unreliable)
ok = quic:send_datagram(ConnRef, Data).
%% Returns {error, datagrams_not_supported} if peer didn't advertise
%% Returns {error, datagram_too_large} if Data exceeds peer's limit
%% Returns {error, congestion_limited} if cwnd is full (datagram dropped)

%% Receive datagrams (owner process)
receive
    {quic, ConnRef, {datagram, Data}} -> handle_datagram(Data)
end.

Error Handling

ErrorCause
{error, datagrams_not_supported}Peer didn't advertise max_datagram_frame_size
{error, datagram_too_large}Data exceeds peer's advertised limit
{error, congestion_limited}Congestion window full (datagram dropped)

Protocol Violations

Per RFC 9221, the following result in PROTOCOL_VIOLATION connection close:

  • Receiving a DATAGRAM frame when we didn't advertise support
  • Receiving a DATAGRAM frame larger than our advertised limit

Active Migration

Triggering Migration

The quic:migrate/1 API triggers active connection migration:

  1. Application calls quic:migrate(ConnRef)
  2. Connection rebinds to new local socket
  3. PATH_CHALLENGE sent to peer on new path
  4. On PATH_RESPONSE: migration complete
  5. Congestion controller reset for new path

Use Cases

  • Network handover (WiFi to cellular)
  • NAT rebinding recovery
  • Load balancing