distribute/retry
Retry policy with exponential backoff and jitter.
This module provides type-safe retry logic for distributed operations. It implements industry best practices for handling transient failures in distributed systems.
Jitter Strategies
Jitter is crucial to prevent the “thundering herd” problem where many clients retry simultaneously after a failure. This module supports multiple jitter strategies as recommended by AWS and Google Cloud.
NoJitter: Deterministic exponential backoff (not recommended)FullJitter:random(0, delay)- Best for reducing contentionEqualJitter:delay/2 + random(0, delay/2)- Balanced approachDecorrelatedJitter:random(base, prev_delay * 3)- Good for APIs
References
- AWS: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
- Google Cloud: https://cloud.google.com/storage/docs/retry-strategy
Example
import distribute/retry
// Create a policy with full jitter (recommended)
let policy = retry.default_with_jitter()
// Calculate delay for attempt 3
let delay = retry.calculate_delay(policy, 3)
// delay will be random(0, min(5000, 100 * 2^2)) = random(0, 400)
Types
Result of a delay calculation, includes metadata for observability.
pub type DelayResult {
DelayResult(
delay_ms: Int,
base_delay_ms: Int,
attempt: Int,
is_final_attempt: Bool,
)
}
Constructors
-
DelayResult( delay_ms: Int, base_delay_ms: Int, attempt: Int, is_final_attempt: Bool, )Arguments
- delay_ms
-
The actual delay to use (in milliseconds)
- base_delay_ms
-
The base delay before jitter was applied
- attempt
-
Current attempt number (1-indexed)
- is_final_attempt
-
Whether this is the last attempt
Jitter strategy for randomizing retry delays.
Jitter helps prevent the “thundering herd” problem where many clients retry simultaneously after a shared failure.
pub type JitterStrategy {
NoJitter
FullJitter
EqualJitter
DecorrelatedJitter
}
Constructors
-
NoJitterNo jitter - deterministic delays. NOT recommended for production. Use only for testing or when deterministic behavior is required.
-
FullJitterFull jitter:
random(0, calculated_delay)Recommended for most use cases. Provides maximum spread of retries. Reference: AWS Architecture Blog -
EqualJitterEqual jitter:
calculated_delay/2 + random(0, calculated_delay/2)Guarantees minimum delay while still adding randomness. Good when you need both progress guarantee and jitter. -
DecorrelatedJitterDecorrelated jitter:
random(base_delay, previous_delay * 3)Each delay is based on the previous, creating natural variation. Good for API rate limiting scenarios.
Retry policy configuration.
Controls how failed operations are retried with exponential backoff and optional jitter for distributed systems.
Fields
max_attempts: Maximum number of attempts (1 = no retry)base_delay_ms: Initial delay in milliseconds before jittermax_delay_ms: Maximum delay to prevent runaway waitsmultiplier: Backoff multiplier (typically 2.0 for exponential)jitter: Jitter strategy for randomizing delays
Example
// Custom policy for critical operations
let policy = RetryPolicy(
max_attempts: 5,
base_delay_ms: 50,
max_delay_ms: 10_000,
multiplier: 2.0,
jitter: FullJitter,
)
pub type RetryPolicy {
RetryPolicy(
max_attempts: Int,
base_delay_ms: Int,
max_delay_ms: Int,
multiplier: Float,
jitter: JitterStrategy,
)
}
Constructors
-
RetryPolicy( max_attempts: Int, base_delay_ms: Int, max_delay_ms: Int, multiplier: Float, jitter: JitterStrategy, )
Values
pub fn aggressive() -> RetryPolicy
Aggressive retry policy for critical operations.
More retries with shorter base delays:
- 5 attempts total
- 50ms base delay
- 3000ms max delay
- Full jitter enabled
pub fn calculate_delay(
policy: RetryPolicy,
attempt: Int,
) -> DelayResult
Calculate the delay for a given attempt number.
Returns a DelayResult containing the delay in milliseconds and metadata.
The attempt number should be 1-indexed (first retry is attempt 1).
Algorithm
- Calculate base exponential delay:
base_delay * (multiplier ^ (attempt - 1)) - Cap at
max_delay_ms - Apply jitter strategy
Example
let policy = retry.default_with_jitter()
let result = retry.calculate_delay(policy, 1)
// result.delay_ms will be random(0, 100)
process.sleep(result.delay_ms)
pub fn conservative() -> RetryPolicy
Conservative retry policy for non-critical operations.
Fewer retries with longer delays:
- 2 attempts total
- 500ms base delay
- 10000ms max delay
- Full jitter enabled
pub fn default() -> RetryPolicy
Default retry policy without jitter.
Conservative settings suitable for general use:
- 3 attempts total
- 100ms base delay
- 5000ms max delay
- 2.0 multiplier (exponential)
- No jitter
For distributed systems, prefer default_with_jitter().
pub fn default_with_jitter() -> RetryPolicy
Default retry policy with full jitter (RECOMMENDED).
Same as default() but with FullJitter enabled.
This is the recommended policy for distributed systems to prevent
thundering herd problems.
Delay progression (example, actual values are randomized):
- Attempt 1: random(0, 100ms)
- Attempt 2: random(0, 200ms)
- Attempt 3: random(0, 400ms)
pub fn delay_ms(policy: RetryPolicy, attempt: Int) -> Int
Get just the delay value in milliseconds.
Convenience function when you don’t need the full DelayResult.
let delay = retry.delay_ms(policy, attempt)
process.sleep(delay)
pub fn is_final_attempt(
policy: RetryPolicy,
attempt: Int,
) -> Bool
Check if this is the final attempt.
case retry.is_final_attempt(policy, attempt) {
True -> log.error("Final attempt, no more retries")
False -> log.warn("Retrying...")
}
pub fn jitter_to_string(jitter: JitterStrategy) -> String
Convert jitter strategy to string for logging.
pub fn no_retry() -> RetryPolicy
No retry policy (single attempt only).
Use when retry is handled at a different layer or for operations that should not be retried (e.g., non-idempotent operations).
pub fn policy_to_string(policy: RetryPolicy) -> String
Convert retry policy to a loggable string representation.
pub fn should_retry(policy: RetryPolicy, attempt: Int) -> Bool
Check if we should retry after this attempt.
Returns True if attempt < max_attempts.
case retry.should_retry(policy, attempt) {
True -> {
process.sleep(retry.delay_ms(policy, attempt))
try_operation(attempt + 1)
}
False -> Error(MaxRetriesExceeded)
}
pub fn total_attempts(policy: RetryPolicy) -> Int
Get the total number of attempts that will be made.
let total = retry.total_attempts(policy)
log.info("Will try up to " <> int.to_string(total) <> " times")
pub fn with_base_delay_ms(
policy: RetryPolicy,
delay_ms: Int,
) -> RetryPolicy
Set the base delay in milliseconds.
retry.default()
|> retry.with_base_delay_ms(200)
pub fn with_full_jitter(policy: RetryPolicy) -> RetryPolicy
Enable full jitter (convenience method).
retry.default()
|> retry.with_full_jitter()
pub fn with_jitter(
policy: RetryPolicy,
jitter: JitterStrategy,
) -> RetryPolicy
Set the jitter strategy.
retry.default()
|> retry.with_jitter(retry.FullJitter)
pub fn with_max_attempts(
policy: RetryPolicy,
attempts: Int,
) -> RetryPolicy
Set the maximum number of retry attempts.
retry.default()
|> retry.with_max_attempts(5)
pub fn with_max_delay_ms(
policy: RetryPolicy,
delay_ms: Int,
) -> RetryPolicy
Set the maximum delay cap in milliseconds.
retry.default()
|> retry.with_max_delay_ms(10_000)
pub fn with_multiplier(
policy: RetryPolicy,
multiplier: Float,
) -> RetryPolicy
Set the backoff multiplier.
retry.default()
|> retry.with_multiplier(1.5) // Slower growth