ETS-based state management for rate limiting.
Tracks per-model/location/metric state including:
retry_untiltimestamps derived from 429 RetryInfo- Token usage sliding windows for budget estimation
- Concurrency permits for gating
State is keyed by {model, location, metric} tuples for fine-grained tracking.
Summary
Functions
Build a state key from model, location, and metric.
Clear the retry state for a key (called after successful request).
Get current usage within the sliding window.
Get the current retry state details for a key.
Get the current retry_until timestamp for a given key.
Initialize the ETS table for state storage.
Reconcile a reservation with actual usage, returning surplus or charging shortfall.
Record token usage in the sliding window.
Remove a reservation without adding usage (e.g., when the request never executed).
Reset all state (useful for testing).
Update the retry_until state from a 429 response with RetryInfo.
Atomically reserve tokens in the current window.
Types
@type reservation_ctx() :: %{ reserved_tokens: non_neg_integer(), estimated_tokens: non_neg_integer(), window_start: DateTime.t() | nil, window_end: DateTime.t() | nil, budget: non_neg_integer() | nil }
@type retry_state() :: %{ retry_until: DateTime.t() | nil, quota_metric: String.t() | nil, quota_id: String.t() | nil, quota_dimensions: map() | nil, quota_value: term() | nil, last_429_at: DateTime.t() | nil }
@type usage_window() :: %{ input_tokens: non_neg_integer(), output_tokens: non_neg_integer(), reserved_tokens: non_neg_integer(), window_start: DateTime.t(), window_duration_ms: pos_integer() }
Functions
Build a state key from model, location, and metric.
@spec clear_retry_state(state_key()) :: :ok
Clear the retry state for a key (called after successful request).
@spec get_current_usage(state_key()) :: usage_window() | nil
Get current usage within the sliding window.
@spec get_retry_state(state_key()) :: retry_state() | nil
Get the current retry state details for a key.
@spec get_retry_until(state_key()) :: DateTime.t() | nil
Get the current retry_until timestamp for a given key.
Returns nil if no retry is needed or the timestamp has passed.
@spec init() :: :ok
Initialize the ETS table for state storage.
Called automatically when the RateLimitManager starts, but also lazily initialized on first access to support direct calls without the supervisor running.
@spec reconcile_reservation(state_key(), reservation_ctx(), map() | nil, keyword()) :: usage_window()
Reconcile a reservation with actual usage, returning surplus or charging shortfall.
@spec record_usage(state_key(), non_neg_integer(), non_neg_integer(), keyword()) :: :ok
Record token usage in the sliding window.
Parameters
key- State key tupleinput_tokens- Number of input tokens usedoutput_tokens- Number of output tokens usedopts- Options including::window_duration_ms- Custom window duration (default: 60_000)
@spec release_reservation(state_key(), reservation_ctx(), keyword()) :: usage_window()
Remove a reservation without adding usage (e.g., when the request never executed).
@spec reset_all() :: :ok
Reset all state (useful for testing).
Update the retry_until state from a 429 response with RetryInfo.
Parameters
key- State key tupleretry_info- Map containing retry delay and quota information
RetryInfo format from Gemini API
%{
"retryDelay" => "60s",
"quotaMetric" => "...",
"quotaId" => "...",
"quotaDimensions" => %{...}
}
@spec try_reserve_budget( state_key(), non_neg_integer(), non_neg_integer() | nil, keyword() ) :: {:ok, reservation_ctx()} | {:error, {:over_budget, map()}}
Atomically reserve tokens in the current window.
Returns {:ok, reservation_ctx} when the reservation fits, or
{:error, {:over_budget, details}} when it would exceed the configured budget.