View Source Nostrum.Api.Ratelimiter (Nostrum v0.9.0)

Handles REST calls to the Discord API while respecting ratelimits.

Purpose

Discord's API returns information about ratelimits that we must respect. This module performs serialization of these requests through a single process, thus preventing concurrency issues from arising if two processes make a remote API call at the same time.

Internal module

This module is intended for exclusive usage inside of nostrum, and is documented for completeness and people curious to look behind the covers.

Asynchronous requests

The ratelimiter is fully asynchronous internally. In theory, it also supports queueing requests in an asynchronous manner. However, support for this is currently not implemented in Nostrum.Api.

If you want to make one or multiple asynchronous requests manually, you can use the following pattern:

req = :gen_statem.send_request(Nostrum.Api.Ratelimiter, {:queue, request})
# ...
response = :gen_statem.receive_response(req, timeout)

where request is a map describing the request to run - see Nostrum.Api for more information. You can also send multiple requests at the same time and wait for their response: see :gen_statem.reqids_add/3 and :gen_statem.wait_response/3 for more information.

Multi-node

If a single global process is desired to handle all ratelimiting, the ratelimiter can theoretically be adjusted to start registered via :global. In practice, it may be more beneficial to have a local ratelimiter process on each node and either using the local one for any API calls, or using a consistent hash mechanism to distribute API requests around the cluster as needed. Do note that the API enforces a global user ratelimit across all requests. With a single process, the ratelimiter can track this without hitting 429s at all, with multiple ratelimiters, the built-in requeue functionality may or may not help.

Inner workings

When a client process wants to perform some request on the Discord API, it sends a request to the :gen_statem behind this module to ask it to :queue the incoming request.

Connection setup

If the state machine is not connected to the HTTP endpoint, it will transition to the :connecting state and try to open the connection. If this succeeds, it transitions to the :connected state.

Queueing requests

The state machine associates a :queue.queue/1 of queued_request/0 to each individual bucket, together with an internal count of remaining calls. When queueing requests, the following cases occur:

If there are no remaining calls in the bot's global ratelimit bucket or there are no remaining calls in the bucket, the request is put into the bucket's queue.
If there is an :initial running request to the bucket, the request is put into the bucket's queue.
If there are more than 0 remaining calls on both the request-specific bucket and the global bucket, the request is started right away. This allows nostrum to dispatch multiple requests to the same endpoint as soon as possible as long as calls remain.
If no ratelimit information is known for the bucket and remaining calls on the global bucket, the request is sent out as the "pioneer" request that will retrieve how many calls we have for this bucket (:initial, see above).
If none of the above is true, a new queue is created and the pending rqeuest marked as the :initial request. It will be run as soon as the bot's global limit limit expires.

The request starting function, :next, will start new requests from the queue as long as more calls are possible in the timeframe. Any requests are then started asynchronously. Bookkeeping is set up to associate the resulting :gun.stream_ref/0 with the original client along with its request and the ratelimiter bucket.

Results from the HTTP connection are delivered non-blocking: simple responses with purely status codes and no body (code 204) will be sent in a single message, other requests will be sent to us incrementally. To finally deliver the full response body to the client with the final package, an internal buffer of the body is kept. A possible future optimization could be having a way for :gun to only send the ratelimiter state machine the initial :gun_response and forward any item of the body directly to the client.

When the headers for a request have been received, the ratelimiter parses the ratelimit information and starts off an internal timer expiring when the ratelimits expire. It will also reschedule calls with the :next internal event for as many remaining calls as it knows about. Once the timer expires for the current bucket, two cases can happen:

The queue has items: Schedule all items and repeat this later.
The queue is empty: Delete the queue and remaining calls from the outstanding buckets.

In practice, this means that we never store more information than we need, and removes the previous regular bucket sweeping functionality that the ratelimit buckets required.

Global ratelimits (note this is a distinct ratelimit from the bot's "global", per-user ratelimit) are handled with the special global_limit state. This state is entered for exactly the the X-Ratelimit-Reset-After time provided in the global ratelimit response. This state does nothing apart from postponing any events it receives and returning to the previous state (:connected) once the global timeout is gone. Requests that failed because of the global ratelimit are requeued after returning back into the regular state: a warning is logged to inform you of this.

Failure modes

HTTP connection death

If the HTTP connection dies, the ratelimiter will inform each affected client by replying with {:error, {:connection_died, reason}}, where reason is the reason as provided by the :gun_down event. It will then transition to :disconnected state. If no requests were running at time the connection was shut down - for instance, because we simply reached the maximum idle time on the HTTP/2 connection - we will simply move on.

Upstream errors

The ratelimiter works by queueing requests aggressively as soon as it has ratelimit information to do so. If no ratelimit information is available, for instance, because Discord returned us a 502 status code, the ratelimiter will not automatically kick the queue to start further running requests.

Other internal issues

Any other internal problems that are not handled appropriately in the ratelimiter will crash it, effectively resulting in the complete loss of any queued requests.

Implementation benefits & drawbacks

A history of ratelimiting

First, it is important to give a short history of nostrum's ratelimiting: pre 0.8, nostrum used to use a GenServer that would call out to ETS tables to look up ratelimiting buckets for requests. If it needed to sleep before issuing a request due to the bucket being exhausted, it would do so in the server process and block other callers.

In nostrum 0.8, the existing ratelimiter bucket storage architecture was refactored to be based around the pluggable caching functionality, and buckets with no remaining calls were adjusted to be slept out on the client-side by having the GenServer respond to the client with {:error, {:retry_after, millis}} and the client trying again and again to schedule its requests. This allowed users to distribute their ratelimit buckets around however they wish, out of the box, nostrum shipped with an ETS and a Mnesia-based ratelimit bucket store.

Problems we solved

The approach above still came with a few problems:

Requests were still being done synchronously in the ratelimiter, and it was blocked from anything else whilst running the requests, even though we are theoretically free to start requests for other buckets while one is still running.
The ratelimiter itself was half working on its own, but half required the external storage mechanisms, which made the code hard to follow and required regular automatic pruning because the store had no idea when a bucket was no longer relevant on its own.
Requests would not be pipelined to run as soon as ideally possible.
The ratelimiter did not inform clients if their request died in-flight.
If the client disconnected before we returned the response, we had to handle this explicitly via handle_info.

The new state machine-based ratelimiter solves these problems.

Summary

Types

bucket()

A bucket for endpoints unter the same ratelimit.

queued_request()

A bucket-specific request waiting to be queued, alongside its client.

remaining()

Remaining calls on a route, as provided by the API response.

request()

A request to make in the ratelimiter.

state()

The state of the ratelimiter.

Functions

callback_mode()

Callback implementation for :gen_statem.callback_mode/0.

child_spec(opts)

code_change(version, state, data, extra)

Callback implementation for :gen_statem.code_change/4.

connected(arg1, request, data)

connecting(arg1, arg2, data)

disconnected(arg, arg2, data)

get_endpoint(route, method)

Retrieves a proper ratelimit endpoint from a given route and url.

global_limit(arg1, next, data)

init(list)

Callback implementation for :gen_statem.init/1.

queue(request)

Queue the given request and wait for the response synchronously.

start_link(opts)

Starts the ratelimiter.

Types

bucket()

(since 0.9.0)

@type bucket() :: String.t()

A bucket for endpoints unter the same ratelimit.

queued_request()

(since 0.9.0)

@type queued_request() :: {request(), client :: :gen_statem.from()}

A bucket-specific request waiting to be queued, alongside its client.

remaining()

(since 0.9.0)

@type remaining() :: non_neg_integer() | :initial

Remaining calls on a route, as provided by the API response.

The ratelimiter internally counts the remaining calls per route to dispatch new requests as soon as it's capable of doing so, but this is only possible if the API already provided us with ratelimit information for an endpoint.

Therefore, if the initial call on an endpoint is made, the special :initial value is specified. This is used by the limit parsing function to set the remaining calls if and only if it is the response for the initial call - otherwise, the value won't represent the truth anymore.

request()

(since 0.9.0)

@type request() :: %{
  method: :get | :post | :put | :delete,
  route: String.t(),
  body: iodata(),
  headers: [{String.t(), String.t()}],
  params: Enum.t()
}

A request to make in the ratelimiter.

state()

(since 0.9.0)

@type state() :: %{
  outstanding: %{
    required(bucket()) => {remaining(), :queue.queue(queued_request())}
  },
  running: %{
    required(:gun.stream_ref()) => {bucket(), request(), :gen_statem.from()}
  },
  inflight: %{
    required(:gun.stream_ref()) =>
      {status :: non_neg_integer(), headers :: [{String.t(), String.t()}],
       body :: String.t()}
  },
  conn: pid() | nil,
  remaining_in_window: non_neg_integer()
}

The state of the ratelimiter.

While this has no public use, it is still documented here to provide help when tracing the ratelimiter via :sys.trace/2 or other means.

Fields

:outstanding: Outstanding (unqueued) requests per bucket alongside with the remaining calls that may be made on said bucket.
:running: Requests that have been sent off. Used to associate back the client with a request when the response comes in.
:inflight: Requests for which we have started getting a response, but we have not fully received it yet. For responses that have a body, this will buffer their body until we can send it back to the client.
:conn: The :gun connection backing the server. Used for making new requests, and updated as the state changes.
:remaining_in_window: How many calls we may still make to the API during this time window. Reset automatically via timeouts.

Functions

callback_mode()

Callback implementation for :gen_statem.callback_mode/0.

child_spec(opts)

code_change(version, state, data, extra)

Callback implementation for :gen_statem.code_change/4.

connected(arg1, request, data)

connecting(arg1, arg2, data)

disconnected(arg, arg2, data)

get_endpoint(route, method)

@spec get_endpoint(String.t(), String.t()) :: String.t()

Retrieves a proper ratelimit endpoint from a given route and url.

global_limit(arg1, next, data)

init(list)

Callback implementation for :gen_statem.init/1.

queue(request)

Queue the given request and wait for the response synchronously.

Ratelimits on the endpoint are handled by the ratelimiter. Global ratelimits will cause this to return an error.

start_link(opts)

@spec start_link([:gen_statem.start_opt()]) :: :gen_statem.start_ret()

Starts the ratelimiter.