View Source Problems with Retry
This document highlights suboptimal decisions made in the Retry library. The analysis is based on version 0.18.0, which is the latest version as of early 2025. Future versions of the library may address some or all of these issues. I will update this document accordingly when they do.
Info
I would like to point out that the Retry library was created almost a decade ago (version 0.1.0 released on May 4, 2016), and there was no clear consensus then on many things that seem obvious today.
Retry conditions
Retry logic is typically implemented when making network calls - to databases, Redis, or external HTTP services. Possible responses from such calls can be categorized into three types:
- Service received the request, understood it, and executed it
- Service received the request but either didn't understand it or refused to execute it
- Service never received the request
Only the first response can be considered successful, which is usually represented in Elixir as :ok
or {:ok, result}
. The other two categories are represented as {:error, reason}
, where programmers must to examine the reason
field to determine the specific category.
Here's a real example from the command/3
function in Redix library:
The return value is
{:ok, response}
if the request is successful and the response is not a Redis error.{:error, reason}
is returned in case there's an error in the request (such as losing the connection to Redis in between the request).reason
can also be aRedix.Error
exception in case Redis is reachable but returns an error (such as a type error).
Here, {:ok, response}
is a successful result, {:error, %Redix.Error{}}
means that Redis received the request but refused to execute it, and all other {:error, reason}
cases indicate that Redis did not receive the request.
In the context of retries, it's important to note that it only makes sense to retry requests in the third category.
Retry doesn't provide an easy way to express this. This code will trigger retries even if the initial command is formed incorrectly:
retry with: delays do
Redix.command(conn, command)
end
A working approach for Retry requires converting the relevant error category in the do
block to a separate one and configuring the library to retry only that category:
retry with: delays, atoms: [:retryable] do
case Redix.command(conn, command) do
{:error, reason} when not is_struct(reason, Redix.Error) ->
{:retryable, reason}
result ->
result
end
else
{:retryable, reason} -> {:error, reason}
end
With OnceMore, programmers can precisely configure which errors should trigger retries:
OnceMore.retry(
fn -> Redix.command(conn, command) end,
&match?({:error, reason} when not is_struct(reason, Redix.Error), &1),
delays
)
Exception handling
In addition to atoms and tuples, Retry can trigger retries based on exception types raised by the specified block of code.
... if the block raises any of the exceptions specified in
rescue_only
, a retry will be attempted. Other exceptions will not be retried. Ifrescue_only
is not specified, it defaults to[RuntimeError]
.
from Retry docs
Using exceptions to control program flow can be considered an anti-pattern (although the anti-pattern wording doesn't definitively classify Retry as such).
Retry's exception handling implementation has several specific issues.
Rescue RuntimeError as default
Using RuntimeError
as the default is problematic because this exception is typically used as a runtime assertion. For example, code could validate its configuration and raise this exception if it detects an error:
def call_service(opts) do
if opts[:option_1] && opts[:option_2] do
raise "Passing :option_1 and :option_2 together is invalid"
end
end
Retrying this code with the same configuration is pointless because the result won't change.
To avoid this behavior, programmers need to pass an empty list to the rescue_only
option.
Loss of stacktrace
Retry "loses" the stacktrace for exceptions listed in rescue_only
.
For example, this code:
Mix.install([:retry])
defmodule M do
use Retry
def run do
retry with: [100] do
call_service()
end
end
defp call_service do
if :erlang.phash2(1, 1) == 0 do
raise "oops!"
end
end
end
M.run()
will return an incomplete stacktrace
$ elixir script.exs
** (RuntimeError) oops!
script.exs:7: M.run/0
script.exs:19: (file)
Using rescue_only: []
allows getting the full stacktrace:
$ elixir script.exs
** (RuntimeError) oops!
script.exs:14: M.call_service/0
script.exs:8: anonymous fn/0 in M.run/0
(elixir 1.18.1) lib/enum.ex:4964: Enumerable.List.reduce/3
(elixir 1.18.1) lib/stream.ex:1041: Stream.do_transform_inner_list/7
(elixir 1.18.1) lib/enum.ex:2600: Enum.reduce_while/3
script.exs:7: M.run/0
script.exs:19: (file)
Macro usage
The Retry library uses macros for its operation, generating unexpectedly large amounts of code at macro call sites.
For example, you might write:
retry with: Stream.take(constant_backoff(), 10) do
Enum.random([:ok, :error])
end
And after expanding the retry/2
macro, you get:
fun = fn ->
try do
case Enum.random([:ok, :error]) do
{atom, _} = result ->
if atom in [:error] do
{:cont, result}
else
{:halt, result}
end
result ->
if is_atom(result) and result in [:error] do
{:cont, result}
else
{:halt, result}
end
end
rescue
e ->
if e.__struct__ in [RuntimeError] do
{:cont, {:exception, e}}
else
reraise e, __STACKTRACE__
end
end
end
(
delays = Stream.take(constant_backoff(), 10)
[0] |> Stream.concat(delays)
)
|> Enum.reduce_while(nil, fn delay, _last_result ->
:timer.sleep(delay)
fun.()
end)
|> case do
{:exception, e} ->
case e do
e when is_exception(e) -> raise e
e -> e
end
e = {atom, _} when atom in [:error] ->
case e do
e when is_exception(e) -> raise e
e -> e
end
e when is_atom(e) and e in [:error] ->
case e do
e when is_exception(e) -> raise e
e -> e
end
result ->
case result do
result -> result
end
end
While 50 additional lines of code is not a significant concern, it should be noted that this amount of code will be generated with each new retry/2
call. This can have a negative impact on project compilation time. Furthermore, each expanded version includes all possible error handling variants: :error
atoms, {:error, reason}
tuples, and exceptions. However, in practice each call likely works with only one error type.
Also, the retry/2
macro does nothing special that can't be done by a function call (as demonstrated by OnceMore), which means it is an anti-pattern.