Replicas and dynamic repositories
When applications reach a certain scale, a single database may not be enough to sustain the required throughput. In such scenarios, it is very common to introduce read replicas: all write operations are sent to primary database and most of the read operations are performed against the replicas. The credentials of the primary and replica databases are typically known upfront by the time the code is compiled.
In other cases, you may need a single Ecto repository to interact with different database instances which are not known upfront. For instance, you may need to communicate with hundreds of databases very sporadically, so instead of opening up a connection to each of those hundreds of databases when your application starts, you want to quickly start a connection, perform some queries, and then shut down, while still leveraging Ecto's APIs as a whole.
This guide will cover how to tackle both approaches.
Primary and Replicas
Since the credentials of the primary and replicas databases are known upfront, adding support for primary and replica databases in your Ecto application is relatively straightforward. Imagine you have a MyApp.Repo
and you want to add four read replicas. This could be done in three steps.
First, define the primary and replicas repositories in lib/my_app/repo.ex
:
defmodule MyApp.Repo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres
@replicas [
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
def replica do
Enum.random(@replicas)
end
for repo <- @replicas do
defmodule repo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres,
read_only: true
end
end
end
The code above defines a regular MyApp.Repo
and four replicas, called MyApp.Repo.Replica1
up to MyApp.Repo.Replica4
. We pass the :read_only
option to the replica repositories, so operations such as insert
, update
and friends are not made accessible. We also define a function called replica
with the purpose of returning a random replica.
Next we need to make sure both primary and replicas are configured properly in your config/config.exs
files. In development and test, you can likely use the same database credentials for all repositories, all pointing to the same database address:
replicas = [
MyApp.Repo,
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
for repo <- replicas do
config :my_app, repo,
username: "postgres",
password: "postgres",
database: "my_app_prod",
hostname: "localhost",
pool_size: 10
end
In production, you want each database to connect to a different hostname:
repos = %{
MyApp.Repo => "prod-primary",
MyApp.Repo.Replica1 => "prod-replica-1",
MyApp.Repo.Replica2 => "prod-replica-2",
MyApp.Repo.Replica3 => "prod-replica-3",
MyApp.Repo.Replica4 => "prod-replica-4"
}
for {repo, hostname} <- repos do
config :my_app, repo,
username: "postgres",
password: "postgres",
database: "my_app_prod",
hostname: hostname,
pool_size: 10
end
Finally, make sure to start all repositories in your supervision tree:
children = [
MyApp.Repo,
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
Now that all repositories are configured, we can safely use them in your application code. Every time you are performing a read operation, you can call the replica/0
function that we have added to return a random replica we will send the query to:
MyApp.Repo.replica().all(query)
And now you are ready to work with primary and replicas, no hacks or complex dependencies required!
Testing replicas
While all of the work we have done so far should fully work in development and production, it may not be enough for tests. Most developers testing Ecto applications are using a sandbox, such as the Ecto SQL Sandbox.
When using a sandbox, each of your tests run in an isolated and independent transaction. Once the test is done, the transaction is rolled back. Which means we can trivially revert all of the changes done in a test in a very performant way.
Unfortunately, even if you configure your primary and replicas to have the same credentials and point to the same hostname, each Ecto repository will open up their own pool of database connections. This means that, once you move to a primary + replicas setup, a simple test like this one won't pass:
user = Repo.insert!(%User{name: "jane doe"})
assert Repo.replica().get!(User, user.id)
That's because Repo.insert!
will write to one database connection and the repository returned by Repo.replica()
will perform the read in another connection. Since the write is done in transaction, its contents won't be available to other connections until the transaction commits, which will never happen for test connections.
There are two options to tackle this problem: one is to change replicas and the other is to use dynamic repos.
A custom replica
definition
One simple solution to the problem above is to use a custom replica
implementation during tests that always return the primary repository, like this:
if Mix.env() == :test do
def replica, do: __MODULE__
else
def replica, do: Enum.random(@replicas)
end
Now during tests, the replica will always return the repository primary repository itself. While this approach works fine, it has the downside that, if you accidentally invoke a write function in a replica, the test will pass, since the replica
function is returning the primary repo, while the code will fail in production.
Using :default_dynamic_repo
Another approach to testing is to set the :default_dynamic_repo
option when defining the repository. Let's see what we mean by that.
When you list a repository in your supervision tree, such as MyApp.Repo
, behind the scenes it will start a supervision tree with a process named MyApp.Repo
. By default, the process has the same name as the repository module itself. Now every time you invoke a function in MyApp.Repo
, such as MyApp.Repo.insert/2
, Ecto will use the connection pool from the process named MyApp.Repo
.
From v3.0, Ecto has the ability to start multiple processes from the same repository. The only requirement is that they must have different process names, like this:
children = [
MyApp.Repo,
{MyApp.Repo, name: :another_instance_of_repo}
]
While the particular example doesn't make much sense (we will cover an actual use case for this feature next), the idea is that now you have two repositories running: one is named MyApp.Repo
and the other one is named :another_instance_of_repo
. Each of those processes have their own connection pool. You can tell Ecto which process you want to use in your repo operations by calling:
MyApp.Repo.put_dynamic_repo(MyApp.Repo)
MyApp.Repo.put_dynamic_repo(:another_instance_of_repo)
Once you call MyApp.Repo.put_dynamic_repo(name)
, all invocations made on MyApp.Repo
will use the connection pool denoted by name
.
How can this help with our replica tests? If we look back to the supervision tree we defined earlier in this guide, you will find this:
children = [
MyApp.Repo,
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
We are starting five different repositories and five different connection pools. Since we want the replica repositories to use the MyApp.Repo
, we can achieve this by doing the following on the setup of each test:
@replicas [
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
setup do
for replica <- @replicas do
replica.put_dynamic_repo(MyApp.Repo)
end
:ok
end
Note put_dynamic_repo
is per process. So every time you spawn a new process, the dynamic_repo
value will reset to its default until you call put_dynamic_repo
again.
Luckily, there is even a better way! We can pass a :default_dynamic_repo
option when we define the repository. In this case, we want to set the :default_dynamic_repo
to MyApp.Repo
only during the test environment. In your lib/my_app/repo.ex
, do this:
for repo <- @replicas do
default_dynamic_repo =
if Mix.env() == :test do
MyApp.Repo
else
repo
end
defmodule repo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres,
read_only: true,
default_dynamic_repo: default_dynamic_repo
end
end
And now your tests should work as before, while still being able to detect if you accidentally perform a write operation in a replica.
Dynamic repositories
At this point, we have learned that Ecto allows you to start multiple connections based on the same repository. This is typically useful when you have to connect multiple databases or perform short-lived database connections.
For example, you can start a repository with a given set of credentials dynamically, like this:
MyApp.Repo.start_link(
name: :some_client,
hostname: "client.example.com",
username: "...",
password: "...",
pool_size: 1
)
In other words, start_link
accepts the same options as the database configuration. Now let's do a query on the dynamically started repository. If you attempt to simply perform MyApp.Repo.all(Post)
, it may fail, as by default it will try to use a process named MyApp.Repo
, which may or may not be running. So don't forget to call put_dynamic_repo/1
before:
MyApp.Repo.put_dynamic_repo(:some_client)
MyApp.Repo.all(Post)
Ecto also allows you to start a repository with no name (just like that famous horse). In such cases, you need to explicitly pass name: nil
and match on the result of MyApp.Repo.start_link/1
to retrieve the PID, which should be given to put_dynamic_repo
. Let's also use this opportunity and perform proper database clean-up, by shutting up the new repository and reverting the value of put_dynamic_repo
:
default_dynamic_repo = MyApp.Repo.get_dynamic_repo()
{:ok, repo} =
MyApp.Repo.start_link(
name: nil,
hostname: "client.example.com",
username: "...",
password: "...",
pool_size: 1
)
try do
MyApp.Repo.put_dynamic_repo(repo)
MyApp.Repo.all(Post)
after
MyApp.Repo.put_dynamic_repo(default_dynamic_repo)
Supervisor.stop(repo)
end
We can encapsulate all of this in a function too, which you could define in your repository:
defmodule MyApp.Repo do
use Ecto.Repo, ...
def with_dynamic_repo(credentials, callback) do
default_dynamic_repo = get_dynamic_repo()
start_opts = [name: nil, pool_size: 1] ++ credentials
{:ok, repo} = MyApp.Repo.start_link(start_opts)
try do
MyApp.Repo.put_dynamic_repo(repo)
callback.()
after
MyApp.Repo.put_dynamic_repo(default_dynamic_repo)
Supervisor.stop(repo)
end
end
end
And now use it as:
credentials = [
hostname: "client.example.com",
username: "...",
password: "..."
]
MyApp.Repo.with_dynamic_repo(credentials, fn ->
MyApp.Repo.all(Post)
end)
And that's it! Now you can have dynamic connections, all properly encapsulated in a single function and built on top of the dynamic repo API.