Multi tenancy with foreign keys
In Multi tenancy with query prefixes, we have learned how to set up multi tenant applications by using separate query prefixes, known as DDL Schemas in PostgreSQL and MSSQL and simply a separate database in MySQL.
Each query prefix is isolated, having their own tables and data, which provides the security guarantees we need. On the other hand, such approach for multi tenancy may be too expensive, as each schema needs to be created, migrated, and versioned separately.
Therefore, some applications may prefer a cheaper mechanism for multi tenancy, by relying on foreign keys. The idea here is that most - if not all - resources in the system belong to a tenant. The tenant is typically an organization or a user and all resources have an org_id
(or user_id
) foreign key pointing directly to it.
In this guide, we will show how to leverage Ecto constructs to guarantee that all queries in your application are properly scoped to a chosen org_id
.
Adding org_id to read operations
The first step in our implementation is to make the repository aware of org_id
. We want to allow commands such as:
MyApp.Repo.all Post, org_id: 13
Where the repository will automatically scope all posts to the organization with ID=13
. We can achieve this with the Ecto.Repo.prepare_query/3
repository callback:
defmodule MyApp.Repo do
use Ecto.Repo, otp_app: :my_app
require Ecto.Query
def prepare_query(operation, query, opts) do
cond do
opts[:skip_org_id] || opts[:schema_migration] ->
{query, opts}
org_id = opts[:org_id] ->
{Ecto.Query.where(query, org_id: ^org_id), opts}
true ->
raise "expected org_id or skip_org_id to be set"
end
end
end
Now we can pass :org_id
to all
, update_all
, get
, preload
, etc. Generally speaking, to all READ/SELECT operations. Note we have intentionally made the :org_id
required, with the exception of two scenarios:
if you explicitly set
:skip_org_id
to true, it won't require and:org_id
. This reduces the odds of a developer forgetting to scope their queries, which can accidentally expose private data to other usersif the
:schema_migration
option is set. This means the repository operation was issued by Ecto itself when migrating our database and we don't want to apply anorg_id
to them
Still, setting the org_id
for every operation is cumbersome and error prone. We will be better served if all operations attempt to set an org_id
.
Setting org_id
by default
To make sure our read operations use the org_id
by default, we will make two additional changes to the repository.
First, we will store the org_id
in the process dictionary. The process dictionary is a storage that is exclusive to each process. For example, each test in your project runs in a separate process. Each request in a web application runs in a separate process too. Each of these processes have their own dictionary which we will store and read from. Let's add these functions:
defmodule MyApp.Repo do
...
@tenant_key {__MODULE__, :org_id}
def put_org_id(org_id) do
Process.put(@tenant_key, org_id)
end
def get_org_id() do
Process.get(@tenant_key)
end
end
We added two new functions. The first, put_org_id
, stores the organization id in the process dictionary. get_org_id
reads the value in the process dictionary.
You will want to call put_org_id
on every process before you use the repository. For example, on every request in a web application, as soon as you read the current organization from the request parameter or the session, you should call MyApp.Repo.put_org_id(params_org_id)
. In tests, you want to explicitly set the put_org_id
or pass the :org_id
option as in the previous section.
The second change we need to do is to set the org_id
as a default option on all repository operations. The value of org_id
will be precisely the value in the process dictionary. We can do so trivially by implementing the default_options
callback:
defmodule MyApp.Repo do
...
def default_options(_operation) do
[org_id: get_org_id()]
end
end
With these changes, we will always set the org_id
fields in our queries, unless we explicitly set skip_org_id: true
when calling the repository. However, we are not done! There are some techniques that we can use to tighten up our multi tenant support, especially in regards to associations.
Working with multi tenant associations
Let's expand our data domain little a bit.
So far we have assumed there is an organization schema. However, instead of naming its primary key id
, we will name it org_id
, so Repo.get(Org, org_id: 13)
just works:
defmodule MyApp.Organization do
use Ecto.Schema
@primary_key {:org_id, :id, autogenerate: true}
schema "orgs" do
field :name
timestamps()
end
end
Let's also say that you may have multiple posts in an organization and the posts themselves may have multiple comments:
defmodule MyApp.Post do
use Ecto.Schema
schema "posts" do
field :title
field :org_id, :integer
has_many :comments, MyApp.Comment
timestamps()
end
end
defmodule MyApp.Comment do
use Ecto.Schema
schema "comments" do
field :body
field :org_id, :integer
belongs_to :post, MyApp.Post
timestamps()
end
end
One thing to have in mind is that, our prepare_query
callback will apply to all queries, but it won't apply to joins inside the same query. Therefore, if you write this query:
MyApp.Repo.put_org_id(some_org_id)
MyApp.Repo.all(
from p in Post, join: c in assoc(p, :comments)
)
prepare_query
will apply the org_id
only to posts but not to the join
. While this may seem problematic, in practice it is not an issue, because when you insert posts and comments in the database, they will always have the same org_id
. If posts and comments do not have the same org_id
, then this is actually a bug in our data: the data either got corrupted or there is a bug in our software when inserting data.
Luckily, we can leverage database's foreign keys to guarantee that the org_id
s always match between posts and comments. Our first stab at defining these schema migrations would look like this:
create table(:orgs, primary_key: false) do
add :org_id, :bigserial, primary_key: true
add :name, :string
timestamps()
end
create table(:posts) do
add :title, :string
add :org_id,
references(:orgs, column: :org_id),
null: false
timestamps()
end
create table(:comments) do
add :body, :string
add :org_id, references(:orgs), null: false
add :post_id, references(:posts), null: false
timestamps()
end
So far the only noteworthy change compared to a regular migration is the primary_key: false
option to the :orgs
table, as we want to mirror the primary key of org_id
given to the schema. While the schema above works and guarantees that posts references an existing organization and that comments references existing posts and organizations, it does not guarantee that all posts and their related comments belong to the same organization.
We can tighten up this requirement by using composite foreign keys with the following changes:
create unique_index(:posts, [:id, :org_id])
create table(:comments) do
add :body, :string
# There is no need to define a reference for org_id
add :org_id, :integer, null: false
# Instead define a composite foreign key
add :post_id,
references(:posts, with: [org_id: :org_id]),
null: false
timestamps()
end
Instead of defining both post_id
and org_id
as individual foreign keys, we define org_id
as a regular integer and then we define post_id+org_id
as a composite foreign key by passing the :with
option to Ecto.Migration.references/2
. This makes sure comments point to posts which point to orgs, where all org_id
s match.
Given composite foreign keys require the references keys to be unique, we also defined a unique index on the posts table before we defined the composite foreign key.
If you are using PostgreSQL and you want to tighten these guarantees even further, you can pass the match: :full
option to references
:
references(:posts, with: [org_id: :org_id], match: :full)
which will help enforce none of the columns in the foreign key can be nil.
Summary
With these changes, we are enforcing our data is valid at the database level, while we have changed our repository interface to guarantee our queries are always scoped to an org_id
, unless we explicitly opt out.
When it comes to associations, you will want to apply composite foreign keys whenever possible. For example, imagine comments belongs to posts (which belong to an organization) and also to user (which belong to an organization). The comments schema migration should be defined like this:
create table(:comments) do
add :body, :string
add :org_id, :integer, null: false
add :post_id,
references(:posts, with: [org_id: :org_id]),
null: false
add :user_id,
references(:users, with: [org_id: :org_id]),
null: false
timestamps()
end
As long as all schemas have an org_id
, all operations will be safely contained by the current tenant.
If by any chance you have schemas that are not tied to an org_id
, you can even consider keeping them in a separate query prefix or in a separate database altogether, so you keep non-tenant data completely separated from tenant-specific data.