View Source Batching Resolution

Avoiding N+1 Queries

In general, you want to make sure that when accessing Ecto associations that you preload the data in the top level resolver functions to avoid N+1 queries.

Imagine this scenario: You have posts and users. A Post has an author field, which returns a user. You want to list all posts, and get the name of their author:

{
  posts {
    author {
      name
    }
  }
}

If you write your schema like this, you're going to have a bad time due to issues with N + 1:

object :post do
  @desc "Author of the post"
  field :author, :user do
    resolve fn post, _, _ ->
      author =
        post
        |> Ecto.assoc(:author)
        |> Repo.one

      {:ok, author}
    end
  end
end

query do
  field :posts, list_of(:post) do
    resolve fn _, _ ->
      {:ok, Post |> Repo.all}
    end
  end
end

What this schema will do when presented with the GraphQL query is run Post |> Repo.all, which will retrieve N posts. Then for each post it will resolve child fields, which runs our Repo.one query function, resulting in N+1 calls to the database.

One way to handle this issue is with Absinthe's support for batching. The idea with batching is that we're gonna aggregate all the author_ids from each post, and then make one call to the user.

Let's first make a function to get a model by ids:

defmodule MyAppWeb.Schema.Helpers do
  def by_id(model, ids) do
    import Ecto.Query

    ids = ids |> Enum.uniq

    model
    |> where([m], m.id in ^ids)
    |> Repo.all
    |> Map.new(&{&1.id, &1})
  end
end

Now we can use this function to batch our author lookups:

object :post do

  @desc "Author of the post"
  field :author, :user do
    resolve fn post, _, _ ->
      batch({MyAppWeb.Schema.Helpers, :by_id, User}, post.author_id, fn batch_results ->
        {:ok, Map.get(batch_results, post.author_id)}
      end)
    end
  end

end

Now we make just two calls to the database. The first call loads all of the posts. Then as Absinthe walks through each post and tries to get the author, it's instead told to aggregate its information. That aggregate information is passed on to our by_id/2 function from earlier. It grabs ALL the users in just one database call, and creates a map of user ids to users.

Absinthe then does a second pass and calls the batch_results function with that map, letting us retrieve the individual author for each post.

Not only is this a very efficient way to query the data, it's also 100% dynamic. If a query document asks for authors, they're loaded efficiently. If it does not, they aren't loaded at all.

See the documentation for Absinthe.Middleware.Batch for more information.

Absinthe.Middleware.Batch achieves a lot and, with some helpers, was the standard way to solve this problem for a long time. While batching still has a place, it has a few limitations that have driven the development of Dataloader. There are small scale annoyances like the limitation of only being able to batch one thing at a time in a field, or the fact that the API can get very verbose.

There's also some larger scale issues however. Ecto has a fair number of quirks that make it a difficult library to abstract access to. If you want the concurrent test system to work, you need to add self() to all the batch keys and do Repo.all(caller: pid) in every batch function so that it knows which sandbox to use. It gets very easy for your GraphQL functions to become full of direct database access, inevitably going around important data access rules you may want to enforce in your contexts. Alternatively, your context functions can end up with dozens of little functions that only exist to support batching items by ID.

In time, people involved in larger projects have been able to build some abstractions, helpers, and conventions around the Absinthe.Middleware.Batch plugin that have done a good job of addressing these issues. That effort has been extracted into the Dataloader project, which also draws inspiration from similar projects in the GraphQL world.

We've made it easier and more flexible, however, with Elixir's dataloader package.

Dataloader

Let's jump straight in to getting Dataloader working, and then we'll expand on what's actually happening behind the scenes.

Using Dataloader is as simple as doing:

alias MyApp.Blog # Dataloader source, see below
import Absinthe.Resolution.Helpers, only: [dataloader: 1]

object :post do
  field :posts, list_of(:post), resolve: dataloader(Blog)

  @desc "Author of the post"
  field :author, :user do
    resolve dataloader(Blog)
  end
end

To make this work we need to setup a dataloader, add the Blog source to it, and make sure our schema knows it needs to run the dataloader.

Latest install instructions found here: https://github.com/absinthe-graphql/dataloader

Let's start with a data source. Dataloader data sources are just structs that encode a way of retrieving data in batches. In a Phoenix application you'll generally have one source per context, so that each context can control how its data is loaded.

Here is a hypothetical Blog context and a dataloader ecto source:

defmodule MyApp.Blog do
  def data() do
    Dataloader.Ecto.new(MyApp.Repo, query: &query/2)
  end

  def query(queryable, _params) do
    queryable
  end
end

In this example, the query returned by query/2 is used as a starting point by Dataloader to build the final query, which it does by traversing schema associations. In other words, Dataloader can determine that an author has many posts, and that to retrieve posts it needs to get those with the relevant author_id. If that's sufficient for your needs, query/2 need not modify the query it's given. But if you only want to load published posts, query/2 can narrow the query accordingly.

When integrating Dataloader with GraphQL, we want to place it in our context so that we can access it in our resolvers. In your schema module add:

alias MyApp.{Blog}

def context(ctx) do
  loader =
    Dataloader.new
    |> Dataloader.add_source(Blog, Blog.data())

  Map.put(ctx, :loader, loader)
end

def plugins do
  [Absinthe.Middleware.Dataloader] ++ Absinthe.Plugin.defaults()
end

That's it! If you run a GraphQL query that hits that field, it will be loaded efficiently without N+1.

See the documentation for Dataloader for more information.