View Source Ecto.Adapters.FoundationDB (Ecto.Adapters.FoundationDB v0.2.0)

(Hex.pm | GitHub)

Adapter module for FoundationDB.

It uses :erlfdb for communicating with the database.

Installation

Install the latest stable release of FoundationDB from the official FoundationDB Releases.

The foundationdb-server package is required on any system that will be running a FoundationDB server instance. For example, it's common to run the foundationdb-server on your development machine and on managed instances running a FoundationDB cluster, but not for your stateless Elixir application server in production.

foundationdb-clients is always required.

Include :ecto_foundationdb in your list of dependencies in mix.exs:

defp deps do
  [
    {:ecto_foundationdb, "~> 0.1"}
  ]
end

Usage

Define this module in your MyApp application:

defmodule MyApp.Repo do
  use Ecto.Repo, otp_app: :my_app, adapter: Ecto.Adapters.FoundationDB

  use EctoFoundationDB.Migrator
  def migrations(), do: []
end

Edit your config.exs file to include the following:

config :my_app, MyApp.Repo,
  cluster_file: "/etc/foundationdb/fdb.cluster"

In your app's supervisor, you can start the repo as follows:

children = [
  ...
  MyApp.Repo,
  ...
]

Supervisor.start_link(...)

Tenants

EctoFoundationDB requires the use of FoundationDB Tenants.

Each tenant you create has a separate keyspace from all others, and a given FoundationDB Transaction is guaranteed to be isolated to a particular tenant's keyspace.

You'll use the Ecto :prefix option to specify the relevant tenant for each Ecto operation in your application.

Creating a schema to be used in a tenant.

defmodule User do
  use Ecto.Schema
  @schema_context usetenant: true
  @primary_key {:id, :binary_id, autogenerate: true}
  schema "users" do
    field(:name, :string)
    field(:department, :string)
    timestamps()
  end
  # ...
end

Setting up a tenant.

alias EctoFoundationDB.Tenant

tenant = Tenant.open!(MyApp.Repo, "some-org")

Inserting a new struct.

user = %User{name: "John", department: "Engineering"}
       |> FoundationDB.usetenant(tenant)

user = MyApp.Repo.insert!(user)

Querying for a struct using the primary key.

MyApp.Repo.get!(User, user.id, prefix: tenant)

When a struct is retrieved from a tenant (using :prefix), that struct's metadata holds onto the tenant reference. This helps to protect your application from a struct accidentally crossing tenant boundaries due to some unforeseen bug.

Indexes

The implication of the FoundationDB Layer Concept is that the manner in which data can be stored and accessed is the responsibility of the client application. To achieve general purpose data storage and access patterns commonly desired in web applications, EctoFoundationDB provides support for indexes out of the box, and also allows your application to define custom indexes.

When you define a migration that calls create index(User, [:department]), a Default index is created on the :department field. The following two indexes are equivalent:

# ...
create index(User, [:department])
# is equivalent to
create index(User, [:department], options: [indexer: EctoFoundationDB.Indexer.Default]))]
# ...

Timestamp indexes and multi-field indexes are supported:

create index(User, [:inserted_at])
create index(User, [:birthyear, :department, :birthdate])

In fact, the index value can be any Elixir term. All types support Equal queries. However, certain Ecto Schema types support Between queries. See the Data Types section for the list.

For example, When an index is created on timestamp-like fields, it is an effective time series index. See the query examples below.

See the Migrations section for further details about managing indexes in your application.

Index Query Examples

Retrieve all users in the Engineering department (with an Equal constraint):

iex> query = from(u in User, where: u.department == ^"Engineering")
iex> MyApp.Repo.all(query, prefix: tenant)

Retrieve all users with a birthyear from 1992 to 1994 (with a Between constraint):

iex> query = from(u in User,
...>   where: u.birthyear >= ^1992 and u.birthyear <= ^1994
...> )
iex> MyApp.Repo.all(query, prefix: tenant)

Retrieve all Engineers born in August 1992:

iex> query = from(u in User,
...>   where: u.birthyear == ^1992 and
...>          u.department == ^"Engineering" and
...>          u.birthdate >= ^~D[1992-08-01] and u.birthdate < ^~D[1992-09-01]
...> )
iex> MyApp.Repo.all(query, prefix: tenant)

Order matters!: When you create an index using multiple fields, the FDB key that stores the index will be extended with all the values in the order of your defined index fields. Because FoundationDB stores keys in a well-defined order, the order of the fields in your index determines the Between queries you can perform.

There can be 0 or 1 Between clauses, and if one exists, it must correspond to the final constraint in the where clause when compared against the order of the index fields.

Custom Indexes

As data is inserted, updated, deleted, and queried, the Indexer callbacks (via behaviour EctoFoundationDB.Indexer) are executed. Your Indexer module does all the work necessary to maintain the index within the transaction that the data itself is being manipulated. Many different types of indexes are possible. You are in control!

For an example, please see the MaxValue implementation which is used internally by EctoFoundationDB to keep track of the maximum value of a field in a Schema.

Transactions

The Repo functions will automatically use FoundationDB transactions behind the scenes. EctoFoundationDB also exposes the Ecto.Repo transaction function, allowing you to group operations together with transactional isolation:

MyApp.Repo.transaction(fn ->
    # Ecto work
  end, prefix: tenant)

It also exposes a FoundationDB-specific transaction:

alias Ecto.Adapters.FoundationDB

FoundationDB.transactional(tenant,
  fn tx ->
    # :erlfdb work
  end)

Both of these calling conventions create a transaction on FDB. Typically you will use Repo.transaction/1 when operating on your Ecto.Schema structs. If you wish to do anything using the lower-level :erlfdb API (rare), you will use FoundationDB.transactional/2.

Please read the FoundationDB Developer Guid on transactions for more information about how to develop with transactions.

It's important to remember that even though the database gives ACID guarantees about the keys updated in a transaction, the Elixir function passed into Repo.transaction/1 or FoundationDB.transactional/2 will be executed more than once when any conflicts are encountered. For this reason, your function must not have any side effects other than the database operations. For example, do not publish any messages to other processes (such as Phoenix.PubSub) from within the transaction.

# Do not do this!
MyApp.Repo.transaction(fn ->
  MyApp.Repo.insert(%User{name: "John"})
  ...
  Phoenix.PubSub.broadcast("my_topic", "new_user") # Not safe! :(
end, prefix: tenant)
# Instead, do this:
MyApp.Repo.transaction(fn ->
  MyApp.Repo.insert(%User{name: "John"})
  ...
end, prefix: tenant)
Phoenix.PubSub.broadcast("my_topic", "new_user") # Safe :)

Migrations

At first glance, EctoFoundationDB migrations may look similar to that of :ecto_sql, but the actual execution of migrations and how your app must be configured are very different, so please read this section in full.

If your app uses indexes on any of your schemas, you must include a module that implements the EctoFoundationDB.Migrator behaviour. By default, this is assumed to be your Repo module, as we have demonstrated above.

Migrations are only used for index management. There are no tables to add, drop, or modify. When you add a field to your Schema, any read requests on old objects will return nil in that field.

As tenants are opened during your application runtime, migrations will be executed automatically. This distributes the migration across a potentially long period of time, as migrations will not be executed unless the tenant is opened.

The Migrator is a module in your application runtime that provides the full list of ordered migrations. These are the migrations that will be executed when a tenant is opened. If you leave out a migration from the list, it will not be applied.

The result of migrations/0 will change over time to include new migrations, like so:

defmodule MyApp.Repo do
  # ...
  def migrations() do
    [
      {0, MyApp.IndexUserByDepartment}
      # At a later date, we my add a new index with a corresponding addition
      # tho the migrations list:
      #    {1, MyApp.IndexUserByRole}
    ]
  end
end

Each migration is contained in a separate module, much like EctoSQL's. However, the operations to be carried out must be returned as a list. For example, the creation an index may look like this:

defmodule MyApp.IndexUserByDepartment do
  use EctoFoundationDB.Migration
  def change() do
    [
      create index(User, [:department])
    ]
  end
end

Your migrator and MyApp.<Migration> modules are part of your codebase. They must be included in your application release.

Migrations can be completed in full at any time with a call to EctoFoundationDB.Migrator.up_all/1. Depending on how many tenants you have in your database, and the size of the data for each tenant, this operation might take a long time and make heavy use of system resources. It is safe to interrupt this operation, as migrations are performed transactionally. However, if you interrupt a migration, the next attempt for that tenant may have a brief delay (~5 sec); the migrator must ensure that the previous migration has indeed stopped executing.

Data Types

EctoFoundationDB stores your struct's data using :erlang.term_to_binary/1, and retrieves it with :erlang.binary_to_term/1. As such, there is no data type conversion between Elixir types and Database Types. Any term you put in your struct will be stored and later retrieved.

Data types are used by EctoFoundationDB for the creation and querying of indexes. Certain Ecto types are encoded into the FDB key, which allows you to formulate Between queries on indexes of these types:

  • :id
  • :binary_id
  • :integer
  • :float
  • :boolean
  • :string
  • :binary
  • :date
  • :time
  • :time_usec
  • :naive_datetime
  • :naive_datetime_usec
  • :utc_datetime
  • :utc_datetime_usec

See Queries for more information.

Upserts

The FoundationDB Adapter supports the following upsert approaches.

Upserts with on_conflict

The following choices for on_conflict are supported, and they work in the manner described by the offical Ecto documentation.

  • :raise: The default.
  • :nothing
  • :replace_all
  • {:replace_all_except, fields}
  • {:replace, fields}

Upserts with conflict_target

If you provide conflict_target: [], the FoundationDB Adapter will make no effort to inspect the existence of objects in the database before insert. When an existing object is blindly overwritten, your indexes may become inconsistent, resulting in undefined behavior. You should only consider its use if one of the following is true:

  • (a) your schema has no indexes
  • (b) you know apriori that any indexed fields are unchanged at the time of upsert, or
  • (c) you know apirori that the primary key you're inserting doesn't already exist.

For these use cases, this option may speed up the loading of initial data at scale.

Pipelining

Pipelining is the technique of sending multiple queries to a database at the same time, and then receiving the results at a later time. In doing so, you can often avoid waiting for multiple network round trips. EctoFDB uses a async/await syntax to implement pipelining.

All the "queryable" repo functions have async versions:

  • Repo.async_get/2 / Repo.async_get/3
  • Repo.async_get!/2 / Repo.async_get!/3
  • Repo.async_get_by/2 / Repo.async_get_by/3
  • Repo.async_get_by!/2 / Repo.async_get_by!/3
  • Repo.async_one/1 / Repo.async_one/2
  • Repo.async_one!/1 / Repo.async_one!/2
  • Repo.async_all/1 / Repo.async_all/2

The results of these functions are provided to Repo.await/1 inside a Repo.transaction/2.

Consider this example. Our transaction below has 2 network round trips in serial. The "John" User is retrieved, and then the "James" users is retrieved. Finally, the list is constructed and the transaction ends.

result = MyApp.Repo.transaction(fn ->
  [
    MyApp.Repo.get_by(User, name: "John"),
    MyApp.Repo.get_by(User, name: "James")
  ]
end, prefix: tenant)

[john, james] = result

With pipelining, we can issue 2 async_get_by queries without waiting for their result, and then only when we need the data, we await the result. Notice that the list of futures is constructed, and then we await on them.

result = MyApp.Repo.transaction(fn ->
  futures = [
    MyApp.Repo.async_get_by(User, name: "John"),
    MyApp.Repo.async_get_by(User, name: "James")
  ]

  MyApp.Repo.await(futures)
end, prefix: tenant)

[john, james] = result

Conceptually, only 1 network round-trip is required, so this will be faster than waiting on each in sequence.

Pipelining on range queries (interleaving waits)

The functions Repo.async_all/1 and Repo.async_all/2 work a little bit different on the await. When the DB must return a list of results, if that list ends up being larger than :erlfdb's configured :target_bytes, then the result set is split into multiple round-trips to the database. If your application awaits multiple such queries, then :erlfdb's "interleaving waits" feature is used.

Consider this example, where we're retrieving all Users and all Posts in the same transaction.

MyApp.Repo.transaction(fn ->
  futures = [
    MyApp.Repo.async_all(User),
    MyApp.Repo.async_all(Post)
  ]

  MyApp.Repo.await(futures)
end, prefix: tenant)

Each query individually executed would be multiple round-trips to the database as pages of results are retrieved.

sequenceDiagram
    Client->>Server: get_range(User)
    Server-->>Client: {100 users, continuation_token}
    Client->>Server: get_range(User, continuation_token)
    Server-->>Client: {10 users, done}

With :erlfdb's interleaving waits, we send as many get_range requests as possible with pipelining until all are exhausted.

sequenceDiagram
    Client->>Server: get_range(User)
    Client->>Server: get_range(Post)
    Server-->>Client: {100 users, continuation_token}
    Server-->>Client: {100 posts, continuation_token}
    Client->>Server: get_range(User, continuation_token)
    Client->>Server: get_range(Post, continuation_token)
    Server-->>Client: {10 users, done}
    Server-->>Client: {100 posts, continuation_token}
    Client->>Server: get_range(Post, continuation_token)
    Server-->>Client: {2 posts, done}

This happens automatically when you include the list of futures to your Repo.await/1.

Optimizing for throughput and latency

Pipelining operations inside your transactions can offer significant speed-ups, and you should also consider parallelizing multiple transactions. Both techniques are useful in different circumstances.

  • Pipelining: The transaction can contain arbitrary logic and remain ACID compliant. If you make smart use of pipelining, the latency of execution of that logic is kept to a minimum.
  • Parallel transactions: Each transaction is internally ACID compliant, but any 2 given transactions may conflict if they're accessing the same keys. When transactions do not conflict, throughput goes up.

If you're attempting to get the maximum throughput on data loading, you'll probably want to make use of both!

Standard Options

  • :cluster_file - The path to the fdb.cluster file. The default is "/etc/foundationdb/fdb.cluster".
  • :migrator - A module that implements the EctoFoundationDB.Migrator behaviour. Required when using any indexes. Defaults to nil. When nil, your Repo module is assumed to be the Migrator.

Advanced Options

  • :storage_id - All tenants created by this adapter are prefixed with this string. This allows multiple configurations to operate on the same FoundationDB cluster indepedently. Defaults to "Ecto.Adapters.FoundationDB".
  • :open_db - 0-arity function used for opening a reference to the FoundationDB cluster. Defaults to :erlfdb.open(cluster_file). When using EctoFoundationDB.Sandbox, you should consider setting this option to Sandbox.open_db/0.
  • :migration_step - The maximum number of keys to process in a single transaction when running migrations. Defaults to 1000. If you use a number that is too large, the FDB transactions run by the Migrator will fail.
  • :idx_cache - When set to :enabled, the Ecto ets cache is used to store the available indexes per tenant. This speeds up all database operations. Defaults to :enabled.

Limitations and Caveats

FoundationDB

Please read FoundationDB Known Limitations.

Layer

EctoFoundationDB implements a specific Layer on the FoundationDB key-value store. This Layer is intended to be generally useful, but you may not find it suitable for your workloads. The Layer is documented in detail at EctoFoundationDB.Layer. This project does not support plugging in other Layers.

Tenants

  • The use of a tenant implies an ownership relationship between the tenant and the data. It's up to you how you use this relationship in your application.

  • Failure to specify a tenant will result in a raised exception at runtime.

Migrations

The following are not supported:

  • Renaming a table
  • Renaming a field
  • Dropping an index
  • Moving down in migration bersions (rollback)

Finally, the Ecto Mix tasks are known to be unsupported:

# These commands are not supported. Do not use them with :ecto_foundationdb!
#    mix ecto.create
#    mix ecto.drop
#    mix ecto.migrate
#    mix ecto.gen.migration
#    mix ecto.rollback
#    mix ecto.migrations

Key and Value Size

FoundationDB has limits on key and value size Please be aware of these limitations as you develop. EctoFoundationDB doesn't make any attempt to protect you from these errors.

Ecto Queries

You'll find that most queries outside of those detailed in the documentation fail with an EctoFoundationDB.Exception.Unsupported exception. The FoundationDB Layer concept precludes complex queries from being executed within the database. Therefore, it only makes sense to implement a limited set of query types -- specifically those that do have optimized database query semantics. All other filtering, aggregation, and grouping must be done by your Elixir code.

In other words, you'll be writing less SQL, and more Elixir.

Refer to the integration tests for a collection of queries that is known to work.

Other Ecto Features

As you test the boundaries of EctoFoundationDB, you'll find that many other Ecto features just plain don't work with this adapter. Ecto has a lot of features, and this adapter is in the early stage. For any given unsupported feature, the underlying reason will be one of the following.

  1. It makes sense for EctoFoundationDB, but it just hasn't been implemented yet.
  2. The fundamental differences between FDB and SQL backends mean that there is no practical reason to implement it in EctoFoundationDB.

Please feel free to request features, but also please be aware that your request might be closed if it falls into that second category.

Summary

Functions

Executes the given function in a transaction on the database.

Functions

Link to this function

transactional(db_or_tenant, fun)

View Source
@spec transactional(
  EctoFoundationDB.Database.t() | EctoFoundationDB.Tenant.t() | nil,
  function()
) ::
  any()

Executes the given function in a transaction on the database.

If you provide an arity-0 function, your function will be executed in a newly spawned process. This is to ensure that EctoFoundationDB can safely manage the process dictionary.

Please be aware of the limitations that FoundationDB imposes on transactions.

For example, a transaction must complete within 5 seconds.

Link to this function

usetenant(struct, tenant)

View Source
@spec usetenant(Ecto.Schema.schema(), any()) :: Ecto.Schema.schema()