Usage
This guide explains how to use an installed instance of IntegrateDB. It covers the following topics:
Users
IntegrateDB is a web application with a JSON API. In order to interact with it, you need to create and authenticate as a user.
Bootstrapping the root user
When IntegrateDB is first installed, it doesn't have any users. The first step is to create a root user. This does not require authentication.
Make a POST request to /api/v1/users with a payload like:
{
  "user": {
    "username": "example", 
    "password": "..."
  }
}For example:
export ENDPOINT="https://your-integrate-db-instance"
curl -X POST -H "Content-Type: application/json" \
  -d '{"user": {"username": "example", "password": "<your password>"}}' \
  $ENDPOINT/api/v1/usersThis will return a response like:
{
  "data": {
    "id": 1,
    "token": "<bearer token>",
    "refreshToken": "<refresh token>",
  }
}Where:
idis the user idtokenis a bearer token for authenticationrefreshTokenis a refresh token you can use to get a new bearer token
Once the root user is created, access is locked down and all API endpoints require user authentication.
Authenticating requests
You can authenticate requests to the API using your bearer token. Test this by listing all users using a GET request to /api/v1/users:
export AUTH_HEADER="Authorization: Bearer <your bearer token>"
export JSON_HEADER="Content-Type: application/json"
curl -H $AUTH_HEADER -H $JSON_HEADER http://localhost:4000/api/v1/usersWhich in this case will return:
{
  "data": [
    {
      "id": 1, 
      "username": "example"
    }
  ]
}Generating new tokens
You can login and and renew your token using:
POST {"data": {"username": "...", "password": "...""}} /api/v1/auth/loginPOST {data: "<renew token>"} /api/v1/auth/renew
Both return the same response data as create user:
{
  "data": {
    "id": 1,
    "token": "<bearer token>",
    "refreshToken": "<refresh token>",
  }
}Managing users
You have standard CRUD resources on users:
GET /api/v1/usersto list all usersGET /api/v1/users/:idto view a userPOST {data} /api/v1/usersto create a new userPUT {data} /api/v1/users/:idto update a userDELETE /api/v1/users/:idto delete a user
Once you have a user, the next step is to create one or more stakeholder applications.
Stakeholders
Stakeholders represent applications you want to integrate via a Postgres database. You can create stakeholders for every application that needs access to the database, or you can have a primary application that manages your database and then only create stakeholders for the secondary applications that you want to integrate via the database.
Stakeholders have a unique name that is used to create and kept in sync with a Postgres database user and a Postgres DDL schema. The name must confirm to this regex ~r/^[a-zA-Z_]{1}\w{0,31}$/.
Create stakeholder
Make a POST request to /api/v1/stakeholders with a payload like:
{
  "stakeholder": {
    "name": "example"
  }
}For example:
curl -X POST -H $AUTH_HEADER -H $JSON_HEADER  \
  -d '{"stakeholder": {"name": "example"}}' \
  $ENDPOINT/api/v1/stakeholdersThis creates a Postgres database user (example) with access scoped to its own private DDL schema (example.*) and returns the database user credentials in the response data:
{
  "data": {
    "id": 1,
    "name": "example",
    "credentials": {
      "username": "example",
      "password": "<randomly generated password>"
    }
  }
}The generated credentials contain the database user and password that your example application should use when connecting directly to the shared Postgres database. For example, it could connect using a connection string like postgres://example:<your randomly generated password>@host:port/db.
Note that this is the only time you will see the credentials. They are not stored by IntegrateDB, nor are they available to query from either IntegrateDB or Postgres. As a result, you must save the credentials from the create stakeholder response (somewhere safe, like in a password manager or a cluster secret).
Managing stakeholders
You also have standard CRUD resources on stakeholders:
GET /api/v1/stakeholdersto list all stakeholdersGET /api/v1/stakeholders/:idto view a stakeholderPOST {data} /api/v1/stakeholdersto create a new stakeholderPUT {data} /api/v1/stakeholders/:idto update a stakeholderDELETE /api/v1/stakeholders/:idto delete a stakeholder
The next section walks through how to configure data access and declare data dependencies for a Stakeholder application using Claims.
Claims
Stakeholder applications have full access to resources in their own DDL schema by default. For example, this allows a stakeholder called example to create tables and read and write data in the example.* schema. This in itself can be useful as a mechanism for persistence and integration.
However, the real power of IntegrateDB is to facilitate sharing of data between applications. Typically, this involves accessing data in a public or other private schema that may be created and managed by another application. For example, you may have a primary application writing data to public.orders and you may want a reporting or fulfillment service to be able to read and respond to the data written there.
To enable this, IntegrateDB allows Stakeholder applications to claim a subset of the data in the Postgres database. This does two things:
- it declares that the Stakeholder depends on the claimed data shape
 - it grants the Stakeholder application access to it
 
As expanded on in the Migration control and Schema evolution sections, declaring a dependency ensures that the claimed data shape (i.e.: tables and columns) exists — both at the time when the claim is established and ongoing as the database schema evolves.
Note that data access is currently read only, i.e.: IntegrateDB grants the Stakeholder's database user "SELECT" permission on the relevant tables and columns. Other permissions may be added in future but at the moment they are regarded as an anti-pattern. Instead, we recommend writing data to tables within the Stakeholder's DDL schema and granting access to that data to other applications that need to access it.
Specifying claims
You create and overwrite claims by PUTing a specification document to /api/v1/stakeholders/:stakeholder_id/claims.
For example, the following will claim all of the columns in the public.orders table and the id and user_id columns in the public.customers table:
{
  "data": {
    "match": [
      {
        "path": "public.orders",
        "fields": ["*"]
      },
      {
        "path": "public.customers",
        "fields": ["id", "user_id"]
      }
    ]
  }
}As you can see from the "*" specifying "all columns" of the public.orders table, specification documents can contain values that are expanded by looking at the actual structure of the database (querying the information_schema). This structure can evolve over time, so IntegrateDB works by storing the original spec document and then expanding and re-syncing actual database claims every time the DDL changes (i.e.: after a migration).
Top-level match array
Specification documents must have a top-level match array where each match object specifies a data claim.
Match objects
Match objects must have a path and can optionally have fields, alternatives and an optional flag.
Path syntax
Path values must be fully qualified strings refering to tables in the schema_name.table_name format.
One special case is that you are allowed to claim all of the tables in a schema using schema_name.* iff you also claim all fields. So this match object is valid:
{
  "path": "public.*",
  "fields": ["*"] 
}This is not:
{
  "path": "public.*",
  "fields": ["id", "inserted_at"] 
}Field syntax
Fields refer to database columns. A match object's fields array can contain column name strings, e.g.:
{
  "path": "public.foos",
  "fields": ["id", "inserted_at"] 
}Or field objects that must have a name and can have type, min_length, is_nullable, alternatives and optional. For example, the following are all valid:
{
  "path": "public.foos",
  "fields": [
    {"name": "id"},
    {"name": "uid", "type": "uuid"},
    {"name": "info", "min_length": 255},
    {"name": "inserted_at", "is_nullable": "false"},
    {"name": "extra_info", "optional": true}
  ]
}In this example, the public.foos table must have:
- an 
idcolumn of any type - an 
uidcolumn that is auuidtype - an 
infocolumn that has a minimum maximum length (see below!) of255 - an 
inserted_atcolumn that must not be nullable - and optionally an 
extra_infocolumn of any type 
Currently, if you specify it, the type value must be the literal string stored in the information_schema.columns data_type for the column, such as:
bigint
boolean
character varying
integer
numeric
text
timestamp with time zone
timestamp without time zone
uuidYou can see the values in your database using e.g.:
SELECT distinct(data_type)
  FROM information_schema.columns 
  ORDER BY data_type;Or to introspect the columns in a specific table using a query like:
SELECT table_schema, table_name, column_name, data_type,
  case when character_maximum_length is not null
    then character_maximum_length
    else numeric_precision
  end as max_length
  FROM information_schema.columns 
  WHERE table_schema = 'public'
    AND table_name = 'foos'
  ORDER BY (table_schema, table_name, column_name);Minimum maximum length
Field objects that specify a min_length are essentially saying "this column's character length or numeric precision must be at least this value". This allows you to protect against column changes that truncate data. For example, say you have a varchar(40) column and a field specifying min_length: 40. If a migration tries to truncate the column to a varchar(30) then it will fail.
What happens when a field is not fully specified?
Fields can be specified with as little as a name:
"fields": ["id"]Or as a fully fleshed out column spec:
"fields": [
  {"name": "id", "type": "bigint", "min_length": 64, "is_nullable": false}
]When storing claims for fields that are not fully specified, IntegrateDB populates the field properties from the current state of the database (i.e.: from the information_schema.columns table). This means that a configuration like "fields": ["id"] declares a dependency that snapshots the current field, precision and nullability of the id field in the database. If the column is altered, for example to increase the size of a varchar or to set it to non nullable, then the claim will automatically adjust to track the new reality and prevent any data regressions.
Validating specification documents
The IntegrateDB application processes and validates the spec in three stages.
- JSON schema validation
 - Ecto.Changeset validation
 - Database validation
 
The full syntax and structure for specification documents is defined and validated by the JSON schema at priv/spec/spec.schema.json. This schema is applied first and you will see errors like:
{
  "errors": {
    "detail": [
      ["Schema does not allow additional properties.", "#/name"],
      ["Required property match was not present.","#"]
    ]
  }
}IntegrateDB then casts the document to a Integrate.Specification.Spec which applies some changeset validation that should not normally be triggered. Lastly, the spec is expanded and compared against the database. To pass and be stored, all of the paths and fields claimed in the spec must exist in the database and the database must match any column properties (type, min_length, is_nullable) specified.
At this point, you may see errors like:
{
  "errors": {
    "claims": [{
      "alternatives": [{
        "columns": [{
          "alternatives": [{
            "type": [
              "path: `public.foos`, field: `id`: specified value `int` does not match existing column value `bigint`."
            ]
          }]
        }]
      }]
    }]
  }
}These error messages need to be improved but hopefully there's enough in there at the moment for you to figure out what's going on.
Notifications
Note: Notifications are not currently implemented. This is top of the Known Issues list.
Notifications are envisaged as being specified just like claims but with additional match attributes for events and notification channels. In addition, the specification data should have an additional channels section for notification channel configuration and the data should be PUT to /api/v1/stakeholders/:stakeholder_id/notifications rather than .../claims.
This will see the configuration looking like this:
{
  "data": {
    "match": [
      {
        "path": "public.orders",
        "events": ["*"],
        "fields": ["*"],
        "channels": ["*"]
      },
      {
        "path": "public.customers",
        "events": ["INSERT", "UPDATE", "DELETE"],
        "fields": ["id", "user_id"],
        "channels": ["SOCKET", "WEBHOOK", "REDIS"]
      }
    ],
    "channels": [
      // ... notification channel configuration
    ]
  }
}The implementation will use the existing Broadway based replication pipeline at Integrate.Replication and will remove the need for SUPERUSER permissions by dynamically adding tables to the logical replication publication.
Migration control
So far, we've covered bootstrapping a root user, creating a stakeholder application and declaring data dependencies and access requirements as claims. This machinery now allows us to validate migrations in order to ensure data dependencies continue to be met when the database structure (DDL) is changing.
IntegrateDB adds three functions to your Postgres database:
SELECT integratedb_sync() tells IntegrateDB to re-expand and re-sync all of the claims from the stored specifications. It's useful when you want to manually sync IntegrateDB with the state of the database, for example following a migration rollback.
SELECT * from integratedb_unmet_claims() runs a query looking for all claims that are unmet by the current state of the database. You can call this yourself for debugging purposes.
SELECT integratedb_validate_migration() validates the current state of the database by calling integratedb_unmet_claims(). If there are unmet claims, it raises an exception with logging. If not, it calls integratedb_sync() to update claims in line with the new state of the database. 
Calling integratedb_validate_migration() at the end of your migration (i.e.: as the last statement within the transaction) prevents the migration from being applied if the resulting DDL schema doesn't provide the declared data dependencies. This works with whichever language or migration tool you prefer.
An example using straight SQL:
BEGIN;
ALTER TABLE foos DROP COLUMN name;
SELECT integratedb_validate_migration();
COMMIT;
-- Or to rollback
BEGIN;
ALTER TABLE foos ADD COLUMN name varchar(255);
SELECT integratedb_sync();
COMMIT;
An example using Ecto.Migration:
defmodule ExampleMigration.AlterFoos do
  use Ecto.Migration
  def up do
    alter table(:foos) do
      remove :name
    end
  end
  def down do
    alter table(:foos) do
      add :name, :string
    end
  end
  def before_commit do
    execute "SELECT integratedb_validate_migration()",
            "SELECT integratedb_sync()"
  end
endIncluding an integratedb_validate_migration() at the end of your migration ensures that the database fulfils the claimed data dependencies. It essentially applies constraints to your database that reduce regressions caused by schema migrations. The next section walks through how these constraints can be relaxed in order to explicitly enable schema evolution.
Schema evolution
With API-based integration, the theory is that you can handle changes to data structure using versioning. I.e.: you publish multiple versions of an API endpoint or your business logic handles variations of a data structure. With IntegrateDB, you can update your data dependency configuration (claims and notifications both work the same way) to:
- allow 
alternatives, enabling changes / alterations - make tables and columns 
optional, enabling deletions 
Alternatives work at the table and column level. For example:
// from
"match": {
  "path": "public.orders",
  "fields": ["product_id", "quantity"]
}
// to
"match": {
  "alternatives": [
    {
      "path": "public.orders",
      "fields": ["product_uuid", "quantity"]
    },
    {
      "path": "public.legacy_orders",
      "fields": [
        {
          "alternatives": ["product_id", "product_uuid"]
        },
        "quantity"
      ]
    }
  ]
}As do optionals:
// from
"match": {
  "path": "public.foos",
  "fields": ["*"]
}
// to
"match": {
  "path": "public.foos",
  "fields": ["*"],
  "optional": true
}
// and from
"fields": ["user_id"]
// to
"fields": [
  {"name": "user_id", "optional": true}
]Alternatives and optionals are designed as tools to help navigate a migration whilst still being as explicit and static / defined as possible about data dependencies. Once the database has been migrated and the new data is flowing, it's recommened to remove the alternatives and optionals from your integration config — as you would remove the code handling the legacy orders from your app.
Next steps
Get started with Installation and find out about the Support available for your project.