ExFdbmonitor (ex_fdbmonitor v0.2.1)

ExFdbmonitor starts and supervises fdbmonitor (the FoundationDB management process), bootstraps new clusters, and handles scaling operations — all coordinated across nodes via Erlang distribution.

How it works

First node — detects that no FDB peers exist, creates the cluster file, writes a foundationdb.conf, and runs configure new single <storage_engine>.
Subsequent nodes — discover existing peers via :erlang.nodes(), copy the cluster file, and join the cluster.
Redundancy — once enough nodes are registered, scale_up configures coordinators and the declared redundancy mode ("double", "triple").
Restarts — on restart the bootstrap config is ignored (data files already exist). The node re-includes itself if necessary and re-evaluates redundancy automatically.

All mutating FDB operations are serialized through ExFdbmonitor.MgmtServer, a DGenServer backed by FDB itself. This prevents concurrent fdbcli commands from interleaving across nodes.

Requirements

Elixir ~> 1.18
FoundationDB client and server packages (releases)

Usage

See examples/example_app/README.md for a tutorial on using ExFdbmonitor in your application.

Configuration

FDB executable paths

If your FoundationDB installation is not in the default location, then you must set the following environment variables. The paths shown here are the defaults.

config :ex_fdbmonitor,
       fdbmonitor: "/usr/local/libexec/fdbmonitor",
       fdbcli: "/usr/local/bin/fdbcli",
       fdbserver: "/usr/local/libexec/fdbserver",
       fdbdr: "/usr/local/bin/fdbdr",
       backup_agent: "/usr/local/foundationdb/backup_agent/backup_agent",
       dr_agent: "/usr/local/bin/dr_agent"

Minimal (single-node dev)

# config/dev.exs
import Config

config :ex_fdbmonitor,
  etc_dir: ".my_app/dev/fdb/etc",
  run_dir: ".my_app/dev/fdb/run"

config :ex_fdbmonitor,
  bootstrap: [
    conf: [
      data_dir: ".my_app/dev/fdb/data",
      log_dir: ".my_app/dev/fdb/log",
      fdbservers: [[port: 5000]]
    ]
  ]

Multi-node production

# config/runtime.exs
import Config

addr = fn interface ->
  {:ok, addrs} = :inet.getifaddrs()
  :proplists.get_value(to_charlist(interface), addrs)[:addr]
  |> :inet.ntoa()
  |> to_string()
end

config :ex_fdbmonitor,
  etc_dir: "/var/lib/my_app/fdb/etc",
  run_dir: "/var/lib/my_app/fdb/run"

config :ex_fdbmonitor,
  bootstrap: [
  
    # nodes must communicate with coordinators over the
    # network interface
    cluster: [coordinator_addr: addr.("eth0")],
    
    conf: [
      data_dir: "/var/lib/my_app/fdb/data",
      log_dir: "/var/lib/my_app/fdb/log",
      storage_engine: "ssd-2",
      
      # We're defining 2 fdbservers per node
      fdbservers: [[port: 4500], [port: 4501]],
      
      # When safe to do so, ex_fdbmonitor will upgrade
      # to 'double' redunancy automatically
      redundancy_mode: "double"
    ]
  ]

Configuration reference

Key	Required	Description
`:etc_dir`	yes	Directory for `fdb.cluster` and `foundationdb.conf`
`:run_dir`	yes	Directory for `fdbmonitor` pid file
`:bootstrap`	no	Bootstrap config (ignored after first successful start)

Bootstrap keys:

Key	Description
`cluster: [coordinator_addr:]`	IP address for the initial coordinator (default `"127.0.0.1"`)
`conf: [data_dir:]`	FDB data directory
`conf: [log_dir:]`	FDB log directory
`conf: [storage_engine:]`	Storage engine (default `"ssd-2"`)
`conf: [fdbservers:]`	List of `[port: N]` keyword lists, one per `fdbserver` process
`conf: [redundancy_mode:]`	`"single"`, `"double"`, or `"triple"` (default: `nil` / single)
`fdbcli:`	Extra `fdbcli` args to run at bootstrap (optional, repeatable)

Bootstrap flow

On application start, ExFdbmonitor runs two phases:

Phase 1 (before any processes start):

If the conf file and data dir are empty (first boot), write config files. If FDB peers exist on :erlang.nodes(), copy their cluster file. Otherwise, create a new cluster file and generate configure new single <engine>.
If files already exist (restart), skip — use existing cluster file.

Phase 2 (after fdbmonitor / fdbserver are running):

Start ExFdbmonitor.MgmtServer (connects to FDB for distributed coordination).
Register this node's machine_id.
Call scale_up(redundancy_mode, [node()]) — includes the node back into FDB and configures redundancy when enough nodes are present.

Public API

`ExFdbmonitor.leave/0`

Gracefully remove the current node from the cluster. Downgrades redundancy if needed, reassigns coordinators, excludes the node (blocks until data is moved), and stops the local fdbmonitor. To rejoin, restart the :ex_fdbmonitor application.

Redundancy modes

Mode	Min nodes	Min coordinators
`"single"`	1	1
`"double"`	3	3
`"triple"`	5	5

scale_up stores the declared mode as a ceiling. scale_down auto-determines the highest mode the surviving nodes can support, capped at that ceiling. This prevents a scale-down/scale-up cycle from accidentally exceeding the operator's intent.

Scaling example

When a node is gracefully shutting down,

# On the departing node:
ExFdbmonitor.leave()

When a node is returning from previously having been gracefully shutdown,

# Later, restart the :ex_fdbmonitor application to rejoin:
Application.stop(:ex_fdbmonitor)
Application.ensure_all_started(:ex_fdbmonitor)

Testing

ExFdbmonitor provides sandbox modules for integration testing:

# Single-node sandbox
sandbox = ExFdbmonitor.Sandbox.Single.checkout("my-test", starting_port: 5000)
# ... run tests ...
ExFdbmonitor.Sandbox.Single.checkin(sandbox, drop?: true)

# 3-node double-redundancy sandbox
sandbox = ExFdbmonitor.Sandbox.Double.checkout("my-test", starting_port: 5500)
# ... run tests ...
ExFdbmonitor.Sandbox.Double.checkin(sandbox, drop?: true)

Sandboxes start isolated local_cluster nodes with their own FDB processes. Pass drop?: true to delete all data on checkin.

Usage

See examples/example_app/README.md for a tutorial on using ExFdbmonitor in your application.

Summary

Functions

leave()

Gracefully remove the current node from the cluster.

open_db(input \\ nil)

Opens a database connection.

Functions

leave()

Gracefully remove the current node from the cluster.

Executes MgmtServer.scale_down/1 which, under the DGenServer lock:

Downgrades the redundancy mode if the remaining nodes can no longer sustain it (e.g. triple → double when dropping below 5 nodes).
Reassigns coordinators to surviving nodes.
Excludes this node's FDB processes (blocks until data is fully moved).

If the exclude succeeds the local worker is terminated so that fdbmonitor and its fdbserver processes are stopped.

Returns :ok on success or {:error, reason} if the scale-down fails (in which case the worker is left running).

Rejoining after leave

Restart the :ex_fdbmonitor application. The bootstrap flow will detect that data files already exist, skip the initial configure, and call MgmtServer.scale_up/2 which includes the node back into FDB and reconfigures the redundancy mode if the cluster now has enough nodes.

open_db(input \\ nil)

Opens a database connection.

Arguments

:input: Ignored. It's here for compatibility with other libraries.

Examples

Use whenever you need to open the t:erlfdb.database/0:

db = ExFdbmonitor.open_db()
"world" = :erlfdb.get(db, "hello")

Use in conjuncation with Ecto.Adapters.FoundationDB:

config :my_app, MyApp.Repo,
  open_db: &ExFdbmonitor.open_db/1