This guide describes the complete lifecycle of a durable object, from startup through execution, hibernation, and shutdown.
Overview
flowchart LR
subgraph Startup
direction TB
S1["call / ensure_started"] --> S2[Load State from DB]
S2 --> S3["after_load callback"]
end
subgraph Running
direction TB
Idle -->|"call or alarm"| Handle[Handle Call]
Handle -->|"state changed"| Persist[Persist State]
Handle -->|"no change"| Idle
Persist -->|"success"| Idle
Idle -->|"inactivity"| Hibernate[Hibernated]
Hibernate -->|"message received"| Idle
end
subgraph Shutdown
direction TB
T1["shutdown_after timeout<br/>or node failure"] --> T2["Process stopped<br/>(state already in DB)"]
end
Startup --> Running
Running --> Shutdown
Shutdown -->|"next call"| StartupPhases
1. Starting
A durable object process is started on demand when you call it:
MyApp.Counter.increment("user:123", 1)
# or
DurableObject.call(MyApp.Counter, "user:123", :increment, [1])The system checks if a process for the (module, object_id) pair is already running. If not, a new process is started under a DynamicSupervisor with a :temporary restart strategy -- meaning it will not be automatically restarted if it stops.
In distributed mode (Horde), the registry ensures only one process exists for a given (module, object_id) across the entire cluster.
2. Loading State
flowchart TD
A[init] --> B{Repo configured?}
B -- Yes --> C{Record in DB?}
B -- No --> F[Use default state]
C -- Yes --> D[Load state from DB]
C -- No --> E[Save default state to DB]
D --> G[Merge with defaults]
E --> F
G --> F
F --> H[after_load callback]During init/1, the server loads persisted state from the database. If no record exists yet, the default state (derived from field definitions in the DSL) is saved. Loaded state is merged with defaults so that newly added fields get their default values.
3. After Load
The optional after_load/1 callback runs once after state is loaded. This is useful for scheduling initial alarms or performing one-time setup:
def after_load(state) do
{:ok, state, {:schedule_alarm, :cleanup, :timer.minutes(30)}}
endIf after_load modifies state, the new state is persisted before the object begins accepting calls.
4. Handling Calls
When a call arrives, the server invokes the corresponding handle_<name>/N function. Handlers return a result tuple:
def handle_increment(amount, state) do
new_count = state.count + amount
{:reply, new_count, %{state | count: new_count}}
endState persistence is transactional. If the state changed, it is written to the database. If the write fails, the state is rolled back to its previous value and the caller receives an error. This guarantees that in-memory state and database state stay in sync.
Handlers can also schedule alarms as part of their return value:
{:reply, :ok, new_state, {:schedule_alarm, :expire, :timer.hours(1)}}5. Alarms
sequenceDiagram
participant Handler
participant Scheduler
participant DB
participant Object
Handler->>DB: Write alarm (upsert)
Note over DB: scheduled_at = now + delay
loop Polling interval (default 30s)
Scheduler->>DB: Query overdue, unclaimed alarms
DB-->>Scheduler: Alarm records
end
Scheduler->>DB: Claim alarm (set claimed_at)
Scheduler->>Object: call(:__fire_alarm__, [alarm_name])
Object->>Object: handle_alarm(name, state)
alt Success
Scheduler->>DB: Delete alarm (if still claimed)
else Handler reschedules same alarm
Note over DB: Upsert clears claimed_at
Note over Scheduler: Delete is no-op
else Failure/crash
Note over DB: Alarm stays claimed
Note over Scheduler: Retries after claim TTL
endAlarms are persisted in the durable_object_alarms table and survive process restarts. The polling scheduler uses claim-based execution for crash recovery:
- Claim: Before firing, the scheduler atomically sets
claimed_aton the alarm - Fire: The object's
handle_alarm/2callback is invoked - Delete: On success, the alarm is deleted only if still claimed
If a handler reschedules the same alarm, the upsert clears claimed_at, so the delete becomes a no-op and the new alarm persists. If the handler fails or the server crashes, the alarm remains claimed and will be retried after the claim_ttl expires (default: 60 seconds).
Alarms with the same (object_type, object_id, alarm_name) are upserted, so scheduling an alarm that already exists replaces it.
6. Hibernation
After a configurable period of inactivity (default: 5 minutes), the GenServer hibernates automatically. This reduces memory usage to a minimum while keeping the process alive and registered. The next incoming message wakes the process transparently.
Configure via the DSL:
options do
hibernate_after :timer.minutes(10)
end7. Shutdown
Optionally, objects can shut down entirely after extended inactivity. Unlike hibernation, shutdown terminates the process. The next call will re-start the object from the database.
options do
shutdown_after :timer.hours(1)
endThe shutdown timer resets on every handler call, so only truly idle objects are stopped. State is already persisted (it was saved after the last handler call), so no data is lost.
8. Recovery
Because state is persisted after every mutation and alarms are stored in the database, recovery is automatic:
- Process crash: The next call starts a fresh process that loads state from the database. Alarms continue to fire since they are tracked externally.
- Node failure (Horde): Horde detects the failure and the object is re-started on another node on the next access. The polling scheduler also runs as a cluster singleton and migrates automatically.
- Application restart: All objects start on demand. Pending alarms are picked up by the scheduler once it starts polling.
- Crash during alarm handler: If the server crashes while executing an alarm handler, the alarm remains in the database with its
claimed_attimestamp. After theclaim_ttlexpires (default: 60 seconds), the alarm becomes available for retry.
At-Least-Once Semantics
The polling scheduler provides at-least-once delivery for alarms. An alarm may fire more than once if:
- The handler completes but the process crashes before deletion
- The
claim_ttlexpires and another poller retries a still-processing alarm
Design your handle_alarm/2 callbacks to be idempotent -- safe to execute multiple times with the same effect.