Chip as part of a Supervision tree
A supervision tree is a strategy used in the wider erlang ecosystem to keep long-running processeess alive. When a process in any part of the tree terminates the supervisor will attempt to restart the process and any subsequent processess after it. Giving our whole system self-healing properties.
Lets assume we have the simplest “counter” actor code:
pub opaque type Message {
Inc
Oops
}
fn loop(message: Message, count: Int) {
case message {
Inc -> actor.continue(count + 1)
Oops -> panic as "unexpected error"
}
}
Integrating to a supervisor is simple enough if we create a childspec
for our counter:
pub fn main() {
let self = process.new_subject()
let childspec = fn(_param) {
// start the counter
use counter <- try(actor.start(0, loop))
// on success, send counter back to caller.
process.send(self, counter)
Ok(counter)
}
// start all processess under a supervision tree
let assert Ok(_supervisor) =
supervisor.start(fn(children) {
children
|> supervisor.add(supervisor.worker(childspec))
})
do_work(self)
}
With the implementation above, we have asured that our supervisor will keep this specific counter spec alive. In the case of unexpected termination the supervisor will re-run the childspec again to restore this and follow-up processess.
For example lets assume the counter is terminated at the do_work
function:
fn do_work(self) {
// wait to receive the counter's subject, and operate on it
let assert Ok(counter) = process.receive(self, 50)
io.debug(counter)
// lets attempt to restart it and receive another reference
process.send(counter, Oops)
let assert Ok(counter) = process.receive(self, 50)
io.debug(counter)
}
The two subject counter references printed to the terminal are:
Subject(//erl(<0.90.0>), //erl(#Ref<0.3359404744.2770337801.190515>))
Subject(//erl(<0.91.0>), //erl(#Ref<0.3359404744.2770337801.190559>))
We can confirm that both process ids 0.90.0
and 0.91.0
are different, therefore when the first counter was terminated the supervisor restarted it and created 0.91.0
in its place.
All of this requires carrying the self
reference around the program and knowing when to receive the process in case of failure. We may be completely out of scope. This is where a registry may help.
Lets build a start
specification for our counter actor:
pub fn start(registry, tag) {
let init = fn() { init(registry, tag) }
actor.start_spec(actor.Spec(init: init, init_timeout: 10, loop: loop))
}
fn init(registry, id) {
// Create a reference to self
let self = process.new_subject()
// Register the counter under an id on initialization
chip.register(
registry,
self
|> chip.new()
|> chip.tag(id),
)
// Adding self to the selector allows us to receive the Stop message
actor.Ready(
0,
process.new_selector()
|> process.selecting(self, function.identity),
)
}
Then integrate chip and our new start function under our supervisor:
pub fn main() {
let self = process.new_subject()
let childspec_registry = fn(_param) {
use registry <- try(chip.start())
// on success, send the registry back to caller.
process.send(self, registry)
Ok(registry)
}
// Transform initial child parameter to the registry and an id tag
let updater_registry = fn(_param, registry) { #(registry, 1) }
let childspec_counter = fn(param) {
// We now receive the registry and initial id
let #(registry, id) = param
start(registry, id)
}
// Subsequent child counters will increment their id tag
let updater_counter = fn(param, _counter) {
let #(registry, id) = param
#(registry, id + 1)
}
// start all processess under a supervision tree
let assert Ok(_supervisor) =
supervisor.start(fn(children) {
children
|> supervisor.add(
supervisor.worker(childspec_registry)
|> supervisor.returning(updater_registry),
)
|> supervisor.add(
supervisor.worker(childspec_counter)
|> supervisor.returning(updater_counter),
)
})
// wait to receive the registry's subject, and operate on it
let assert Ok(registry) = process.receive(self, 50)
do_work(registry)
}
It is quite a bit of extra code and specification, but keep in mind you may move most of this to helpers. Now we can reference the counter from the registry in our do_work
function.
fn do_work(registry) {
// retrieve the counter's subject, and operate on it
let assert Ok(counter) = chip.find(registry, 1)
io.debug(counter)
// lets attempt to restart it and wait for the registry to update
process.send(counter, Oops)
process.sleep(50)
let assert Ok(counter) = chip.find(registry, 1)
io.debug(counter)
}
Printing us these subjects:
Subject(//erl(<0.91.0>), //erl(#Ref<0.211861637.3582984195.76640>))
Subject(//erl(<0.92.0>), //erl(#Ref<0.211861637.3582984195.76689>))
Granted the registry didn’t solve all our issues, now we need to pass around a copy of our registry’s subject through our app. And the registry may not yet have registered the new subject (we had to wait a few milliseconds for it to update and restart).
These issues are out of scope of chip but may be solved through different techniques. For example, top-level processes may use an “app configuration” library that keeps track of singleton processes, if you’d like to re-purpose chip for this please check the wrapping up chip Guideline for more.