Monitor system startup and validate firmware
This module provides a default for preventing devices that have failed to complete initialization from either reverting to an earlier firmware or rebooting to try again. Enough time is given so that a device doesn't get into an undebuggable boot loop, but also doesn't wait forever in a state that may also be impossible to debug.
This is a generic default that is intended to be suitable for all use cases.
However, you will eventually find that you can do better, and you are
encouraged to replace it when ready. For example, you may want to confirm
connectivity to a firmware update server before validating a new image just
in case a change broke networking. Please investigate using alarms (via
:alarm_handler or alarmist) for aggregating these checks.
If your Nerves system requires that new firmware images are validated, you
will need this. In other words, if you have to run
Nerves.Runtime.validate_firmware/0 every time you upload new firmware, then
your Nerves system requires validation.
Setup
Add the following to your project's target.exs or config.exs:
config :nerves_runtime, startup_guard_enabled: trueTo handle a case where Erlang starts fine, but somehow hangs before StartupGuard can
register itself with Erlang's heart feature, there's a handshake that needs to occur.
The handshake needs to be enabled in Nerves Heart (which integrates with
Erlang heart), though. To do this, add the following to your project's
rel/vm.args.eex:
## Require an initialization handshake within 10 minutes
-env HEART_INIT_TIMEOUT 600Further discussion
Here's the high level summary of how this works:
- On init, OTP starts up all applications. When it starts up
:nerves_runtime,StartupGuardgets run. StartupGuardregisters a:heartcallback. The callback is a time bomb that starts failing after 15 minutes.StartupGuardgets the list of OTP applications that should be started. Applications marked in the Mix release to only:loadaren't counted.StartupGuardwaits for all expected applications to start- Once everything starts,
StartupGuardvalidates the firmware and removes the:heartcallback. - If anything went wrong, log the errors. Since the
:heartcallback is still registered, the system will be available for debugging, but it will eventually reboot.
One nice alteration to this is to leave the :heart callback in place, but
have it check some kind of "system ok" flag. If you do this, keep in mind
that the callback is totally unforgiving to errors and function calls taking
too long. Making it too complicated can backfire and cause inadvertent
reboots. Rebooting too quickly on errors can impact your ability to debug
partial failures. If using this code as a template, try to keep your code in
Task or change this to a GenServer or anything else that can be
supervised. Decoupling the checks into alarms is another nice pattern.
Troubleshooting
- If getting the log message about exceeding the number of retries for
getting firmware validation status, then
Nerves.Runtime.firmware_validation_status/0is returning:unknown. This is probably due to the Nerves system'sfwup.confnot initializing<slot>.nerves_fw_validatedto0(or1if always valid). - If falling back without logs, try installing
ramoops_loggerto capture log messages that don't make it to disk.
Summary
Functions
Returns a specification to start this module under a supervisor.
Functions
Returns a specification to start this module under a supervisor.
arg is passed as the argument to Task.start_link/1 in the :start field
of the spec.
For more information, see the Supervisor module,
the Supervisor.child_spec/2 function and the Supervisor.child_spec/0 type.