Open Gleametry
Basic usage:
import opengleametry/span
{
use ctx <- span.with("example span", [])
todo
}
Which results in a open telemetry ‘spanning’ the duration of the code in the use block. Such a span can have multiple properties, which you can set on construction (an empty list here), or during the code execution, see the documentation. One important property is ‘error’.
With extra infrastructure, These spans can then stored and queried for monitoring, debugging, performance and more reasons.
It is quite common to have spans within spans, say an actor that creates a span when receiving a message, then calls a DB layer, where the function creates its own span. These spans are automatically bound together, Open Telemetry calls this a trace.
Serious work
The example above makes a noop trace, since there is no more than the opentelemetry_api.
See example/ for the simplest example app that will emit a trace;
it contains all of the instructions of this section.
Be sure to run it with the instructions of the next section.
Exporter and Collector
The ‘noop’ bit is serious. The intent here is that the telemetry declarations (called ‘instrumentation’) impact your application as little as possible. That means ‘noop’ when there is no need to store the metrics, and it means an offloading as much as possible when there is need to store the metrics.
In the latter case, Open Telemetry needs an sdk to pick up the
telemetry and an exporter to throw it over the fence to a system that
stores it, a Collector. For this, your gleam application needs to
depend on opengleametry, and also on opentelemetry sdk and
opentelemetry_exporter or another exporter.
For the mentioned exporter, you best wait 1 second before launching your
application - early spans are lost… You also should have your app run at
least 5 seconds (demo effect), otherwise that exporter will not even initialise
completely. Use the gleam logging package with its default
logging.configure() for the “INFO” to show up.
It is a good idea to set up inets and ssl as extra applications (also to get rid of an error when booting),
in your gleam.toml, like this:
[erlang]
extra_applications = ["inets", "ssl"]
Gleam will pick up the opentelemetry_exporter and opentelemetry
applications automatically.
Collecting
With all that set, the exporter still does not do anything: you need to point it to a collector, and set another value or two when running your gleam application, e.g:
For bash:
export OTEL_SERVICE_NAME="your_application"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_HEADERS="x-service-api-key=12345"
gleam run
For zsh, fish:
set -x OTEL_SERVICE_NAME "your_application"
set -x OTEL_EXPORTER_OTLP_ENDPOINT "http://localhost:4318"
set -x OTEL_EXPORTER_OTLP_HEADERS "x-service-api-key=12345"
gleam run
and you need to run a collector, e.g. jaeger all-in-one docker image:
docker run --rm --name jaeger --network host \
cr.jaegertracing.io/jaegertracing/jaeger:2.10.0
jaeger all-in-one serves a [UI[(http://localhost:16686) for your browser.
Also, grafana:
docker run --rm --network host -ti grafana/otel-lgtm
Asynchronous work
When passing a link to a process or an actor (for example in this simple way), a span can be connected to its source.
// a link needs a current context; we create one here
use ctx <- span.with("Main", [])
let link = link.current()
let _pid = process.spawn(fn() {
use ctx <- span.with_links("nested", [link], [])
echo ctx
})
Crash Reports
By adding
opengleametry.remove_default_handler()
opengleametry.set_sasl_handler()
at the start of your program, you stop showing default crash reports (and more) and secondly, redirect those to opengleametry which creates a span named crash_report, with some useful information from the report; a report includes the process dictionary of the crashed process, which in turn includes the otel/opentelemetry trace and span id, iff the process crashed within any span.with block. That is, the crash report contains a link (aka reference) to the span in which the crash happened.
This assists with debugging unexpected problems in production.
To Do
A lot
-
Enable basic OTEL instrumentation
use ctx <- span.with("name", []) - This package will not force libraries to depend on exporters (this means that applications that do not include sdk and exporter, will have noop traces)
-
Enable persistent links by exposing the opentelemetry
Linktype - Provide values from opentelemetry semantic conventions (sidestep the erlang package, as wrapping atoms makes no sense).
-
wrap
?add_event - do for supervisor reports what we did for crash reports
Other
There are no tests, since we are only manipulating typed ffi data.
Check out Erlang “trace” concept. https://www.erlang.org/doc/apps/kernel/trace