View Source TelemetryMetricsPrometheus.Core (telemetry_metrics_prometheus_core v1.2.1)
Prometheus Reporter for Telemetry.Metrics
definitions.
Provide a list of metric definitions to the child_spec/1
function. It's recommended to
add this to your supervision tree.
def start(_type, _args) do
# List all child processes to be supervised
children = [
{TelemetryMetricsPrometheus.Core, [
metrics: [
counter("http.request.count"),
sum("http.request.payload_size", unit: :byte),
sum("websocket.connection.count", reporter_options: [prometheus_type: :gauge]),
last_value("vm.memory.total", unit: :byte)
]
]}
]
opts = [strategy: :one_for_one, name: ExampleApp.Supervisor]
Supervisor.start_link(children, opts)
end
Note that aggregations for distributions (histogram) only occur at scrape time. These aggregations only have to process events that have occurred since the last scrape, so it's recommended at this time to keep an eye on scrape durations if you're reporting a large number of distributions or you have a high tag cardinality.
Telemetry.Metrics to Prometheus Equivalents
Metric types:
- Counter - Counter
- Distribution - Histogram
- LastValue - Gauge
- Sum - Counter/Gauge
- Summary - Summary (Not supported)
Units
Prometheus recommends the usage of base units for compatibility - Base Units.
This is simple to do with :telemetry
and Telemetry.Metrics
as all memory
related measurements in the BEAM are reported in bytes and Metrics provides
automatic time unit conversions.
Note that measurement unit should used as part of the reported name in the case of histograms and gauges to Prometheus. As such, it is important to explicitly define the unit of measure for these types when the unit is time or memory related.
It is suggested to not mix units, e.g. seconds with milliseconds.
It is required to define your buckets according to the end unit translation since this measurements are converted at the time of handling the event, prior to bucketing.
Memory
Report memory as :byte
.
Time
Report durations as :second
. The BEAM and :telemetry
events use :native
time
units. Converting to seconds is as simple as adding the conversion tuple for
the unit - {:native, :second}
Naming
Telemetry.Metrics
definition names do not translate easily to Prometheus naming
conventions. By default, the name provided when creating your definition uses parts
of the provided name to determine what event to listen to and which event measurement
to use.
For example, "http.request.duration"
results in listening for [:http, :request]
events and use :duration
from the event measurements. Prometheus would recommend
a name of http_request_duration_seconds
as a good name.
It is therefore recommended to use the name in your definition to reflect the name
you wish to see reported, e.g. http.request.duration.seconds
or [:http, :request, :duration, :seconds]
and use the :event_name
override and :measurement
options in your definition.
Example:
Metrics.distribution(
"http.request.duration.seconds",
event_name: [:http, :request, :complete],
measurement: :duration,
unit: {:native, :second},
reporter_options: [
buckets: [0.01, 0.025, 0.05, 0.1, 0.2, 0.5, 1]
]
)
The exporter sanitizes names to Prometheus' requirements (Metric Naming) and joins the event name parts with an underscore.
Labels
Labels in Prometheus are referred to as :tags
in Telemetry.Metrics
- see the docs
for more information on tag usage.
Important: Each tag + value results in a separate time series. For distributions, this
is further complicated as a time series is created for each bucket plus one for measurements
exceeding the limit of the last bucket - +Inf
.
It is recommended, but not required, to abide by Prometheus' best practices regarding labels - Label Best Practices
Reporter Options
In some cases you may want to configure the aspects of a metric definition's
underlying Prometheus metric, such as the bucket boundaries for a
distribution. This can be achieved by passing :reporter_options
to the
metric definition.
The supported :reporter_options
are:
:buckets
- a list of bucket boundaries for distributions. This reporter option is mandatory fordistributions
. Example:
Example:
Metrics.distribution(
"http.request.duration.seconds",
event_name: [:http, :request, :complete],
measurement: :duration,
unit: {:native, :second},
reporter_options: [
buckets: [0.01, 0.025, 0.05, 0.1, 0.2, 0.5, 1]
]
)
:prometheus_type
- the Prometheus type that should be used for a sum, either:counter
or:gauge
. Prometheus counters are monotonic, so a gauge should be used when a sum can increase and decrease. Defaults to:counter
.
Example:
Metrics.sum("websocket.connection.count", reporter_options: [prometheus_type: :gauge])
Missing or Invalid Measurements and Tags
If a measurement value is missing or non-numeric, the error is logged at the debug
level
and the event is not recorded. Events with missing tags are also logged and skipped.
Summary
Functions
Reporter's child spec.
Returns a metrics scrape in Prometheus exposition format for the given reporter
name - defaults to :prometheus_metrics
.
Start the TelemetryMetricsPrometheus.Core.Supervisor
Types
Functions
@spec child_spec(prometheus_options()) :: Supervisor.child_spec()
Reporter's child spec.
This function allows you to start the reporter under a supervisor like this:
children = [ {TelemetryMetricsPrometheus.Core, options} ]
See start_link/1
for options.
Returns a metrics scrape in Prometheus exposition format for the given reporter
name - defaults to :prometheus_metrics
.
@spec start_link(prometheus_options()) :: GenServer.on_start()
Start the TelemetryMetricsPrometheus.Core.Supervisor
Available options:
:name
- name of the reporter instance. Defaults to:prometheus_metrics
:metrics
- a list of metrics to track.:start_async
- used to configure how theTelemetryMetricsPrometheus.Core.Supervisor
GenServer starts. When set to false, all of the metrics defined in:metrics
are initialized in the GenServer'sinit/1
callback effectively blocking the supervision tree from proceeding until all Telemetry event handlers are initialized. This is useful if subsequent supervision tree children emit events on start up and you don't want to miss those events due to an async start. Defaults totrue
.