telemetry_metrics v0.1.0 Telemetry.Metrics View Source

Data model and specifications for aggregating Telemetry events.

Metrics are responsible for aggregating Telemetry events with the same name in order to gain any useful knowledge about the events.

Please note that Telemetry.Metrics package itself doesn't provide any functionality for aggregating metrics. This library only defines the data model and specifications for aggregations which should be implemented by reporters - libraries exporting metrics to external systems. You can read more about reporters in the "Reporters" section below.

Data model

Telemetry.Metrics imposes a multi-dimensional data model - a single metric may generate multiple aggregations, each aggregation being bound to a unique set of tag values. Tags are pairs of key-values derived from event metadata (in the simplest case, tags are a subset of the metadata). Based on the tag values, the value of the event will be used to generate one of the aggregations.

For example, imagine that you want to count how many requests are being made against your web application. On each request, you might emit an event with the name of the controller and action handling that request, e.g.:

:telemetry.execute([:http, :request], 1, %{controller: "user_controller", action: "index"})
:telemetry.execute([:http, :request], 1, %{controller: "user_controller", action: "index"})
:telemetry.execute([:http, :request], 1, %{controller: "user_controller", action: "create"})
:telemetry.execute([:http, :request], 1, %{controller: "product_controller", action: "get"})

With multi-dimensional data model, the result of aggregating those events by :controller and :action tags would look like this:

controlleractioncount
user_controllerindex2
user_controllercreate1
product_controllerget1

You can see that the request count is broken down by unique set of tag values.

Metric types

Metric type specifies how the event values are aggregated. Telemetry.Metrics aims to define a set of metric types covering the most common instrumentation patterns.

Metric types below are heavily inspired by OpenCensus.

Counter

Value of the counter metric is the number of emitted events, regardless of event value. It's monotonically increasing and its value is never reset.

Sum

Value of the sum metric is the sum of event values.

LastValue

Value of this metric is the value of the most recent event.

Distribution

The value of this metric is a histogram distribution of event values, i.e. how many events were emitted with values falling into defined buckets. Histogram values can be used to compute approximation of useful statistics about the data, like quantiles, minimum or maximum.

For example, given boundaries [0, 100, 200], the distribution metric produces four values:

  • number of event values less than or equal to 0
  • number of event values greater than 0 and less than or equal to 100
  • number of event values greater than 100 and less than or equal to 200
  • number of event values greater than 200

Metric specifications

Metric specification is a data structure describing the metric - its name, type, name of the events aggregated by the metric, etc. The structure of metric specification is relevant only to authors of reporters.

Metric specifications are created using one of the four functions: counter/2, sum/2, last_value/2 and distribution/2. Each of those functions returns a specification of metric of the corresponding type. The first argument to all these functions is the name of events which are aggregated by the metric. Event name might be represented as in Telemetry, i.e. as a list of atoms ([:http, :request]), or as a string of words joined by dots ("http.request").

Note: do not use data from external sources as metric or event names! Since they are converted to atoms, your application becomes vulnerable to atom leakage and might run out of memory.

The second argument is a list of options. Below is the description of the options common to all metric types:

  • :name - the metric name. Metric name can be represented in the same way as event name. Defaults to event name given as first argument;
  • :tags - tags by which aggregations will be broken down. Defaults to an empty list;
  • :metadata - determines what part of event metadata is used as the source of tag values. Default value is the value of :tags or empty list if :tags are not set. There are three possible values of this option:

    • :all - all event metadata is used;
    • list of terms, e.g. [:table, :kind] - only these keys from the event metadata are used;
    • one argument function taking the event metadata and returning the metadata which should be used to generate tag values
  • :description - human-readable description of the metric. Might be used by reporters for documentation purposes. Defaults to nil;
  • :unit - an atom describing the unit of event values. Might be used by reporters for documentation purposes. Defaults to :unit.

Reporters

Reporters take metric definitions as an input, subscribe to relevant events and update the metrics when the events are emitted. Updating the metric might involve publishing the metrics periodically, or on demand, to external systems. Telemetry.Metrics defines only specification for metric types, and reporters should provide actual implementation for these aggregations.

Rationale

The design proposed by Telemetry.Metrics might look controversial - unlike most of the libraries available on the BEAM, it doesn't aggregate metrics itself, it merely defines what users should expect when using the reporters. There are two arguments for this solution. if Telemetry.Metrics would aggregate metrics, the way those aggregations work would be imposed on the system where the metrics are published to. For example, counters in StatsD are reset on every flush and can be decremented, whereas counters in Prometheus are monotonically increasing. Telemetry.Metrics doesn't focus on those details - instead, it describes what the end user, operator, expects to see when using the metric of particular type. This implies that in most cases aggregated metrics won't be visible inside the BEAM, but in exchange aggregations can be implemented in a way that makes most sense for particular system. Finally, one could also implement an in-VM "reporter" which would aggregate the metrics and expose them inside the BEAM. When there is a need to swap the reporters, and if both reporters are following the metric types specification, then the end result of aggregation is the same, regardless of the backend system in use.

Requirements for reporters

Reporters should accept metric specifications and subscribe to relevant events. When those events are emitted, metric should be updated (either in-memory or by contacting external system) in such a way that the user is able to view metric values as described in the "Metric types" section.

If the reporter does not support the metric given to it, it should log a warning.

Reporters should also document how Telemetry.Metrics metric types, names tags are translated to metric types and identifiers in the system they publish metrics to.

Link to this section Summary

Functions

Returns a specification of counter metric

Returns a specification of distribution metric

Returns a specification of last value metric

Returns a specification of sum metric

Link to this section Types

Link to this type

counter_options() View Source
counter_options() :: [metric_option()]

Link to this type

description() View Source
description() :: nil | String.t()

Link to this type

distribution_options() View Source
distribution_options() :: [
  metric_option() | {:buckets, Telemetry.Metrics.Distribution.buckets()}
]

Link to this type

last_value_options() View Source
last_value_options() :: [metric_option()]

Link to this type

metric_option() View Source
metric_option() ::
  {:name, metric_name()}
  | {:metadata, metadata()}
  | {:tags, tags()}
  | {:description, description()}
  | {:unit, unit()}

Link to this type

metric_type() View Source
metric_type() :: :counter | :sum | :last_value | :distribution

Link to this type

normalized_metric_name() View Source
normalized_metric_name() :: [atom(), ...]

Link to this type

sum_options() View Source
sum_options() :: [metric_option()]

Link to this type

t() View Source
t() :: %module(){
  name: normalized_metric_name(),
  event_name: :telemetry.event_name(),
  metadata: (:telemetry.event_metadata() -> :telemetry.event_metadata()),
  tags: tags(),
  description: description(),
  unit: unit()
}

Common fields for metric specifications

Reporters should assume that these fields are present in all metric specifications.

Link to this section Functions

Returns a specification of counter metric.

See "Metric specifications" section in the top-level documentation of this module for more information.

Example

counter(
  "http.request",
  metadata: [:controller, :action] tags: [:controller, :action]
)

Returns a specification of distribution metric.

For a distribution metric, it is required that you include a :buckets field in the options keyword list.

See "Metric specifications" section in the top-level documentation of this module for more information.

Example

distribution(
  "http.request",
  buckets: [100, 200, 300],
  tags: [:controller, :action],
)
Link to this function

last_value(event_name, options \\ []) View Source

Returns a specification of last value metric.

See "Metric specifications" section in the top-level documentation of this module for more information.

Example

last_value(
  "vm.memory.total",
  description: "Total amount of memory allocated by the Erlang VM", unit: :byte
)

Returns a specification of sum metric.

See "Metric specifications" section in the top-level documentation of this module for more information.

Example

sum("user.session_count.change", name: "user.session_count", metadata: [:role], tags: [:role])