View Source Telemetry
This guide is not focused on explaining :telemetry
itself, how it works, how
to configure it, and so on. Instead, we will show you how to instrument and
report on Cache Telemetry events in your application when using Nebulex.
For more information about :telemetry
, you can check the
documentation, or the Phoenix Telemetry
guide is also recommended.
Telemetry Events
Many Elixir libraries (including Nebulex) are already using the :telemetry
package as a way to give users more insight into the behavior of their
applications, by emitting events at key moments in the application lifecycle.
Nebulex built-in events
The following events are emitted by all Nebulex caches:
[:nebulex, :cache, :init]
- it is dispatched whenever a cache starts.- Measurement:
%{system_time: System.monotonic_time()}
- Metadata:
%{cache: atom, opts: [term]}
- Measurement:
Adapter-specific events
Nebulex currently suggests the adapters to dispatch the following Telemetry events:
[:my_app, :cache, :command, :start]
- Dispatched by the underlying cache adapter before an adapter callback is executed.- Measurement:
%{system_time: System.monotonic_time()}
- Metadata:
%{adapter_meta: map, function_name: atom, args: [term]}
- Measurement:
[:my_app, :cache, :command, :stop]
- Dispatched by the underlying cache adapter after an adapter callback has been successfully executed.Measurement:
%{duration: native_time}
Metadata:
%{ adapter_meta: map, function_name: atom, args: [term], result: term }
[:my_app, :cache, :command, :exception]
- Dispatched by the underlying cache adapter when an exception is raised while the adapter callback is executed.Measurement:
%{duration: native_time}
Metadata:
%{ adapter_meta: map, function_name: atom, args: [term], kind: :error | :exit | :throw, reason: term, stacktrace: term }
Nebulex Metrics
Assuming you have defined the cache MyApp.Cache
with the default
:telemetry_prefix
, using Telemetry.Metrics
, you can define a
counter metric, which counts how many cache commands were completed:
Telemetry.Metrics.counter("my_app.cache.command.stop.duration")
or you could use a distribution metric to see how many commands were completed in particular time buckets:
Telemetry.Metrics.distribution(
"my_app.cache.command.stop.duration",
buckets: [100, 200, 300]
)
So far, these metrics are only helpful to be able to see just the total number of executed cache commands. What if you wanted to see the average command duration, minimum and maximum, or percentiles, but aggregated per command or callback name? In this case, one could define a summary metric like so:
Telemetry.Metrics.summary(
"my_app.cache.command.stop.duration",
unit: {:native, :millisecond},
tags: [:function_name]
)
As it is described above in the "Adapter-specific events" section, the event
includes the invoked callback name into the metadata as :function_name
, then
we can add it to the metric's tags.
Extracting tag values from adapter's metadata
Let's add another metric for the command event, this time to group by cache, adapter, and function_name (adapter's callback):
Telemetry.Metrics.summary(
"my_app.cache.command.stop.duration",
unit: {:native, :millisecond},
tags: [:cache, :adapter, :function_name],
tag_values:
&Map.merge(&1, %{
cache: &1.adapter_meta.cache,
adapter: &1.adapter_meta.cache.__adapter__()
})
)
We've introduced the :tag_values
option here, because we need to perform a
transformation on the event metadata in order to get to the values we need.
Cache Stats
Each adapter is responsible for providing stats by implementing
Nebulex.Adapter.Stats
behaviour. However, Nebulex provides a simple default
implementation using Erlang counters, which is used by
the built-in local adapter. The local adapter uses
Nebulex.Telemetry.StatsHandler
to aggregate the stats and keep
them updated, therefore, it requires the Telemetry events are dispatched
by the adapter, otherwise, it won't work properly.
Furthermore, when the :stats
option is enabled, we can use Telemetry for
emitting the current stat values.
First of all, make sure you have added :telemetry
, :telemetry_metrics
, and
:telemetry_poller
packages as dependencies to your mix.exs
file.
Let's define out cache module:
defmodule MyApp.Cache do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Local
end
Make sure the :stats
option is set to true
, for example in the
configuration:
config :my_app, MyApp.Cache,
stats: true,
backend: :shards,
gc_interval: :timer.hours(12),
max_size: 1_000_000,
gc_cleanup_min_timeout: :timer.seconds(10),
gc_cleanup_max_timeout: :timer.minutes(10)
Create your Telemetry supervisor at lib/my_app/telemetry.ex
:
# lib/my_app/telemetry.ex
defmodule MyApp.Telemetry do
use Supervisor
import Telemetry.Metrics
def start_link(arg) do
Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
end
def init(_arg) do
children = [
# Configure `:telemetry_poller` for reporting the cache stats
{:telemetry_poller, measurements: periodic_measurements(), period: 10_000},
# For example, we use the console reporter, but you can change it.
# See `:telemetry_metrics` for for information.
{Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
]
Supervisor.init(children, strategy: :one_for_one)
end
defp metrics do
[
# Nebulex Stats Metrics
last_value("my_app.cache.stats.hits", tags: [:cache]),
last_value("my_app.cache.stats.misses", tags: [:cache]),
last_value("my_app.cache.stats.writes", tags: [:cache]),
last_value("my_app.cache.stats.updates", tags: [:cache]),
last_value("my_app.cache.stats.evictions", tags: [:cache]),
last_value("my_app.cache.stats.expirations", tags: [:cache])
]
end
defp periodic_measurements do
[
{MyApp.Cache, :dispatch_stats, []}
]
end
end
Then add to your main application's supervision tree
(usually in lib/my_app/application.ex
):
children = [
MyApp.Cache,
MyApp.Telemetry,
...
]
Now start an IEx session and call the server:
iex(1)> MyApp.Cache.get 1
nil
iex(2)> MyApp.Cache.put 1, 1, ttl: 10
:ok
iex(3)> MyApp.Cache.get 1
1
iex(4)> MyApp.Cache.put 2, 2
:ok
iex(5)> MyApp.Cache.delete 2
:ok
iex(6)> Process.sleep(20)
:ok
iex(7)> MyApp.Cache.get 1
nil
iex(2)> MyApp.Cache.replace 1, 11
true
and you should see something like the following output:
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: my_app.cache.stats
All measurements: %{evictions: 2, expirations: 1, hits: 1, misses: 2, updates: 1, writes: 2}
All metadata: %{cache: MyApp.Cache}
Metric measurement: :hits (last_value)
With value: 1
Tag values: %{cache: MyApp.Cache}
Metric measurement: :misses (last_value)
With value: 2
Tag values: %{cache: MyApp.Cache}
Metric measurement: :writes (last_value)
With value: 2
Tag values: %{cache: MyApp.Cache}
Metric measurement: :updates (last_value)
With value: 1
Tag values: %{cache: MyApp.Cache}
Metric measurement: :evictions (last_value)
With value: 2
Tag values: %{cache: MyApp.Cache}
Metric measurement: :expirations (last_value)
With value: 1
Tag values: %{cache: MyApp.Cache}
Custom metrics
In the same way, for instance, you can add another periodic measurement for reporting the cache size:
defmodule MyApp.Cache do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Local
def dispatch_cache_size do
:telemetry.execute(
[:my_app, :cache, :size],
%{value: size()},
%{cache: __MODULE__, node: node()}
)
end
end
Now let's add a new periodic measurement to invoke dispatch_cache_size()
through :telemetry_poller
:
defp periodic_measurements do
[
{MyApp.Cache, :dispatch_stats, [[metadata: %{node: node()}]]},
{MyApp.Cache, :dispatch_cache_size, []}
]
end
Notice the node name was added to the metadata so we can use it in the metric's tags.
Metrics:
defp metrics do
[
# Nebulex Stats Metrics
last_value("my_app.cache.stats.hits", tags: [:cache, :node]),
last_value("my_app.cache.stats.misses", tags: [:cache, :node]),
last_value("my_app.cache.stats.writes", tags: [:cache, :node]),
last_value("my_app.cache.stats.updates", tags: [:cache, :node]),
last_value("my_app.cache.stats.evictions", tags: [:cache, :node]),
last_value("my_app.cache.stats.expirations", tags: [:cache, :node]),
# Nebulex custom Metrics
last_value("my_app.cache.size.value", tags: [:cache, :node])
]
end
If you start an IEx session like previously, you should see the new metric too:
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: my_app.cache.stats
All measurements: %{evictions: 0, expirations: 0, hits: 0, misses: 0, updates: 0, writes: 0}
All metadata: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :hits (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :misses (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :writes (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :updates (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :evictions (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :expirations (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: my_app.cache.size
All measurements: %{value: 0}
All metadata: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :value (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Multi-level Cache Stats
When using the multi-level adapter the returned stats measurements may look a bit different, because the multi-level adapter works as a wrapper for the configured cache levels, so the returned measurements are grouped and consolidated by level. For example, suppose you have the cache:
defmodule MyApp.Multilevel do
use Nebulex.Cache,
otp_app: :nebulex,
adapter: Nebulex.Adapters.Multilevel
defmodule L1 do
use Nebulex.Cache,
otp_app: :nebulex,
adapter: Nebulex.Adapters.Local
end
defmodule L2 do
use Nebulex.Cache,
otp_app: :nebulex,
adapter: Nebulex.Adapters.Partitioned
end
end
Then, when you run MyApp.Multilevel.stats()
you get something like:
%Nebulex.Stats{
measurements: %{
l1: %{evictions: 0, expirations: 0, hits: 0, misses: 0, updates: 0, writes: 0},
l2: %{evictions: 0, expirations: 0, hits: 0, misses: 0, updates: 0, writes: 0}
},
metadata: %{
l1: %{
cache: NMyApp.Multilevel.L1,
started_at: ~U[2021-01-10 13:06:04.075084Z]
},
l2: %{
cache: MyApp.Multilevel.L2.Primary,
started_at: ~U[2021-01-10 13:06:04.089888Z]
},
cache: MyApp.Multilevel,
started_at: ~U[2021-01-10 13:06:04.066750Z]
}
}
As you can see, the measurements map has the stats grouped by level, every key is an atom specifying the level and the value is a map with the stats and/or measurements for that level. Based on that, you could define the Telemetry metrics in this way:
[
# L1 metrics
last_value("my_app.cache.stats.l1.hits",
event_name: "my_app.cache.stats",
measurement: &get_in(&1, [:l1, :hits]),
tags: [:cache]
),
last_value("my_app.cache.stats.l1.misses",
event_name: "my_app.cache.stats",
measurement: &get_in(&1, [:l1, :misses]),
tags: [:cache]
),
last_value("my_app.cache.stats.l1.writes",
event_name: "my_app.cache.stats",
measurement: &get_in(&1, [:l1, :writes]),
tags: [:cache]
),
last_value("my_app.cache.stats.l1.updates",
event_name: "my_app.cache.stats",
measurement: &get_in(&1, [:l1, :updates]),
tags: [:cache]
),
last_value("my_app.cache.stats.l1.evictions",
event_name: "my_app.cache.stats",
measurement: &get_in(&1, [:l1, :evictions]),
tags: [:cache]
),
last_value("my_app.cache.stats.l1.expirations",
event_name: "my_app.cache.stats",
measurement: &get_in(&1, [:l1, :expirations]),
tags: [:cache]
),
# L2 metrics
last_value("my_app.cache.stats.l2.hits",
event_name: "my_app.cache.stats",
measurement: &get_in(&1, [:l2, :hits]),
tags: [:cache]
),
...
]
If what you need is the aggregated stats for all levels, you can always define
your own function to emit the Telemetry events. You just need to call
MyApp.Multilevel.stats()
and then you add the logic to process the results
in the way you need. On the other hand, if you are using Datadog through the
StatsD reporter, you could do the aggregation directly in Datadog.