Telemetry
This guide aims to show you how to instrument and report on :telemetry
events
in your application for cache statistics.
This guide is not focused on explaining :telemetry
, how it works, configure
it, and so on. Instead, it is focused on how we can use :telemetry
for
reporting cache stats. For more information about :telemetry
you can check
the docs, or the Phoenix Telemetry
guide is also recommended.
Instrumenting Nebulex Caches
Each adapter is responsible for providing stats by implementing
Nebulex.Adapter.Stats
behaviour. Nevertheless, Nebulex provides a default
implementation using Erlang counters, which is supported by
the built-in adapters (with all callbacks overridable).
See Nebulex.Adapter.Stats
for more information.
Furthermore, when the :stats
option is enabled, we can use Telemetry for
emitting the current stat values.
First of all, let's configure the dependencies adding :telemetry_metrics
and :telemetry_poller
packages:
def deps do
[
{:nebulex, "2.0"},
{:shards, "~> 1.0"},
{:decorator, "~> 1.3"},
{:telemetry, "~> 0.4"},
{:telemetry_metrics, "~> 0.6"},
{:telemetry_poller, "~> 0.5"}
]
end
Then define the cache and add the configuration:
defmodule MyApp.Cache do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Local
end
Could be configured:
config :my_app, MyApp.Cache,
stats: true,
backend: :shards,
gc_interval: 86_400_000,
max_size: 1_000_000,
gc_cleanup_min_timeout: 10_000,
gc_cleanup_max_timeout: 900_000
Create your Telemetry supervisor at lib/my_app/telemetry.ex
:
# lib/my_app/telemetry.ex
defmodule MyApp.Telemetry do
use Supervisor
import Telemetry.Metrics
def start_link(arg) do
Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
end
def init(_arg) do
children = [
# Configure `:telemetry_poller` for reporting the cache stats
{:telemetry_poller, measurements: periodic_measurements(), period: 10_000},
# For example, we use the console reporter, but you can change it.
# See `:telemetry_metrics` for for information.
{Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
]
Supervisor.init(children, strategy: :one_for_one)
end
defp metrics do
[
# Nebulex Stats Metrics
last_value("nebulex.cache.stats.hits", tags: [:cache]),
last_value("nebulex.cache.stats.misses", tags: [:cache]),
last_value("nebulex.cache.stats.writes", tags: [:cache]),
last_value("nebulex.cache.stats.updates", tags: [:cache]),
last_value("nebulex.cache.stats.evictions", tags: [:cache]),
last_value("nebulex.cache.stats.expirations", tags: [:cache]),
# VM Metrics
summary("vm.memory.total", unit: {:byte, :kilobyte}),
summary("vm.total_run_queue_lengths.total"),
summary("vm.total_run_queue_lengths.cpu"),
summary("vm.total_run_queue_lengths.io")
]
end
defp periodic_measurements do
[
{MyApp.Cache, :dispatch_stats, []}
]
end
end
Make sure to replace
MyApp
by your actual application name.
Then add to your main application's supervision tree
(usually in lib/my_app/application.ex
):
children = [
MyApp.Cache,
MyApp.Telemetry,
...
]
Now start an IEx session and call the server:
iex(1)> MyApp.Cache.get 1
nil
iex(2)> MyApp.Cache.put 1, 1, ttl: 10
:ok
iex(3)> MyApp.Cache.get 1
1
iex(4)> MyApp.Cache.put 2, 2
:ok
iex(5)> MyApp.Cache.delete 2
:ok
iex(6)> Process.sleep(20)
:ok
iex(7)> MyApp.Cache.get 1
nil
iex(2)> MyApp.Cache.replace 1, 11
true
and you should see something like the following output:
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: nebulex.cache.stats
All measurements: %{evictions: 2, expirations: 1, hits: 1, misses: 2, updates: 1, writes: 2}
All metadata: %{cache: MyApp.Cache}
Metric measurement: :hits (last_value)
With value: 1
Tag values: %{cache: MyApp.Cache}
Metric measurement: :misses (last_value)
With value: 2
Tag values: %{cache: MyApp.Cache}
Metric measurement: :writes (last_value)
With value: 2
Tag values: %{cache: MyApp.Cache}
Metric measurement: :updates (last_value)
With value: 1
Tag values: %{cache: MyApp.Cache}
Metric measurement: :evictions (last_value)
With value: 2
Tag values: %{cache: MyApp.Cache}
Metric measurement: :expirations (last_value)
With value: 1
Tag values: %{cache: MyApp.Cache}
Adding other custom metrics
In the same way, you can, for instance, add another periodic measurement for reporting the cache size.
Using our previous cache:
defmodule MyApp.Cache do
use Nebulex.Cache,
otp_app: :my_app,
adapter: Nebulex.Adapters.Local
def dispatch_cache_size do
:telemetry.execute(
[:nebulex, :cache, :size],
%{value: size()},
%{cache: __MODULE__, node: node()}
)
end
end
And add it to the list of periodic measurements in our previously defined supervisor:
defp periodic_measurements do
[
{MyApp.Cache, :dispatch_stats, [[metadata: %{node: node()}]]},
{MyApp.Cache, :dispatch_cache_size, []}
]
end
NOTE: Notice we added in the metadata the node name so that we can use it in the tags of the metrics.
Metrics:
defp metrics do
[
# Nebulex Stats Metrics
last_value("nebulex.cache.stats.hits", tags: [:cache, :node]),
last_value("nebulex.cache.stats.misses", tags: [:cache, :node]),
last_value("nebulex.cache.stats.writes", tags: [:cache, :node]),
last_value("nebulex.cache.stats.updates", tags: [:cache, :node]),
last_value("nebulex.cache.stats.evictions", tags: [:cache, :node]),
last_value("nebulex.cache.stats.expirations", tags: [:cache, :node]),
# Nebulex custom Metrics
last_value("nebulex.cache.size.value", tags: [:cache, :node]),
# VM Metrics
summary("vm.memory.total", unit: {:byte, :kilobyte}),
summary("vm.total_run_queue_lengths.total"),
summary("vm.total_run_queue_lengths.cpu"),
summary("vm.total_run_queue_lengths.io")
]
end
If you start an IEx session like previously, you should see the new metric too:
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: nebulex.cache.stats
All measurements: %{evictions: 0, expirations: 0, hits: 0, misses: 0, updates: 0, writes: 0}
All metadata: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :hits (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :misses (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :writes (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :updates (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :evictions (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :expirations (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: nebulex.cache.size
All measurements: %{value: 0}
All metadata: %{cache: MyApp.Cache, node: :nonode@nohost}
Metric measurement: :value (last_value)
With value: 0
Tag values: %{cache: MyApp.Cache, node: :nonode@nohost}
Instrumenting Multi-level caches
When using the multi-level adapter the returned stats measurements may look a bit different, because the multi-level adapter works as a wrapper for the configured cache levels, so the returned measurements are grouped and consolidated by level. For example, suppose you have the cache:
defmodule MyApp.Multilevel do
use Nebulex.Cache,
otp_app: :nebulex,
adapter: Nebulex.Adapters.Multilevel
defmodule L1 do
use Nebulex.Cache,
otp_app: :nebulex,
adapter: Nebulex.Adapters.Local
end
defmodule L2 do
use Nebulex.Cache,
otp_app: :nebulex,
adapter: Nebulex.Adapters.Partitioned
end
end
Then, when you run MyApp.Multilevel.stats()
you get something like:
%Nebulex.Stats{
measurements: %{
l1: %{evictions: 0, expirations: 0, hits: 0, misses: 0, updates: 0, writes: 0},
l2: %{evictions: 0, expirations: 0, hits: 0, misses: 0, updates: 0, writes: 0},
l3: %{evictions: 0, expirations: 0, hits: 0, misses: 0, updates: 0, writes: 0}
},
metadata: %{
l1: %{
cache: NMyApp.Multilevel.L1,
started_at: ~U[2021-01-10 13:06:04.075084Z]
},
l2: %{
cache: MyApp.Multilevel.L2.Primary,
started_at: ~U[2021-01-10 13:06:04.089888Z]
},
cache: MyApp.Multilevel,
started_at: ~U[2021-01-10 13:06:04.066750Z]
}
}
As you can see, the measurements map has the stats grouped by level, every key is an atom specifying the level and the value is a map with the stats and/or measurements for that level. Based on that, you could define the Telemetry metrics in this way:
[
# L1 metrics
last_value("nebulex.cache.stats.l1.hits",
event_name: "nebulex.cache.stats",
measurement: &get_in(&1, [:l1, :hits]),
tags: [:cache]
),
last_value("nebulex.cache.stats.l1.misses",
event_name: "nebulex.cache.stats",
measurement: &get_in(&1, [:l1, :misses]),
tags: [:cache]
),
last_value("nebulex.cache.stats.l1.writes",
event_name: "nebulex.cache.stats",
measurement: &get_in(&1, [:l1, :writes]),
tags: [:cache]
),
last_value("nebulex.cache.stats.l1.updates",
event_name: "nebulex.cache.stats",
measurement: &get_in(&1, [:l1, :updates]),
tags: [:cache]
),
last_value("nebulex.cache.stats.l1.evictions",
event_name: "nebulex.cache.stats",
measurement: &get_in(&1, [:l1, :evictions]),
tags: [:cache]
),
last_value("nebulex.cache.stats.l1.expirations",
event_name: "nebulex.cache.stats",
measurement: &get_in(&1, [:l1, :expirations]),
tags: [:cache]
),
# L2 metrics
last_value("nebulex.cache.stats.l2.hits",
event_name: "nebulex.cache.stats",
measurement: &get_in(&1, [:l2, :hits]),
tags: [:cache]
),
...
]
If what you need is the aggregated stats for all levels, you can always define
your own function to emit the Telemetry events. You just need to call
MyApp.Multilevel.stats()
and then you add the logic to process the results
in the way you need. On the other hand, if you are using Datadog through the
StatsD reporter, you could do the aggregation directly in Datadog.