Recon
Recon
, as a module, provides access to the high-level
functionality contained in the ReconEx application.
It has functions in five main categories:
- State information
- Process information is everything that has to do with the
general state of the node. Functions such as
info/1
andinfo/3
are wrappers to provide more details than:erlang.process_info/1
, while providing it in a production-safe manner. They have equivalents to:erlang.process_info/2
in the functionsinfo/2
andinfo/4
, respectively. proc_count/2
andproc_window/3
are to be used when you require information about processes in a larger sense: biggest consumers of given process information (say memory or reductions), either absolutely or over a sliding time window, respectively.bin_leak/1
is a function that can be used to try and see if your Erlang node is leaking refc binaries. See the function itself for more details.- Functions to access node statistics, in a manner somewhat
similar to what vmstats
provides as a library. There are 3 of them:
node_stats_print/2
, which displays them,node_stats_list/2
, which returns them in a list, andnode_stats/4
, which provides a fold-like interface for stats gathering. For CPU usage specifically, seescheduler_usage/1
.
- OTP tools
- This category provides tools to interact with pieces of OTP
more easily. At this point, the only function included is
get_state/1
, which works as a wrapper aroundget_state/2
, which works as a wrapper around:sys.get_state/1
in OTP R16B01, and provides the required functionality for older versions of Erlang.
- Code Handling
- Specific functions are in
Recon
for the sole purpose of interacting with source and compiled code.remote_load/1
andremote_load/2
will allow to take a local module, and load it remotely (in a diskless manner) on another Erlang node you’re connected to. source/1
allows to print the source of a loaded module, in case it’s not available in the currently running node.
- Ports and Sockets
- To make it simpler to debug some network-related issues, recon
contains functions to deal with Erlang ports (raw, file
handles, or inet). Functions
tcp/0
,udp/0
,sctp/0
,files/0
, andport_types/0
will list all the Erlang ports of a given type. The latter function prints counts of all individual types. - Port state information can be useful to figure out why certain
parts of the system misbehave. Functions such as
port_info/1
andport_info/2
are wrappers to provide more similar or more details than:erlang.port_info/1
and:erlang.port_info/2
, and, for inet ports, statistics and options for each socket. - Finally, the functions
inet_count/2
andinet_window/3
provide the absolute or sliding window functionality ofproc_count/2
andproc_count/3
to inet ports and connections currently on the node.
- RPC
- These are wrappers to make RPC work simpler with clusters of
Erlang nodes. Default RPC mechanisms (from the
:rpc
module) make it somewhat painful to call shell-defined funs over node boundaries. The functionsrpc/1
,rpc/2
, andrpc/3
will do it with a simpler interface. - Additionally, when you’re running diagnostic code on remote
nodes and want to know which node evaluated what result, using
named_rpc/1
,named_rpc/2
, andnamed_rpc/3
will wrap the results in a tuple that tells you which node it’s coming from, making it easier to identify bad nodes.
Summary
Functions
Refc binaries can be leaking when barely-busy processes route them
around and do little else, or when extremely busy processes reach a
stable amount of memory allocated and do the vast majority of their
work with refc binaries. When this happens, it may take a very long
while before references get deallocated and refc binaries get to be
garbage collected, leading to out of memory crashes. This function
fetches the number of refc binary references in each process of the
node, garbage collects them, and compares the resulting number of
references in each of them. The function then returns the n
processes that freed the biggest amount of binaries, potentially
highlighting leaks
Returns a list of all file handles open on the node
Shorthand call to get_state(pid_term, 5000)
Fetch the internal state of an OTP process. Calls :sys.get_state/2
directly in OTP R16B01+, and fetches it dynamically on older
versions of OTP
Fetches a given attribute from all inet ports (TCP, UDP, SCTP) and
returns the biggest num
consumers
Fetches a given attribute from all inet ports (TCP, UDP, SCTP) and returns the biggest entries, over a sliding time window
Allows to be similar to :erlang.process_info/1
, but excludes
fields such as the mailbox, which have a tendency to grow and be
unsafe when called in production systems. Also includes a few more
fields than what is usually given (monitors
, monitored_by
,
etc.), and separates the fields in a more readable format based on
the type of information contained
Allows to be similar to :erlang.process_info/2
, but allows to sort
fields by safe categories and pre-selections, avoiding items such as
the mailbox, which may have a tendency to grow and be unsafe when
called in production systems
Equivalent to info(<a.b.c>)
where a
, b
, and c
are integers
part of a pid
Equivalent to info(<a.b.c>, key)
where a
, b
, and c
are
integers part of a pid
Shorthand for named_rpc([node()|nodes()], fun)
Shorthand for named_rpc(nodes, fun, :infinity)
Runs an arbitrary fun (of arity 0) over one or more nodes, and returns the name of the node that computed a given result along with it, in a tuple
Gathers statistics n
time, waiting interval
milliseconds between
each run, and accumulates results using a folding function fold_fun
.
The function will gather statistics in two forms: Absolutes and
Increments
Shorthand for node_stats(n, interval, fn(x, acc) -> [x | acc] end, [])
with the results reversed to be in the right temporal order
Shorthand for node_stats(n, interval, fn(x, _) -> IO.inspect(x, pretty: true) end, :ok)
Allows to be similar to :erlang.port_info/1
, but allows more
flexible port usage: usual ports, ports that were registered locally
(an atom), ports represented as strings ("#Port<0.2013>"
),
Allows to be similar to :erlang.port_info/2
, but allows more
flexible port usage: usual ports, ports that were registered locally
(an atom), ports represented as strings ("#Port<0.2013>"
),
or through an index lookup (2013', for the same result as
”#Port<0.2013>”`)
Shows a list of all different ports on the node with their respective types
Fetches a given attribute from all processes (except the caller) and
returns the biggest num
consumers
Fetches a given attribute from all processes (except the caller) and returns the biggest entries, over a sliding time window
Equivalent remote_load(nodes(), mod)
Loads one or more modules remotely, in a diskless manner. Allows to share code loaded locally with a remote node that doesn’t have it
Shorthand for rpc([node()|nodes()], fun)
Shorthand for rpc(nodes, fun, :infinity)
Runs an arbitrary fn (of arity 0) over one or more nodes
Because Erlang CPU usage as reported from top
isn’t the most
reliable value (due to schedulers doing idle spinning to avoid going
to sleep and impacting latency), a metric exists that is based on
scheduler wall time
Returns a list of all SCTP ports (the data type) open on the node
Obtain the source code of a module compiled with debug_info
. The
returned list sadly does not allow to format the types and typed
records the way they look in the original module, but instead goes
to an intermediary form used in the AST. They will still be placed
in the right module attributes, however
Returns a list of all TCP ports (the data type) open on the node
Returns a list of all UDP ports (the data type) open on the node
Types
inet_attr_name ::
:recv_cnt |
:recv_oct |
:send_cnt |
:send_oct |
:cnt |
:oct
inet_attrs :: {port, attr :: term, [{atom, term}]}
info_location_key :: :initial_call | :current_stacktrace
info_memory_key ::
:memory |
:message_queue_len |
:heap_size |
:total_heap_size |
:garbage_collection
info_meta_key ::
:registered_name |
:dictionary |
:group_leader |
:status
info_signals_key ::
:links |
:monitors |
:monitored_by |
:trap_exit
info_type ::
:meta |
:signals |
:location |
:memory_used |
:work
info_work_key :: :reductions
interval_ms :: pos_integer
nodes :: node | [node, ...]
pid_term ::
pid |
atom |
char_list |
{:global, term} |
{:via, module, term} |
{non_neg_integer, non_neg_integer, non_neg_integer}
port_info_io_key :: :input | :output
port_info_key ::
port_info_meta_key |
port_info_signals_key |
port_info_io_key |
port_info_memory_key |
port_info_specific_key
port_info_memory_key :: :memory | :queue_size
port_info_meta_key ::
:registered_name |
:id |
:name |
:os_pid
port_info_signals_key :: :connected | :links | :monitors
port_info_specific_key :: atom
port_info_type ::
:meta |
:signals |
:io |
:memory_used |
:specific
port_term ::
port |
char_list |
atom |
pos_integer
proc_attrs :: {pid, attr :: term, [name :: atom | {:current_function, mfa} | {:initial_call, mfa}, ...]}
rpc_result :: {[success :: term], [fail :: term]}
stats :: {[absolutes :: {atom, term}], [increments :: {atom, term}]}
time_ms :: pos_integer
timeout_ms :: non_neg_integer | :infinity
Functions
Specs
bin_leak(pos_integer) :: [proc_attrs]
Refc binaries can be leaking when barely-busy processes route them
around and do little else, or when extremely busy processes reach a
stable amount of memory allocated and do the vast majority of their
work with refc binaries. When this happens, it may take a very long
while before references get deallocated and refc binaries get to be
garbage collected, leading to out of memory crashes. This function
fetches the number of refc binary references in each process of the
node, garbage collects them, and compares the resulting number of
references in each of them. The function then returns the n
processes that freed the biggest amount of binaries, potentially
highlighting leaks.
See the Erlang/OTP Efficiency Guide for more details on refc binaries.
Specs
get_state(pid_term, timeout_ms) :: term
Fetch the internal state of an OTP process. Calls :sys.get_state/2
directly in OTP R16B01+, and fetches it dynamically on older
versions of OTP.
Specs
inet_count(inet_attr_name, non_neg_integer) :: [inet_attrs]
Fetches a given attribute from all inet ports (TCP, UDP, SCTP) and
returns the biggest num
consumers.
The values to be used can be the number of octets (bytes) sent,
received, or both (:send_oct
, ~recv_oct
, :oct
, respectively),
or the number of packets sent, received, or both (:send_cnt
,
:recv_cnt
, :cnt
, respectively). Individual absolute values for
each metric will be returned in the 3rd position of the resulting
tuple.
Specs
inet_window(inet_attr_name, non_neg_integer, time_ms) :: [inet_attrs]
Fetches a given attribute from all inet ports (TCP, UDP, SCTP) and returns the biggest entries, over a sliding time window.
Warning: this function depends on data gathered at two snapshots, and then building a dictionary with entries to differentiate them. This can take a heavy toll on memory when you have many dozens of thousands of ports open.
The values to be used can be the number of octets (bytes) sent,
received, or both (:send_oct
, :recv_oct
, :oct
, respectively),
or the number of packets sent, received, or both (:send_cnt
,
:recv_cnt
, :cnt
, respectively). Individual absolute values for
each metric will be returned in the 3rd position of the resulting
tuple.
Allows to be similar to :erlang.process_info/1
, but excludes
fields such as the mailbox, which have a tendency to grow and be
unsafe when called in production systems. Also includes a few more
fields than what is usually given (monitors
, monitored_by
,
etc.), and separates the fields in a more readable format based on
the type of information contained.
Moreover, it will fetch and read information on local processes that
were registered locally (an atom), globally ({:global, name}
), or
through another registry supported in the {:via, module, name}
syntax (must have a module.whereis_name/1
function). Pids can also
be passed in as a string ("PID#<0.39.0>"
, "<0.39.0>"
) or a
triple ({0, 39, 0}
) and will be converted to be used.
Specs
Allows to be similar to :erlang.process_info/2
, but allows to sort
fields by safe categories and pre-selections, avoiding items such as
the mailbox, which may have a tendency to grow and be unsafe when
called in production systems.
Moreover, it will fetch and read information on local processes that
were registered locally (an atom), globally ({:global, name}
), or
through another registry supported in the {:via, module, name}
syntax (must have a module.whereis_name/1
function). Pids can also
be passed in as a string ("#PID<0.39.0>"
, "<0.39.0>"
) or a
triple ({0, 39, 0}
) and will be converted to be used.
Although the type signature doesn’t show it in generated
documentation, a list of arguments or individual arguments accepted
by :erlang.process_info/2' and return them as that function would.
A fake attribute
:binary_memory` is also available to return the
amount of memory used by refc binaries for a process.
Specs
Equivalent to info(<a.b.c>)
where a
, b
, and c
are integers
part of a pid.
Specs
info(non_neg_integer, non_neg_integer, non_neg_integer, key :: info_type | [atom] | atom) :: term
Equivalent to info(<a.b.c>, key)
where a
, b
, and c
are
integers part of a pid.
Specs
named_rpc((() -> term)) :: rpc_result
Shorthand for named_rpc([node()|nodes()], fun)
Specs
named_rpc(nodes, (() -> term)) :: rpc_result
Shorthand for named_rpc(nodes, fun, :infinity)
Specs
named_rpc(nodes, (() -> term), timeout_ms) :: rpc_result
Runs an arbitrary fun (of arity 0) over one or more nodes, and returns the name of the node that computed a given result along with it, in a tuple.
Specs
node_stats(non_neg_integer, interval_ms, fold_fun :: (stats, acc :: term -> term), acc0 :: term) :: acc1 :: term
Gathers statistics n
time, waiting interval
milliseconds between
each run, and accumulates results using a folding function fold_fun
.
The function will gather statistics in two forms: Absolutes and
Increments.
Absolutes are values that keep changing with time, and are useful to know about as a datapoint: process count, size of the run queue, error_logger queue length, and the memory of the node (total, processes, atoms, binaries, and ets tables).
Increments are values that are mostly useful when compared to a previous one to have an idea what they’re doing, because otherwise they’d never stop increasing: bytes in and out of the node, number of garbage collector runs, words of memory that were garbage collected, and the global reductions count for the node.
Specs
node_stats_list(repeat :: non_neg_integer, interval_ms) :: [stats]
Shorthand for node_stats(n, interval, fn(x, acc) -> [x | acc] end, [])
with the results reversed to be in the right temporal order.
Specs
node_stats_print(repeat :: non_neg_integer, interval_ms) :: term
Shorthand for node_stats(n, interval, fn(x, _) -> IO.inspect(x, pretty: true) end, :ok)
Specs
port_info(port_term) :: [{port_info_type, [{port_info_key, term}]}, ...]
Allows to be similar to :erlang.port_info/1
, but allows more
flexible port usage: usual ports, ports that were registered locally
(an atom), ports represented as strings ("#Port<0.2013>"
),
or through an index lookup (2013
, for the same result as
"#Port<0.2013>"
).
Moreover, the function will try to fetch implementation-specific details based on the port type (only inet ports have this feature so far). For example, TCP ports will include information about the remote peer, transfer statistics, and socket options being used.
The information-specific and the basic port info are sorted and
categorized in broader categories (port_info_type()
).
Specs
port_info(port_term, atom) :: {atom, term}
port_info(port_term, [atom]) :: [{atom, term}]
port_info(port_term, port_info_type) :: {port_info_type, [{port_info_key, term}]}
Allows to be similar to :erlang.port_info/2
, but allows more
flexible port usage: usual ports, ports that were registered locally
(an atom), ports represented as strings ("#Port<0.2013>"
),
or through an index lookup (2013', for the same result as
”#Port<0.2013>”).
Moreover, the function allows to to fetch information by category as
defined in
port_info_type(), and although the type signature
doesn't show it in the generated documentation, individual items
accepted by [
:erlang.port_info/2`](http://www.erlang.org/doc/man/erlang.html#port_info-2) are accepted, and lists of them
too.
Specs
port_types :: [{type :: char_list, count :: pos_integer}]
Shows a list of all different ports on the node with their respective types.
Specs
proc_count(attribute_name :: atom, non_neg_integer) :: [proc_attrs]
Fetches a given attribute from all processes (except the caller) and
returns the biggest num
consumers.
Specs
proc_window(attribute_name :: atom, non_neg_integer, milliseconds :: pos_integer) :: [proc_attrs]
Fetches a given attribute from all processes (except the caller) and returns the biggest entries, over a sliding time window.
This function is particularly useful when processes on the node are mostly short-lived, usually too short to inspect through other tools, in order to figure out what kind of processes are eating through a lot resources on a given node.
It is important to see this function as a snapshot over a sliding window. A program’s timeline during sampling might look like this:
--w---- [Sample1] ---x-------------y----- [Sample2] ---z--->
Some processes will live between w
and die at x
, some between
y
and z
, and some between x
and y
. These samples will not be
too significant as they’re incomplete. If the majority of your
processes run between a time interval x
…y
(in absolute terms),
you should make sure that your sampling time is smaller than this so
that for many processes, their lifetime spans the equivalent of w
and z
. Not doing this can skew the results: long-lived processes,
that have 10 times the time to accumulate data (say reductions) will
look like bottlenecks when they’re not one.
Warning: this function depends on data gathered at two snapshots, and then building a dictionary with entries to differentiate them. This can take a heavy toll on memory when you have many dozens of thousands of processes.
Specs
remote_load(nodes, module) :: term
Loads one or more modules remotely, in a diskless manner. Allows to share code loaded locally with a remote node that doesn’t have it.
Specs
rpc(nodes, (() -> term)) :: rpc_result
Shorthand for rpc(nodes, fun, :infinity)
Specs
rpc(nodes, (() -> term), timeout_ms) :: rpc_result
Runs an arbitrary fn (of arity 0) over one or more nodes.
Specs
scheduler_usage(interval_ms) :: [{scheduler_id :: pos_integer, usage :: number}]
Because Erlang CPU usage as reported from top
isn’t the most
reliable value (due to schedulers doing idle spinning to avoid going
to sleep and impacting latency), a metric exists that is based on
scheduler wall time.
For any time interval, Scheduler wall time can be used as a measure of how busy a scheduler is. A scheduler is busy when:
- executing process code
- executing driver code
- executing NIF code
- executing BIFs
- garbage collecting
- doing memory management
A scheduler isn’t busy when doing anything else.
Specs
source(module) :: iolist
Obtain the source code of a module compiled with debug_info
. The
returned list sadly does not allow to format the types and typed
records the way they look in the original module, but instead goes
to an intermediary form used in the AST. They will still be placed
in the right module attributes, however.