View Source Explorer.Remote (Explorer v0.10.0)
A module responsible for placing remote dataframes and garbage collect them.
The functions in Explorer.DataFrame
and Explorer.Series
will automatically move operations on remote dataframes to
the nodes they belong to. Explorer
also integrates with
FLAME
and automatically tracks remote dataframes and
series returned from FLAME
calls when the :track_resources
option is enabled.
This module provides additional conveniences for manual placement.
Implementation details
In order to understand what this module does, we need to understand the challenges in working with remote series and dataframes.
Series and dataframes are actually NIF resources: they are
pointers to blobs of memory operated by low-level libraries.
Those are represented in Erlang/Elixir as references (the
same as the one returned by make_ref/0
). Once the reference
is garbage collected (based on refcounting), those NIF
resources are garbage collected and the memory is reclaimed.
When using Distributed Erlang, you may write this code:
remote_series = :erpc.call(node, Explorer.Series, :from_list, [[1, 2, 3]])
However, the code above will not work, because the series
will be allocated in the remote node and the remote node
won't hold a reference to said series! This means the series
is garbage collected and if we attempt to read it later on,
from the caller node, it will no longer exist. Therefore,
we must explicitly place these resources in remote nodes
by spawning processes to hold these references. That's what
the place/2
function in this module does.
We also need to guarantee these resources are not kept
forever by these remote nodes, so place/2
creates a
local NIF resource that notifies the remote resources
they have been GCed, effectively implementing a remote
garbage collector.
Summary
Functions
Receives a data structure and traverses it looking for remote dataframes and series.
Functions
Receives a data structure and traverses it looking for remote dataframes and series.
If any is found, it spawns a process on the remote node and sets up a distributed garbage collector. This function only traverses maps, lists, and tuples, it does not support arbitrary structs (such as map sets).
It returns the updated term and a list of remote PIDs spawned.