Copyright © 2021-2023 VMware, Inc. or its affiliates. All rights reserved.
Version: 0.2.4
Authors: Jean-Sébastien Pédron (jean-sebastien@rabbitmq.com), Michael Davis (mcarsondavis@gmail.com), The RabbitMQ team (info@rabbitmq.com).
Horus is a library that extracts an anonymous function's code as well as the code of the all the functions it calls, and creates a standalone version of it in a new module at runtime.
The goal is to have a storable and transferable function which does not depend on the availability of the modules that defined it or were called.
To achieve that goal, Horus extracts the assembly code of the anonymous function, watches all calls it does and recursively extracts the assembly code of other called functions. When complete, it creates a standalone Erlang module based on it. This module can be stored, transfered to another Erlang node and executed anywhere without the presence of the initial anonymous function's module.
If the extracted function calls directly or indirectly modules from the erts
,
kernel
or stdlib
applications, the called functions are not extracted.
That's ok because the behavior of Erlang/OTP modules rarely changes and they
will be available. Therefore, there is little value in extracting that code.
While processing the assembly instructions and watching function calls, Horus can use callbacks provided by the caller to determine if instructions and calls are allowed or denied.
Horus caches assembly code and generated modules. This avoids the need to call the code server or the compiler again and again.
Note that at this point, the cache is never invalidated and purged. Memory management of the cache will be a future improvement.
In the process, Horus pays attention to modules' checksums. Therefore, if a module is reloaded during a code upgrade, a new standalone function will be generated from the apparently same anonymous function. This applies to all modules called by that anonymous function.
It's possible for an anonymous function to take inputs from arguments of course, but also from the scope it is defined in:
Pid = self(),
Fun = fun() ->
Pid ! stop
end,
It is even possible to take another anonymous function from the scope:
InnerFun = fun(Ret) ->
{ok, Ret}
end,
OuterFun = fun(Ret) ->
InnerFun(Ret)
end,
These variables are "stored" in the anonymous function's environment by Erlang. They are also taken into account by Horus during the extraction. In particular, if the environment references another anonymous function, it will be extracted too.
The returned standalone function contains a standalone copy of the environment. Thus you don't have to worry about.
However, that environment is not part of the generated module. Therefore, a
single module will generated for each of InnerFun
and OuterFun
from the
example above.
This avoids the multiplication of generated modules which are loaded at execution time. With this distinction, a single module is generated and loaded. The environment stored in the returned standalone function is then passed to that generated module during execution.
Let's take the simplest anonymous function:
fun() ->
ok
end
Horus generates the following assembly form:
{
%% This is the generated module name. It is created from the name and
%% origin of the extraction function and a hash. The hash permits
%% several copies of the same function after code reloading.
'horus__erl_eval__-expr/6-fun-2-__97040876',
%% This is the list of exported functions. `run/0' is the entry point
%% corresponding to the extracted function and has the same arity (or
%% perhaps a greater arity if the anonymous function took variables
%% from the scope).
%%
%% There could be other non-exported functions in the module corresponding
%% to the functions called by the top-level anonymous function.
[{run,0},
{module_info,0},
{module_info,1}],
%% These are module attributes. Things like `-dialyzer(...).'. There are
%% none in the generated module.
[],
%% The actual functions present in the module at last!
[
{function,
run, %% The name of the function.
0, %% Its arity.
2, %% The label where the code starts.
[
%% Some metadata for this function:
{label,1},
{func_info,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
{atom,run},
0},
%% The function body for all its clauses.
{label,2},
{move,{atom,ok},{x,0}},
return]},
%% The `module/{0,1}' functions are added so the generated module is
%% compatible with debuggers.
{function,module_info,0,4,
[{label,3},
{line,[{location,"horus.erl",0}]},
{func_info,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
{atom,module_info},
0},
{label,4},
{move,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
{x,0}},
{call_ext_only,1,{extfunc,erlang,get_module_info,1}}]},
{function,module_info,1,6,
[{label,5},
{line,[{location,"horus.erl",0}]},
{func_info,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
{atom,module_info},
1},
{label,6},
{move,{x,0},{x,1}},
{move,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
{x,0}},
{call_ext_only,2,{extfunc,erlang,get_module_info,2}}]}],
7}}}
Here is a description of the usecase that started it all.
Horus doesn't extract the anonymous function code only, but all the functions it calls, whether they sit in the same module or another module. At the same time, it will "analyse" the code and give the caller (through callbacks) the opportunity to deny specific operations or function calls.
For instance, Khepri — which Horus was created for initially — needs to do that as part of the transaction functions feature. In case you don't know, Khepri is a key/value store where keys are organized in a tree. The replication of data relies on the Raft algorithm. Raft is based on state machines where a leader state machine sends a journal of commands to followers. The leader and follower state machines modify their state after applying comands and they must all reach the exact same state. They also must reach the same state again if the journal of commands needs to be replayed. You can learn more from the Khepri documentation.
For transaction functions to fullfill this "reach same state" constraint no matter the node running the transaction function, no matter the date/time or the number of times the function is executed, we need to deny any operations with side effects or taking input from or having output to something external to the state machine. For example:
persistent_term
, self()
, nodes()
, the current time,
etc.We also need to ensure that the transaction function remains the same if it is executed again in the future, even after an upgrade.
This is where Horus comes into play. Its role is to collect the entire code of the transaction function even if it is split into multiple Erlang functions, accross multiple Erlang modules. This is to prevent that an upgrade changes the behavior of a transaction function.
While Horus collects the code, it uses callbacks provided by the caller to let
it say if an operation is allowed or denied. Khepri will deny messages being
sent or received, calls to functions such as self()
or node()
and calls to
any functions Khepri doesn't approve.
erts
, kernel
or stdlib
applications. The
collected code will keep a regular external function call in this case. This is
to avoid the extraction of code we know have no side effects and the behavior
will not change between upgrades. Also because some functions can't be
extracted because they are simple placeholders replaced by NIFs at runtime.
Generated by EDoc