The Horus library

Version: 0.2.4

Authors: Jean-Sébastien Pédron (jean-sebastien@rabbitmq.com), Michael Davis (mcarsondavis@gmail.com), The RabbitMQ team (info@rabbitmq.com).

Horus is a library that extracts an anonymous function's code as well as the code of the all the functions it calls, and creates a standalone version of it in a new module at runtime.

The goal is to have a storable and transferable function which does not depend on the availability of the modules that defined it or were called.

Fork me on GitHub

How does it work?

To achieve that goal, Horus extracts the assembly code of the anonymous function, watches all calls it does and recursively extracts the assembly code of other called functions. When complete, it creates a standalone Erlang module based on it. This module can be stored, transfered to another Erlang node and executed anywhere without the presence of the initial anonymous function's module.

If the extracted function calls directly or indirectly modules from the erts, kernel or stdlib applications, the called functions are not extracted. That's ok because the behavior of Erlang/OTP modules rarely changes and they will be available. Therefore, there is little value in extracting that code.

While processing the assembly instructions and watching function calls, Horus can use callbacks provided by the caller to determine if instructions and calls are allowed or denied.

The extraction process

Here is what it does in more details:

The assembly code of the module hosting the anonymous function is extracted.
The anonymous function code is located inside that assembly code.
Optionaly, the code is analyzed to determine if it matches constraints defined by the caller. For instance, it does not perform any forbidden operations like:
- sending or receiving inter-process messages
- accessing files or network connections
- calling forbidden functions
Based on the listed function calls, the same steps are repeated for all of them (extract, verify, list calls).
Optionaly, once all the assembly code to have a standalone anonymous function is collected, it checks if it is still needed by the caller.
Finally, an Erlang module is compiled.

Caching

Horus caches assembly code and generated modules. This avoids the need to call the code server or the compiler again and again.

Note that at this point, the cache is never invalidated and purged. Memory management of the cache will be a future improvement.

Extraction and code reloading

In the process, Horus pays attention to modules' checksums. Therefore, if a module is reloaded during a code upgrade, a new standalone function will be generated from the apparently same anonymous function. This applies to all modules called by that anonymous function.

The anonymous function environment

It's possible for an anonymous function to take inputs from arguments of course, but also from the scope it is defined in:

Pid = self(),
Fun = fun() ->
          Pid ! stop
      end,

It is even possible to take another anonymous function from the scope:

InnerFun = fun(Ret) ->
               {ok, Ret}
           end,
OuterFun = fun(Ret) ->
               InnerFun(Ret)
           end,

These variables are "stored" in the anonymous function's environment by Erlang. They are also taken into account by Horus during the extraction. In particular, if the environment references another anonymous function, it will be extracted too.

The returned standalone function contains a standalone copy of the environment. Thus you don't have to worry about.

However, that environment is not part of the generated module. Therefore, a single module will generated for each of InnerFun and OuterFun from the example above.

This avoids the multiplication of generated modules which are loaded at execution time. With this distinction, a single module is generated and loaded. The environment stored in the returned standalone function is then passed to that generated module during execution.

Example of a generated module

Let's take the simplest anonymous function:

fun() ->
    ok
end

Horus generates the following assembly form:

{
 %% This is the generated module name. It is created from the name and
 %% origin of the extraction function and a hash. The hash permits
 %% several copies of the same function after code reloading.
 'horus__erl_eval__-expr/6-fun-2-__97040876',

 %% This is the list of exported functions. `run/0' is the entry point
 %% corresponding to the extracted function and has the same arity (or
 %% perhaps a greater arity if the anonymous function took variables
 %% from the scope).
 %%
 %% There could be other non-exported functions in the module corresponding
 %% to the functions called by the top-level anonymous function.
 [{run,0},
  {module_info,0},
  {module_info,1}],

 %% These are module attributes. Things like `-dialyzer(...).'. There are
 %% none in the generated module.
 [],

 %% The actual functions present in the module at last!
 [
  {function,
   run, %% The name of the function.
   0,   %% Its arity.
   2,   %% The label where the code starts.
   [
    %% Some metadata for this function:
    {label,1},
    {func_info,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
     {atom,run},
     0},
    %% The function body for all its clauses.
    {label,2},
    {move,{atom,ok},{x,0}},
    return]},

  %% The `module/{0,1}' functions are added so the generated module is
  %% compatible with debuggers.
  {function,module_info,0,4,
   [{label,3},
    {line,[{location,"horus.erl",0}]},
    {func_info,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
     {atom,module_info},
     0},
    {label,4},
    {move,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
     {x,0}},
    {call_ext_only,1,{extfunc,erlang,get_module_info,1}}]},

  {function,module_info,1,6,
   [{label,5},
    {line,[{location,"horus.erl",0}]},
    {func_info,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
     {atom,module_info},
     1},
    {label,6},
    {move,{x,0},{x,1}},
    {move,{atom,'horus__erl_eval__-expr/6-fun-2-__97040876'},
     {x,0}},
    {call_ext_only,2,{extfunc,erlang,get_module_info,2}}]}],
 7}}}

Why not store/send the module defining the function?

Here is a description of the usecase that started it all.

Horus doesn't extract the anonymous function code only, but all the functions it calls, whether they sit in the same module or another module. At the same time, it will "analyse" the code and give the caller (through callbacks) the opportunity to deny specific operations or function calls.

For instance, Khepri — which Horus was created for initially — needs to do that as part of the transaction functions feature. In case you don't know, Khepri is a key/value store where keys are organized in a tree. The replication of data relies on the Raft algorithm. Raft is based on state machines where a leader state machine sends a journal of commands to followers. The leader and follower state machines modify their state after applying comands and they must all reach the exact same state. They also must reach the same state again if the journal of commands needs to be replayed. You can learn more from the Khepri documentation.

For transaction functions to fullfill this "reach same state" constraint no matter the node running the transaction function, no matter the date/time or the number of times the function is executed, we need to deny any operations with side effects or taking input from or having output to something external to the state machine. For example:

the transaction function can't send or receive messages to/from other processes
it can't perform any disk I/O
it can't use e.g. persistent_term, self(), nodes(), the current time, etc.

We also need to ensure that the transaction function remains the same if it is executed again in the future, even after an upgrade.

This is where Horus comes into play. Its role is to collect the entire code of the transaction function even if it is split into multiple Erlang functions, accross multiple Erlang modules. This is to prevent that an upgrade changes the behavior of a transaction function.

While Horus collects the code, it uses callbacks provided by the caller to let it say if an operation is allowed or denied. Khepri will deny messages being sent or received, calls to functions such as self() or node() and calls to any functions Khepri doesn't approve.

By default, Horus will stop following calls (for code extraction) when the code calls a module provided by the erts, kernel or stdlib applications. The collected code will keep a regular external function call in this case. This is to avoid the extraction of code we know have no side effects and the behavior will not change between upgrades. Also because some functions can't be extracted because they are simple placeholders replaced by NIFs at runtime.

Generated by EDoc