Your first compiler with Beaver!

Mix.install([
  {:beaver, "~> 0.2"}
])
:ok

14:06:55.047 [debug] [Beaver] all passes registered

What makes a compiler

Before start coding anything. Let's get some intuition about what a typical compiler is made of.

  • IR
  • Pass
  • CodeGen

A taste of MLIR

Most of LLVM tutorial starts with AST and parser. Guess what we are building it with Elixir and it rocks. So we can start with something fun, generating the IR.

use Beaver
alias Beaver.MLIR.Dialect.{Func, Arith}
require Func
ctx = MLIR.Context.create()

ir =
  mlir ctx: ctx do
    module do
      Func.func some_func(function_type: Type.function([Type.i32()], [Type.i32()])) do
        region do
          block _bb_entry(a >>> Type.i32()) do
            b = Arith.constant(value: Attribute.integer(Type.i32(), 1024)) >>> Type.i32()
            c = Arith.addi(a, b) >>> Type.i32()
            Func.return(c) >>> []
          end
        end
      end
    end
  end

ir |> Beaver.MLIR.to_string() |> IO.puts()
module {
  func.func @some_func(%arg0: i32) -> i32 {
    %c1024_i32 = arith.constant 1024 : i32
    %0 = arith.addi %arg0, %c1024_i32 : i32
    return %0 : i32
  }
}
:ok

As you can see, with MLIR we are not working with those low level LLVM instructions you might have seen somewhere else. Here we have this Func.func and Func.return thing looks like a function and return statement in a programming language. And the Arith stuff seems to be doing some arithmetic for us. Now you might take some time to look at the IR printed by IO.puts and think about it.

If you find the text full of %1, %2 not so interesting, don't worry. The only thing you need to know is that this is called MLIR's "textual form". Let's carry on.

Entering the LLVM realm

The Next Big Thing™ we are going to do is to convert what we got so far to LLVM IR. Why? Long story short, MLIR is like a cinematic universe. Instead of having different hero franchises, we got a bunch of "dialects". The Func, Arith are both dialects. There is a dialect called LLVM we haven't seen. LLVM is a dialect kind of magical and kind of different from others. It is magical that if we can convert IR to LLVM, we can generate native machine code and run it.

This is how it is done with Beaver:

import MLIR.Conversion

llvm_ir =
  ir
  |> convert_arith_to_llvm
  |> MLIR.Pass.Composer.nested("func.func", "llvm-request-c-wrappers")
  |> convert_func_to_llvm
  |> MLIR.Pass.Composer.run!()

llvm_ir |> Beaver.MLIR.to_string() |> IO.puts()
module attributes {llvm.data_layout = ""} {
  llvm.func @some_func(%arg0: i32) -> i32 attributes {llvm.emit_c_interface} {
    %0 = llvm.mlir.constant(1024 : i32) : i32
    %1 = llvm.add %arg0, %0  : i32
    llvm.return %1 : i32
  }
  llvm.func @_mlir_ciface_some_func(%arg0: i32) -> i32 attributes {llvm.emit_c_interface} {
    %0 = llvm.call @some_func(%arg0) : (i32) -> i32
    llvm.return %0 : i32
  }
}
:ok

Now you can see the IR seems to "grow" a little. It is called "lowering". What we just did is running two passes convert_func_to_llvm and convert_arith_to_llvm on our IR. So that the higher level abstraction like Func and Arith are lowered to LLVM IR, which is a representation closer to the hardware instruction.

Remember what I told you, if we get to LLVM Dialect, basically we should get native machine code to run right? Let's do it!

Run it!

jit = MLIR.ExecutionEngine.create!(llvm_ir)

return = Beaver.Native.I32.make(0)
arguments = [Beaver.Native.I32.make(1024)]

MLIR.ExecutionEngine.invoke!(jit, "some_func", arguments, return)
|> Beaver.Native.to_term()
2048

Recap

Congratulations! You have just built your first compiler with Beaver. There could be a lot to unpack and here is the summary:

  • First we generate some MLIR with the mlir/1 macro.
  • Then we convert it to LLVM IR with convert_func_to_llvm and convert_arith_to_llvm passes.
  • Later we create a JIT engine from the generated LLVM IR with MLIR.ExecutionEngine.create! and run the native function.

These three steps are corespondent to what makes a compiler

  • IR
  • Pass
  • CodeGen

In next tutorial we are taking a more programmable approach to generate the IR, converting Elixir AST to MLIR. By doing it we will pick up concepts like region, block, operations appear here but didn't receive much attention for now. You might want to look at the full list of MLIR dialects here.