View Source Axon (Axon v0.1.0)

A high-level interface for creating neural network models.

Axon is built entirely on top of Nx numerical definitions, so every neural network can be JIT or AOT compiled using any Nx compiler, or even transformed into high-level neural network formats like TensorFlow Lite and ONNX.

model-creation

Model Creation

All Axon models start with an input layer, specifying the expected input shape of the training data:

input = Axon.input({nil, 784}, "input")

Notice you can specify some dimensions as nil, indicating that the dimension size will be filled in at model runtime. You can then compose inputs with other layers:

model =
  input
  |> Axon.dense(128, activation: :relu)
  |> Axon.batch_norm()
  |> Axon.dropout(rate: 0.8)
  |> Axon.dense(64)
  |> Axon.tanh()
  |> Axon.dense(10)
  |> Axon.activation(:softmax)

You can inspect the model for a nice summary:

IO.inspect(model)

---------------------------------------------------------------------------------------------------------
                                                  Model
=========================================================================================================
 Layer                                   Shape        Policy              Parameters   Parameters Memory
=========================================================================================================
 input ( input )                         {nil, 784}   p=f32 c=f32 o=f32   0            0 bytes
 dense_0 ( dense["input"] )              {nil, 128}   p=f32 c=f32 o=f32   100480       401920 bytes
 relu_0 ( relu["dense_0"] )              {nil, 128}   p=f32 c=f32 o=f32   0            0 bytes
 batch_norm_0 ( batch_norm["relu_0"] )   {nil, 128}   p=f32 c=f32 o=f32   512          2048 bytes
 dropout_0 ( dropout["batch_norm_0"] )   {nil, 128}   p=f32 c=f32 o=f32   0            0 bytes
 dense_1 ( dense["dropout_0"] )          {nil, 64}    p=f32 c=f32 o=f32   8256         33024 bytes
 tanh_0 ( tanh["dense_1"] )              {nil, 64}    p=f32 c=f32 o=f32   0            0 bytes
 dense_2 ( dense["tanh_0"] )             {nil, 10}    p=f32 c=f32 o=f32   650          2600 bytes
 softmax_0 ( softmax["dense_2"] )        {nil, 10}    p=f32 c=f32 o=f32   0            0 bytes
---------------------------------------------------------------------------------------------------------
Total Parameters: 109898
Total Parameters Memory: 439592 bytes
Inputs: %{"input" => {nil, 784}}

multiple-inputs

Multiple Inputs

Creating a model with multiple inputs is as easy as declaring an additional input in your Axon graph. Every input layer present in the final Axon graph will be required to be passed as input at the time of model execution.

inp1 = Axon.input({nil, 1}, "input_0")
inp2 = Axon.input({nil, 1}, "input_1")

# Both inputs will be used
model1 = Axon.add(inp1, inp2)

# Only inp2 will be used
model2 = Axon.add(inp2, inp2)

Axon graphs are immutable, which means composing and manipulating an Axon graph creates an entirely new graph. Additionally, layer names are lazily generated at model execution time. To avoid non-deterministic input orderings and names, Axon requires each input to have a unique binary identifier. You can then reference inputs by name when passing to models at execution time:

inp1 = Axon.input({nil, 1}, "input_0")
inp2 = Axon.input({nil, 1}, "input_1")

model1 = Axon.add(inp1, inp2)
params1 = Axon.init(model1)
# Inputs are referenced by name
Axon.predict(model1, params1, %{"input_0" => x, "input_1" => y})

multiple-outputs

Multiple Outputs

Nx offers robust container support which is extended to Axon. Axon allows you to wrap any valid Nx container in a layer. Containers are most commonly used to structure outputs:

inp1 = Axon.input({nil, 1}, "input_0")
inp2 = Axon.input({nil, 1}, "input_1")
model = Axon.container(%{foo: inp1, bar: inp2})

Containers can be arbitrarily nested:

inp1 = Axon.input({nil, 1}, "input_0")
inp2 = Axon.input({nil, 1}, "input_1")
model = Axon.container({%{foo: {inp1, %{bar: inp2}}}})

You can even use custom structs which implement the container protocol:

inp1 = Axon.input({nil, 1}, "input_0")
inp2 = Axon.input({nil, 1}, "input_1")
model = Axon.container(%MyStruct{foo: inp1, bar: inp2})

custom-layers

Custom Layers

If you find that Axon's built-in layers are insufficient for your needs, you can create your own using the custom layer API. All of Axon's built-in layers (aside from special ones such as input, constant, and container) make use of this same API.

Axon layers are really just placeholders for Nx computations with trainable parameters and possibly state. To define a custom layer, you just need to define a defn implementation:

defn my_layer(x, weight, _opts \ []) do
  Nx.atan2(x, weight)
end

Notice the only stipulation is that your custom layer implementation must accept at least 1 input and a list of options. At execution time, every layer will be passed a :mode option which can be used to control behavior at training and inference time.

Inputs to your custom layer can be either Axon graph inputs or trainable parameters. You can pass Axon graph inputs as-is to a custom layer. To declare trainable parameters, use Axon.param/3:

weight = Axon.param(input_shape, "weight")

To create a custom layer, you "wrap" your implementation and inputs into a layer using Axon.layer. You'll notice the API mirrors Elixir's apply:

def atan2_layer(%Axon{output_shape: shape} = input) do
  weight = Axon.param(input_shape, "weight")
  Axon.layer(&my_layer/3, [input, weight])
end

model-execution

Model Execution

Under the hood, Axon models are represented as Elixir structs. You can initialize and apply models using the macros Axon.init/3 and Axon.predict/4:

params = Axon.init(model, compiler: EXLA)

Axon.predict(model, params, inputs, compiler: EXLA, mode: :train)

It is suggested that you set compiler options globally rather than pass them as options to execution macros:

EXLA.set_as_nx_default([:tpu, :cuda, :rocm, :host])

params = Axon.init(model)
Axon.predict(model, params, inputs, mode: :train)

Axon.predict/4 by default runs in inference mode, which performs certain optimizations and removes layers such as dropout layers. If constructing a training step using Axon.predict/4, be sure to specify mode: :train.

model-training

Model Training

Combining the Axon model creation API with the optimization and training APIs, you can create and train neural networks with ease:

model =
  Axon.input({nil, 784}, "input_0")
  |> Axon.dense(128, activation: :relu)
  |> Axon.layer_norm()
  |> Axon.dropout()
  |> Axon.dense(10, activation: :softmax)

IO.inspect model

model_state =
  model
  |> Axon.Loop.trainer(:categorical_cross_entropy, Axon.Optimizers.adamw(0.005))
  |> Axon.Loop.run(train_data, epochs: 10, compiler: EXLA)

See Axon.Updates and Axon.Loop for a more in-depth treatment of model optimization and model training.

Link to this section Summary

Layers: Special

Adds a constant layer to the network.

Adds a container layer to the network.

Adds an input layer to the network.

Custom Axon layer with given inputs.

Applies the given Nx expression to the input.

Layers: Activation

Adds an activation layer to the network.

Adds a Continuously-differentiable exponential linear unit activation layer to the network.

Adds an Exponential linear unit activation layer to the network.

Adds an Exponential activation layer to the network.

Adds a Gaussian error linear unit activation layer to the network.

Adds a Hard sigmoid activation layer to the network.

Adds a Hard sigmoid weighted linear unit activation layer to the network.

Adds a Hard hyperbolic tangent activation layer to the network.

Adds a Leaky rectified linear unit activation layer to the network.

Adds a Linear activation layer to the network.

Adds a Log-sigmoid activation layer to the network.

Adds a Log-softmax activation layer to the network.

Adds a Mish activation layer to the network.

Adds a Rectified linear unit 6 activation layer to the network.

Adds a Rectified linear unit activation layer to the network.

Adds a Scaled exponential linear unit activation layer to the network.

Adds a Sigmoid activation layer to the network.

Adds a Sigmoid weighted linear unit activation layer to the network.

Adds a Softmax activation layer to the network.

Adds a Softplus activation layer to the network.

Adds a Softsign activation layer to the network.

Adds a Hyperbolic tangent activation layer to the network.

Layers: Linear

Adds a bias layer to the network.

Adds a bilinear layer to the network.

Adds a dense layer to the network.

Adds an embedding layer to the network.

Layers: Convolution

Adds a convolution layer to the network.

Adds a transposed convolution layer to the network.

Adds a depthwise convolution layer to the network.

Adds a depthwise separable 2-dimensional convolution to the network.

Adds a depthwise separable 3-dimensional convolution to the network.

Layers: Dropout

Adds an Alpha dropout layer to the network.

Adds a Dropout layer to the network.

Adds a Feature alpha dropout layer to the network.

Adds a Spatial dropout layer to the network.

Layers: Pooling

Adds an Adaptive average pool layer to the network.

Adds an Adaptive power average pool layer to the network.

Adds an Adaptive max pool layer to the network.

Adds an Average pool layer to the network.

Adds a Global average pool layer to the network.

Adds a Global LP pool layer to the network.

Adds a Global max pool layer to the network.

Adds a Power average pool layer to the network.

Adds a Max pool layer to the network.

Layers: Normalization

Adds a Batch normalization layer to the network.

Adds a group normalization layer to the network.

Adds an Instance normalization layer to the network.

Adds a Layer normalization layer to the network.

Layers: Recurrent

Adds a convolutional long short-term memory (LSTM) layer to the network with a random initial hidden state.

Adds a convolutional long short-term memory (LSTM) layer to the network with the given initial hidden state..

Adds a gated recurrent unit (GRU) layer to the network with a random initial hidden state.

Adds a gated recurrent unit (GRU) layer to the network with the given initial hidden state.

Adds a long short-term memory (LSTM) layer to the network with a random initial hidden state.

Adds a long short-term memory (LSTM) layer to the network with the given initial hidden state.

Layers: Shape

Adds a flatten layer to the network.

Adds a pad layer to the network.

Adds a reshape layer to the network.

Adds a resize layer to the network.

Adds a transpose layer to the network.

Model: Execution

Compiles and runs the given models initialization function with the given compiler options.

Compiles and runs the given Axon model with params on input with the given compiler options.

Functions

Adds a add layer to the network.

Attaches a hook to the given Axon model.

Compiles the given model to {init_fn, predict_fn}.

Adds a concatenate layer to the network.

Adds a conditional layer which conditionally executes true_graph or false_graph based on the condition cond_fn at runtime.

Deserializes serialized model and parameters into a {model, params} tuple.

Freezes parameters returned from fun in the given model. fun takes the model's parameter list and returns the list of parameters it wishes to freeze. fun defaults to the identity function, freezing all of the parameters in model.

Returns the model's signature as a tuple of {input_shape, output_shape}.

Adds a multiply layer to the network.

Wraps an Axon model into a namespace.

Trainable Axon parameter used to create custom layers.

Serializes a model and its parameters for persisting models to disk or elsewhere.

Splits input graph into a container of n input graphs along the given axis.

Adds a subtract layer to the network.

Traverses a model tree applying fun to each layer.

Traverses a model applying fun with an accumulator.

Link to this section Types

@type t() :: %Axon{
  args: term(),
  hooks: term(),
  id: term(),
  name: term(),
  op: term(),
  op_name: term(),
  opts: term(),
  output_shape: term(),
  parameters: term(),
  parent: term(),
  policy: term()
}

Link to this section Layers: Special

Link to this function

constant(tensor, opts \\ [])

View Source

Adds a constant layer to the network.

Constant layers encapsulate Nx tensors in an Axon layer for ease of use with other Axon layers. They can be used interchangeably with other Axon layers:

inp = Axon.input({nil, 32}, "input")
my_constant = Axon.constant(Nx.iota({1, 32}))
model = Axon.add(inp, my_constant)

Constant layers will be cast according to the mixed precision policy. If it's important for your constant to retain it's type during the computation, you will need to set the mixed precision policy to ignore constant layers.

options

Options

  • :name - layer name.
Link to this function

container(container, opts \\ [])

View Source

Adds a container layer to the network.

In certain cases you may want your model to have multiple outputs. In order to make this work, you must "join" the outputs into an Axon layer using this function for use in initialization and inference later on.

The given container can be any valid Axon Nx container.

options

Options

  • :name - layer name.

examples

Examples

iex> inp1 = Axon.input({nil, 1}, "input_0")
iex> inp2 = Axon.input({nil, 2}, "input_1")
iex> model = Axon.container(%{a: inp1, b: inp2})
iex> %{a: a, b: b} = Axon.predict(model, %{}, %{
...>    "input_0" => Nx.tensor([[1.0]]),
...>    "input_1" => Nx.tensor([[1.0, 2.0]])
...> })
iex> a
#Nx.Tensor<
  f32[1][1]
  [
    [1.0]
  ]
>
iex> b
#Nx.Tensor<
  f32[1][2]
  [
    [1.0, 2.0]
  ]
>
Link to this function

input(input_shape, name)

View Source

Adds an input layer to the network.

Input layers specify a model's inputs. Input layers are always the root layers of the neural network.

You must specify the input layers name, which will be used to uniquely identify it in the case of multiple inputs.

Link to this function

layer(op, inputs, opts \\ [])

View Source

Custom Axon layer with given inputs.

Inputs may be other Axon layers or trainable parameters created with Axon.param. At inference time, op will be applied with inputs in specified order and an additional opts parameter which specifies inference options. All options passed to layer are forwarded to inference function except:

  • :shape - specify layer output shape to bypass shape inference.
  • :name - layer name.
  • :op_name - layer operation for inspection and building parameter map.

Note this means your layer should not use these as input options, as they will always be dropped during inference compilation.

Axon's compiler will additionally forward the following options to every layer at inference time:

  • :mode - :inference or :train. To control layer behavior based on inference or train time.

op is a function of the form:

fun = fn input, weight, bias, _opts ->
  input * weight + bias
end
Link to this function

nx(input, fun, opts \\ [])

View Source

Applies the given Nx expression to the input.

Nx layers are meant for quick applications of functions without trainable parameters. For example, they are useful for applying functions which apply accessors to containers:

model = Axon.container({foo, bar})
Axon.nx(model, &elem(&1, 0))

options

Options

  • :name - layer name.

Link to this section Layers: Activation

Link to this function

activation(x, activation, opts \\ [])

View Source

Adds an activation layer to the network.

Activation layers are element-wise functions typically called after the output of another layer.

options

Options

  • :name - layer name.

Adds a Continuously-differentiable exponential linear unit activation layer to the network.

See Axon.Activations.celu/1 for more details.

options

Options

  • :name - layer name.

Adds an Exponential linear unit activation layer to the network.

See Axon.Activations.elu/1 for more details.

options

Options

  • :name - layer name.

Adds an Exponential activation layer to the network.

See Axon.Activations.exp/1 for more details.

options

Options

  • :name - layer name.

Adds a Gaussian error linear unit activation layer to the network.

See Axon.Activations.gelu/1 for more details.

options

Options

  • :name - layer name.
Link to this function

hard_sigmoid(x, opts \\ [])

View Source

Adds a Hard sigmoid activation layer to the network.

See Axon.Activations.hard_sigmoid/1 for more details.

options

Options

  • :name - layer name.
Link to this function

hard_silu(x, opts \\ [])

View Source

Adds a Hard sigmoid weighted linear unit activation layer to the network.

See Axon.Activations.hard_silu/1 for more details.

options

Options

  • :name - layer name.
Link to this function

hard_tanh(x, opts \\ [])

View Source

Adds a Hard hyperbolic tangent activation layer to the network.

See Axon.Activations.hard_tanh/1 for more details.

options

Options

  • :name - layer name.
Link to this function

leaky_relu(x, opts \\ [])

View Source

Adds a Leaky rectified linear unit activation layer to the network.

See Axon.Activations.leaky_relu/1 for more details.

options

Options

  • :name - layer name.

Adds a Linear activation layer to the network.

See Axon.Activations.linear/1 for more details.

options

Options

  • :name - layer name.
Link to this function

log_sigmoid(x, opts \\ [])

View Source

Adds a Log-sigmoid activation layer to the network.

See Axon.Activations.log_sigmoid/1 for more details.

options

Options

  • :name - layer name.
Link to this function

log_softmax(x, opts \\ [])

View Source

Adds a Log-softmax activation layer to the network.

See Axon.Activations.log_softmax/1 for more details.

options

Options

  • :name - layer name.

Adds a Mish activation layer to the network.

See Axon.Activations.mish/1 for more details.

options

Options

  • :name - layer name.

Adds a Rectified linear unit 6 activation layer to the network.

See Axon.Activations.relu6/1 for more details.

options

Options

  • :name - layer name.

Adds a Rectified linear unit activation layer to the network.

See Axon.Activations.relu/1 for more details.

options

Options

  • :name - layer name.

Adds a Scaled exponential linear unit activation layer to the network.

See Axon.Activations.selu/1 for more details.

options

Options

  • :name - layer name.

Adds a Sigmoid activation layer to the network.

See Axon.Activations.sigmoid/1 for more details.

options

Options

  • :name - layer name.

Adds a Sigmoid weighted linear unit activation layer to the network.

See Axon.Activations.silu/1 for more details.

options

Options

  • :name - layer name.

Adds a Softmax activation layer to the network.

See Axon.Activations.softmax/1 for more details.

options

Options

  • :name - layer name.

Adds a Softplus activation layer to the network.

See Axon.Activations.softplus/1 for more details.

options

Options

  • :name - layer name.

Adds a Softsign activation layer to the network.

See Axon.Activations.softsign/1 for more details.

options

Options

  • :name - layer name.

Adds a Hyperbolic tangent activation layer to the network.

See Axon.Activations.tanh/1 for more details.

options

Options

  • :name - layer name.

Link to this section Layers: Linear

Adds a bias layer to the network.

A bias layer simply adds a trainable bias to an input.

options

Options

  • :name - layer name.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros.

Link to this function

bilinear(input1, input2, units, opts \\ [])

View Source

Adds a bilinear layer to the network.

The bilinear layer implements:

output = activation(dot(dot(input1, kernel), input2) + bias)

where activation is given by the :activation option and both kernel and bias are layer parameters. units specifies the number of output units.

All dimensions but the last of input1 and input2 must match. The batch sizes of both inputs must also match or at least one must be nil. Inferred output batch size coerces to the strictest input batch size.

Compiles to Axon.Layers.bilinear/5.

options

Options

  • :name - layer name.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros.

  • :activation - element-wise activation function.

  • :use_bias - whether the layer should add bias to the output. Defaults to true.

Link to this function

dense(x, units, opts \\ [])

View Source

Adds a dense layer to the network.

The dense layer implements:

output = activation(dot(input, kernel) + bias)

where activation is given by the :activation option and both kernel and bias are layer parameters. units specifies the number of output units.

Compiles to Axon.Layers.dense/4.

options

Options

  • :name - layer name.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros.

  • :activation - element-wise activation function.

  • :use_bias - whether the layer should add bias to the output. Defaults to true.

Link to this function

embedding(x, vocab_size, embedding_size, opts \\ [])

View Source

Adds an embedding layer to the network.

An embedding layer initializes a kernel of shape {vocab_size, embedding_size} which acts as a lookup table for sequences of discrete tokens (e.g. sentences). Embeddings are typically used to obtain a dense representation of a sparse input space.

options

Options

  • :name - layer name.

  • :kernel_initializer - initializer for kernel weights. Defaults to :uniform.

Link to this section Layers: Convolution

Link to this function

conv(x, units, opts \\ [])

View Source

Adds a convolution layer to the network.

The convolution layer implements a general dimensional convolutional layer - which convolves a kernel over the input to produce an output.

Compiles to Axon.Layers.conv/4.

options

Options

  • :name - layer name.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros

  • :activation - element-wise activation function.

  • :use_bias - whether the layer should add bias to the output. Defaults to true

  • :kernel_size - size of the kernel spatial dimensions. Defaults to 1.

  • :strides - stride during convolution. Defaults to 1.

  • :padding - padding to the spatial dimensions of the input. Defaults to :valid.

  • :input_dilation - dilation to apply to input. Defaults to 1.

  • :kernel_dilation - dilation to apply to kernel. Defaults to 1.

  • :feature_group_size - feature group size for convolution. Defaults to 1.

  • :channels - channels location. One of :first or :last. Defaults to :first.

Link to this function

conv_transpose(x, units, opts \\ [])

View Source

Adds a transposed convolution layer to the network.

The transposed convolution layer is sometimes referred to as a fractionally strided convolution or (incorrectly) as a deconvolution.

Compiles to Axon.Layers.conv_transpose/4.

options

Options

  • :name - layer name.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros

  • :activation - element-wise activation function.

  • :use_bias - whether the layer should add bias to the output. Defaults to true

  • :kernel_size - size of the kernel spatial dimensions. Defaults to 1.

  • :strides - stride during convolution. Defaults to 1.

  • :padding - padding to the spatial dimensions of the input. Defaults to :valid.

  • :kernel_dilation - dilation to apply to kernel. Defaults to 1.

  • :channels - channels location. One of :first or :last. Defaults to :first.

Link to this function

depthwise_conv(x, channel_multiplier, opts \\ [])

View Source

Adds a depthwise convolution layer to the network.

The depthwise convolution layer implements a general dimensional depthwise convolution - which is a convolution where the feature group size is equal to the number of input channels.

Channel multiplier grows the input channels by the given factor. An input factor of 1 means the output channels are the same as the input channels.

Compiles to Axon.Layers.depthwise_conv/4.

options

Options

  • :name - layer name.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros

  • :activation - element-wise activation function.

  • :use_bias - whether the layer should add bias to the output. Defaults to true

  • :kernel_size - size of the kernel spatial dimensions. Defaults to 1.

  • :strides - stride during convolution. Defaults to 1.

  • :padding - padding to the spatial dimensions of the input. Defaults to :valid.

  • :input_dilation - dilation to apply to input. Defaults to 1.

  • :kernel_dilation - dilation to apply to kernel. Defaults to 1.

  • :channels - channels location. One of :first or :last. Defaults to :first.

Link to this function

separable_conv2d(x, channel_multiplier, opts \\ [])

View Source

Adds a depthwise separable 2-dimensional convolution to the network.

Depthwise separable convolutions break the kernel into kernels for each dimension of the input and perform a depthwise conv over the input with each kernel.

Compiles to Axon.Layers.separable_conv2d/6.

options

Options

  • :name - layer name.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros

  • :activation - element-wise activation function.

  • :use_bias - whether the layer should add bias to the output. Defaults to true

  • :kernel_size - size of the kernel spatial dimensions. Defaults to 1.

  • :strides - stride during convolution. Defaults to 1.

  • :padding - padding to the spatial dimensions of the input. Defaults to :valid.

  • :input_dilation - dilation to apply to input. Defaults to 1.

  • :kernel_dilation - dilation to apply to kernel. Defaults to 1.

  • :channels - channels location. One of :first or :last. Defaults to :first.

Link to this function

separable_conv3d(x, channel_multiplier, opts \\ [])

View Source

Adds a depthwise separable 3-dimensional convolution to the network.

Depthwise separable convolutions break the kernel into kernels for each dimension of the input and perform a depthwise conv over the input with each kernel.

Compiles to Axon.Layers.separable_conv3d/8.

options

Options

  • :name - layer name.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros

  • :activation - element-wise activation function.

  • :use_bias - whether the layer should add bias to the output. Defaults to true

  • :kernel_size - size of the kernel spatial dimensions. Defaults to 1.

  • :strides - stride during convolution. Defaults to 1.

  • :padding - padding to the spatial dimensions of the input. Defaults to :valid.

  • :input_dilation - dilation to apply to input. Defaults to 1.

  • :kernel_dilation - dilation to apply to kernel. Defaults to 1.

  • :channels - channels location. One of :first or :last. Defaults to :first.

Link to this section Layers: Dropout

Link to this function

alpha_dropout(x, opts \\ [])

View Source

Adds an Alpha dropout layer to the network.

See Axon.Layers.alpha_dropout/2 for more details.

options

Options

  • :name - layer name.

  • :rate - dropout rate. Defaults to 0.5.

Adds a Dropout layer to the network.

See Axon.Layers.dropout/2 for more details.

options

Options

  • :name - layer name.

  • :rate - dropout rate. Defaults to 0.5.

Link to this function

feature_alpha_dropout(x, opts \\ [])

View Source

Adds a Feature alpha dropout layer to the network.

See Axon.Layers.feature_alpha_dropout/2 for more details.

options

Options

  • :name - layer name.

  • :rate - dropout rate. Defaults to 0.5.

Link to this function

spatial_dropout(x, opts \\ [])

View Source

Adds a Spatial dropout layer to the network.

See Axon.Layers.spatial_dropout/2 for more details.

options

Options

  • :name - layer name.

  • :rate - dropout rate. Defaults to 0.5.

Link to this section Layers: Pooling

Link to this function

adaptive_avg_pool(x, opts \\ [])

View Source

Adds an Adaptive average pool layer to the network.

See Axon.Layers.adaptive_avg_pool/2 for more details.

options

Options

  • :name - layer name.

  • :output_size - layer output size.

  • :channels - channel configuration. One of :first or :last. Defaults to :first.

Link to this function

adaptive_lp_pool(x, opts \\ [])

View Source

Adds an Adaptive power average pool layer to the network.

See Axon.Layers.adaptive_lp_pool/2 for more details.

options

Options

  • :name - layer name.

  • :output_size - layer output size.

  • :channels - channel configuration. One of :first or :last. Defaults to :first.

Link to this function

adaptive_max_pool(x, opts \\ [])

View Source

Adds an Adaptive max pool layer to the network.

See Axon.Layers.adaptive_max_pool/2 for more details.

options

Options

  • :name - layer name.

  • :output_size - layer output size.

  • :channels - channel configuration. One of :first or :last. Defaults to :first.

Adds an Average pool layer to the network.

See Axon.Layers.avg_pool/2 for more details.

options

Options

  • :name - layer name.

  • :kernel_size - size of the kernel spatial dimensions. Defaults to 1.

  • :strides - stride during convolution. Defaults to size of kernel.

  • :padding - padding to the spatial dimensions of the input. Defaults to :valid.

  • :dilations - window dilations. Defaults to 1.

  • :channels - channels location. One of :first or :last. Defaults to :first.

Link to this function

global_avg_pool(x, opts \\ [])

View Source

Adds a Global average pool layer to the network.

See Axon.Layers.global_avg_pool/2 for more details.

Typically used to connect feature extractors such as those in convolutional neural networks to fully-connected models by reducing inputs along spatial dimensions to only feature and batch dimensions.

options

Options

  • :name - layer name.

  • :keep_axes - option to keep reduced axes. If true, keeps reduced axes with a dimension size of 1.

  • :channels - channel configuration. One of :first or :last. Defaults to :first.

Link to this function

global_lp_pool(x, opts \\ [])

View Source

Adds a Global LP pool layer to the network.

See Axon.Layers.global_lp_pool/2 for more details.

Typically used to connect feature extractors such as those in convolutional neural networks to fully-connected models by reducing inputs along spatial dimensions to only feature and batch dimensions.

options

Options

  • :name - layer name.

  • :keep_axes - option to keep reduced axes. If true, keeps reduced axes with a dimension size of 1.

  • :channels - channel configuration. One of :first or :last. Defaults to :first.

Link to this function

global_max_pool(x, opts \\ [])

View Source

Adds a Global max pool layer to the network.

See Axon.Layers.global_max_pool/2 for more details.

Typically used to connect feature extractors such as those in convolutional neural networks to fully-connected models by reducing inputs along spatial dimensions to only feature and batch dimensions.

options

Options

  • :name - layer name.

  • :keep_axes - option to keep reduced axes. If true, keeps reduced axes with a dimension size of 1.

  • :channels - channel configuration. One of :first or :last. Defaults to :first.

Adds a Power average pool layer to the network.

See Axon.Layers.lp_pool/2 for more details.

options

Options

  • :name - layer name.

  • :kernel_size - size of the kernel spatial dimensions. Defaults to 1.

  • :strides - stride during convolution. Defaults to size of kernel.

  • :padding - padding to the spatial dimensions of the input. Defaults to :valid.

  • :dilations - window dilations. Defaults to 1.

  • :channels - channels location. One of :first or :last. Defaults to :first.

Adds a Max pool layer to the network.

See Axon.Layers.max_pool/2 for more details.

options

Options

  • :name - layer name.

  • :kernel_size - size of the kernel spatial dimensions. Defaults to 1.

  • :strides - stride during convolution. Defaults to size of kernel.

  • :padding - padding to the spatial dimensions of the input. Defaults to :valid.

  • :dilations - window dilations. Defaults to 1.

  • :channels - channels location. One of :first or :last. Defaults to :first.

Link to this section Layers: Normalization

Link to this function

batch_norm(x, opts \\ [])

View Source

Adds a Batch normalization layer to the network.

See Axon.Layers.batch_norm/6 for more details.

options

Options

  • :name - layer name.

  • :gamma_initializer - gamma parameter initializer. Defaults to :glorot_uniform.

  • :beta_initializer - beta parameter initializer. Defaults to :zeros.

  • :channel_index - input feature index used for calculating mean and variance. Defaults to 1.

  • :epsilon - numerical stability term.

Link to this function

group_norm(x, group_size, opts \\ [])

View Source

Adds a group normalization layer to the network.

See Axon.Layers.group_norm/4 for more details.

options

Options

  • :name - layer name.

  • :gamma_initializer - gamma parameter initializer. Defaults to :glorot_uniform.

  • :beta_initializer - beta parameter initializer. Defaults to :zeros.

  • :channel_index - input feature index used for calculating mean and variance. Defaults to 1.

  • :epsilon - numerical stability term.

Link to this function

instance_norm(x, opts \\ [])

View Source

Adds an Instance normalization layer to the network.

See Axon.Layers.instance_norm/6 for more details.

options

Options

  • :name - layer name.

  • :gamma_initializer - gamma parameter initializer. Defaults to :glorot_uniform.

  • :beta_initializer - beta parameter initializer. Defaults to :zeros.

  • :channel_index - input feature index used for calculating mean and variance. Defaults to 1.

  • :epsilon - numerical stability term.

Link to this function

layer_norm(x, opts \\ [])

View Source

Adds a Layer normalization layer to the network.

See Axon.Layers.layer_norm/4 for more details.

options

Options

  • :name - layer name.

  • :gamma_initializer - gamma parameter initializer. Defaults to :glorot_uniform.

  • :beta_initializer - beta parameter initializer. Defaults to :zeros.

  • :channel_index - input feature index used for calculating mean and variance. Defaults to 1.

  • :epsilon - numerical stability term.

Link to this section Layers: Recurrent

See conv_lstm/3.

Link to this function

conv_lstm(x, units, opts)

View Source

Adds a convolutional long short-term memory (LSTM) layer to the network with a random initial hidden state.

See conv_lstm/4 for more details.

additional-options

Additional options

  • :recurrent_initializer - initializer for hidden state. Defaults to :glorot_uniform.
Link to this function

conv_lstm(x, hidden_state, units, opts)

View Source

Adds a convolutional long short-term memory (LSTM) layer to the network with the given initial hidden state..

ConvLSTMs apply Axon.Recurrent.conv_lstm_cell/5 over an entire input sequence and return:

{{new_cell, new_hidden}, output_sequence}

You can use the output state as the hidden state of another ConvLSTM layer.

options

Options

  • :name - layer name.

  • :padding - convolutional padding. Defaults to :same.

  • :kernel_size - convolutional kernel size. Defaults to 1.

  • :strides - convolutional strides. Defaults to 1.

  • :unroll - :dynamic (loop preserving) or :static (compiled) unrolling of RNN.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros.

  • :use_bias - whether the layer should add bias to the output. Defaults to true.

See gru/3.

Adds a gated recurrent unit (GRU) layer to the network with a random initial hidden state.

See gru/4 for more details.

additional-options

Additional options

  • :recurrent_initializer - initializer for hidden state. Defaults to :glorot_uniform.
Link to this function

gru(x, hidden_state, units, opts)

View Source

Adds a gated recurrent unit (GRU) layer to the network with the given initial hidden state.

GRUs apply Axon.Recurrent.gru_cell/7 over an entire input sequence and return:

{{new_hidden}, output_sequence}

You can use the output state as the hidden state of another GRU layer.

options

Options

  • :name - layer name.

  • :activation - recurrent activation. Defaults to :tanh.

  • :gate - recurrent gate function. Defaults to :sigmoid.

  • :unroll - :dynamic (loop preserving) or :static (compiled) unrolling of RNN.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros.

  • :use_bias - whether the layer should add bias to the output. Defaults to true.

See lstm/3.

Adds a long short-term memory (LSTM) layer to the network with a random initial hidden state.

See lstm/4 for more details.

additional-options

Additional options

  • :recurrent_initializer - initializer for hidden state. Defaults to :glorot_uniform.
Link to this function

lstm(x, hidden_state, units, opts \\ [])

View Source

Adds a long short-term memory (LSTM) layer to the network with the given initial hidden state.

LSTMs apply Axon.Recurrent.lstm_cell/7 over an entire input sequence and return:

{{new_cell, new_hidden}, output_sequence}

You can use the output state as the hidden state of another LSTM layer.

options

Options

  • :name - layer name.

  • :activation - recurrent activation. Defaults to :tanh.

  • :gate - recurrent gate function. Defaults to :sigmoid.

  • :unroll - :dynamic (loop preserving) or :static (compiled) unrolling of RNN.

  • :kernel_initializer - initializer for kernel weights. Defaults to :glorot_uniform.

  • :bias_initializer - initializer for bias weights. Defaults to :zeros.

  • :use_bias - whether the layer should add bias to the output. Defaults to true.

Link to this section Layers: Shape

Adds a flatten layer to the network.

This layer will flatten all but the batch dimensions of the input into a single layer. Typically called to flatten the output of a convolution for use with a dense layer.

options

Options

  • :name - layer name.

  • :ignore_batch? - whether to ignore batch dimension in transpose operation. Defaults to true.

Link to this function

pad(x, config, value \\ 0.0, opts \\ [])

View Source

Adds a pad layer to the network.

This layer will pad the spatial dimensions of the input. Padding configuration is a list of tuples for each spatial dimension.

options

Options

  • :name - layer name.

  • :channels - channel configuration. One of :first or :last. Defaults to :first.

Link to this function

reshape(x, new_shape, opts \\ [])

View Source

Adds a reshape layer to the network.

This layer implements a special case of Nx.reshape which accounts for possible batch dimensions in the input tensor. If the input contains batch dimensions, the reshape operation is performed on all non-batch dimensions of the input - preserving the original batch size.

If the input is an Axon constant, the reshape behavior matches that of Nx.reshape.

options

Options

  • :name - layer name.

  • :ignore_batch? - whether to ignore batch dimension in transpose operation. Defaults to true.

Link to this function

resize(x, resize_shape, opts \\ [])

View Source

Adds a resize layer to the network.

Resizing can be used for interpolation or upsampling input values in a neural network. For example, you can use this layer as an upsampling layer within a GAN.

Resize shape must be a tuple representing the resized spatial dimensions of the input tensor.

Compiles to Axon.Layers.resize/2.

options

Options

  • :name - layer name.

  • :method - resize method. Defaults to :nearest.

  • :channels - channel configuration. One of :first or :last. Defaults to :first.

Link to this function

transpose(x, permutation, opts \\ [])

View Source

Adds a transpose layer to the network.

options

Options

  • :name - layer name.

  • :ignore_batch? - whether to ignore batch dimension in transpose operation. Defaults to true.

Link to this section Model: Execution

Link to this function

init(model, params \\ %{}, opts \\ [])

View Source

Compiles and runs the given models initialization function with the given compiler options.

You may optionally specify initial parameters for some layers or namespaces by passing a partial parameter map:

Axon.init(model, %{"dense_0" => dense_params})

The parameter map will be merged with the initialized model parameters.

Link to this function

predict(model, params, input, opts \\ [])

View Source

Compiles and runs the given Axon model with params on input with the given compiler options.

Link to this section Functions

Adds a add layer to the network.

This layer performs an element-wise add operation on input layers. All input layers must be capable of being broadcast together.

If one shape has a static batch size, all other shapes must have a static batch size as well.

options

Options

  • :name - layer name.
Link to this function

attach_hook(axon, fun, opts \\ [])

View Source

Attaches a hook to the given Axon model.

Hooks compile down to Nx.Defn.Kernel.hook/3 and provide the same functionality for adding side-effecting operations to a compiled model. For example, you can use hooks to inspect intermediate activations, send data to an external service, and more.

Hooks can be configured to be invoked on the following events:

  • :initialize - on model initialization.
  • :pre_forward - before layer forward pass is invoked.
  • :forward - after layer forward pass is invoked.
  • :backward - after layer backward pass is invoked.

To invoke a hook on every single event, you may pass :all to on:.

Axon.input({nil, 1}, "input") |> Axon.attach_hook(&IO.inspect/1, on: :all)

The default event is :forward, assuming you want a hook invoked on the layers forward pass.

You may configure hooks to run in one of only training or inference mode using the :mode option. The default mode is :both to be invoked during both train and inference mode.

Axon.input({nil, 1}, "input") |> Axon.attach_hook(&IO.inspect/1, on: :forward, mode: :train)

You can also attach multiple hooks to a single layer. Hooks are invoked in the order in which they are declared. If order is important, you should attach hooks in the order you want them to be executed:

Axon.input({nil, 1}, "input")
# I will be executed first
|> Axon.attach_hook(&IO.inspect/1)
# I will be executed second
|> Axon.attach_hook(fn _ -> IO.write("HERE") end)

Hooks are executed at their point of attachment. You must insert hooks at each point you want a hook to execute during model execution.

Axon.input({nil, 1}, "input")
|> Axon.attach_hook(&IO.inspect/1)
|> Axon.relu()
|> Axon.attach_hook(&IO.inspect/1)
Link to this function

compile(model, opts \\ [])

View Source

Compiles the given model to {init_fn, predict_fn}.

Once compiled, a model can be passed as argument to Nx.Defn.

Adds a concatenate layer to the network.

This layer will concatenate inputs along the last dimension unless specified otherwise.

options

Options

  • :name - layer name.

  • :axis - concatenate axis. Defaults to -1.

Link to this function

cond(parent, cond_fn, true_graph, false_graph, opts \\ [])

View Source

Adds a conditional layer which conditionally executes true_graph or false_graph based on the condition cond_fn at runtime.

cond_fn is an arity-1 function executed on the output of the parent graph. It must return a boolean scalar tensor (e.g. 1 or 0).

The shapes of true_graph and false_graph must be equal.

Link to this function

deep_reduce(map, acc, fun)

View Source
Link to this function

deserialize(serialized, opts \\ [])

View Source

Deserializes serialized model and parameters into a {model, params} tuple.

It is the opposite of Axon.serialize/3.

examples

Examples

iex> model = Axon.input({nil, 2}, "input") |> Axon.dense(1, kernel_initializer: :zeros, activation: :relu)
iex> params = Axon.init(model)
iex> serialized = Axon.serialize(model, params)
iex> {saved_model, saved_params} = Axon.deserialize(serialized)
iex> Axon.predict(saved_model, saved_params, Nx.tensor([[1.0, 1.0]]))
#Nx.Tensor<
  f32[1][1]
  [
    [0.0]
  ]
>
Link to this function

freeze(model, fun \\ & &1)

View Source

Freezes parameters returned from fun in the given model. fun takes the model's parameter list and returns the list of parameters it wishes to freeze. fun defaults to the identity function, freezing all of the parameters in model.

Freezing parameters is useful when performing transfer learning to leverage features learned from another problem in a new problem. For example, it's common to combine the convolutional base from larger models trained on ImageNet with fresh fully-connected classifiers. The combined model is then trained on fresh data, with the convolutional base frozen so as not to lose information. You can see this example in code here:

cnn_base = get_pretrained_cnn_base()
model =
  cnn_base
  |> Axon.freeze()
  |> Axon.flatten()
  |> Axon.dense(1024, activation: :relu)
  |> Axon.dropout()
  |> Axon.dense(1000, activation: :softmax)

model
|> Axon.Loop.trainer(:categorical_cross_entropy, Axon.Optimizers.adam(0.005))
|> Axon.Loop.run(data, epochs: 10)

When compiled, frozen parameters are wrapped in Nx.Defn.Kernel.stop_grad/1, which zeros out the gradient with respect to the frozen parameter. Gradients of frozen parameters will return 0.0, meaning they won't be changed during the update process.

Link to this function

get_model_signature(axon)

View Source

Returns the model's signature as a tuple of {input_shape, output_shape}.

examples

Examples

iex> model = Axon.input({nil, 32}, "input") |> Axon.dense(10)
iex> {inp, out} = Axon.get_model_signature(model)
iex> inp
{nil, 32}
iex> out
{nil, 10}

iex> inp1 = Axon.input({nil, 32}, "input_0")
iex> inp2 = Axon.input({nil, 32}, "input_1")
iex> model = Axon.concatenate(inp1, inp2)
iex> {{inp1_shape, inp2_shape}, out} = Axon.get_model_signature(model)
iex> inp1_shape
{nil, 32}
iex> inp2_shape
{nil, 32}
iex> out
{nil, 64}

Adds a multiply layer to the network.

This layer performs an element-wise multiply operation on input layers. All input layers must be capable of being broadcast together.

If one shape has a static batch size, all other shapes must have a static batch size as well.

options

Options

  • :name - layer name.

Wraps an Axon model into a namespace.

A namespace is a part of an Axon model which is meant to be a self-contained collection of Axon layers. Namespaces are guaranteed to always generate with the same internal layer names and can be re-used universally across models.

Namespaces are most useful for containing large collections of layers and offering a straightforward means for accessing the parameters of individual model components. A common application of namespaces is to use them in with a pre-trained model for fine-tuning:

{base, resnet_params} = resnet()
base = base |> Axon.namespace("resnet")

model = base |> Axon.dense(1)
Axon.init(model, %{"resnset" => resnet_params})

Notice you can use Axon.init in conjunction with namespaces to specify which portion of a model you'd like to initialize from a fixed starting point.

Namespaces have fixed names, which means it's easy to run into namespace collisions. Re-using namespaces, re-using inner parts of a namespace, and attempting to share layers between namespaces are still sharp edges in namespace usage.

Link to this function

param(name, shape, opts \\ [])

View Source

Trainable Axon parameter used to create custom layers.

Parameters are specified in usages of Axon.layer and will be automatically initialized and used in subsequent applications of Axon models.

Parameters must be specified in order of their usage.

options

Options

  • :initializer - parameter initializer. Defaults to :glorot_uniform.
Link to this function

serialize(model, params, opts \\ [])

View Source

Serializes a model and its parameters for persisting models to disk or elsewhere.

Model and parameters are serialized as a tuple, where the model is converted to a recursive map to ensure compatibility with future Axon versions and the parameters are serialized using Nx.serialize/2. There is some additional metadata included such as current serialization version for compatibility.

Serialization opts are forwarded to Nx.serialize/2 and :erlang.term_to_binary/2 for controlling compression options.

examples

Examples

iex> model = Axon.input({nil, 2}, "input") |> Axon.dense(1, kernel_initializer: :zeros, activation: :relu)
iex> params = Axon.init(model)
iex> serialized = Axon.serialize(model, params)
iex> {saved_model, saved_params} = Axon.deserialize(serialized)
iex> Axon.predict(saved_model, saved_params, Nx.tensor([[1.0, 1.0]]))
#Nx.Tensor<
  f32[1][1]
  [
    [0.0]
  ]
>
Link to this function

split(parent, splits, opts \\ [])

View Source

Splits input graph into a container of n input graphs along the given axis.

options

Options

  • :name - layer name.

  • :axis - concatenate axis. Defaults to -1.

Adds a subtract layer to the network.

This layer performs an element-wise subtract operation on input layers. All input layers must be capable of being broadcast together.

If one shape has a static batch size, all other shapes must have a static batch size as well.

options

Options

  • :name - layer name.

Traverses a model tree applying fun to each layer.

Link to this function

tree_reduce(axon, acc, fun)

View Source

Traverses a model applying fun with an accumulator.