View Source Axon.Activations (Axon v0.7.0)

Activation functions.

Activation functions are element-wise, (typically) non-linear functions called on the output of another layer, such as a dense layer:

x
|> dense(weight, bias)
|> relu()

Activation functions output the "activation" or how active a given layer's neurons are in learning a representation of the data-generating distribution.

Some activations are commonly used as output activations. For example softmax is often used as the output in multiclass classification problems because it returns a categorical probability distribution:

iex> Axon.Activations.softmax(Nx.tensor([[1, 2, 3]], type: {:f, 32}))
#Nx.Tensor<
  f32[1][3]
  [
    [0.09003057330846786, 0.2447284758090973, 0.6652409434318542]
  ]
>

Other activations such as tanh or sigmoid are used because they have desirable properties, such as keeping the output tensor constrained within a certain range.

Generally, the choice of activation function is arbitrary; although some activations work better than others in certain problem domains. For example ReLU (rectified linear unit) activation is a widely-accepted default. You can see a list of activation functions and implementations here.

All of the functions in this module are implemented as numerical functions and can be JIT or AOT compiled with any supported Nx compiler.

Summary

Functions

celu(x, opts \\ [])

Continuously-differentiable exponential linear unit activation.

elu(x, opts \\ [])

Exponential linear unit activation.

exp(x)

Exponential activation.

gelu(x)

Gaussian error linear unit activation.

hard_sigmoid(x, opts \\ [])

Hard sigmoid activation.

hard_silu(x, opts \\ [])

Hard sigmoid weighted linear unit activation.

hard_tanh(x)

Hard hyperbolic tangent activation.

leaky_relu(x, opts \\ [])

Leaky rectified linear unit activation.

linear(x)

Linear activation.

log_sigmoid(x)

Log-sigmoid activation.

log_softmax(x, opts \\ [])

Log-softmax activation.

log_sumexp(x, opts \\ [])

Logsumexp activation.

mish(x)

Mish activation.

relu6(x)

Rectified linear unit 6 activation.

relu(x)

Rectified linear unit activation.

selu(x, opts \\ [])

Scaled exponential linear unit activation.

sigmoid(x)

Sigmoid activation.

silu(x)

Sigmoid weighted linear unit activation.

softmax(x, opts \\ [])

Softmax activation along an axis.

softplus(x)

Softplus activation.

softsign(x)

Softsign activation.

tanh(x)

Hyperbolic tangent activation.

Functions

celu(x, opts \\ [])

Continuously-differentiable exponential linear unit activation.

$f(x_i) = \max(0, x_i) + \min(0, \alpha * e^{\frac{x_i}{\alpha}} - 1)$

Options

alpha - $\alpha$ in CELU formulation. Must be non-zero. Defaults to 1.0

Examples

iex> Axon.Activations.celu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
  f32[7]
  [-0.9502129554748535, -0.8646647334098816, -0.6321205496788025, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.celu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}))
#Nx.Tensor<
  bf16[2][3]
  [
    [-0.62890625, -0.86328125, -0.94921875],
    [1.0, 2.0, 3.0]
  ]
>

Error cases

iex> Axon.Activations.celu(Nx.tensor([0.0, 1.0, 2.0], type: {:f, 32}), alpha: 0.0)
** (ArgumentError) :alpha must be non-zero in CELU activation

References

Continuously Differentiable Exponential Linear Units

elu(x, opts \\ [])

Exponential linear unit activation.

Equivalent to celu for $\alpha = 1$

$f(x_i) = \begin{cases}x_i & x _i > 0 \newline \alpha * (e^{x_i} - 1) & x_i \leq 0 \\ \end{cases}$

Options

alpha - $\alpha$ in ELU formulation. Defaults to 1.0

Examples

iex> Axon.Activations.elu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
  f32[7]
  [-0.9502129554748535, -0.8646647334098816, -0.6321205496788025, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.elu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}))
#Nx.Tensor<
  bf16[2][3]
  [
    [-0.62890625, -0.86328125, -0.94921875],
    [1.0, 2.0, 3.0]
  ]
>

References

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

exp(x)

Exponential activation.

$f(x_i) = e^{x_i}$

Examples

iex> Axon.Activations.exp(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.049787066876888275, 0.1353352814912796, 0.3678794503211975, 1.0, 2.7182817459106445, 7.389056205749512, 20.08553695678711]
>

iex> Axon.Activations.exp(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.3671875, 0.134765625, 0.049560546875],
    [2.703125, 7.375, 20.0]
  ]
>

gelu(x)

Gaussian error linear unit activation.

$f(x_i) = \frac{x_i}{2}(1 + {erf}(\frac{x_i}{\sqrt{2}}))$

Examples

iex> Axon.Activations.gelu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.0040496885776519775, -0.04550027847290039, -0.15865525603294373, 0.0, 0.8413447141647339, 1.9544997215270996, 2.995950222015381]
>

iex> Axon.Activations.gelu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.16015625, -0.046875, -0.005859375],
    [0.83984375, 1.953125, 2.984375]
  ]
>

References

Gaussian Error Linear Units (GELUs)

hard_sigmoid(x, opts \\ [])

Hard sigmoid activation.

Examples

iex> Axon.Activations.hard_sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.0, 0.0, 0.0, 0.20000000298023224, 0.4000000059604645, 0.6000000238418579, 0.800000011920929]
>

iex> Axon.Activations.hard_sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [7.781982421875e-4, 0.0, 0.0],
    [0.3984375, 0.59765625, 0.796875]
  ]
>

hard_silu(x, opts \\ [])

Hard sigmoid weighted linear unit activation.

$f(x_i) = \begin{cases} 0 & x_i \leq -3 \newline x & x_i \geq 3 \newline \frac{x_i^2}{6} + \frac{x_i}{2} & otherwise \end{cases}$

Examples

iex> Axon.Activations.hard_silu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.0, -0.0, -0.0, 0.0, 0.4000000059604645, 1.2000000476837158, 2.4000000953674316]
>

iex> Axon.Activations.hard_silu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-7.781982421875e-4, -0.0, -0.0],
    [0.3984375, 1.1953125, 2.390625]
  ]
>

hard_tanh(x)

Hard hyperbolic tangent activation.

$f(x_i) = \begin{cases} 1 & x > 1 \newline -1 & x < -1 \newline x & otherwise \end{cases}$

Examples

iex> Axon.Activations.hard_tanh(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-1.0, -1.0, -1.0, 0.0, 1.0, 1.0, 1.0]
>

iex> Axon.Activations.hard_tanh(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-1.0, -1.0, -1.0],
    [1.0, 1.0, 1.0]
  ]
>

leaky_relu(x, opts \\ [])

Leaky rectified linear unit activation.

$f(x_i) = \begin{cases} x & x \geq 0 \newline \alpha * x & otherwise \end{cases}$

Options

:alpha - $\alpha$ in Leaky ReLU formulation. Defaults to 1.0e-2

Examples

iex> Axon.Activations.leaky_relu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]), alpha: 0.5)
#Nx.Tensor<
  f32[data: 7]
  [-1.5, -1.0, -0.5, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.leaky_relu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], names: [:batch, :data]), alpha: 0.5)
#Nx.Tensor<
  f32[batch: 2][data: 3]
  [
    [-0.5, -1.0, -1.5],
    [1.0, 2.0, 3.0]
  ]
>

linear(x)

Linear activation.

$f(x_i) = x_i$

Examples

iex> Axon.Activations.linear(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.linear(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-1.0, -2.0, -3.0],
    [1.0, 2.0, 3.0]
  ]
>

log_sigmoid(x)

Log-sigmoid activation.

$f(x_i) = \log(sigmoid(x))$

Examples

iex> Axon.Activations.log_sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-3.0485873222351074, -2.1269280910491943, -1.3132617473602295, -0.6931471824645996, -0.3132616877555847, -0.12692801654338837, -0.04858734831213951]
>

iex> Axon.Activations.log_sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-1.3125, -2.125, -3.046875],
    [-0.3125, -0.1259765625, -0.04833984375]
  ]
>

log_softmax(x, opts \\ [])

Log-softmax activation.

$f(x_i) = -log( um{e^x_i})$

Examples

iex> Axon.Activations.log_softmax(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-6.457762718200684, -5.457762718200684, -4.457762718200684, -3.4577627182006836, -2.4577627182006836, -1.4577628374099731, -0.45776283740997314]
>

iex> Axon.Activations.log_softmax(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.404296875, -1.3984375, -2.390625],
    [-2.390625, -1.3984375, -0.404296875]
  ]
>

log_sumexp(x, opts \\ [])

Logsumexp activation.

$\log(sum e^x_i)$

Examples

iex> Axon.Activations.log_sumexp(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 1]
  [3.4577627182006836]
>

iex> Axon.Activations.log_sumexp(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 1]
  [
    [-0.59375],
    [3.390625]
  ]
>

mish(x)

Mish activation.

$f(x_i) = x_i* \tanh(\log(1 + e^x_i))$

Examples

iex> Axon.Activations.mish(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.14564745128154755, -0.2525014877319336, -0.30340147018432617, 0.0, 0.8650984168052673, 1.9439589977264404, 2.98653507232666]
>

iex> Axon.Activations.mish(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.30078125, -0.25, -0.1435546875],
    [0.86328125, 1.9375, 2.96875]
  ]
>

relu6(x)

Rectified linear unit 6 activation.

$f(x_i) = \min_i(\max_i(x, 0), 6)$

Examples

iex> Axon.Activations.relu6(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
  f32[7]
  [0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.relu6(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.0, 0.0, 0.0],
    [1.0, 2.0, 3.0]
  ]
>

References

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

relu(x)

Rectified linear unit activation.

$f(x_i) = \max_i(x, 0)$

Examples

iex> Axon.Activations.relu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.relu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.0, 0.0, 0.0],
    [1.0, 2.0, 3.0]
  ]
>

selu(x, opts \\ [])

Scaled exponential linear unit activation.

$f(x_i) = \begin{cases} \lambda x & x \geq 0 \newline \lambda \alpha(e^{x} - 1) & x < 0 \end{cases}$

$\alpha \approx 1.6733$ $\lambda \approx 1.0507$

Examples

iex> Axon.Activations.selu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-1.670568823814392, -1.5201665163040161, -1.1113307476043701, 0.0, 1.0507010221481323, 2.1014020442962646, 3.1521029472351074]
>

iex> Axon.Activations.selu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-1.09375, -1.5078125, -1.6640625],
    [1.046875, 2.09375, 3.140625]
  ]
>

References

Self-Normalizing Neural Networks

sigmoid(x)

Sigmoid activation.

$f(x_i) = \frac{1}{1 + e^{-x_i}}$

Implementation Note: Sigmoid logits are cached as metadata in the expression and can be used in calculations later on. For example, they are used in cross-entropy calculations for better stability.

Examples

iex> Axon.Activations.sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.04742587357759476, 0.11920291930437088, 0.2689414322376251, 0.5, 0.7310585975646973, 0.8807970881462097, 0.9525741338729858]
>

iex> Axon.Activations.sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.267578125, 0.119140625, 0.04736328125],
    [0.73046875, 0.87890625, 0.94921875]
  ]
>

silu(x)

Sigmoid weighted linear unit activation.

$f(x_i) = x * sigmoid(x)$

Examples

iex> Axon.Activations.silu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.14227762818336487, -0.23840583860874176, -0.2689414322376251, 0.0, 0.7310585975646973, 1.7615941762924194, 2.857722282409668]
>

iex> Axon.Activations.silu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.267578125, -0.23828125, -0.1416015625],
    [0.73046875, 1.7578125, 2.84375]
  ]
>

References

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

softmax(x, opts \\ [])

Softmax activation along an axis.

$\frac{e^{x_i}}{\sum_i e^{x_i}}$

Implementation Note: Softmax logits are cached as metadata in the expression and can be used in calculations later on. For example, they are used in cross-entropy calculations for better stability.

Options

:axis - softmax axis along which to calculate distribution. Defaults to 1.

Examples

iex> Axon.Activations.softmax(Nx.tensor([[-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]], names: [:batch, :data]))
#Nx.Tensor<
  f32[batch: 1][data: 7]
  [
    [0.0015683004166930914, 0.004263082519173622, 0.011588259600102901, 0.03150015324354172, 0.08562629669904709, 0.23275642096996307, 0.6326975226402283]
  ]
>

iex> Axon.Activations.softmax(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.6640625, 0.2431640625, 0.08935546875],
    [0.08935546875, 0.2431640625, 0.6640625]
  ]
>

softplus(x)

Softplus activation.

$\log(1 + e^x_i)$

Examples

iex> Axon.Activations.softplus(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.04858734831213951, 0.12692801654338837, 0.3132616877555847, 0.6931471824645996, 1.3132617473602295, 2.1269280910491943, 3.0485873222351074]
>

iex> Axon.Activations.softplus(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.3125, 0.1259765625, 0.04833984375],
    [1.3125, 2.125, 3.046875]
  ]
>

softsign(x)

Softsign activation.

$f(x_i) = \frac{x_i}{|x_i| + 1}$

Examples

iex> Axon.Activations.softsign(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.75, -0.6666666865348816, -0.5, 0.0, 0.5, 0.6666666865348816, 0.75]
>

iex> Axon.Activations.softsign(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.5, -0.6640625, -0.75],
    [0.5, 0.6640625, 0.75]
  ]
>

tanh(x)

Hyperbolic tangent activation.

$f(x_i) = \tanh(x_i)$

Examples

iex> Axon.Activations.tanh(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.9950547814369202, -0.9640275835990906, -0.7615941762924194, 0.0, 0.7615941762924194, 0.9640275835990906, 0.9950547814369202]
>

iex> Axon.Activations.tanh(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.7578125, -0.9609375, -0.9921875],
    [0.7578125, 0.9609375, 0.9921875]
  ]
>