View Source Axon.Activations (Axon v0.6.1)

Activation functions.

Activation functions are element-wise, (typically) non-linear functions called on the output of another layer, such as a dense layer:

x
|> dense(weight, bias)
|> relu()

Activation functions output the "activation" or how active a given layer's neurons are in learning a representation of the data-generating distribution.

Some activations are commonly used as output activations. For example softmax is often used as the output in multiclass classification problems because it returns a categorical probability distribution:

iex> Axon.Activations.softmax(Nx.tensor([[1, 2, 3]], type: {:f, 32}))
#Nx.Tensor<
  f32[1][3]
  [
    [0.09003057330846786, 0.2447284758090973, 0.6652409434318542]
  ]
>

Other activations such as tanh or sigmoid are used because they have desirable properties, such as keeping the output tensor constrained within a certain range.

Generally, the choice of activation function is arbitrary; although some activations work better than others in certain problem domains. For example ReLU (rectified linear unit) activation is a widely-accepted default. You can see a list of activation functions and implementations here.

All of the functions in this module are implemented as numerical functions and can be JIT or AOT compiled with any supported Nx compiler.

Summary

Functions

Continuously-differentiable exponential linear unit activation.

Exponential linear unit activation.

Exponential activation.

Gaussian error linear unit activation.

Hard sigmoid activation.

Hard sigmoid weighted linear unit activation.

Hard hyperbolic tangent activation.

Leaky rectified linear unit activation.

Linear activation.

Log-sigmoid activation.

Log-softmax activation.

Logsumexp activation.

Mish activation.

Rectified linear unit 6 activation.

Rectified linear unit activation.

Scaled exponential linear unit activation.

Sigmoid activation.

Sigmoid weighted linear unit activation.

Softmax activation along an axis.

Softplus activation.

Softsign activation.

Hyperbolic tangent activation.

Functions

Continuously-differentiable exponential linear unit activation.

$$ f(x_i) = \max(0, x_i) + \min(0, \alpha * e^{\frac{x_i}{\alpha}} - 1) $$

Options

  • alpha - $\alpha$ in CELU formulation. Must be non-zero. Defaults to 1.0

Examples

iex> Axon.Activations.celu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
  f32[7]
  [-0.9502129554748535, -0.8646647334098816, -0.6321205496788025, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.celu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}))
#Nx.Tensor<
  bf16[2][3]
  [
    [-0.62890625, -0.86328125, -0.94921875],
    [1.0, 2.0, 3.0]
  ]
>

Error cases

iex> Axon.Activations.celu(Nx.tensor([0.0, 1.0, 2.0], type: {:f, 32}), alpha: 0.0)
** (ArgumentError) :alpha must be non-zero in CELU activation

References

Exponential linear unit activation.

Equivalent to celu for $\alpha = 1$

$$ f(x_i) = \begin{cases}x_i & x _i > 0 \newline \alpha * (e^{x_i} - 1) & x_i \leq 0 \\ \end{cases} $$

Options

  • alpha - $\alpha$ in ELU formulation. Defaults to 1.0

Examples

iex> Axon.Activations.elu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
  f32[7]
  [-0.9502129554748535, -0.8646647334098816, -0.6321205496788025, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.elu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}))
#Nx.Tensor<
  bf16[2][3]
  [
    [-0.62890625, -0.86328125, -0.94921875],
    [1.0, 2.0, 3.0]
  ]
>

References

Exponential activation.

$$ f(x_i) = e^{x_i} $$

Examples

iex> Axon.Activations.exp(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.049787066876888275, 0.1353352814912796, 0.3678794503211975, 1.0, 2.7182817459106445, 7.389056205749512, 20.08553695678711]
>

iex> Axon.Activations.exp(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.3671875, 0.134765625, 0.049560546875],
    [2.703125, 7.375, 20.0]
  ]
>

Gaussian error linear unit activation.

$$ f(x_i) = \frac{x_i}{2}(1 + {erf}(\frac{x_i}{\sqrt{2}})) $$

Examples

iex> Axon.Activations.gelu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.0040496885776519775, -0.04550027847290039, -0.15865525603294373, 0.0, 0.8413447141647339, 1.9544997215270996, 2.995950222015381]
>

iex> Axon.Activations.gelu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.16015625, -0.046875, -0.005859375],
    [0.83984375, 1.953125, 2.984375]
  ]
>

References

Link to this function

hard_sigmoid(x, opts \\ [])

View Source

Hard sigmoid activation.

Examples

iex> Axon.Activations.hard_sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.0, 0.0, 0.0, 0.20000000298023224, 0.4000000059604645, 0.6000000238418579, 0.800000011920929]
>

iex> Axon.Activations.hard_sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [7.781982421875e-4, 0.0, 0.0],
    [0.3984375, 0.59765625, 0.796875]
  ]
>
Link to this function

hard_silu(x, opts \\ [])

View Source

Hard sigmoid weighted linear unit activation.

$$ f(x_i) = \begin{cases} 0 & x_i \leq -3 \newline x & x_i \geq 3 \newline \frac{x_i^2}{6} + \frac{x_i}{2} & otherwise \end{cases} $$

Examples

iex> Axon.Activations.hard_silu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.0, -0.0, -0.0, 0.0, 0.4000000059604645, 1.2000000476837158, 2.4000000953674316]
>

iex> Axon.Activations.hard_silu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-7.781982421875e-4, -0.0, -0.0],
    [0.3984375, 1.1953125, 2.390625]
  ]
>

Hard hyperbolic tangent activation.

$$ f(x_i) = \begin{cases} 1 & x > 1 \newline -1 & x < -1 \newline x & otherwise \end{cases} $$

Examples

iex> Axon.Activations.hard_tanh(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-1.0, -1.0, -1.0, 0.0, 1.0, 1.0, 1.0]
>

iex> Axon.Activations.hard_tanh(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-1.0, -1.0, -1.0],
    [1.0, 1.0, 1.0]
  ]
>
Link to this function

leaky_relu(x, opts \\ [])

View Source

Leaky rectified linear unit activation.

$$ f(x_i) = \begin{cases} x & x \geq 0 \newline \alpha * x & otherwise \end{cases} $$

Options

  • :alpha - $\alpha$ in Leaky ReLU formulation. Defaults to 1.0e-2

Examples

iex> Axon.Activations.leaky_relu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]), alpha: 0.5)
#Nx.Tensor<
  f32[data: 7]
  [-1.5, -1.0, -0.5, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.leaky_relu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], names: [:batch, :data]), alpha: 0.5)
#Nx.Tensor<
  f32[batch: 2][data: 3]
  [
    [-0.5, -1.0, -1.5],
    [1.0, 2.0, 3.0]
  ]
>

Linear activation.

$$ f(x_i) = x_i $$

Examples

iex> Axon.Activations.linear(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.linear(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-1.0, -2.0, -3.0],
    [1.0, 2.0, 3.0]
  ]
>

Log-sigmoid activation.

$$ f(x_i) = \log(sigmoid(x)) $$

Examples

iex> Axon.Activations.log_sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-3.0485873222351074, -2.1269280910491943, -1.3132617473602295, -0.6931471824645996, -0.3132616877555847, -0.12692801654338837, -0.04858734831213951]
>

iex> Axon.Activations.log_sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-1.3125, -2.125, -3.046875],
    [-0.3125, -0.1259765625, -0.04833984375]
  ]
>
Link to this function

log_softmax(x, opts \\ [])

View Source

Log-softmax activation.

$$ f(x_i) = -log( um{e^x_i}) $$

Examples

iex> Axon.Activations.log_softmax(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-6.457762718200684, -5.457762718200684, -4.457762718200684, -3.4577627182006836, -2.4577627182006836, -1.4577628374099731, -0.45776283740997314]
>

iex> Axon.Activations.log_softmax(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.404296875, -1.3984375, -2.390625],
    [-2.390625, -1.3984375, -0.404296875]
  ]
>
Link to this function

log_sumexp(x, opts \\ [])

View Source

Logsumexp activation.

$$ \log(sum e^x_i) $$

Examples

iex> Axon.Activations.log_sumexp(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 1]
  [3.4577627182006836]
>

iex> Axon.Activations.log_sumexp(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 1]
  [
    [-0.59375],
    [3.390625]
  ]
>

Mish activation.

$$ f(x_i) = x_i* \tanh(\log(1 + e^x_i)) $$

Examples

iex> Axon.Activations.mish(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.14564745128154755, -0.2525014877319336, -0.30340147018432617, 0.0, 0.8650984168052673, 1.9439589977264404, 2.98653507232666]
>

iex> Axon.Activations.mish(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.30078125, -0.25, -0.1435546875],
    [0.86328125, 1.9375, 2.96875]
  ]
>

Rectified linear unit 6 activation.

$$ f(x_i) = \min_i(\max_i(x, 0), 6) $$

Examples

iex> Axon.Activations.relu6(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
  f32[7]
  [0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.relu6(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.0, 0.0, 0.0],
    [1.0, 2.0, 3.0]
  ]
>

References

Rectified linear unit activation.

$$ f(x_i) = \max_i(x, 0) $$

Examples

iex> Axon.Activations.relu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 3.0]
>

iex> Axon.Activations.relu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.0, 0.0, 0.0],
    [1.0, 2.0, 3.0]
  ]
>

Scaled exponential linear unit activation.

$$ f(x_i) = \begin{cases} \lambda x & x \geq 0 \newline \lambda \alpha(e^{x} - 1) & x < 0 \end{cases} $$

$$ \alpha \approx 1.6733 $$ $$ \lambda \approx 1.0507 $$

Examples

iex> Axon.Activations.selu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-1.670568823814392, -1.5201665163040161, -1.1113307476043701, 0.0, 1.0507010221481323, 2.1014020442962646, 3.1521029472351074]
>

iex> Axon.Activations.selu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-1.09375, -1.5078125, -1.6640625],
    [1.046875, 2.09375, 3.140625]
  ]
>

References

Sigmoid activation.

$$ f(x_i) = \frac{1}{1 + e^{-x_i}} $$

Implementation Note: Sigmoid logits are cached as metadata in the expression and can be used in calculations later on. For example, they are used in cross-entropy calculations for better stability.

Examples

iex> Axon.Activations.sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.04742587357759476, 0.11920291930437088, 0.2689414322376251, 0.5, 0.7310585975646973, 0.8807970881462097, 0.9525741338729858]
>

iex> Axon.Activations.sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.267578125, 0.119140625, 0.04736328125],
    [0.73046875, 0.87890625, 0.94921875]
  ]
>

Sigmoid weighted linear unit activation.

$$ f(x_i) = x * sigmoid(x) $$

Examples

iex> Axon.Activations.silu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.14227762818336487, -0.23840583860874176, -0.2689414322376251, 0.0, 0.7310585975646973, 1.7615941762924194, 2.857722282409668]
>

iex> Axon.Activations.silu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.267578125, -0.23828125, -0.1416015625],
    [0.73046875, 1.7578125, 2.84375]
  ]
>

References

Softmax activation along an axis.

$$ \frac{e^{x_i}}{\sum_i e^{x_i}} $$

Implementation Note: Softmax logits are cached as metadata in the expression and can be used in calculations later on. For example, they are used in cross-entropy calculations for better stability.

Options

  • :axis - softmax axis along which to calculate distribution. Defaults to 1.

Examples

iex> Axon.Activations.softmax(Nx.tensor([[-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]], names: [:batch, :data]))
#Nx.Tensor<
  f32[batch: 1][data: 7]
  [
    [0.0015683004166930914, 0.004263082519173622, 0.011588259600102901, 0.03150015324354172, 0.08562629669904709, 0.23275642096996307, 0.6326975226402283]
  ]
>

iex> Axon.Activations.softmax(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.6640625, 0.2431640625, 0.08935546875],
    [0.08935546875, 0.2431640625, 0.6640625]
  ]
>

Softplus activation.

$$ \log(1 + e^x_i) $$

Examples

iex> Axon.Activations.softplus(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [0.04858734831213951, 0.12692801654338837, 0.3132616877555847, 0.6931471824645996, 1.3132617473602295, 2.1269280910491943, 3.0485873222351074]
>

iex> Axon.Activations.softplus(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [0.3125, 0.1259765625, 0.04833984375],
    [1.3125, 2.125, 3.046875]
  ]
>

Softsign activation.

$$ f(x_i) = \frac{x_i}{|x_i| + 1} $$

Examples

iex> Axon.Activations.softsign(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.75, -0.6666666865348816, -0.5, 0.0, 0.5, 0.6666666865348816, 0.75]
>

iex> Axon.Activations.softsign(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.5, -0.6640625, -0.75],
    [0.5, 0.6640625, 0.75]
  ]
>

Hyperbolic tangent activation.

$$ f(x_i) = \tanh(x_i) $$

Examples

iex> Axon.Activations.tanh(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
  f32[data: 7]
  [-0.9950547814369202, -0.9640275835990906, -0.7615941762924194, 0.0, 0.7615941762924194, 0.9640275835990906, 0.9950547814369202]
>

iex> Axon.Activations.tanh(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
  bf16[batch: 2][data: 3]
  [
    [-0.7578125, -0.9609375, -0.9921875],
    [0.7578125, 0.9609375, 0.9921875]
  ]
>