View Source Axon.Activations (Axon v0.6.1)
Activation functions.
Activation functions are element-wise, (typically) non-linear functions called on the output of another layer, such as a dense layer:
x
|> dense(weight, bias)
|> relu()
Activation functions output the "activation" or how active a given layer's neurons are in learning a representation of the data-generating distribution.
Some activations are commonly used as output activations. For
example softmax
is often used as the output in multiclass
classification problems because it returns a categorical
probability distribution:
iex> Axon.Activations.softmax(Nx.tensor([[1, 2, 3]], type: {:f, 32}))
#Nx.Tensor<
f32[1][3]
[
[0.09003057330846786, 0.2447284758090973, 0.6652409434318542]
]
>
Other activations such as tanh
or sigmoid
are used because
they have desirable properties, such as keeping the output
tensor constrained within a certain range.
Generally, the choice of activation function is arbitrary; although some activations work better than others in certain problem domains. For example ReLU (rectified linear unit) activation is a widely-accepted default. You can see a list of activation functions and implementations here.
All of the functions in this module are implemented as
numerical functions and can be JIT or AOT compiled with
any supported Nx
compiler.
Summary
Functions
Continuously-differentiable exponential linear unit activation.
Exponential linear unit activation.
Exponential activation.
Gaussian error linear unit activation.
Hard sigmoid activation.
Hard sigmoid weighted linear unit activation.
Hard hyperbolic tangent activation.
Leaky rectified linear unit activation.
Linear activation.
Log-sigmoid activation.
Log-softmax activation.
Logsumexp activation.
Mish activation.
Rectified linear unit 6 activation.
Rectified linear unit activation.
Scaled exponential linear unit activation.
Sigmoid activation.
Sigmoid weighted linear unit activation.
Softmax activation along an axis.
Softplus activation.
Softsign activation.
Hyperbolic tangent activation.
Functions
Continuously-differentiable exponential linear unit activation.
$$ f(x_i) = \max(0, x_i) + \min(0, \alpha * e^{\frac{x_i}{\alpha}} - 1) $$
Options
alpha
- $\alpha$ in CELU formulation. Must be non-zero. Defaults to1.0
Examples
iex> Axon.Activations.celu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
f32[7]
[-0.9502129554748535, -0.8646647334098816, -0.6321205496788025, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.celu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}))
#Nx.Tensor<
bf16[2][3]
[
[-0.62890625, -0.86328125, -0.94921875],
[1.0, 2.0, 3.0]
]
>
Error cases
iex> Axon.Activations.celu(Nx.tensor([0.0, 1.0, 2.0], type: {:f, 32}), alpha: 0.0)
** (ArgumentError) :alpha must be non-zero in CELU activation
References
Exponential linear unit activation.
Equivalent to celu
for $\alpha = 1$
$$ f(x_i) = \begin{cases}x_i & x _i > 0 \newline \alpha * (e^{x_i} - 1) & x_i \leq 0 \\ \end{cases} $$
Options
alpha
- $\alpha$ in ELU formulation. Defaults to1.0
Examples
iex> Axon.Activations.elu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
f32[7]
[-0.9502129554748535, -0.8646647334098816, -0.6321205496788025, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.elu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}))
#Nx.Tensor<
bf16[2][3]
[
[-0.62890625, -0.86328125, -0.94921875],
[1.0, 2.0, 3.0]
]
>
References
Exponential activation.
$$ f(x_i) = e^{x_i} $$
Examples
iex> Axon.Activations.exp(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.049787066876888275, 0.1353352814912796, 0.3678794503211975, 1.0, 2.7182817459106445, 7.389056205749512, 20.08553695678711]
>
iex> Axon.Activations.exp(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.3671875, 0.134765625, 0.049560546875],
[2.703125, 7.375, 20.0]
]
>
Gaussian error linear unit activation.
$$ f(x_i) = \frac{x_i}{2}(1 + {erf}(\frac{x_i}{\sqrt{2}})) $$
Examples
iex> Axon.Activations.gelu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.0040496885776519775, -0.04550027847290039, -0.15865525603294373, 0.0, 0.8413447141647339, 1.9544997215270996, 2.995950222015381]
>
iex> Axon.Activations.gelu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.16015625, -0.046875, -0.005859375],
[0.83984375, 1.953125, 2.984375]
]
>
References
Hard sigmoid activation.
Examples
iex> Axon.Activations.hard_sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.0, 0.0, 0.0, 0.20000000298023224, 0.4000000059604645, 0.6000000238418579, 0.800000011920929]
>
iex> Axon.Activations.hard_sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[7.781982421875e-4, 0.0, 0.0],
[0.3984375, 0.59765625, 0.796875]
]
>
Hard sigmoid weighted linear unit activation.
$$ f(x_i) = \begin{cases} 0 & x_i \leq -3 \newline x & x_i \geq 3 \newline \frac{x_i^2}{6} + \frac{x_i}{2} & otherwise \end{cases} $$
Examples
iex> Axon.Activations.hard_silu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.0, -0.0, -0.0, 0.0, 0.4000000059604645, 1.2000000476837158, 2.4000000953674316]
>
iex> Axon.Activations.hard_silu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-7.781982421875e-4, -0.0, -0.0],
[0.3984375, 1.1953125, 2.390625]
]
>
Hard hyperbolic tangent activation.
$$ f(x_i) = \begin{cases} 1 & x > 1 \newline -1 & x < -1 \newline x & otherwise \end{cases} $$
Examples
iex> Axon.Activations.hard_tanh(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-1.0, -1.0, -1.0, 0.0, 1.0, 1.0, 1.0]
>
iex> Axon.Activations.hard_tanh(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-1.0, -1.0, -1.0],
[1.0, 1.0, 1.0]
]
>
Leaky rectified linear unit activation.
$$ f(x_i) = \begin{cases} x & x \geq 0 \newline \alpha * x & otherwise \end{cases} $$
Options
:alpha
- $\alpha$ in Leaky ReLU formulation. Defaults to1.0e-2
Examples
iex> Axon.Activations.leaky_relu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]), alpha: 0.5)
#Nx.Tensor<
f32[data: 7]
[-1.5, -1.0, -0.5, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.leaky_relu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], names: [:batch, :data]), alpha: 0.5)
#Nx.Tensor<
f32[batch: 2][data: 3]
[
[-0.5, -1.0, -1.5],
[1.0, 2.0, 3.0]
]
>
Linear activation.
$$ f(x_i) = x_i $$
Examples
iex> Axon.Activations.linear(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.linear(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-1.0, -2.0, -3.0],
[1.0, 2.0, 3.0]
]
>
Log-sigmoid activation.
$$ f(x_i) = \log(sigmoid(x)) $$
Examples
iex> Axon.Activations.log_sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-3.0485873222351074, -2.1269280910491943, -1.3132617473602295, -0.6931471824645996, -0.3132616877555847, -0.12692801654338837, -0.04858734831213951]
>
iex> Axon.Activations.log_sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-1.3125, -2.125, -3.046875],
[-0.3125, -0.1259765625, -0.04833984375]
]
>
Log-softmax activation.
$$ f(x_i) = -log( um{e^x_i}) $$
Examples
iex> Axon.Activations.log_softmax(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-6.457762718200684, -5.457762718200684, -4.457762718200684, -3.4577627182006836, -2.4577627182006836, -1.4577628374099731, -0.45776283740997314]
>
iex> Axon.Activations.log_softmax(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.404296875, -1.3984375, -2.390625],
[-2.390625, -1.3984375, -0.404296875]
]
>
Logsumexp activation.
$$ \log(sum e^x_i) $$
Examples
iex> Axon.Activations.log_sumexp(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 1]
[3.4577627182006836]
>
iex> Axon.Activations.log_sumexp(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 1]
[
[-0.59375],
[3.390625]
]
>
Mish activation.
$$ f(x_i) = x_i* \tanh(\log(1 + e^x_i)) $$
Examples
iex> Axon.Activations.mish(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.14564745128154755, -0.2525014877319336, -0.30340147018432617, 0.0, 0.8650984168052673, 1.9439589977264404, 2.98653507232666]
>
iex> Axon.Activations.mish(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.30078125, -0.25, -0.1435546875],
[0.86328125, 1.9375, 2.96875]
]
>
Rectified linear unit 6 activation.
$$ f(x_i) = \min_i(\max_i(x, 0), 6) $$
Examples
iex> Axon.Activations.relu6(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
f32[7]
[0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.relu6(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.0, 0.0, 0.0],
[1.0, 2.0, 3.0]
]
>
References
Rectified linear unit activation.
$$ f(x_i) = \max_i(x, 0) $$
Examples
iex> Axon.Activations.relu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.relu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.0, 0.0, 0.0],
[1.0, 2.0, 3.0]
]
>
Scaled exponential linear unit activation.
$$ f(x_i) = \begin{cases} \lambda x & x \geq 0 \newline \lambda \alpha(e^{x} - 1) & x < 0 \end{cases} $$
$$ \alpha \approx 1.6733 $$ $$ \lambda \approx 1.0507 $$
Examples
iex> Axon.Activations.selu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-1.670568823814392, -1.5201665163040161, -1.1113307476043701, 0.0, 1.0507010221481323, 2.1014020442962646, 3.1521029472351074]
>
iex> Axon.Activations.selu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-1.09375, -1.5078125, -1.6640625],
[1.046875, 2.09375, 3.140625]
]
>
References
Sigmoid activation.
$$ f(x_i) = \frac{1}{1 + e^{-x_i}} $$
Implementation Note: Sigmoid logits are cached as metadata in the expression and can be used in calculations later on. For example, they are used in cross-entropy calculations for better stability.
Examples
iex> Axon.Activations.sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.04742587357759476, 0.11920291930437088, 0.2689414322376251, 0.5, 0.7310585975646973, 0.8807970881462097, 0.9525741338729858]
>
iex> Axon.Activations.sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.267578125, 0.119140625, 0.04736328125],
[0.73046875, 0.87890625, 0.94921875]
]
>
Sigmoid weighted linear unit activation.
$$ f(x_i) = x * sigmoid(x) $$
Examples
iex> Axon.Activations.silu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.14227762818336487, -0.23840583860874176, -0.2689414322376251, 0.0, 0.7310585975646973, 1.7615941762924194, 2.857722282409668]
>
iex> Axon.Activations.silu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.267578125, -0.23828125, -0.1416015625],
[0.73046875, 1.7578125, 2.84375]
]
>
References
Softmax activation along an axis.
$$ \frac{e^{x_i}}{\sum_i e^{x_i}} $$
Implementation Note: Softmax logits are cached as metadata in the expression and can be used in calculations later on. For example, they are used in cross-entropy calculations for better stability.
Options
:axis
- softmax axis along which to calculate distribution. Defaults to 1.
Examples
iex> Axon.Activations.softmax(Nx.tensor([[-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]], names: [:batch, :data]))
#Nx.Tensor<
f32[batch: 1][data: 7]
[
[0.0015683004166930914, 0.004263082519173622, 0.011588259600102901, 0.03150015324354172, 0.08562629669904709, 0.23275642096996307, 0.6326975226402283]
]
>
iex> Axon.Activations.softmax(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.6640625, 0.2431640625, 0.08935546875],
[0.08935546875, 0.2431640625, 0.6640625]
]
>
Softplus activation.
$$ \log(1 + e^x_i) $$
Examples
iex> Axon.Activations.softplus(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.04858734831213951, 0.12692801654338837, 0.3132616877555847, 0.6931471824645996, 1.3132617473602295, 2.1269280910491943, 3.0485873222351074]
>
iex> Axon.Activations.softplus(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.3125, 0.1259765625, 0.04833984375],
[1.3125, 2.125, 3.046875]
]
>
Softsign activation.
$$ f(x_i) = \frac{x_i}{|x_i| + 1} $$
Examples
iex> Axon.Activations.softsign(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.75, -0.6666666865348816, -0.5, 0.0, 0.5, 0.6666666865348816, 0.75]
>
iex> Axon.Activations.softsign(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.5, -0.6640625, -0.75],
[0.5, 0.6640625, 0.75]
]
>
Hyperbolic tangent activation.
$$ f(x_i) = \tanh(x_i) $$
Examples
iex> Axon.Activations.tanh(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.9950547814369202, -0.9640275835990906, -0.7615941762924194, 0.0, 0.7615941762924194, 0.9640275835990906, 0.9950547814369202]
>
iex> Axon.Activations.tanh(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.7578125, -0.9609375, -0.9921875],
[0.7578125, 0.9609375, 0.9921875]
]
>