View Source Axon.Activations (Axon v0.6.0)
Activation functions.
Activation functions are element-wise, (typically) non-linear functions called on the output of another layer, such as a dense layer:
x
|> dense(weight, bias)
|> relu()
Activation functions output the "activation" or how active a given layer's neurons are in learning a representation of the data-generating distribution.
Some activations are commonly used as output activations. For
example softmax
is often used as the output in multiclass
classification problems because it returns a categorical
probability distribution:
iex> Axon.Activations.softmax(Nx.tensor([[1, 2, 3]], type: {:f, 32}))
#Nx.Tensor<
f32[1][3]
[
[0.09003057330846786, 0.2447284758090973, 0.6652409434318542]
]
>
Other activations such as tanh
or sigmoid
are used because
they have desirable properties, such as keeping the output
tensor constrained within a certain range.
Generally, the choice of activation function is arbitrary; although some activations work better than others in certain problem domains. For example ReLU (rectified linear unit) activation is a widely-accepted default. You can see a list of activation functions and implementations here.
All of the functions in this module are implemented as
numerical functions and can be JIT or AOT compiled with
any supported Nx
compiler.
Link to this section Summary
Functions
Continuously-differentiable exponential linear unit activation.
Exponential linear unit activation.
Exponential activation.
Gaussian error linear unit activation.
Hard sigmoid activation.
Hard sigmoid weighted linear unit activation.
Hard hyperbolic tangent activation.
Leaky rectified linear unit activation.
Linear activation.
Log-sigmoid activation.
Log-softmax activation.
Logsumexp activation.
Mish activation.
Rectified linear unit 6 activation.
Rectified linear unit activation.
Scaled exponential linear unit activation.
Sigmoid activation.
Sigmoid weighted linear unit activation.
Softmax activation along an axis.
Softplus activation.
Softsign activation.
Hyperbolic tangent activation.
Link to this section Functions
Continuously-differentiable exponential linear unit activation.
$$f(x_i) = \max(0, x_i) + \min(0, \alpha * e^{\frac{x_i}{\alpha}} - 1)$$
options
Options
alpha
- $\alpha$ in CELU formulation. Must be non-zero. Defaults to1.0
examples
Examples
iex> Axon.Activations.celu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
f32[7]
[-0.9502129554748535, -0.8646647334098816, -0.6321205496788025, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.celu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}))
#Nx.Tensor<
bf16[2][3]
[
[-0.62890625, -0.86328125, -0.94921875],
[1.0, 2.0, 3.0]
]
>
error-cases
Error cases
iex> Axon.Activations.celu(Nx.tensor([0.0, 1.0, 2.0], type: {:f, 32}), alpha: 0.0)
** (ArgumentError) :alpha must be non-zero in CELU activation
references
References
Exponential linear unit activation.
Equivalent to celu
for $\alpha = 1$
$$f(x_i) = \begin{cases}x_i & x _i > 0 \newline \alpha * (e^{x_i} - 1) & x_i \leq 0 \ \end{cases}$$
options
Options
alpha
- $\alpha$ in ELU formulation. Defaults to1.0
examples
Examples
iex> Axon.Activations.elu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
f32[7]
[-0.9502129554748535, -0.8646647334098816, -0.6321205496788025, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.elu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}))
#Nx.Tensor<
bf16[2][3]
[
[-0.62890625, -0.86328125, -0.94921875],
[1.0, 2.0, 3.0]
]
>
references
References
Exponential activation.
$$f(x_i) = e^{x_i}$$
examples
Examples
iex> Axon.Activations.exp(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.049787066876888275, 0.1353352814912796, 0.3678794503211975, 1.0, 2.7182817459106445, 7.389056205749512, 20.08553695678711]
>
iex> Axon.Activations.exp(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.3671875, 0.134765625, 0.049560546875],
[2.703125, 7.375, 20.0]
]
>
Gaussian error linear unit activation.
$$f(x_i) = \frac{x_i}{2}(1 + {erf}(\frac{x_i}{\sqrt{2}}))$$
examples
Examples
iex> Axon.Activations.gelu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.0040496885776519775, -0.04550027847290039, -0.15865525603294373, 0.0, 0.8413447141647339, 1.9544997215270996, 2.995950222015381]
>
iex> Axon.Activations.gelu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.16015625, -0.046875, -0.005859375],
[0.83984375, 1.953125, 2.984375]
]
>
references
References
Hard sigmoid activation.
examples
Examples
iex> Axon.Activations.hard_sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.0, 0.0, 0.0, 0.20000000298023224, 0.4000000059604645, 0.6000000238418579, 0.800000011920929]
>
iex> Axon.Activations.hard_sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[7.781982421875e-4, 0.0, 0.0],
[0.3984375, 0.59765625, 0.796875]
]
>
Hard sigmoid weighted linear unit activation.
$$f(x_i) = \begin{cases} 0 & x_i \leq -3 \newline x & x_i \geq 3 \newline \frac{x_i^2}{6} + \frac{x_i}{2} & otherwise \end{cases}$$
examples
Examples
iex> Axon.Activations.hard_silu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.0, -0.0, -0.0, 0.0, 0.4000000059604645, 1.2000000476837158, 2.4000000953674316]
>
iex> Axon.Activations.hard_silu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-7.781982421875e-4, -0.0, -0.0],
[0.3984375, 1.1953125, 2.390625]
]
>
Hard hyperbolic tangent activation.
$$f(x_i) = \begin{cases} 1 & x > 1 \newline -1 & x < -1 \newline x & otherwise \end{cases}$$
examples
Examples
iex> Axon.Activations.hard_tanh(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-1.0, -1.0, -1.0, 0.0, 1.0, 1.0, 1.0]
>
iex> Axon.Activations.hard_tanh(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-1.0, -1.0, -1.0],
[1.0, 1.0, 1.0]
]
>
Leaky rectified linear unit activation.
$$f(x_i) = \begin{cases} x & x \geq 0 \newline \alpha * x & otherwise \end{cases}$$
options
Options
:alpha
- $\alpha$ in Leaky ReLU formulation. Defaults to1.0e-2
examples
Examples
iex> Axon.Activations.leaky_relu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]), alpha: 0.5)
#Nx.Tensor<
f32[data: 7]
[-1.5, -1.0, -0.5, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.leaky_relu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], names: [:batch, :data]), alpha: 0.5)
#Nx.Tensor<
f32[batch: 2][data: 3]
[
[-0.5, -1.0, -1.5],
[1.0, 2.0, 3.0]
]
>
Linear activation.
$$f(x_i) = x_i$$
examples
Examples
iex> Axon.Activations.linear(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.linear(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-1.0, -2.0, -3.0],
[1.0, 2.0, 3.0]
]
>
Log-sigmoid activation.
$$f(x_i) = \log(sigmoid(x))$$
examples
Examples
iex> Axon.Activations.log_sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-3.0485873222351074, -2.1269280910491943, -1.3132617473602295, -0.6931471824645996, -0.3132616877555847, -0.12692801654338837, -0.04858734831213951]
>
iex> Axon.Activations.log_sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-1.3125, -2.125, -3.046875],
[-0.3125, -0.1259765625, -0.04833984375]
]
>
Log-softmax activation.
$$f(x_i) = -log( um{e^x_i})$$
examples
Examples
iex> Axon.Activations.log_softmax(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-6.457762718200684, -5.457762718200684, -4.457762718200684, -3.4577627182006836, -2.4577627182006836, -1.4577628374099731, -0.45776283740997314]
>
iex> Axon.Activations.log_softmax(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.404296875, -1.3984375, -2.390625],
[-2.390625, -1.3984375, -0.404296875]
]
>
Logsumexp activation.
$$\log(sum e^x_i)$$
examples
Examples
iex> Axon.Activations.log_sumexp(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 1]
[0.45776283740997314]
>
iex> Axon.Activations.log_sumexp(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 1]
[
[0.404296875],
[0.404296875]
]
>
Mish activation.
$$f(x_i) = x_i* \tanh(\log(1 + e^x_i))$$
examples
Examples
iex> Axon.Activations.mish(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], type: {:f, 32}, names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.14564745128154755, -0.2525014877319336, -0.30340147018432617, 0.0, 0.8650984168052673, 1.9439589977264404, 2.98653507232666]
>
iex> Axon.Activations.mish(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.30078125, -0.25, -0.1435546875],
[0.86328125, 1.9375, 2.96875]
]
>
Rectified linear unit 6 activation.
$$f(x_i) = \min_i(\max_i(x, 0), 6)$$
examples
Examples
iex> Axon.Activations.relu6(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]))
#Nx.Tensor<
f32[7]
[0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.relu6(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.0, 0.0, 0.0],
[1.0, 2.0, 3.0]
]
>
references
References
Rectified linear unit activation.
$$f(x_i) = \max_i(x, 0)$$
examples
Examples
iex> Axon.Activations.relu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 3.0]
>
iex> Axon.Activations.relu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.0, 0.0, 0.0],
[1.0, 2.0, 3.0]
]
>
Scaled exponential linear unit activation.
$$f(x_i) = \begin{cases} \lambda x & x \geq 0 \newline \lambda \alpha(e^{x} - 1) & x < 0 \end{cases}$$
$$\alpha \approx 1.6733$$ $$\lambda \approx 1.0507$$
examples
Examples
iex> Axon.Activations.selu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-1.670568823814392, -1.5201665163040161, -1.1113307476043701, 0.0, 1.0507010221481323, 2.1014020442962646, 3.1521029472351074]
>
iex> Axon.Activations.selu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-1.09375, -1.5078125, -1.6640625],
[1.046875, 2.09375, 3.140625]
]
>
references
References
Sigmoid activation.
$$f(x_i) = \frac{1}{1 + e^{-x_i}}$$
Implementation Note: Sigmoid logits are cached as metadata in the expression and can be used in calculations later on. For example, they are used in cross-entropy calculations for better stability.
examples
Examples
iex> Axon.Activations.sigmoid(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.04742587357759476, 0.11920291930437088, 0.2689414322376251, 0.5, 0.7310585975646973, 0.8807970881462097, 0.9525741338729858]
>
iex> Axon.Activations.sigmoid(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.267578125, 0.119140625, 0.04736328125],
[0.73046875, 0.87890625, 0.94921875]
]
>
Sigmoid weighted linear unit activation.
$$f(x_i) = x * sigmoid(x)$$
examples
Examples
iex> Axon.Activations.silu(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.14227762818336487, -0.23840583860874176, -0.2689414322376251, 0.0, 0.7310585975646973, 1.7615941762924194, 2.857722282409668]
>
iex> Axon.Activations.silu(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.267578125, -0.23828125, -0.1416015625],
[0.73046875, 1.7578125, 2.84375]
]
>
references
References
Softmax activation along an axis.
$$\frac{e^{x_i}}{\sum_i e^{x_i}}$$
Implementation Note: Softmax logits are cached as metadata in the expression and can be used in calculations later on. For example, they are used in cross-entropy calculations for better stability.
options
Options
:axis
- softmax axis along which to calculate distribution. Defaults to 1.
examples
Examples
iex> Axon.Activations.softmax(Nx.tensor([[-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0]], names: [:batch, :data]))
#Nx.Tensor<
f32[batch: 1][data: 7]
[
[0.0015683004166930914, 0.004263082519173622, 0.011588259600102901, 0.03150015324354172, 0.08562629669904709, 0.23275642096996307, 0.6326975226402283]
]
>
iex> Axon.Activations.softmax(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.6640625, 0.2431640625, 0.08935546875],
[0.08935546875, 0.2431640625, 0.6640625]
]
>
Softplus activation.
$$\log(1 + e^x_i)$$
examples
Examples
iex> Axon.Activations.softplus(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[0.04858734831213951, 0.12692801654338837, 0.3132616877555847, 0.6931471824645996, 1.3132617473602295, 2.1269280910491943, 3.0485873222351074]
>
iex> Axon.Activations.softplus(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[0.3125, 0.1259765625, 0.04833984375],
[1.3125, 2.125, 3.046875]
]
>
Softsign activation.
$$f(x_i) = \frac{x_i}{|x_i| + 1}$$
examples
Examples
iex> Axon.Activations.softsign(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.75, -0.6666666865348816, -0.5, 0.0, 0.5, 0.6666666865348816, 0.75]
>
iex> Axon.Activations.softsign(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.5, -0.6640625, -0.75],
[0.5, 0.6640625, 0.75]
]
>
Hyperbolic tangent activation.
$$f(x_i) = \tanh(x_i)$$
examples
Examples
iex> Axon.Activations.tanh(Nx.tensor([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0], names: [:data]))
#Nx.Tensor<
f32[data: 7]
[-0.9950547814369202, -0.9640275835990906, -0.7615941762924194, 0.0, 0.7615941762924194, 0.9640275835990906, 0.9950547814369202]
>
iex> Axon.Activations.tanh(Nx.tensor([[-1.0, -2.0, -3.0], [1.0, 2.0, 3.0]], type: {:bf, 16}, names: [:batch, :data]))
#Nx.Tensor<
bf16[batch: 2][data: 3]
[
[-0.7578125, -0.9609375, -0.9921875],
[0.7578125, 0.9609375, 0.9921875]
]
>