Model
Axon
- Sections
- Summary
- Types
  - t/0
- Layers: Special
  - constant/2
  - container/2
  - input/2
  - layer/3
  - namespace/2
  - nx/3
  - optional/2
  - param/3
- Layers: Activation
  - activation/3
  - celu/2
  - elu/2
  - exp/2
  - gelu/2
  - hard_sigmoid/2
  - hard_silu/2
  - hard_tanh/2
  - leaky_relu/2
  - linear/2
  - log_sigmoid/2
  - log_softmax/2
  - log_sumexp/2
  - mish/2
  - relu6/2
  - relu/2
  - selu/2
  - sigmoid/2
  - silu/2
  - softmax/2
  - softplus/2
  - softsign/2
  - tanh/2
- Layers: Linear
  - bias/2
  - bilinear/4
  - dense/3
  - embedding/4
- Layers: Convolution
- Layers: Dropout
- Layers: Pooling
- Layers: Normalization
- Layers: Recurrent
  - conv_lstm/2
  - conv_lstm/3
  - conv_lstm/4
  - gru/2
  - gru/3
  - gru/4
  - lstm/2
  - lstm/3
  - lstm/4
- Layers: Combinators
  - add/3
  - concatenate/3
  - cond/5
  - multiply/3
  - split/3
  - subtract/3
- Layers: Shape
  - flatten/2
  - pad/4
  - reshape/3
  - resize/3
  - transpose/3
- Model
- Model: Manipulation
- Model: Debugging
- Functions
  - input/3
Axon.MixedPrecision
- Summary
- Functions
  - create_policy/1
Axon.None
Axon.StatefulOutput
Summary
Axon.Display
- Summary
- Functions
  - as_graph/3
  - as_table/2
Functional
Axon.Activations
- Summary
- Functions
  - celu/2
  - elu/2
  - exp/1
  - gelu/1
  - hard_sigmoid/2
  - hard_silu/2
  - hard_tanh/1
  - leaky_relu/2
  - linear/1
  - log_sigmoid/1
  - log_softmax/2
  - log_sumexp/2
  - mish/1
  - relu6/1
  - relu/1
  - selu/2
  - sigmoid/1
  - silu/1
  - softmax/2
  - softplus/1
  - softsign/1
  - tanh/1
Axon.Initializers
- Summary
- Functions
Axon.Layers
- Summary
- Layers: Linear
- Layers: Dropout
- Layers: Pooling
- Layers: Normalization
- Layers: Shape
  - flatten/2
  - resize/2
- Functions: Convolutional
- Functions
  - celu/2
  - conv_lstm/6
  - conv_lstm_cell/6
  - dynamic_unroll/6
  - elu/2
  - gru/6
  - gru_cell/7
  - hard_sigmoid/2
  - hard_silu/2
  - leaky_relu/2
  - log_softmax/2
  - log_sumexp/2
  - lstm/6
  - lstm_cell/7
  - multiply/2
  - selu/2
  - softmax/2
  - static_unroll/6
  - subtract/2
Axon.LossScale
- Summary
- Functions
Axon.Losses
- Summary
- Functions
Axon.Metrics
- Summary
- Functions
Optimization
Axon.Optimizers
- Sections
  - Example
- Summary
- Functions
  - adabelief/2
  - adagrad/2
  - adam/2
  - adamw/2
  - lamb/2
  - noisy_sgd/2
  - radam/2
  - rmsprop/2
  - sgd/2
  - yogi/2
Axon.Schedules
- Summary
- Functions
Axon.Updates
- Sections
  - Custom combinators
- Summary
- Functions
Loop
Axon.Loop
- Sections
- Summary
- Functions
Axon.Loop.State
Exceptions
Axon.CompileError
- Summary
- Functions
  - message/1

View Source Axon.Schedules (Axon v0.3.1)

Parameter Schedules.

Parameter schedules are often used to anneal hyperparameters such as the learning rate during the training process. Schedules provide a mapping from the current time step to a learning rate or another hyperparameter.

Choosing a good learning rate and consequently a good learning rate schedule is typically a process of trial and error. Learning rates should be relatively small such that the learning curve does not oscillate violently during the training process, but not so small that learning proceeds too slowly. Using a schedule slowly decreases oscillations during the training process such that, as the model converges, training also becomes more stable.

All of the functions in this module are implemented as numerical functions and can be JIT or AOT compiled with any supported Nx compiler.

Link to this section Summary

Functions

constant(opts \\ [])

Constant schedule.

cosine_decay(opts \\ [])

Cosine decay schedule.

exponential_decay(opts \\ [])

Exponential decay schedule.

polynomial_decay(opts \\ [])

Polynomial schedule.

Link to this section Functions

constant(opts \\ [])

Constant schedule.

$\gamma(t) = \gamma_0$

options
Options

:init_value - initial value. $\gamma_0$ in above formulation. Defaults to 1.0e-2

cosine_decay(opts \\ [])

Cosine decay schedule.

$$\gamma(t) = \gamma_0 (1 - \alpha)(\frac{1}{2}(1 + \cos{\pi \frac{t}{k}})) + \alpha$$

options
Options

:init_value - initial value. $\gamma_0$ in above formulation. Defaults to 1.0e-2
:decay_steps - number of steps to apply decay for. $k$ in above formulation. Defaults to 10
:alpha - minimum value of multiplier adjusting learning rate. $\alpha$ in above formulation. Defaults to 0.0

references
References

SGDR: Stochastic Gradient Descent with Warm Restarts

exponential_decay(opts \\ [])

Exponential decay schedule.

$\gamma(t) = \gamma_0 * r^{\frac{t}{k}}$

options
Options

:init_value - initial value. $\gamma$ in above formulation. Defaults to 1.0e-2
:decay_rate - rate of decay. $r$ in above formulation. Defaults to 0.95
:transition_steps - steps per transition. $k$ in above formulation. Defaults to 10
:transition_begin - step to begin transition. Defaults to 0
:staircase - discretize outputs. Defaults to false

polynomial_decay(opts \\ [])

Polynomial schedule.

$\gamma(t) = (\gamma_0 - \gamma_n) * (1 - \frac{t}{k})^p$

options
Options

:init_value - initial value. $\gamma_0$ in above formulation. Defaults to 1.0e-2
:end_value - end value of annealed scalar. $\gamma_n$ in above formulation. Defaults to 1.0e-3
:power - power of polynomial. $p$ in above formulation. Defaults to 2
:transition_steps - number of steps over which annealing takes place. $k$ in above formulation. Defaults to 10

Settings View Source Axon.Schedules (Axon v0.3.1)

Link to this section Summary

Functions

Link to this section Functions

constant(opts \\ [])

options Options

cosine_decay(opts \\ [])

options Options

references References

exponential_decay(opts \\ [])

options Options

polynomial_decay(opts \\ [])

options Options

View Source Axon.Schedules (Axon v0.3.1)

options
Options

options
Options

references
References

options
Options

options
Options