View Source Axon.Optimizers (Axon v0.3.1)
Implementations of common gradient-based optimization algorithms.
All of the methods in this module are written in terms of
the update methods defined in Axon.Updates
. Axon treats
optimizers as the tuple:
{init_fn, update_fn}
where init_fn
returns an initial optimizer state and update_fn
scales input gradients. init_fn
accepts a model's parameters
and attaches state to each parameter. update_fn
accepts
gradients, optimizer state, and current model parameters and
returns updated optimizer state and gradients.
Custom optimizers are often created via the Axon.Updates
API.
example
Example
Consider the following usage of the Adam optimizer in a basic
update function (assuming objective
and the dataset
are
defined elsewhere):
defmodule Learning do
import Nx.Defn
defn init(params, init_fn) do
init_fn.(params)
end
defn update(params, optimizer_state, inputs, targets, update_fn) do
{loss, gradient} = value_and_grad(params, &objective(&1, inputs, targets))
{scaled_updates, new_optimizer_state} = update_fn.(gradient, optimizer_state, params)
{Axon.Updates.apply_updates(params, scaled_updates), new_optimizer_state, loss}
end
end
model_params = Nx.random_uniform({784, 10})
{init_fn, update_fn} = Axon.Optimizers.adam(0.005)
optimizer_state =
Learning.init(params, init_fn)
{new_params, new_optimizer_state, loss} =
Learning.update(params, optimizer_state, inputs, targets, update_fn)
For a simpler approach, you can also use optimizers with the training API:
model
|> Axon.Loop.trainer(:categorical_cross_entropy, Axon.Optimizers.adam(0.005))
|> Axon.Loop.run(data, epochs: 10, compiler: EXLA)
Link to this section Summary
Functions
Adabelief optimizer.
Adagrad optimizer.
Adam optimizer.
Adam with weight decay optimizer.
Lamb optimizer.
Noisy SGD optimizer.
Rectified Adam optimizer.
RMSProp optimizer.
SGD optimizer.
Yogi optimizer.
Link to this section Functions
Adabelief optimizer.
options
Options
:b1
- first moment decay. Defaults to0.9
:b2
- second moment decay. Defaults to0.999
:eps
- numerical stability term. Defaults to0.0
:eps_root
- numerical stability term. Defaults to1.0e-16
references
References
Adagrad optimizer.
options
Options
:eps
- numerical stability term. Defaults to1.0e-7
references
References
Adam optimizer.
options
Options
:b1
- first moment decay. Defaults to0.9
:b2
- second moment decay. Defaults to0.999
:eps
- numerical stability term. Defaults to1.0e-8
:eps_root
- numerical stability term. Defaults to1.0e-15
references
References
Adam with weight decay optimizer.
options
Options
:b1
- first moment decay. Defaults to0.9
:b2
- second moment decay. Defaults to0.999
:eps
- numerical stability term. Defaults to1.0e-8
:eps_root
- numerical stability term. Defaults to0.0
:decay
- weight decay. Defaults to0.0
Lamb optimizer.
options
Options
:b1
- first moment decay. Defaults to0.9
:b2
- second moment decay. Defaults to0.999
:eps
- numerical stability term. Defaults to1.0e-8
:eps_root
- numerical stability term. Defaults to0.0
:decay
- weight decay. Defaults to0.0
:min_norm
- minimum norm value. Defaults to0.0
references
References
Noisy SGD optimizer.
options
Options
:eta
- used to compute variance of noise distribution. Defaults to0.1
:gamma
- used to compute variance of noise distribution. Defaults to0.55
Rectified Adam optimizer.
options
Options
:b1
- first moment decay. Defaults to0.9
:b2
- second moment decay. Defaults to0.999
:eps
- numerical stability term. Defaults to1.0e-8
:eps_root
- numerical stability term. Defaults to0.0
:threshold
- threshold term. Defaults to5.0
references
References
RMSProp optimizer.
options
Options
:centered
- whether to scale by centered root of EMA of squares. Defaults tofalse
:momentum
- momentum term. If set, uses SGD with momentum and decay set to value of this term.:nesterov
- whether or not to use nesterov momentum. Defaults tofalse
:initial_scale
- initial value of EMA. Defaults to0.0
:decay
- EMA decay rate. Defaults to0.9
:eps
- numerical stability term. Defaults to1.0e-8
SGD optimizer.
options
Options
:momentum
- momentum term. If set, uses SGD with momentum and decay set to value of this term.:nesterov
- whether or not to use nesterov momentum. Defaults tofalse
Yogi optimizer.
options
Options
:initial_accumulator_value
- initial value for first and second moment. Defaults to0.0
:b1
- first moment decay. Defaults to0.9
:b2
- second moment decay. Defaults to0.999
:eps
- numerical stability term. Defaults to1.0e-8
:eps_root
- numerical stability term. Defaults to0.0