View Source Polaris.Optimizers (polaris v0.1.0)

Implementations of common gradient-based optimization algorithms.

All of the methods in this module are written in terms of the update methods defined in Polaris.Updates. Polaris treats optimizers as the tuple:

{init_fn, update_fn}

where init_fn returns an initial optimizer state and update_fn scales input gradients. init_fn accepts a model's parameters and attaches state to each parameter. update_fn accepts gradients, optimizer state, and current model parameters and returns updated optimizer state and gradients.

Custom optimizers are often created via the Polaris.Updates API.

example

Example

Consider the following usage of the Adam optimizer in a basic update function (assuming objective and the dataset are defined elsewhere):

defmodule Learning do

  import Nx.Defn

  defn init(params, init_fn) do
    init_fn.(params)
  end

  defn update(params, optimizer_state, inputs, targets, update_fn) do
    {loss, gradient} = value_and_grad(params, &objective(&1, inputs, targets))
    {scaled_updates, new_optimizer_state} = update_fn.(gradient, optimizer_state, params)
    {Polaris.Updates.apply_updates(params, scaled_updates), new_optimizer_state, loss}
  end
end

{model_params, _key} = Nx.Random.uniform(key, shape: {784, 10})
{init_fn, update_fn} = Polaris.Optimizers.adam(0.005)

optimizer_state =
  Learning.init(params, init_fn)

{new_params, new_optimizer_state, loss} =
  Learning.update(params, optimizer_state, inputs, targets, update_fn)

Link to this section Summary

Functions

Adabelief optimizer.

Adagrad optimizer.

Adam optimizer.

Adam with weight decay optimizer.

Lamb optimizer.

Noisy SGD optimizer.

Rectified Adam optimizer.

RMSProp optimizer.

SGD optimizer.

Yogi optimizer.

Link to this section Functions

Adabelief optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-3
  • :b1 - first moment decay. Defaults to 0.9
  • :b2 - second moment decay. Defaults to 0.999
  • :eps - numerical stability term. Defaults to 0.0
  • :eps_root - numerical stability term. Defaults to 1.0e-16

references

References

Adagrad optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-3
  • :eps - numerical stability term. Defaults to 1.0e-7

references

References

Adam optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-3
  • :b1 - first moment decay. Defaults to 0.9
  • :b2 - second moment decay. Defaults to 0.999
  • :eps - numerical stability term. Defaults to 1.0e-8
  • :eps_root - numerical stability term. Defaults to 1.0e-15

references

References

Adam with weight decay optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-3
  • :b1 - first moment decay. Defaults to 0.9
  • :b2 - second moment decay. Defaults to 0.999
  • :eps - numerical stability term. Defaults to 1.0e-8
  • :eps_root - numerical stability term. Defaults to 0.0
  • :decay - weight decay. Defaults to 0.0

Lamb optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-2
  • :b1 - first moment decay. Defaults to 0.9
  • :b2 - second moment decay. Defaults to 0.999
  • :eps - numerical stability term. Defaults to 1.0e-8
  • :eps_root - numerical stability term. Defaults to 0.0
  • :decay - weight decay. Defaults to 0.0
  • :min_norm - minimum norm value. Defaults to 0.0

references

References

Noisy SGD optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-2
  • :eta - used to compute variance of noise distribution. Defaults to 0.1
  • :gamma - used to compute variance of noise distribution. Defaults to 0.55

Rectified Adam optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-3
  • :b1 - first moment decay. Defaults to 0.9
  • :b2 - second moment decay. Defaults to 0.999
  • :eps - numerical stability term. Defaults to 1.0e-8
  • :eps_root - numerical stability term. Defaults to 0.0
  • :threshold - threshold term. Defaults to 5.0

references

References

RMSProp optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-2
  • :centered - whether to scale by centered root of EMA of squares. Defaults to false
  • :momentum - momentum term. If set, uses SGD with momentum and decay set to value of this term.
  • :nesterov - whether or not to use nesterov momentum. Defaults to false
  • :initial_scale - initial value of EMA. Defaults to 0.0
  • :decay - EMA decay rate. Defaults to 0.9
  • :eps - numerical stability term. Defaults to 1.0e-8

SGD optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-2
  • :momentum - momentum term. If set, uses SGD with momentum and decay set to value of this term.
  • :nesterov - whether or not to use nesterov momentum. Defaults to false

Yogi optimizer.

options

Options

  • :learning_rate - the learning rate for the optimizer. Defaults to 1.0e-2
  • :initial_accumulator_value - initial value for first and second moment. Defaults to 0.0
  • :b1 - first moment decay. Defaults to 0.9
  • :b2 - second moment decay. Defaults to 0.999
  • :eps - numerical stability term. Defaults to 1.0e-8
  • :eps_root - numerical stability term. Defaults to 0.0

references

References