Nasty.Statistics.SequenceLabeling.Optimizer (Nasty v0.3.0)

View Source

Gradient-based optimization for CRF training.

Implements gradient descent with momentum and L2 regularization for training linear-chain CRFs.

Optimization Methods

  • SGD with Momentum: Stochastic gradient descent with momentum term
  • AdaGrad: Adaptive learning rates per parameter
  • L-BFGS (simplified): Limited-memory quasi-Newton method

Regularization

L2 regularization (ridge) to prevent overfitting:

loss = -log_likelihood + λ * ||weights||²

Summary

Functions

Adds weight values element-wise.

Clips gradient values to prevent exploding gradients.

Checks if optimization has converged.

Computes gradient norm (L2 norm of gradient vector).

Initializes weights with small random values.

Creates a new optimizer with specified configuration.

Applies L2 regularization gradient.

Applies L2 regularization to weights.

Scales all weight values by a constant.

Performs one optimization step.

Types

gradient()

@type gradient() :: map()

optimizer_state()

@type optimizer_state() :: %{
  method: atom(),
  learning_rate: float(),
  momentum: float(),
  regularization: float(),
  velocity: map(),
  iteration: non_neg_integer()
}

weights()

@type weights() :: map()

Functions

add_weights(weights1, weights2)

@spec add_weights(weights(), weights()) :: weights()

Adds weight values element-wise.

Used for accumulating gradients.

clip_gradient(gradient, opts \\ [])

@spec clip_gradient(
  gradient(),
  keyword()
) :: gradient()

Clips gradient values to prevent exploding gradients.

Options

  • :max_norm - Maximum gradient norm (default: 5.0)

converged?(gradient, prev_loss, curr_loss, opts \\ [])

@spec converged?(gradient(), float(), float(), keyword()) :: boolean()

Checks if optimization has converged.

Convergence Criteria

  • Gradient norm < threshold
  • Relative improvement < threshold
  • Maximum iterations reached

gradient_norm(gradient)

@spec gradient_norm(gradient()) :: float()

Computes gradient norm (L2 norm of gradient vector).

Used for convergence checking.

initialize_weights(keys, opts \\ [])

@spec initialize_weights(
  [term()],
  keyword()
) :: weights()

Initializes weights with small random values.

Helps break symmetry and improve convergence.

learning_rate_schedule(initial_lr, iteration, opts \\ [])

@spec learning_rate_schedule(float(), non_neg_integer(), keyword()) :: float()

Computes learning rate decay.

Schedules

  • :constant - No decay
  • :step - Decay by factor every N steps
  • :exponential - Exponential decay
  • :inverse - 1 / (1 + decay * iteration)

new(opts \\ [])

@spec new(keyword()) :: optimizer_state()

Creates a new optimizer with specified configuration.

Options

  • :method - Optimization method (:sgd, :momentum, :adagrad) (default: :momentum)
  • :learning_rate - Initial learning rate (default: 0.1)
  • :momentum - Momentum coefficient (default: 0.9)
  • :regularization - L2 regularization strength (default: 1.0)

regularization_gradient(weights, lambda)

@spec regularization_gradient(weights(), float()) :: gradient()

Applies L2 regularization gradient.

Gradient of regularization term: λ * w

regularize_weights(weights, lambda)

@spec regularize_weights(weights(), float()) :: float()

Applies L2 regularization to weights.

Adds penalty term: λ/2 * ||w||²

scale_weights(weights, scale)

@spec scale_weights(weights(), float()) :: weights()

Scales all weight values by a constant.

step(weights, gradient, state)

@spec step(weights(), gradient(), optimizer_state()) :: {weights(), optimizer_state()}

Performs one optimization step.

Updates weights based on computed gradient.

Parameters

  • weights - Current model weights
  • gradient - Gradient of loss function
  • state - Optimizer state

Returns

{updated_weights, updated_state}