Nasty.Statistics.SequenceLabeling.Optimizer (Nasty v0.3.0)

Gradient-based optimization for CRF training.

Implements gradient descent with momentum and L2 regularization for training linear-chain CRFs.

Optimization Methods

SGD with Momentum: Stochastic gradient descent with momentum term
AdaGrad: Adaptive learning rates per parameter
L-BFGS (simplified): Limited-memory quasi-Newton method

Regularization

L2 regularization (ridge) to prevent overfitting:

loss = -log_likelihood + λ * ||weights||²

Summary

Types

gradient()

optimizer_state()

weights()

Functions

add_weights(weights1, weights2)

Adds weight values element-wise.

clip_gradient(gradient, opts \\ [])

Clips gradient values to prevent exploding gradients.

converged?(gradient, prev_loss, curr_loss, opts \\ [])

Checks if optimization has converged.

gradient_norm(gradient)

Computes gradient norm (L2 norm of gradient vector).

initialize_weights(keys, opts \\ [])

Initializes weights with small random values.

learning_rate_schedule(initial_lr, iteration, opts \\ [])

Computes learning rate decay.

new(opts \\ [])

Creates a new optimizer with specified configuration.

regularization_gradient(weights, lambda)

Applies L2 regularization gradient.

regularize_weights(weights, lambda)

Applies L2 regularization to weights.

scale_weights(weights, scale)

Scales all weight values by a constant.

step(weights, gradient, state)

Performs one optimization step.

Types

gradient()

@type gradient() :: map()

optimizer_state()

@type optimizer_state() :: %{
  method: atom(),
  learning_rate: float(),
  momentum: float(),
  regularization: float(),
  velocity: map(),
  iteration: non_neg_integer()
}

weights()

@type weights() :: map()

Functions

add_weights(weights1, weights2)

@spec add_weights(weights(), weights()) :: weights()

Adds weight values element-wise.

Used for accumulating gradients.

clip_gradient(gradient, opts \\ [])

@spec clip_gradient(
  gradient(),
  keyword()
) :: gradient()

Clips gradient values to prevent exploding gradients.

Options

:max_norm - Maximum gradient norm (default: 5.0)

converged?(gradient, prev_loss, curr_loss, opts \\ [])

@spec converged?(gradient(), float(), float(), keyword()) :: boolean()

Checks if optimization has converged.

Convergence Criteria

Gradient norm < threshold
Relative improvement < threshold
Maximum iterations reached

gradient_norm(gradient)

@spec gradient_norm(gradient()) :: float()

Computes gradient norm (L2 norm of gradient vector).

Used for convergence checking.

initialize_weights(keys, opts \\ [])

@spec initialize_weights(
  [term()],
  keyword()
) :: weights()

Initializes weights with small random values.

Helps break symmetry and improve convergence.

learning_rate_schedule(initial_lr, iteration, opts \\ [])

@spec learning_rate_schedule(float(), non_neg_integer(), keyword()) :: float()

Computes learning rate decay.

Schedules

:constant - No decay
:step - Decay by factor every N steps
:exponential - Exponential decay
:inverse - 1 / (1 + decay * iteration)

new(opts \\ [])

@spec new(keyword()) :: optimizer_state()

Creates a new optimizer with specified configuration.

Options

:method - Optimization method (:sgd, :momentum, :adagrad) (default: :momentum)
:learning_rate - Initial learning rate (default: 0.1)
:momentum - Momentum coefficient (default: 0.9)
:regularization - L2 regularization strength (default: 1.0)

regularization_gradient(weights, lambda)

@spec regularization_gradient(weights(), float()) :: gradient()

Applies L2 regularization gradient.

Gradient of regularization term: λ * w

regularize_weights(weights, lambda)

@spec regularize_weights(weights(), float()) :: float()

Applies L2 regularization to weights.

Adds penalty term: λ/2 * ||w||²

scale_weights(weights, scale)

@spec scale_weights(weights(), float()) :: weights()

Scales all weight values by a constant.

step(weights, gradient, state)

@spec step(weights(), gradient(), optimizer_state()) :: {weights(), optimizer_state()}

Performs one optimization step.

Updates weights based on computed gradient.

Parameters

weights - Current model weights
gradient - Gradient of loss function
state - Optimizer state

Returns

{updated_weights, updated_state}