Nasty.Statistics.SequenceLabeling.Optimizer (Nasty v0.3.0)
View SourceGradient-based optimization for CRF training.
Implements gradient descent with momentum and L2 regularization for training linear-chain CRFs.
Optimization Methods
- SGD with Momentum: Stochastic gradient descent with momentum term
- AdaGrad: Adaptive learning rates per parameter
- L-BFGS (simplified): Limited-memory quasi-Newton method
Regularization
L2 regularization (ridge) to prevent overfitting:
loss = -log_likelihood + λ * ||weights||²
Summary
Functions
Adds weight values element-wise.
Clips gradient values to prevent exploding gradients.
Checks if optimization has converged.
Computes gradient norm (L2 norm of gradient vector).
Initializes weights with small random values.
Computes learning rate decay.
Creates a new optimizer with specified configuration.
Applies L2 regularization gradient.
Applies L2 regularization to weights.
Scales all weight values by a constant.
Performs one optimization step.
Types
Functions
Adds weight values element-wise.
Used for accumulating gradients.
Clips gradient values to prevent exploding gradients.
Options
:max_norm- Maximum gradient norm (default: 5.0)
Checks if optimization has converged.
Convergence Criteria
- Gradient norm < threshold
- Relative improvement < threshold
- Maximum iterations reached
Computes gradient norm (L2 norm of gradient vector).
Used for convergence checking.
Initializes weights with small random values.
Helps break symmetry and improve convergence.
@spec learning_rate_schedule(float(), non_neg_integer(), keyword()) :: float()
Computes learning rate decay.
Schedules
:constant- No decay:step- Decay by factor every N steps:exponential- Exponential decay:inverse- 1 / (1 + decay * iteration)
@spec new(keyword()) :: optimizer_state()
Creates a new optimizer with specified configuration.
Options
:method- Optimization method (:sgd,:momentum,:adagrad) (default::momentum):learning_rate- Initial learning rate (default: 0.1):momentum- Momentum coefficient (default: 0.9):regularization- L2 regularization strength (default: 1.0)
Applies L2 regularization gradient.
Gradient of regularization term: λ * w
Applies L2 regularization to weights.
Adds penalty term: λ/2 * ||w||²
Scales all weight values by a constant.
@spec step(weights(), gradient(), optimizer_state()) :: {weights(), optimizer_state()}
Performs one optimization step.
Updates weights based on computed gradient.
Parameters
weights- Current model weightsgradient- Gradient of loss functionstate- Optimizer state
Returns
{updated_weights, updated_state}