Nasty.Statistics.Neural.Transformers.FineTuner (Nasty v0.3.0)
View SourceFine-tuning pipeline for pre-trained transformer models.
Supports fine-tuning on:
- Part-of-speech tagging datasets
- Named entity recognition datasets
- Custom token classification tasks
Uses AdamW optimizer with linear warmup and weight decay.
Summary
Functions
Evaluates a fine-tuned model on test data.
Fine-tunes with minimal examples using few-shot learning techniques.
Fine-tunes a pre-trained model on a token classification task.
Types
@type training_example() :: {tokens :: [Nasty.AST.Token.t()], labels :: [integer()]}
Functions
@spec evaluate(map(), [training_example()]) :: {:ok, map()} | {:error, term()}
Evaluates a fine-tuned model on test data.
Returns metrics including accuracy, precision, recall, and F1 score.
Examples
{:ok, metrics} = FineTuner.evaluate(model, test_data)
# => %{
# accuracy: 0.95,
# precision: 0.94,
# recall: 0.93,
# f1_score: 0.935
# }
@spec few_shot_fine_tune(map(), [training_example()], atom(), keyword()) :: {:ok, map()} | {:error, term()}
Fine-tunes with minimal examples using few-shot learning techniques.
Applies data augmentation and longer training to work with small datasets.
Examples
{:ok, model} = FineTuner.few_shot_fine_tune(
base_model,
small_dataset,
:ner,
epochs: 10,
data_augmentation: true
)
Fine-tunes a pre-trained model on a token classification task.
Arguments
base_model- Pre-trained transformer model from Loadertraining_data- List of {tokens, labels} tuplestask- Classification task (:pos_tagging, :ner)opts- Training configuration options
Options
:epochs- Number of training epochs (default: 3):batch_size- Training batch size (default: 16):learning_rate- Learning rate (default: 3.0e-5):warmup_ratio- Warmup ratio for learning rate scheduler (default: 0.1):weight_decay- Weight decay for AdamW (default: 0.01):max_grad_norm- Gradient clipping threshold (default: 1.0):eval_steps- Evaluate every N steps (default: 500):save_steps- Save checkpoint every N steps (default: 1000):validation_data- Optional validation dataset:num_labels- Number of classification labels:label_map- Map from label IDs to names
Examples
training_data = [
{[token1, token2], [0, 1]},
{[token3, token4], [2, 0]},
...
]
{:ok, finetuned_model} = FineTuner.fine_tune(
base_model,
training_data,
:pos_tagging,
epochs: 3,
num_labels: 17,
label_map: upos_label_map
)