Nasty.Statistics.Neural.Pretrained (Nasty v0.3.0)

View Source

Integration with pre-trained transformer models via Bumblebee.

Provides access to state-of-the-art pre-trained models from HuggingFace for tasks like POS tagging, NER, and text classification.

Supported Models

  • BERT (bert-base-uncased, bert-base-cased)
  • RoBERTa (roberta-base, roberta-large)
  • DistilBERT (distilbert-base-uncased)
  • Custom fine-tuned models

Usage

# Load a pre-trained BERT model for POS tagging
{:ok, model} = Pretrained.load_model("bert-base-uncased", task: :pos_tagging)

# Fine-tune on your data
{:ok, fine_tuned} = Pretrained.fine_tune(model, training_data, epochs: 3)

# Use for prediction
{:ok, tags} = Pretrained.predict(fine_tuned, words)

Note

This module requires downloading models from HuggingFace. Models are cached locally after the first download.

Full implementation requires:

  • Model downloading and caching
  • Tokenization with Bumblebee tokenizers
  • Fine-tuning interface
  • Integration with existing pipeline

Future Enhancements

  • Support for multilingual models (mBERT, XLM-R)
  • Zero-shot classification
  • Model quantization for efficiency
  • Custom model registration

Summary

Functions

Fine-tunes a pre-trained model on task-specific data.

Lists available pre-trained models.

Loads a pre-trained model from Bumblebee/HuggingFace.

Makes predictions using a pre-trained or fine-tuned model.

Functions

fine_tune(model, training_data, opts \\ [])

@spec fine_tune(map(), list(), keyword()) :: {:ok, map()} | {:error, term()}

Fine-tunes a pre-trained model on task-specific data.

Parameters

  • model - Pre-trained model
  • training_data - Task-specific training data
  • opts - Fine-tuning options

Options

  • :epochs - Number of epochs (default: 3)
  • :learning_rate - Learning rate (default: 2e-5)
  • :batch_size - Batch size (default: 16)
  • :warmup_ratio - Warmup ratio (default: 0.1)

Returns

  • {:ok, fine_tuned_model} - Fine-tuned model
  • {:error, reason} - Fine-tuning failed

list_models()

@spec list_models() :: [map()]

Lists available pre-trained models.

Returns

List of available model names with metadata.

load_model(model_name, opts \\ [])

@spec load_model(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, term()}

Loads a pre-trained model from Bumblebee/HuggingFace.

Parameters

  • model_name - Model identifier (e.g., "bert-base-uncased")
  • opts - Loading options

Options

  • :task - Task type: :pos_tagging, :ner, :classification
  • :cache_dir - Model cache directory (default: ~/.cache/nasty/models)
  • :device - Device to load on: :cpu or :cuda (default: :cpu)

Returns

  • {:ok, model} - Loaded model
  • {:error, reason} - Loading failed

Examples

{:ok, model} = Pretrained.load_model("bert-base-uncased", task: :pos_tagging)

predict(model, input, opts \\ [])

@spec predict(map(), term(), keyword()) :: {:ok, term()} | {:error, term()}

Makes predictions using a pre-trained or fine-tuned model.

Parameters

  • model - Model (pre-trained or fine-tuned)
  • input - Input text or tokens
  • opts - Prediction options

Returns

  • {:ok, predictions} - Model predictions
  • {:error, reason} - Prediction failed