Nasty.Statistics.Neural.Transformers.ZeroShot (Nasty v0.3.0)

View Source

Zero-shot classification using pre-trained models.

Allows classification of text into arbitrary categories without any task-specific training. Uses Natural Language Inference (NLI) models trained on MNLI to perform zero-shot classification.

How it works

The model treats classification as a textual entailment problem:

  • Hypothesis: "This text is about {label}"
  • Premise: The input text
  • The model predicts entailment probability for each label

Supported Models

Best models for zero-shot classification:

  • :roberta_large_mnli - RoBERTa fine-tuned on MNLI (best accuracy)
  • :bart_large_mnli - BART fine-tuned on MNLI
  • :xlm_roberta_base - Multilingual zero-shot (63 languages)

Examples

# Sentiment analysis
{:ok, result} = ZeroShot.classify("I love this product!",
  candidate_labels: ["positive", "negative", "neutral"]
)
# => %{label: "positive", scores: %{"positive" => 0.95, ...}}

# Topic classification
{:ok, result} = ZeroShot.classify(article_text,
  candidate_labels: ["politics", "sports", "technology", "business"]
)

# Multi-label classification
{:ok, results} = ZeroShot.classify(text,
  candidate_labels: ["urgent", "action_required", "informational"],
  multi_label: true
)

Summary

Functions

Classifies text into one of the candidate labels using zero-shot learning.

Classifies multiple texts in batch for efficiency.

Gets recommended models for zero-shot classification.

Types

classification_result()

@type classification_result() :: %{
  label: String.t(),
  scores: %{required(String.t()) => float()},
  sequence: String.t()
}

multi_label_result()

@type multi_label_result() :: %{
  labels: [String.t()],
  scores: %{required(String.t()) => float()},
  sequence: String.t()
}

Functions

classify(text, opts)

@spec classify(
  String.t(),
  keyword()
) :: {:ok, classification_result() | multi_label_result()} | {:error, term()}

Classifies text into one of the candidate labels using zero-shot learning.

Options

  • :candidate_labels - List of possible labels (required)
  • :model - Model to use (default: :roberta_large_mnli)
  • :multi_label - Allow multiple labels (default: false)
  • :hypothesis_template - Template for hypothesis (default: "This text is about {}")
  • :threshold - Minimum score for multi-label (default: 0.5)

Examples

{:ok, result} = ZeroShot.classify("Python is a programming language",
  candidate_labels: ["technology", "biology", "geography"]
)

{:ok, result} = ZeroShot.classify(text,
  candidate_labels: ["urgent", "normal"],
  hypothesis_template: "This message is {}"
)

classify_batch(texts, opts)

@spec classify_batch(
  [String.t()],
  keyword()
) :: {:ok, [classification_result() | multi_label_result()]} | {:error, term()}

Classifies multiple texts in batch for efficiency.

Examples

texts = ["text1", "text2", "text3"]
{:ok, results} = ZeroShot.classify_batch(texts,
  candidate_labels: ["positive", "negative"]
)