Nasty.Language.English.TextClassifier (Nasty v0.3.0)

View Source

English text classification using Naive Bayes.

Thin wrapper around generic Naive Bayes classifier with English-specific feature extraction.

Summary

Functions

Evaluates a model on test data.

Predicts the class for a document using a trained model.

Trains a Naive Bayes classifier on labeled documents.

Functions

evaluate(model, test_data, opts \\ [])

@spec evaluate(
  Nasty.AST.ClassificationModel.t(),
  [{Nasty.AST.Document.t(), atom()}],
  keyword()
) ::
  map()

Evaluates a model on test data.

Returns accuracy and per-class metrics.

Examples

iex> test_data = [{doc1, :spam}, {doc2, :ham}, ...]
iex> metrics = TextClassifier.evaluate(model, test_data)
%{
  accuracy: 0.85,
  precision: %{spam: 0.9, ham: 0.8},
  recall: %{spam: 0.8, ham: 0.9},
  f1: %{spam: 0.85, ham: 0.85}
}

predict(model, document, opts \\ [])

Predicts the class for a document using a trained model.

Returns a list of classification results sorted by confidence.

Examples

iex> {:ok, predictions} = TextClassifier.predict(model, document)
{:ok, [
  %Classification{class: :spam, confidence: 0.85, ...},
  %Classification{class: :ham, confidence: 0.15, ...}
]}

train(training_data, opts \\ [])

Trains a Naive Bayes classifier on labeled documents.

Arguments

  • training_data - List of {document, class} tuples
  • opts - Training options

Options

  • :features - Feature types to extract (default: [:bow])
  • :smoothing - Smoothing parameter alpha (default: 1.0)
  • :min_frequency - Minimum feature frequency (default: 2)

Examples

iex> training_data = [
...>   {spam_doc1, :spam},
...>   {spam_doc2, :spam},
...>   {ham_doc1, :ham},
...>   {ham_doc2, :ham}
...> ]
iex> model = TextClassifier.train(training_data, features: [:bow, :ngrams])
%ClassificationModel{algorithm: :naive_bayes, classes: [:spam, :ham], ...}