Nasty.Language.English.TextClassifier (Nasty v0.3.0)
View SourceEnglish text classification using Naive Bayes.
Thin wrapper around generic Naive Bayes classifier with English-specific feature extraction.
Summary
Functions
Evaluates a model on test data.
Predicts the class for a document using a trained model.
Trains a Naive Bayes classifier on labeled documents.
Functions
@spec evaluate( Nasty.AST.ClassificationModel.t(), [{Nasty.AST.Document.t(), atom()}], keyword() ) :: map()
Evaluates a model on test data.
Returns accuracy and per-class metrics.
Examples
iex> test_data = [{doc1, :spam}, {doc2, :ham}, ...]
iex> metrics = TextClassifier.evaluate(model, test_data)
%{
accuracy: 0.85,
precision: %{spam: 0.9, ham: 0.8},
recall: %{spam: 0.8, ham: 0.9},
f1: %{spam: 0.85, ham: 0.85}
}
@spec predict(Nasty.AST.ClassificationModel.t(), Nasty.AST.Document.t(), keyword()) :: {:ok, [Nasty.AST.Classification.t()]} | {:error, term()}
Predicts the class for a document using a trained model.
Returns a list of classification results sorted by confidence.
Examples
iex> {:ok, predictions} = TextClassifier.predict(model, document)
{:ok, [
%Classification{class: :spam, confidence: 0.85, ...},
%Classification{class: :ham, confidence: 0.15, ...}
]}
@spec train( [{Nasty.AST.Document.t(), atom()}], keyword() ) :: Nasty.AST.ClassificationModel.t()
Trains a Naive Bayes classifier on labeled documents.
Arguments
training_data- List of{document, class}tuplesopts- Training options
Options
:features- Feature types to extract (default:[:bow]):smoothing- Smoothing parameter alpha (default: 1.0):min_frequency- Minimum feature frequency (default: 2)
Examples
iex> training_data = [
...> {spam_doc1, :spam},
...> {spam_doc2, :spam},
...> {ham_doc1, :ham},
...> {ham_doc2, :ham}
...> ]
iex> model = TextClassifier.train(training_data, features: [:bow, :ngrams])
%ClassificationModel{algorithm: :naive_bayes, classes: [:spam, :ham], ...}