View Source Scholar.NaiveBayes.Bernoulli (Scholar v0.4.0)

Naive Bayes classifier for multivariate Bernoulli models.

Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.

Summary

Functions

fit(x, y, opts \\ [])

Fits a naive Bayes model. The function assumes that the targets y are integers between 0 and num_classes - 1 (inclusive). Otherwise, those samples will not contribute to class_count.

predict(model, x, classes)

Perform classification on an array of test vectors x using model. You need to add sorted classes from the training data as the second argument.

predict_joint_log_probability(model, x)

Return joint log probability estimates for the test vector x using model.

predict_log_probability(model, x)

Return log-probability estimates for the test vector x using model.

predict_probability(model, x)

Return probability estimates for the test vector x using model.

Functions

fit(x, y, opts \\ [])

Fits a naive Bayes model. The function assumes that the targets y are integers between 0 and num_classes - 1 (inclusive). Otherwise, those samples will not contribute to class_count.

Options

:num_classes (pos_integer/0) - Required. Number of different classes used in training.
:alpha - Additive (Laplace/Lidstone) smoothing parameter (set alpha to 0.0 and force_alpha to true, for no smoothing). The default value is 1.0.
:force_alpha (boolean/0) - If false and alpha is less than 1e-10, it will set alpha to 1e-10. If true, alpha will remain unchanged. This may cause numerical errors if alpha is too close to 0. The default value is true.
:binarize - Threshold for binarizing (mapping to booleans) of sample features. If nil, input is presumed to already consist of binary vectors. The default value is 0.0.
:fit_priors (boolean/0) - Whether to learn class prior probabilities or not. If false, a uniform prior will be used. The default value is true.
:class_priors - Prior probabilities of the classes. If specified, the priors are not adjusted according to the data.
:sample_weights - List of num_samples elements. A list of 1.0 values is used if none is given.

Return Values

The function returns a struct with the following parameters:

:class_count - Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.
:class_log_priors - Smoothed empirical log probability for each class.
:feature_count - Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.
:feature_log_probability - Empirical log probability of features given a class, P(x_i|y).

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> Scholar.NaiveBayes.Bernoulli.fit(x, y, num_classes: 3, binarize: 1.0)
%Scholar.NaiveBayes.Bernoulli{
      feature_count: Nx.tensor(
        [
          [1.0, 1.0, 1.0],
          [0.0, 0.0, 1.0],
          [2.0, 2.0, 2.0]
        ]
      ),
      class_count: Nx.tensor(
        [1.0, 1.0, 2.0]
      ),
      class_log_priors: Nx.tensor(
        [-1.3862943649291992, -1.3862943649291992, -0.6931471824645996]
      ),
      feature_log_probability: Nx.tensor(
        [
          [-0.40546512603759766, -0.40546512603759766, -0.40546512603759766],
          [-1.0986123085021973, -1.0986123085021973, -0.40546512603759766],
          [-0.28768205642700195, -0.28768205642700195, -0.28768205642700195]
        ]
      )
    }

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> Scholar.NaiveBayes.Bernoulli.fit(x, y, num_classes: 3, force_alpha: false, alpha: 0.0)
%Scholar.NaiveBayes.Bernoulli{
      feature_count: Nx.tensor(
        [
          [1.0, 1.0, 1.0],
          [0.0, 1.0, 1.0],
          [2.0, 2.0, 2.0]
        ]
      ),
      class_count: Nx.tensor(
        [1.0, 1.0, 2.0]
      ),
      class_log_priors: Nx.tensor(
        [-1.3862943649291992, -1.3862943649291992, -0.6931471824645996]
      ),
      feature_log_probability: Nx.tensor(
        [
          [0.0, 0.0, 0.0],
          [-23.025850296020508, 0.0, 0.0],
          [0.0, 0.0, 0.0]
        ]
      )
    }

predict(model, x, classes)

Perform classification on an array of test vectors x using model. You need to add sorted classes from the training data as the second argument.

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> model = Scholar.NaiveBayes.Bernoulli.fit(x, y, num_classes: 3)
iex> Scholar.NaiveBayes.Bernoulli.predict(model, Nx.tensor([[6, 2, 4], [8, 5, 9]]), Nx.tensor([0, 1, 2]))
#Nx.Tensor<
  s32[2]
  [2, 2]
>

predict_joint_log_probability(model, x)

Return joint log probability estimates for the test vector x using model.

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> model = Scholar.NaiveBayes.Bernoulli.fit(x, y, num_classes: 3)
iex> Scholar.NaiveBayes.Bernoulli.predict_joint_log_probability(model, Nx.tensor([[6, 2, 4], [8, 5, 9]]))
#Nx.Tensor<
  f32[2][3]
  [
    [3.6356334686279297, -3.988985061645508, 8.331316947937012],
    [10.56710433959961, 0.16989731788635254, 19.317440032958984]
  ]
>

predict_log_probability(model, x)

Return log-probability estimates for the test vector x using model.

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> model = Scholar.NaiveBayes.Bernoulli.fit(x, y, num_classes: 3)
iex> Scholar.NaiveBayes.Bernoulli.predict_log_probability(model, Nx.tensor([[6, 2, 4], [8, 5, 9]]))
#Nx.Tensor<
  f32[2][3]
  [
    [-4.704780578613281, -12.329399108886719, -0.009097099304199219],
    [-8.750494003295898, -19.147701263427734, -1.583099365234375e-4]
  ]
>

predict_probability(model, x)

Return probability estimates for the test vector x using model.

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> model = Scholar.NaiveBayes.Bernoulli.fit(x, y, num_classes: 3)
iex> Scholar.NaiveBayes.Bernoulli.predict_probability(model, Nx.tensor([[6, 2, 4], [8, 5, 9]]))
#Nx.Tensor<
  f32[2][3]
  [
    [0.00905190035700798, 4.4198750401847064e-6, 0.9909441471099854],
    [1.5838305989746004e-4, 4.833469624543341e-9, 0.9998416900634766]
  ]
>