View Source Scholar.NaiveBayes.Gaussian (Scholar v0.4.0)

Gaussian Naive Bayes algorithm for classification.

The likelihood of the features is assumed to be Gaussian: $P(x\_{i} | y) = \frac{1}{\sqrt{2\pi\sigma\_{y}^{2}}} \exp \left(-\frac{(x\_{i} - \mu\_{y})^2}{2\sigma\_{y}^{2}}\right)$

The parameters $\sigma\_{y}$ and $\mu\_{y}$ are estimated using maximum likelihood.

Time complexity is $O(K * N * C)$ where $N$ is the number of samples and $K$ is the number of features, and $C$ is the number of classes.

Reference:

[1] - Detailed explanation of algorithm used to update feature means and variance online by Chan, Golub, and LaVeque

Summary

Functions

fit(x, y, opts \\ [])

Gaussian Naive Bayes.

predict(model, x)

Perform classification on an array of test vectors x using model.

predict_joint_log_probability(model, x)

Return joint log probability estimates for the test vector x using model.

predict_log_probability(model, x)

Return log-probability estimates for the test vector x using model.

predict_probability(model, x)

Return probability estimates for the test vector x using model.

Functions

fit(x, y, opts \\ [])

Gaussian Naive Bayes.

Options

:var_smoothing (float/0) - Portion of the largest variance of all features that is added to variances for calculation stability. The default value is 1.0e-9.
:priors - Prior probabilities of the classes. If specified, the priors are not adjusted according to the data. We assume that priors are correct and sum(priors) == 1.
:sample_weights - List of n_samples elements.
A list of 1.0 values is used if none is given.
:num_classes (pos_integer/0) - Required. Number of different classes used in training.

Return Values

The function returns a struct with the following parameters:

:theta - mean of each feature per class.
:var - Variance of each feature per class.
:class_count - number of training samples observed in each class.
:class_priors - probability of each class.
:classes - class labels known to the classifier.
:epsilon - absolute additive value to variances.

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> Scholar.NaiveBayes.Gaussian.fit(x, y, num_classes: 3)
%Scholar.NaiveBayes.Gaussian{
  theta: Nx.tensor(
    [
      [6.0, 7.0, 8.0],
      [0.0, 1.0, 2.0],
      [6.0, 7.0, 8.0]
    ]
  ),
  var: Nx.tensor(
    [
      [1.1250000042650754e-8, 1.1250000042650754e-8, 1.1250000042650754e-8],
      [1.1250000042650754e-8, 1.1250000042650754e-8, 1.1250000042650754e-8],
      [9.0, 9.0, 9.0]
    ]
  ),
  class_count: Nx.tensor([1.0, 1.0, 2.0]),
  class_priors: Nx.tensor([0.25, 0.25, 0.5]),
  classes: Nx.tensor([0, 1, 2]),
  epsilon: Nx.tensor(1.1250000042650754e-8)
}

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> Scholar.NaiveBayes.Gaussian.fit(x, y, num_classes: 3, sample_weights: [1, 6, 2, 3])
%Scholar.NaiveBayes.Gaussian{
  theta: Nx.tensor(
    [
      [6.0, 7.0, 8.0],
      [0.0, 1.0, 2.0],
      [5.0, 6.0, 7.0]
    ]
  ),
  var: Nx.tensor(
    [
      [1.1250000042650754e-8, 1.1250000042650754e-8, 1.1250000042650754e-8],
      [1.1250000042650754e-8, 1.1250000042650754e-8, 1.1250000042650754e-8],
      [8.0, 8.0, 8.0]
    ]
    ),
  class_count: Nx.tensor([2.0, 1.0, 9.0]),
  class_priors: Nx.tensor([0.1666666716337204, 0.0833333358168602, 0.75]),
  classes: Nx.tensor([0, 1, 2]),
  epsilon: Nx.tensor(1.1250000042650754e-8)
}

predict(model, x)

Perform classification on an array of test vectors x using model.

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> model = Scholar.NaiveBayes.Gaussian.fit(x, y, num_classes: 3)
iex> Scholar.NaiveBayes.Gaussian.predict(model, Nx.tensor([[6, 2, 4], [8, 5, 9]]))
#Nx.Tensor<
  s32[2]
  [2, 2]
>

predict_joint_log_probability(model, x)

Return joint log probability estimates for the test vector x using model.

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> model = Scholar.NaiveBayes.Gaussian.fit(x, y, num_classes: 3)
iex> Scholar.NaiveBayes.Gaussian.predict_joint_log_probability(model, Nx.tensor([[6, 2, 4], [8, 5, 9]]))
#Nx.Tensor<
  f32[2][3]
  [
    [-1822222336.0, -1822222208.0, -9.023576736450195],
    [-399999968.0, -5733332992.0, -7.245799541473389]
  ]
>

predict_log_probability(model, x)

Return log-probability estimates for the test vector x using model.

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> model = Scholar.NaiveBayes.Gaussian.fit(x, y, num_classes: 3)
iex> Scholar.NaiveBayes.Gaussian.predict_log_probability(model, Nx.tensor([[6, 2, 4], [8, 5, 9]]))
#Nx.Tensor<
  f32[2][3]
  [
    [-1822222336.0, -1822222208.0, 0.0],
    [-399999968.0, -5733332992.0, 0.0]
  ]
>

predict_probability(model, x)

Return probability estimates for the test vector x using model.

Examples

iex> x = Nx.iota({4, 3})
iex> y = Nx.tensor([1, 2, 0, 2])
iex> model = Scholar.NaiveBayes.Gaussian.fit(x, y, num_classes: 3)
iex> Scholar.NaiveBayes.Gaussian.predict_probability(model, Nx.tensor([[6, 2, 4], [8, 5, 9]]))
#Nx.Tensor<
  f32[2][3]
  [
    [0.0, 0.0, 1.0],
    [0.0, 0.0, 1.0]
  ]
>