View Source Scholar.Decomposition.PCA (Scholar v0.4.0)

Principal Component Analysis (PCA).

PCA is a method for reducing the dimensionality of the data by transforming the original features into a new set of uncorrelated features called principal components, which capture the maximum variance in the data. It can be trained on the entirety of the data at once using fit/2 or incrementally for datasets that are too large to fit in the memory using incremental_fit/2.

The time complexity is O(NP2+P3)O(NP^2 + P^3) where NN is the number of samples and PP is the number of features. Space complexity is O(P(P+N))O(P * (P+N)).

References:

Summary

Functions

Fits a PCA for sample inputs x.

Fit the model with x and apply the dimensionality reduction on x.

Fits a PCA model on a stream of batches.

Updates the parameters of a PCA model on samples x.

For a fitted model performs a decomposition of samples x.

Functions

fit(x, opts \\ [])

Fits a PCA for sample inputs x.

Options

  • :num_components (pos_integer/0) - Required. The number of principal components to keep.

  • :whiten? (boolean/0) - When true the result is multiplied by the square root of :num_samples and then divided by the :singular_values to ensure uncorrelated outputs with unit component-wise variances.

    Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.

    The default value is false.

Return Values

The function returns a struct with the following parameters:

  • :components - Principal axes in feature space, representing the directions of maximum variance in the data. Equivalently, the right singular vectors of the centered input data, parallel to its eigenvectors. The components are sorted by decreasing :explained_variance.

  • :singular_values - The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the :num_components variables in the lower-dimensional space.

  • :num_samples_seen - Number of samples in the training data.

  • :mean - Per-feature empirical mean, estimated from the training set.

  • :variance - Per-feature empirical variance.

  • :explained_variance - The amount of variance explained by each of the selected components. The variance estimation uses :num_samples - 1 degrees of freedom. Equal to :num_components largest eigenvalues of the covariance matrix of x.

  • :explained_variance_ratio - Percentage of variance explained by each of the selected components.

  • :whiten? - Whether to apply whitening.

Examples

iex> x = Scidata.Iris.download() |> elem(0) |> Nx.tensor()
iex> pca = Scholar.Decomposition.PCA.fit(x, num_components: 2)
iex> pca.components
Nx.tensor(
  [
    [0.36182016134262085, -0.08202514797449112, 0.8565111756324768, 0.3588128685951233],
    [0.6585038900375366, 0.7275884747505188, -0.17632202804088593, -0.07679986208677292]
  ]
)
iex> pca.singular_values
Nx.tensor([25.089859008789062, 6.007821559906006])

fit_transform(x, opts)

Fit the model with x and apply the dimensionality reduction on x.

This function is equivalent to calling fit/2 and then transform/3, but the result is computed more efficiently.

  • :num_components (pos_integer/0) - Required. The number of principal components to keep.

  • :whiten? (boolean/0) - When true the result is multiplied by the square root of :num_samples and then divided by the :singular_values to ensure uncorrelated outputs with unit component-wise variances.

    Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.

    The default value is false.

Return Values

The function returns a tensor with decomposed data.

Examples

iex> x = Scidata.Iris.download() |> elem(0) |> Enum.take(6) |> Nx.tensor()
iex> Scholar.Decomposition.PCA.fit_transform(x, num_components: 2)
Nx.tensor(
  [
    [0.16441848874092102, 0.028548287227749825],
    [-0.32804328203201294, 0.20709986984729767],
    [-0.3284338414669037, -0.08318747580051422],
    [-0.42237386107444763, -0.0735677033662796],
    [0.17480169236660004, -0.11189625412225723],
    [0.7396301627159119, 0.03300142288208008
    ]
  ]
)

incremental_fit(batches, opts)

Fits a PCA model on a stream of batches.

Options

  • :num_components (pos_integer/0) - Required. The number of principal components to keep.

  • :whiten? (boolean/0) - When true the result is multiplied by the square root of :num_samples and then divided by the :singular_values to ensure uncorrelated outputs with unit component-wise variances.

    Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.

    The default value is false.

Return values

The function returns a struct with the following parameters:

  • :num_components - The number of principal components.

  • :components - Principal axes in feature space, representing the directions of maximum variance in the data. Equivalently, the right singular vectors of the centered input data, parallel to its eigenvectors. The components are sorted by decreasing :explained_variance.

  • :singular_values - The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the :num_components variables in the lower-dimensional space.

  • :num_samples_seen - The number of data samples processed.

  • :mean - Per-feature empirical mean.

  • :variance - Per-feature empirical variance.

  • :explained_variance - Variance explained by each of the selected components.

  • :explained_variance_ratio - Percentage of variance explained by each of the selected components.

  • :whiten? - Whether to apply whitening.

Examples

iex> {x, _} = Scidata.Iris.download()
iex> batches = x |> Nx.tensor() |> Nx.to_batched(10)
iex> pca = Scholar.Decomposition.PCA.incremental_fit(batches, num_components: 2)
iex> pca.components
Nx.tensor(
  [
    [-0.33354005217552185, 0.1048964187502861, -0.8618107080105579, -0.3674643635749817],
    [-0.5862125754356384, -0.7916879057884216, 0.15874788165092468, -0.06621300429105759]
  ]
)
iex> pca.singular_values
Nx.tensor([77.05782028025969, 10.137848854064941])

partial_fit(model, x)

Updates the parameters of a PCA model on samples x.

Examples

iex> {x, _} = Scidata.Iris.download() iex> {first_batch, second_batch} = x |> Nx.tensor() |> Nx.split(75) iex> pca = Scholar.Decomposition.PCA.fit(first_batch, num_components: 2) iex> pca = Scholar.Decomposition.PCA.partial_fit(pca, second_batch) iex> pca.components Nx.tensor(

[
  [-0.3229745328426361, 0.09587063640356064, -0.8628664612770081, -0.37677285075187683],
  [-0.6786625981330872, -0.7167785167694092, 0.14237160980701447, 0.07332050055265427]
]

) iex> pca.singular_values Nx.tensor([166.141845703125, 6.078948020935059])

transform(model, x)

For a fitted model performs a decomposition of samples x.

Return Values

The function returns a tensor with decomposed data.

Examples

iex> x_fit = Scidata.Iris.download() |> elem(0) |> Nx.tensor()
iex> pca = Scholar.Decomposition.PCA.fit(x_fit, num_components: 2)
iex> x_transform = Nx.tensor([[5.2, 2.6, 2.475, 0.7], [6.1, 3.2, 3.95, 1.3], [7.0, 3.8, 5.425, 1.9]])
iex> Scholar.Decomposition.PCA.transform(pca, x_transform)
Nx.tensor(
  [
    [-1.4739344120025635, -0.48932668566703796],
    [0.28113049268722534, 0.2337251454591751],
    [2.0361955165863037, 0.9567767977714539]
  ]
)