View Source Scholar.Decomposition.PCA (Scholar v0.4.0)
Principal Component Analysis (PCA).
PCA is a method for reducing the dimensionality of the data by transforming the original features
into a new set of uncorrelated features called principal components, which capture the maximum
variance in the data.
It can be trained on the entirety of the data at once using fit/2
or
incrementally for datasets that are too large to fit in the memory using incremental_fit/2
.
The time complexity is where is the number of samples and is the number of features. Space complexity is .
References:
- [1] Dimensionality Reduction with Principal Component Analysis. Mathematics for Machine Learning, Chapter 10
- [2] Incremental Learning for Robust Visual Tracking
Summary
Functions
Fits a PCA for sample inputs x
.
Fit the model with x
and apply the dimensionality reduction on x
.
Fits a PCA model on a stream of batches.
Updates the parameters of a PCA model on samples x
.
For a fitted model
performs a decomposition of samples x
.
Functions
Fits a PCA for sample inputs x
.
Options
:num_components
(pos_integer/0
) - Required. The number of principal components to keep.:whiten?
(boolean/0
) - When true the result is multiplied by the square root of:num_samples
and then divided by the:singular_values
to ensure uncorrelated outputs with unit component-wise variances.Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.
The default value is
false
.
Return Values
The function returns a struct with the following parameters:
:components
- Principal axes in feature space, representing the directions of maximum variance in the data. Equivalently, the right singular vectors of the centered input data, parallel to its eigenvectors. The components are sorted by decreasing:explained_variance
.:singular_values
- The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the:num_components
variables in the lower-dimensional space.:num_samples_seen
- Number of samples in the training data.:mean
- Per-feature empirical mean, estimated from the training set.:variance
- Per-feature empirical variance.:explained_variance
- The amount of variance explained by each of the selected components. The variance estimation uses:num_samples - 1
degrees of freedom. Equal to:num_components
largest eigenvalues of the covariance matrix ofx
.:explained_variance_ratio
- Percentage of variance explained by each of the selected components.:whiten?
- Whether to apply whitening.
Examples
iex> x = Scidata.Iris.download() |> elem(0) |> Nx.tensor()
iex> pca = Scholar.Decomposition.PCA.fit(x, num_components: 2)
iex> pca.components
Nx.tensor(
[
[0.36182016134262085, -0.08202514797449112, 0.8565111756324768, 0.3588128685951233],
[0.6585038900375366, 0.7275884747505188, -0.17632202804088593, -0.07679986208677292]
]
)
iex> pca.singular_values
Nx.tensor([25.089859008789062, 6.007821559906006])
Fit the model with x
and apply the dimensionality reduction on x
.
This function is equivalent to calling fit/2
and then
transform/3
, but the result is computed more efficiently.
:num_components
(pos_integer/0
) - Required. The number of principal components to keep.:whiten?
(boolean/0
) - When true the result is multiplied by the square root of:num_samples
and then divided by the:singular_values
to ensure uncorrelated outputs with unit component-wise variances.Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.
The default value is
false
.
Return Values
The function returns a tensor with decomposed data.
Examples
iex> x = Scidata.Iris.download() |> elem(0) |> Enum.take(6) |> Nx.tensor()
iex> Scholar.Decomposition.PCA.fit_transform(x, num_components: 2)
Nx.tensor(
[
[0.16441848874092102, 0.028548287227749825],
[-0.32804328203201294, 0.20709986984729767],
[-0.3284338414669037, -0.08318747580051422],
[-0.42237386107444763, -0.0735677033662796],
[0.17480169236660004, -0.11189625412225723],
[0.7396301627159119, 0.03300142288208008
]
]
)
Fits a PCA model on a stream of batches.
Options
:num_components
(pos_integer/0
) - Required. The number of principal components to keep.:whiten?
(boolean/0
) - When true the result is multiplied by the square root of:num_samples
and then divided by the:singular_values
to ensure uncorrelated outputs with unit component-wise variances.Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.
The default value is
false
.
Return values
The function returns a struct with the following parameters:
:num_components
- The number of principal components.:components
- Principal axes in feature space, representing the directions of maximum variance in the data. Equivalently, the right singular vectors of the centered input data, parallel to its eigenvectors. The components are sorted by decreasing:explained_variance
.:singular_values
- The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the:num_components
variables in the lower-dimensional space.:num_samples_seen
- The number of data samples processed.:mean
- Per-feature empirical mean.:variance
- Per-feature empirical variance.:explained_variance
- Variance explained by each of the selected components.:explained_variance_ratio
- Percentage of variance explained by each of the selected components.:whiten?
- Whether to apply whitening.
Examples
iex> {x, _} = Scidata.Iris.download()
iex> batches = x |> Nx.tensor() |> Nx.to_batched(10)
iex> pca = Scholar.Decomposition.PCA.incremental_fit(batches, num_components: 2)
iex> pca.components
Nx.tensor(
[
[-0.33354005217552185, 0.1048964187502861, -0.8618107080105579, -0.3674643635749817],
[-0.5862125754356384, -0.7916879057884216, 0.15874788165092468, -0.06621300429105759]
]
)
iex> pca.singular_values
Nx.tensor([77.05782028025969, 10.137848854064941])
Updates the parameters of a PCA model on samples x
.
Examples
iex> {x, _} = Scidata.Iris.download() iex> {first_batch, second_batch} = x |> Nx.tensor() |> Nx.split(75) iex> pca = Scholar.Decomposition.PCA.fit(first_batch, num_components: 2) iex> pca = Scholar.Decomposition.PCA.partial_fit(pca, second_batch) iex> pca.components Nx.tensor(
[
[-0.3229745328426361, 0.09587063640356064, -0.8628664612770081, -0.37677285075187683],
[-0.6786625981330872, -0.7167785167694092, 0.14237160980701447, 0.07332050055265427]
]
) iex> pca.singular_values Nx.tensor([166.141845703125, 6.078948020935059])
For a fitted model
performs a decomposition of samples x
.
Return Values
The function returns a tensor with decomposed data.
Examples
iex> x_fit = Scidata.Iris.download() |> elem(0) |> Nx.tensor()
iex> pca = Scholar.Decomposition.PCA.fit(x_fit, num_components: 2)
iex> x_transform = Nx.tensor([[5.2, 2.6, 2.475, 0.7], [6.1, 3.2, 3.95, 1.3], [7.0, 3.8, 5.425, 1.9]])
iex> Scholar.Decomposition.PCA.transform(pca, x_transform)
Nx.tensor(
[
[-1.4739344120025635, -0.48932668566703796],
[0.28113049268722534, 0.2337251454591751],
[2.0361955165863037, 0.9567767977714539]
]
)