View Source Scholar.Decomposition.PCA (Scholar v0.3.0)
Principal Component Analysis (PCA).
The main concept of PCA is to find components (i.e. columns of a matrix) which explain the most variance of data set [1]. The sample data is decomposed using linear combination of vectors that lie on the directions of those components.
The time complexity is $O(NP^2 + P^3)$ where $N$ is the number of samples and $P$ is the number of features. Space complexity is $O(P * (P+N))$. Reference:
Summary
Functions
Fits a PCA for sample inputs x
.
Fit the model with x
and apply the dimensionality reduction on x
.
For a fitted model
performs a decomposition.
Functions
Fits a PCA for sample inputs x
.
Options
:num_components
- Number of components to keep. If:num_components
is not set, all components are kept which is the minimum value from number of features and number of samples. The default value isnil
.
Return Values
The function returns a struct with the following parameters:
:components
- Principal axes in feature space, representing the directions of maximum variance in the data. Equivalently, the right singular vectors of the centered input data, parallel to its eigenvectors. The components are sorted by:explained_variance
.:explained_variance
- The amount of variance explained by each of the selected components. The variance estimation uses:num_samples - 1
degrees of freedom. Equal to:num_components
largest eigenvalues of the covariance matrix ofx
.:explained_variance_ratio
- Percentage of variance explained by each of the selected components. If:num_components
is not set then all components are stored and the sum of the ratios is equal to 1.0.:singular_values
- The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the:num_components
variables in the lower-dimensional space.:mean
- Per-feature empirical mean, estimated from the training set.:num_components
- It equals the parameter:num_components
, or the lesser value of:num_features
and:num_samples
if the parameter:num_components
isnil
.:num_features
- Number of features in the training data.:num_samples
- Number of samples in the training data.
Examples
iex> x = Nx.tensor([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
iex> Scholar.Decomposition.PCA.fit(x)
%Scholar.Decomposition.PCA{
components: Nx.tensor(
[
[-0.838727593421936, -0.5445511937141418],
[0.5445511937141418, -0.838727593421936]
]
),
explained_variance: Nx.tensor(
[7.939542293548584, 0.06045711785554886]
),
explained_variance_ratio: Nx.tensor(
[0.9924428462982178, 0.007557140197604895]
),
singular_values: Nx.tensor(
[6.300611972808838, 0.5498050451278687]
),
mean: Nx.tensor(
[0.0, 0.0]
),
num_components: 2,
num_features: Nx.tensor(
2
),
num_samples: Nx.tensor(
6
)
}
Fit the model with x
and apply the dimensionality reduction on x
.
This function is analogous to calling fit/2
and then
transform/3
, but it is calculated more efficiently.
Options
:whiten
(boolean/0
) - When true the result is multiplied by the square root of:num_samples
and then divided by the:singular_values
to ensure uncorrelated outputs with unit component-wise variances.Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.
The default value is
false
.
Return Values
The function returns a tensor with decomposed data.
Examples
iex> x = Nx.tensor([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
iex> Scholar.Decomposition.PCA.fit_transform(x)
Nx.tensor(
[
[1.3819537162780762, 0.2936314642429352],
[2.2231407165527344, -0.25125157833099365],
[3.6050944328308105, 0.04237968474626541],
[-1.3819535970687866, -0.29363128542900085],
[-2.2231407165527344, 0.2512516379356384],
[-3.6050944328308105, -0.04237968474626541]
]
)
For a fitted model
performs a decomposition.
Options
:whiten
(boolean/0
) - When true the result is multiplied by the square root of:num_samples
and then divided by the:singular_values
to ensure uncorrelated outputs with unit component-wise variances.Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.
The default value is
false
.
Return Values
The function returns a tensor with decomposed data.
Examples
iex> x = Nx.tensor([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
iex> model = Scholar.Decomposition.PCA.fit(x)
iex> Scholar.Decomposition.PCA.transform(model, x)
Nx.tensor(
[
[1.3832788467407227, 0.2941763997077942],
[2.222006320953369, -0.25037479400634766],
[3.605285167694092, 0.04380160570144653],
[-1.3832788467407227, -0.2941763997077942],
[-2.222006320953369, 0.25037479400634766],
[-3.605285167694092, -0.04380160570144653]
]
)