View Source Scholar.Decomposition.PCA (Scholar v0.3.1)

Principal Component Analysis (PCA).

The main concept of PCA is to find components (i.e. columns of a matrix) which explain the most variance of data set [1]. The sample data is decomposed using linear combination of vectors that lie on the directions of those components.

The time complexity is $O(NP^2 + P^3)$ where $N$ is the number of samples and $P$ is the number of features. Space complexity is $O(P * (P+N))$. Reference:

Summary

Functions

Fits a PCA for sample inputs x.

Fit the model with x and apply the dimensionality reduction on x.

For a fitted model performs a decomposition.

Functions

Fits a PCA for sample inputs x.

Options

  • :num_components - Number of components to keep. If :num_components is not set, all components are kept which is the minimum value from number of features and number of samples. The default value is nil.

Return Values

The function returns a struct with the following parameters:

  • :components - Principal axes in feature space, representing the directions of maximum variance in the data. Equivalently, the right singular vectors of the centered input data, parallel to its eigenvectors. The components are sorted by :explained_variance.

  • :explained_variance - The amount of variance explained by each of the selected components. The variance estimation uses :num_samples - 1 degrees of freedom. Equal to :num_components largest eigenvalues of the covariance matrix of x.

  • :explained_variance_ratio - Percentage of variance explained by each of the selected components. If :num_components is not set then all components are stored and the sum of the ratios is equal to 1.0.

  • :singular_values - The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the :num_components variables in the lower-dimensional space.

  • :mean - Per-feature empirical mean, estimated from the training set.

  • :num_components - It equals the parameter :num_components, or the lesser value of :num_features and :num_samples if the parameter :num_components is nil.

  • :num_features - Number of features in the training data.

  • :num_samples - Number of samples in the training data.

Examples

iex> x = Nx.tensor([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
iex> Scholar.Decomposition.PCA.fit(x)
%Scholar.Decomposition.PCA{
  components: Nx.tensor(
    [
      [-0.838727593421936, -0.5445511937141418],
      [0.5445511937141418, -0.838727593421936]
    ]
  ),
  explained_variance: Nx.tensor(
    [7.939542293548584, 0.06045711785554886]
  ),
  explained_variance_ratio: Nx.tensor(
    [0.9924428462982178, 0.007557140197604895]
  ),
  singular_values: Nx.tensor(
    [6.300611972808838, 0.5498050451278687]
  ),
  mean: Nx.tensor(
    [0.0, 0.0]
  ),
  num_components: 2,
  num_features: Nx.tensor(
    2
  ),
  num_samples: Nx.tensor(
    6
  )
}
Link to this function

fit_transform(x, opts \\ [])

View Source

Fit the model with x and apply the dimensionality reduction on x.

This function is analogous to calling fit/2 and then transform/3, but it is calculated more efficiently.

Options

  • :whiten (boolean/0) - When true the result is multiplied by the square root of :num_samples and then divided by the :singular_values to ensure uncorrelated outputs with unit component-wise variances.

    Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.

    The default value is false.

Return Values

The function returns a tensor with decomposed data.

Examples

iex> x = Nx.tensor([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
iex> Scholar.Decomposition.PCA.fit_transform(x)
Nx.tensor(
  [
    [1.3819537162780762, 0.2936314642429352],
    [2.2231407165527344, -0.25125157833099365],
    [3.6050944328308105, 0.04237968474626541],
    [-1.3819535970687866, -0.29363128542900085],
    [-2.2231407165527344, 0.2512516379356384],
    [-3.6050944328308105, -0.04237968474626541]
  ]
)
Link to this function

transform(model, x, opts \\ [])

View Source

For a fitted model performs a decomposition.

Options

  • :whiten (boolean/0) - When true the result is multiplied by the square root of :num_samples and then divided by the :singular_values to ensure uncorrelated outputs with unit component-wise variances.

    Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.

    The default value is false.

Return Values

The function returns a tensor with decomposed data.

Examples

iex> x = Nx.tensor([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
iex> model = Scholar.Decomposition.PCA.fit(x)
iex> Scholar.Decomposition.PCA.transform(model, x)
Nx.tensor(
  [
    [1.3832788467407227, 0.2941763997077942],
    [2.222006320953369, -0.25037479400634766],
    [3.605285167694092, 0.04380160570144653],
    [-1.3832788467407227, -0.2941763997077942],
    [-2.222006320953369, 0.25037479400634766],
    [-3.605285167694092, -0.04380160570144653]
  ]
)