UFPCA#

class FDApy.preprocessing.UFPCA(method='covariance', n_components=None, normalize=False)[source]#

Univariate functional principal components analysis.

Linear dimensionality reduction of a univariate functional dataset. The projection of the data in a lower dimensional space is performed using a diagonalization of the covariance operator or of the inner-product matrix of the data.

Parameters:
  • method (str) – Method used to estimate the eigencomponents. If method == 'covariance', the estimation is based on an eigendecomposition of the covariance operator. If method == 'inner-product', the estimation is based on an eigendecomposition of the inner-product matrix.

  • n_components (int | float | None) – Number of components to keep. If n_components is None, all components are kept, n_components == min(n_samples, n_features). If n_components is an integer, n_components are kept. If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.

  • normalize (bool) – Perform a normalization of the data.

Attributes:
  • mean (DenseFunctionalData) – An estimation of the mean of the training data.

  • covariance (DenseFunctionalData) – An estimation of the covariance of the training data based on their eigendecomposition using the Mercer’s theorem.

  • eigenvalues (npt.NDArray[np.float64], shape=(n_components,)) – The singular values corresponding to each of selected components.

  • eigenfunctions (DenseFunctionalData) – Principal axes in feature space, representing the directions of maximum variance in the data.

References

Methods

fit(data[, points, method_smoothing, ...])

Estimate the eigencomponents of the data.

inverse_transform(scores)

Transform the data back to its original space.

transform([data, method, method_smoothing])

Apply dimensionality reduction to the data.

fit(data, points=None, method_smoothing=None, kwargs_mean={}, kwargs_covariance={}, kwargs_innpro={})[source]#

Estimate the eigencomponents of the data.

Before estimating the eigencomponents, the data is centered. Using the covariance operator, the estimation is based on [1].

Parameters:
  • data (FunctionalData) – Training data used to estimate the eigencomponents.

  • points (DenseArgvals | None) – The sampling points at which the covariance and the eigenfunctions will be estimated.

  • method_smoothing (str | None) – Should the mean and covariance be smoothed?

  • kwargs_mean (Dict[str, object]) – Keywords arguments to be passed to the function FunctionalData.mean().

  • kwargs_covariance (Dict[str, object]) – Keywords arguments to be passed to the function preprocessing.ufpca._fit_covariance().

  • kwargs_innpro (Dict[str, object]) – Keywords arguments to be passed to the function preprocessing.ufpca._fit_inner_product().

Return type:

None

inverse_transform(scores)[source]#

Transform the data back to its original space.

Given a set of scores \(c_{ik}\), we reconstruct the observations using a truncation of the Karhunen-Loève expansion,

\[X_{i}(t) = \mu(t) + \sum_{k = 1}^K c_{ik}\phi_k(t).\]

Data can be multidimensional.

Parameters:

scores (ndarray[Any, dtype[float64]]) – New data, where n_obs is the number of observations and n_components is the number of components.

Returns:

A DenseFunctionalData object representing the transformation of the scores into the original curve space.

Return type:

DenseFunctionalData

transform(data=None, method='NumInt', method_smoothing='LP', **kwargs)[source]#

Apply dimensionality reduction to the data.

The functional principal components scores are defined as the projection of the observation \(X_i\) on the eigenfunction \(\phi_k\). These scores are given by:

\[c_{ik} = \int_{\mathcal{T}} \{X_i(t) - \mu(t)\}\phi_k(t)dt.\]

This integral can be estimated using two ways. First, if data are sampled on a common fine grid, the estimation is done using numerical integration. Second, the PACE (Principal Components through Conditional Expectation) algorithm [2] is used for sparse functional data. If the eigenfunctions have been estimated using the inner-product matrix, the scores can also be estimated using the formula

\[c_{ik} = \sqrt{l_k}v_{ik},\]

where \(l_k\) and \(v_{k}\) are the eigenvalues and eigenvectors of the inner-product matrix.

Parameters:
  • data (DenseFunctionalData | None) – The data to be transformed. If None, the data are the same than for the fit method.

  • method (str) – Method used to estimate the scores. If method == 'NumInt', numerical integration method is performed. If method == 'PACE', the PACE algorithm [1] is used. If method == 'InnPro', the estimation is performed using the inner product matrix of the data (can only be used if the eigencomponents have been estimated using the inner-product matrix.)

  • method_smoothing (str) – Should the mean and covariance be smoothed?

  • kwargs (Any) – See below

Keyword Arguments:
  • tol (float, default=1e-4) – Tolerance parameter to prevent overflow to inverse a matrix, only used if method == 'PACE'.

  • integration_method (str, {'trapz', 'simpson'}, default='trapz') – Method used to perform numerical integration, only used if method == 'NumInt'.

Returns:

An array representing the projection of the data onto the basis of functions defined by the eigenfunctions.

Return type:

npt.NDArray[np.float64], shape=(n_obs, n_components)

Examples using FDApy.preprocessing.UFPCA#

FPCA of 1-dimensional data

FPCA of 1-dimensional data

FPCA of 1-dimensional sparse data

FPCA of 1-dimensional sparse data

FPCA of 2-dimensional data

FPCA of 2-dimensional data

Canadian weather dataset

Canadian weather dataset