Preprocessing#

Univariate Functional Principal Components Analysis#

class FDApy.preprocessing.dim_reduction.ufpca.UFPCA(method: str = 'covariance', n_components: float | int | None = None, normalize: bool = False)#

Bases: object

UFPCA – Univariate Functional Principal Components Analysis.

Linear dimensionality reduction of a univariate functional dataset. The projection of the data in a lower dimensional space is performed using a diagonalization of the covariance operator or of the inner-product matrix of the data.

Parameters:

method: str, {‘covariance’, ‘inner-product’}, default=’covariance’: Method used to estimate the eigencomponents. If method == 'covariance', the estimation is based on an eigendecomposition of the covariance operator. If method == 'inner-product', the estimation is based on an eigendecomposition of the inner-product matrix.
n_components: Optional[Union[int, float]], default=None: Number of components to keep. If n_components is None, all components are kept, n_components == min(n_samples, n_features). If n_components is an integer, n_components are kept. If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.
normalize: bool, default=False: Perform a normalization of the data.

Attributes:

mean: DenseFunctionalData: An estimation of the mean of the training data.
covariance: DenseFunctionalData: An estimation of the covariance of the training data based on their eigendecomposition using the Mercer’s theorem.
eigenvalues: npt.NDArray[np.float64], shape=(n_components,): The singular values corresponding to each of selected components.
eigenfunctions: DenseFunctionalData: Principal axes in feature space, representing the directions of maximum variance in the data.

Methods

`fit`(data[, points, method_smoothing, ...])	Estimate the eigencomponents of the data.
`inverse_transform`(scores)	Transform the data back to its original space.
`transform`([data, method, method_smoothing])	Apply dimensionality reduction to the data.

References

[1]

Ramsey, J. O. and Silverman, B. W. (2005), Functional Data Analysis, Springer Science, Chapter 8.

property method: str#: Getter for method.

property n_components: int#: Getter for n_components.

property normalize: bool#: Getter for normalize.

property mean: DenseFunctionalData#: Getter for mean.

property covariance: DenseFunctionalData#: Getter for covariance.

property eigenvalues: ndarray[Any, dtype[float64]]#: Getter for eigenvalues.

property eigenfunctions: DenseFunctionalData#: Getter for eigenfunctions.

fit(data: FunctionalData, points: DenseArgvals | None = None, method_smoothing: str | None = None, kwargs_mean: Dict[str, object] = {}, kwargs_covariance: Dict[str, object] = {}, kwargs_innpro: Dict[str, object] = {}) → None#

Estimate the eigencomponents of the data.

Before estimating the eigencomponents, the data is centered. Using the covariance operator, the estimation is based on [1].

Parameters:

data: FunctionalData: Training data used to estimate the eigencomponents.
points: DenseArgvals: The sampling points at which the covariance and the eigenfunctions will be estimated.
method_smoothing: str, default=None: Should the mean and covariance be smoothed?
kwargs_mean: Dict[str, object], default={}: Keywords arguments to be passed to the function FunctionalData.mean().
kwargs_covariance: Dict[str, object], default={}: Keywords arguments to be passed to the function preprocessing.fpca._fit_covariance().
kwargs_innpro: Dict[str, object], default={}: Keywords arguments to be passed to the function preprocessing.fpca._fit_inner_product().

References

[1]

Ramsey, J. O. and Silverman, B. W. (2005), Functional Data Analysis, Springer Science, Chapter 8.

transform(data: DenseFunctionalData | None = None, method: str = 'NumInt', method_smoothing: str = 'LP', **kwargs) → ndarray[Any, dtype[float64]]#

Apply dimensionality reduction to the data.

The functional principal components scores are defined as the projection of the observation \(X_i\) on the eigenfunction \(\phi_k\). These scores are given by:

\[c_{ik} = \int_{\mathcal{T}} \{X_i(t) - \mu(t)\}\phi_k(t)dt.\]

This integral can be estimated using two ways. First, if data are sampled on a common fine grid, the estimation is done using numerical integration. Second, the PACE (Principal Components through Conditional Expectation) algorithm [1] is used for sparse functional data. If the eigenfunctions have been estimated using the inner-product matrix, the scores can also be estimated using the formula

\[c_{ik} = \sqrt{l_k}v_{ik},\]

where \(l_k\) and \(v_{k}\) are the eigenvalues and eigenvectors of the inner-product matrix.

Parameters:

data: Optional[DenseFunctionalData], default=None

The data to be transformed. If None, the data are the same than for the fit method.

method: str, {‘NumInt’, ‘PACE’, ‘InnPro’}, default=’NumInt’

Method used to estimate the scores. If method == 'NumInt', numerical integration method is performed. If method == 'PACE', the PACE algorithm [1] is used. If method == 'InnPro', the estimation is performed using the inner product matrix of the data (can only be used if the eigencomponents have been estimated using the inner-product matrix.)

method_smoothing: str = ‘LP’,

Should the mean and covariance be smoothed?

**kwargs:

tol: float, default=1e-4: Tolerance parameter to prevent overflow to inverse a matrix, only used if method == 'PACE'.
integration_method: str, {‘trapz’, ‘simpson’}, default=’trapz’: Method used to perform numerical integration, only used if method == 'NumInt'.

Returns:

npt.NDArray[np.float64], shape=(n_obs, n_components): An array representing the projection of the data onto the basis of functions defined by the eigenfunctions.

References

[1] (1,2)

Yao, Müller and Wang (2005), Functional Data Analysis for Sparse Longitudinal Data. Journal of the American Statistical Association, 100, pp. 577–590.

inverse_transform(scores: ndarray[Any, dtype[float64]]) → DenseFunctionalData#

Transform the data back to its original space.

Given a set of scores \(c_{ik}\), we reconstruct the observations using a truncation of the Karhunen-Loève expansion,

\[X_{i}(t) = \mu(t) + \sum_{k = 1}^K c_{ik}\phi_k(t).\]

Data can be multidimensional.

Parameters:

scores: npt.NDArray[np.float64], shape=(n_obs, n_components): New data, where n_obs is the number of observations and n_components is the number of components.

Returns:

DenseFunctionalData: A DenseFunctionalData object representing the transformation of the scores into the original curve space.

Multivariate Functional Principal Components Analysis#

class FDApy.preprocessing.dim_reduction.mfpca.MFPCA(n_components: int | float = 2, univariate_expansions: List[Dict[str, object]] | None = None, method: str = 'covariance', weights: ndarray[Any, dtype[float64]] | None = None, normalize: bool = False)#

Bases: object

MFPCA – Multivariate Functional Principal Components Analysis.

Linear dimensionality reduction of a multivariate functional dataset. The projection of the data in a lower dimensional space is performed using a diagonalization of the covariance operator of each univariate component or of the inner-product matrix of the data. It is assumed that the data have \(P\) components.

Parameters:

n_components: Union[int, float], default=2: Number of components to keep. If n_components is an integer, n_components are kept. If 0 < n_components < 1, we select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.
univariate_expansions: Optional[List[Dict[str, object]]], default=None: List of dictionaries characterizing the univariate expansion computed for each component.
method: str, {‘covariance’, ‘inner-product’}, default=’covariance’: Method used to estimate the eigencomponents. If method == ‘covariance’, the estimation is based on an eigendecomposition of the covariance operator of each univariate components. If method == ‘inner-product’, the estimation is based on an eigendecomposition of the inner-product matrix.
weights: npt.NDArray[np.float_], default=None: A vector of weights of length \(P\). If None, we set the weights to be equal to 1 for each component.
normalize: bool, default=False: Perform a normalization of the data.

Attributes:

mean: MultivariateFunctionalData: An estimation of the mean of the training data.
covariance: MultivariateFunctionalData: An estimation of the covariance of the training data based on their eigendecomposition using the Mercer’s theorem.
eigenvalues: npt.NDArray[np.float_], shape=(n_components,): The singular values corresponding to each of selected components.
eigenfunctions: MultivariateFunctionalData: Principal axes in feature space, representing the directions of maximum variances in the data as a MultivariateFunctionalData.

Methods

`fit`(data[, points, method_smoothing])	Estimate the eigencomponents of the data.
`inverse_transform`(scores)	Transform the data back to its original space.
`transform`([data, method, method_smoothing])	Apply dimensionality reduction to the data.

References

[1]

Happ and Greven (2018), Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association, 113, pp. 649–659.

property n_components: int | float#: Getter for n_components.

property univariate_expansion: List[Dict[str, object]] | None#: Gettter for univariate_expansion.

property method: str#: Getter for method.

property weights: ndarray[Any, dtype[float64]] | None#: Getter for weights.

property normalize: bool#: Getter for normalize.

property mean: MultivariateFunctionalData#: Getter for mean.

property covariance: MultivariateFunctionalData#: Getter for covariance.

property eigenvalues: ndarray[Any, dtype[float64]]#: Getter for eigenvalues.

property eigenfunctions: MultivariateFunctionalData#: Getter for eigenfunctions.

fit(data: MultivariateFunctionalData, points: List[DenseArgvals] | None = None, method_smoothing: str | None = None, **kwargs) → None#

Estimate the eigencomponents of the data.

Before estimating the eigencomponents, the data is centered. Using the covariance operator, the estimation is based on [2]. Using the Gram matrix, the estimation is based on [1].

Parameters:

data: MultivariateFunctionalData

Training data used to estimate the eigencomponents.

points: Optional[List[DenseArgvals]]

The sampling points at which the covariance and the eigenfunctions will be estimated.

method_smoothing: str, default=None

Should the mean and covariance be smoothed?

**kwargs

Other keyword arguments are passed to the following functions:

FunctionalData.mean() and FunctionalData.center();
preprocessing.dim_reduction.mfpca._fit_inner_product_multivariate().

References

[1]

Golovkine, S., Gunning, E., Simpkin, A.J., Bargary, N. (2023). On the use of the Gram matrix for multivariate functional principal components analysis.

[2]

Happ C. & Greven S. (2018), Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association, 113, pp. 649–659.

transform(data: MultivariateFunctionalData | None = None, method: str = 'NumInt', method_smoothing: str = 'LP', **kwargs) → ndarray[Any, dtype[float64]]#

Apply dimensionality reduction to the data.

The functional principal components scores are defined as the projection of the observation \(X_i\) on the eigenfunction \(\phi_k\). These scores are given by:

\[c_{ik} = \sum_{p = 1}^P \int_{\mathcal{T}_p} \{X_i^{(p)}(t) - \mu^{(p)}(t)\}\phi_k^{(p)}(t)dt.\]

This integral can be estimated using numerical integration. If the eigenfunctions have been estimated using the inner-product matrix, the scores can also be estimated using the formula

\[c_{ik} = \sqrt{l_k}v_{ik},\]

where \(l_k\) and \(v_{k}\) are the eigenvalues and eigenvectors of the inner-product matrix.

TODO: Test for 2D functional data

Parameters:

data: Optional[MultivariateFunctionalData], default=None

Data

method: str, {‘NumInt’, ‘PACE’, ‘InnPro’}, default=’NumInt’

Method used to estimate the scores. If method == 'NumInt', numerical integration method is performed. If method == 'InnPro', the estimation is performed using the inner product matrix of the data (can only be used if the eigencomponents have been estimated using the inner-product matrix.)

method_smoothing: str = ‘LP’,

Should the mean and covariance be smoothed?

**kwargs:

Other keyword arguments are passed to the following function:

FunctionalData.center().

Returns:

npt.NDArray[np.float64], shape=(n_obs, n_components): An array representing the projection of the data onto the basis of functions defined by the eigenfunctions.

Notes

Concerning the estimation of the scores using numerical integration, we directly estimate the scores using the projection of the data onto the multivariate eigenfunctions and not use the univariate components and the decomposition of the covariance of the univariate scores as Happ and Greven [1] could do.

References

[1]

Happ and Greven (2018), Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association, 113, pp. 649–659.

inverse_transform(scores: ndarray[Any, dtype[float64]]) → MultivariateFunctionalData#

Transform the data back to its original space.

Given a set of scores \(c_{ik}\), we reconstruct the observations using a truncation of the Karhunen-Loève expansion,

\[X_{i}(t) = \mu(t) + \sum_{k = 1}^K c_{ik}\phi_k(t).\]

Data can be multidimensional. Recall that, here, \(X_{i}\), \(\mu\) and \(\phi_k\) are \(P\)-dimensional functions.

Parameters:

scores: npt.NDArray[np.float64], shape=(n_obs, n_components): New data, where n_obs is the number of observations and n_components is the number of components.

Returns:

MultivariateFunctionalData: A MultivariateFunctionalData object representing the transformation of the scores into the original curve space.

Functional Canonical Polyadic-Tensor Power Algorithm#

class FDApy.preprocessing.dim_reduction.fcp_tpa.FCPTPA(n_components: int = 5, normalize: bool = False)#

Bases: object

Functional Canonical Polyadic - Tensor Power Algorithm (FCP-TPA).

This module implements the Functional CP-TPA algorithm [1]. This method computes an eigendecomposition of image observations, which can be interpreted as functions on a two-dimensional domain. We assume \(N\) observations of 2D images with dimension \(M_1 \times M_2\). The results are given in a CANDECOMP/PARAFRAC (CP) model format

\[X = \sum_{k = 1}^K c_k \cdot u_k \circ v_k \circ w_k\]

where \(\circ\) stands for the outer product, \(c_k\) is a coefficient (scalar) and \(u_k, v_k, w_k\) are eigenvectors for each direction of the tensor. In this representation, the outer product \(v_k \circ w_k\) can be regarded as the \(k\)-th eigenimage, while \(d_k \cdot u_k\) represents the vector of individual scores for this eigenimage and each observation.

The smoothness of the eigenvectors \(v_k, w_k\) is induced by penalty matrices for both image directions, that are weighted by smoothing parameters \(\alpha_{v_k}, \alpha_{w_k}\). The eigenvectors \(u_k\) are not smoothed, hence the algorithm does not induce smoothness along observations.

Optimal smoothing parameters are found via a nested generalized cross validation [4]. In each iteration of the TPA (tensor power algorithm), the GCV criterion is optimized via scipy.optimize on the intervals specified via alpha_range.

The FCP-TPA algorithm is an iterative algorithm. Convergence is assumed if the relative difference between the actual and the previous values are all below the tolerance level tolerance. The tolerance level is increased automatically, if the algorithm has not converged after max_iteration steps and if adapt_tolerance = TRUE. If the algorithm did not converge after max_iteration steps steps, the function throws a warning. The code is adapted from [2] and [3].

Parameters:

n_components: int, default=5: Number of components to be calculated.
normalize: bool, default=False: Should the results be normalied?

Attributes:

eigenvalues: npt.NDArray[np.float64], shape=(n_components,): The singular values corresponding to each of selected components.
eigenfunctions: DenseFunctionalData: Principal axes in feature space, representing the directions of maximum variance in the data.

Methods

`fit`(data, penalty_matrices, alpha_range[, ...])	Fit the model on data.
`inverse_transform`(scores)	Transform the data back to its original space.
`transform`(data[, method])	Apply dimension reduction to the data.

References

[1]

Allen G. (2013) Multi-way Functional Principal Components Analysis, IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing.

[2]

Happ C. and Greven S. (2018) Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains, Journal of the American Statistical Association, 113(522), 649–659, DOI: 10.1080/01621459.2016.1273115.

[3]

Happ-Kurz C. (2020) Object-Oriented Software for Functional Data, Journal of Statistical Software, 93(5): 1–38.

[4]

Huang J. Z., Shen H. and Buja A. (2009) The Analysis of Two-Way Functional Data Using Two-Way Regularized Singular Value Decomposition, Journal of the American Statistical Association, 104(488): 1609–1620.

property n_components: int#: Getter for n_components.

property normalize: bool#: Getter for normalize.

property eigenvalues: ndarray[Any, dtype[float64]]#: Getter for eigenvalues.

property eigenfunctions: DenseFunctionalData#: Getter for eigenfunctions.

fit(data: DenseFunctionalData, penalty_matrices: Dict[str, ndarray[Any, dtype[float64]]], alpha_range: Dict[str, Tuple[float, float]], tolerance: float = 0.0001, max_iteration: int = 15, adapt_tolerance: bool = True, verbose: bool = False) → None#

Fit the model on data.

This function is used to fit a model on the data.

Parameters:

data: DenseFunctionalData: Training data used to estimate the eigencoponents. The dimension of its value parameter is \(N \times M_1 \times M_2\).
penalty_matrices: Dict[str, npt.NDArray[np.float64]]: A dictionary with entries \(v\) and \(w\), containing a roughness penalty matrix for each direction of the image. The algorithm does not induce smoothness along observations.
alpha_range: Dict[str, Tuple[float, float]]: A dictionary with entries \(v\) and \(w\), containing the range of smoothness parameters \(\alpha_{v_k}, \alpha_{w_k}\) as a tuple.
tolerance: float, default=1e-4: A numeric value, giving the tolerance for relative error values in the algorithm. It is automatically multiplyed by 10 after max_iter steps, if adapt_tol = True.
max_iteration: int, default=15: An integer, the maximal iteration steps. Can be doubled, if adapt_tol = True.
adapt_tolerance: bool, default=True: If True, the tolerance is adapted (multiply by 10), if the algorithm has not converged after max_iter steps and another max_iter steps are allowed with the increased tolerance.
verbose: bool, default=False: If True, computational details are given on the standard output during the computation. Here for debug purpose.

Examples

Simulate some data.

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     dimension='2D',
...     argvals={'input_dim_0': np.linspace(0, 1, 101)},
...     random_state=42
... )
>>> kl.new(n_obs=50)
>>> data = kl.data

Define some parameters.

>>> n_points = data.n_points
>>> mat_v = np.diff(np.identity(n_points['input_dim_0']))
>>> mat_w = np.diff(np.identity(n_points['input_dim_1']))

Fit the FCP-TPA algorithm.

>>> fcptpa = FCPTPA(n_components=10)
>>> fcptpa.fit(
...     data,
...     penalty_matrices={
...         'v': np.dot(mat_v, mat_v.T),
...         'w': np.dot(mat_w, mat_w.T)
...     },
...     alpha_range={
...         'v': (1e-2, 1e2),
...         'w': (1e-2, 1e2)
...     },
...     tolerance=1e-4,
...     max_iteration=15,
...     adapt_tolerance=True
... )

transform(data: DenseFunctionalData, method: str = 'NumInt') → ndarray[Any, dtype[float64]]#

Apply dimension reduction to the data.

Parameters:

data: DenseFunctionalData: Functional data object to be transformed. It has to be 2-dimensional data.
method: str, {‘NumInt’, ‘FCPTPA’}: Not used. To be compliant with other methods.

Returns:

npt.NDArray[np.float64], shape=(n_obs, n_components): An array representing the projection of the data onto the basis of functions defined by the eigenimages.

Examples

Using the model fitted using the fit function.

>>> scores = fcptpa.transform(data, 'NumInt')

inverse_transform(scores: ndarray[Any, dtype[float64]]) → DenseFunctionalData#

Transform the data back to its original space.

Return a DenseFunctionalData whose transform would be scores.

Parameters:

scores: npt.NDArray[np.float64], shape=(n_obs, n_components): A set of coefficients to generate new data, where n_obs is the number of observations and n_components is the number of components.

Returns:

DenseFunctionalData: The transformation of the scores into the original space.

Examples

Using the model fitted using the fit function.

>>> data_f = fcptpa.inverse_transform(scores)

Local Polynomials#

class FDApy.preprocessing.smoothing.local_polynomial.LocalPolynomial(kernel_name: str = 'epanechnikov', bandwidth: float = 0.05, degree: int = 1, robust: bool = False, **kwargs)#

Bases: object

Local Polynomial Regression.

This module implements Local Polynomial Regression over different dimensional domain [2]. The idea of local regression is to fit a (simple) different model separetely at each query point \(x_0\). Using only the observations close to \(x_0\), the resulting estimated function is smooth in the definition domain. Selecting observations close to \(x_0\) is achieved via a weighted (kernel) function which assigned a weight to each observation based on its (euclidean) distance from the query point.

Different kernels are defined (gaussian, epanechnikov, tricube, bisquare). Each of them has slightly different properties. Kernels are indexed by a parameter (bandwith) that controls the width of the neighborhood of \(x_0\). Note that the bandwidth can be adaptive and depend on \(x_0\).

The degree of smoothing functions is controled using the degree parameter. A degree of 0 corresponds to locally constant, a degree of 1 to locally linear and a degree of 2 to locally quadratic, etc. High degrees can cause overfitting.

The implementation is adapted from [3].

Parameters:

kernel_name: np.str_, default=”epanechnikov”: Kernel name used as weight (gaussian, epanechnikov, tricube, bisquare).
bandwidth: float, default=0.05: Strictly positive. Control the size of the associated neighborhood.
degree: int, default=1: Degree of the local polynomial to fit. If degree = 0, we fit the local constant estimator (equivalent to the Nadaraya-Watson estimator). If degree = 1, we fit the local linear estimator. If degree = 2, we fit the local quadratic estimator.
robust: bool, default=False: Whether to apply the robustification procedure from [1], page 831.

Attributes:

kernel: Callable: Function associated to the kernel name.
poly_features: PolynomialFeatures: An instance of sklearn.preprocessing.PolynomialFeatures used to create design matrices. It includes an intercept and interactions for multidimensional inputs.

Methods

predict(y, x[, x_new])

Predict using local polynomial regression.

Notes

This methods is memory-based and thus require no training; all the work is performed at evaluation time [2]. For now, no fit function is necessary and only a predict is implemented.

References

[1]

Cleveland W. (1979) Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74(368): 829–836.

[2] (1,2)

Hastie, T., Tibshirani, R., Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer Series in Statistics.

[3]

arokem/lowess

property kernel_name: str#: Getter for kernel_name.

property bandwidth: float#: Getter for bandwidth.

property degree: float#: Getter for degree.

property robust: bool#: Getter for robust.

property kernel: Callable#: Getter for kernel.

property poly_features: PolynomialFeatures#: Getter for poly_features.

predict(y: ndarray[Any, dtype[float64]], x: ndarray[Any, dtype[float64]], x_new: ndarray[Any, dtype[float64]] | None = None) → ndarray[Any, dtype[float64]]#

Predict using local polynomial regression.

Parameters:

y: npt.NDArray[np.float64], shape = (n_samples,): Target values.
x: npt.NDArray[np.float64], shape = (n_samples, n_dim): Training data.
x_new: Optional[npt.NDArray[np.float64]], default=None: Query points at which estimates the function. If None, the (unique) training data are used as query points. The shape of the array must be (n_points, n_dim).

Returns:

npt.NDArray[np.float64], shape = (n_samples,): Return predicted values.

Notes

Be careful that, for two-dimensional and higher-dimensional data, not passing a x_new argument may result to something unexpected as for now, the function np.unique re-order the columns of the data. To be sure of the results, please provide a x_new argument.

Examples

For one-dimensional data.

>>> n_points = 101
>>> x = np.linspace(0, 1, n_points)
>>> y = np.sin(x) + np.random.normal(0, 0.05, n_points)
>>> x_new = np.linspace(0, 1, 11)

>>> lp = LocalPolynomial(
...     kernel_name='epanechnikov', bandwidth=0.3, degree=1
... )
>>> lp.predict(y=y, x=x, x_new=x_new)

For two-dimensional data.

>>> n_points = 51
>>> pts = np.linspace(0, 1, n_points)
>>> xx, yy = np.meshgrid(pts, pts, indexing='ij')
>>> x = np.column_stack([xx.flatten(), yy.flatten()])
>>> eps = np.random.normal(0, 0.1, len(x))
>>> y = np.sin(x[:, 0]) * np.cos(x[:, 1]) + eps

>>> lp = LocalPolynomial(
...     kernel_name='epanechnikov', bandwidth=0.3, degree=2
... )
>>> lp.predict(y=y, x=x, x_new=x_new)