Preprocessing#
Univariate Functional Principal Components Analysis#
- class FDApy.preprocessing.dim_reduction.ufpca.UFPCA(method: str = 'covariance', n_components: float | int | None = None, normalize: bool = False)#
Bases:
objectUFPCA – Univariate Functional Principal Components Analysis.
Linear dimensionality reduction of a univariate functional dataset. The projection of the data in a lower dimensional space is performed using a diagonalization of the covariance operator or of the inner-product matrix of the data.
- Parameters:
- method: str, {‘covariance’, ‘inner-product’}, default=’covariance’
Method used to estimate the eigencomponents. If
method == 'covariance', the estimation is based on an eigendecomposition of the covariance operator. Ifmethod == 'inner-product', the estimation is based on an eigendecomposition of the inner-product matrix.- n_components: Optional[Union[int, float]], default=None
Number of components to keep. If n_components is None, all components are kept,
n_components == min(n_samples, n_features). If n_components is an integer, n_components are kept. If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.- normalize: bool, default=False
Perform a normalization of the data.
- Attributes:
- mean: DenseFunctionalData
An estimation of the mean of the training data.
- covariance: DenseFunctionalData
An estimation of the covariance of the training data based on their eigendecomposition using the Mercer’s theorem.
- eigenvalues: npt.NDArray[np.float64], shape=(n_components,)
The singular values corresponding to each of selected components.
- eigenfunctions: DenseFunctionalData
Principal axes in feature space, representing the directions of maximum variance in the data.
Methods
fit(data[, points, method_smoothing, ...])Estimate the eigencomponents of the data.
inverse_transform(scores)Transform the data back to its original space.
transform([data, method, method_smoothing])Apply dimensionality reduction to the data.
References
[1]Ramsey, J. O. and Silverman, B. W. (2005), Functional Data Analysis, Springer Science, Chapter 8.
- property method: str#
Getter for method.
- property n_components: int#
Getter for n_components.
- property normalize: bool#
Getter for normalize.
- property mean: DenseFunctionalData#
Getter for mean.
- property covariance: DenseFunctionalData#
Getter for covariance.
- property eigenvalues: ndarray[Any, dtype[float64]]#
Getter for eigenvalues.
- property eigenfunctions: DenseFunctionalData#
Getter for eigenfunctions.
- fit(data: FunctionalData, points: DenseArgvals | None = None, method_smoothing: str | None = None, kwargs_mean: Dict[str, object] = {}, kwargs_covariance: Dict[str, object] = {}, kwargs_innpro: Dict[str, object] = {}) None#
Estimate the eigencomponents of the data.
Before estimating the eigencomponents, the data is centered. Using the covariance operator, the estimation is based on [1].
- Parameters:
- data: FunctionalData
Training data used to estimate the eigencomponents.
- points: DenseArgvals
The sampling points at which the covariance and the eigenfunctions will be estimated.
- method_smoothing: str, default=None
Should the mean and covariance be smoothed?
- kwargs_mean: Dict[str, object], default={}
Keywords arguments to be passed to the function
FunctionalData.mean().- kwargs_covariance: Dict[str, object], default={}
Keywords arguments to be passed to the function
preprocessing.fpca._fit_covariance().- kwargs_innpro: Dict[str, object], default={}
Keywords arguments to be passed to the function
preprocessing.fpca._fit_inner_product().
References
[1]Ramsey, J. O. and Silverman, B. W. (2005), Functional Data Analysis, Springer Science, Chapter 8.
- transform(data: DenseFunctionalData | None = None, method: str = 'NumInt', method_smoothing: str = 'LP', **kwargs) ndarray[Any, dtype[float64]]#
Apply dimensionality reduction to the data.
The functional principal components scores are defined as the projection of the observation \(X_i\) on the eigenfunction \(\phi_k\). These scores are given by:
\[c_{ik} = \int_{\mathcal{T}} \{X_i(t) - \mu(t)\}\phi_k(t)dt.\]This integral can be estimated using two ways. First, if data are sampled on a common fine grid, the estimation is done using numerical integration. Second, the PACE (Principal Components through Conditional Expectation) algorithm [1] is used for sparse functional data. If the eigenfunctions have been estimated using the inner-product matrix, the scores can also be estimated using the formula
\[c_{ik} = \sqrt{l_k}v_{ik},\]where \(l_k\) and \(v_{k}\) are the eigenvalues and eigenvectors of the inner-product matrix.
- Parameters:
- data: Optional[DenseFunctionalData], default=None
The data to be transformed. If None, the data are the same than for the fit method.
- method: str, {‘NumInt’, ‘PACE’, ‘InnPro’}, default=’NumInt’
Method used to estimate the scores. If
method == 'NumInt', numerical integration method is performed. Ifmethod == 'PACE', the PACE algorithm [1] is used. Ifmethod == 'InnPro', the estimation is performed using the inner product matrix of the data (can only be used if the eigencomponents have been estimated using the inner-product matrix.)- method_smoothing: str = ‘LP’,
Should the mean and covariance be smoothed?
- **kwargs:
- tol: float, default=1e-4
Tolerance parameter to prevent overflow to inverse a matrix, only used if
method == 'PACE'.- integration_method: str, {‘trapz’, ‘simpson’}, default=’trapz’
Method used to perform numerical integration, only used if
method == 'NumInt'.
- Returns:
- npt.NDArray[np.float64], shape=(n_obs, n_components)
An array representing the projection of the data onto the basis of functions defined by the eigenfunctions.
References
- inverse_transform(scores: ndarray[Any, dtype[float64]]) DenseFunctionalData#
Transform the data back to its original space.
Given a set of scores \(c_{ik}\), we reconstruct the observations using a truncation of the Karhunen-Loève expansion,
\[X_{i}(t) = \mu(t) + \sum_{k = 1}^K c_{ik}\phi_k(t).\]Data can be multidimensional.
- Parameters:
- scores: npt.NDArray[np.float64], shape=(n_obs, n_components)
New data, where n_obs is the number of observations and n_components is the number of components.
- Returns:
- DenseFunctionalData
A DenseFunctionalData object representing the transformation of the scores into the original curve space.
Multivariate Functional Principal Components Analysis#
- class FDApy.preprocessing.dim_reduction.mfpca.MFPCA(n_components: int | float = 2, univariate_expansions: List[Dict[str, object]] | None = None, method: str = 'covariance', weights: ndarray[Any, dtype[float64]] | None = None, normalize: bool = False)#
Bases:
objectMFPCA – Multivariate Functional Principal Components Analysis.
Linear dimensionality reduction of a multivariate functional dataset. The projection of the data in a lower dimensional space is performed using a diagonalization of the covariance operator of each univariate component or of the inner-product matrix of the data. It is assumed that the data have \(P\) components.
- Parameters:
- n_components: Union[int, float], default=2
Number of components to keep. If n_components is an integer, n_components are kept. If 0 < n_components < 1, we select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.
- univariate_expansions: Optional[List[Dict[str, object]]], default=None
List of dictionaries characterizing the univariate expansion computed for each component.
- method: str, {‘covariance’, ‘inner-product’}, default=’covariance’
Method used to estimate the eigencomponents. If method == ‘covariance’, the estimation is based on an eigendecomposition of the covariance operator of each univariate components. If method == ‘inner-product’, the estimation is based on an eigendecomposition of the inner-product matrix.
- weights: npt.NDArray[np.float_], default=None
A vector of weights of length \(P\). If None, we set the weights to be equal to 1 for each component.
- normalize: bool, default=False
Perform a normalization of the data.
- Attributes:
- mean: MultivariateFunctionalData
An estimation of the mean of the training data.
- covariance: MultivariateFunctionalData
An estimation of the covariance of the training data based on their eigendecomposition using the Mercer’s theorem.
- eigenvalues: npt.NDArray[np.float_], shape=(n_components,)
The singular values corresponding to each of selected components.
- eigenfunctions: MultivariateFunctionalData
Principal axes in feature space, representing the directions of maximum variances in the data as a MultivariateFunctionalData.
Methods
fit(data[, points, method_smoothing])Estimate the eigencomponents of the data.
inverse_transform(scores)Transform the data back to its original space.
transform([data, method, method_smoothing])Apply dimensionality reduction to the data.
References
[1]Happ and Greven (2018), Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association, 113, pp. 649–659.
- property n_components: int | float#
Getter for n_components.
- property univariate_expansion: List[Dict[str, object]] | None#
Gettter for univariate_expansion.
- property method: str#
Getter for method.
- property weights: ndarray[Any, dtype[float64]] | None#
Getter for weights.
- property normalize: bool#
Getter for normalize.
- property mean: MultivariateFunctionalData#
Getter for mean.
- property covariance: MultivariateFunctionalData#
Getter for covariance.
- property eigenvalues: ndarray[Any, dtype[float64]]#
Getter for eigenvalues.
- property eigenfunctions: MultivariateFunctionalData#
Getter for eigenfunctions.
- fit(data: MultivariateFunctionalData, points: List[DenseArgvals] | None = None, method_smoothing: str | None = None, **kwargs) None#
Estimate the eigencomponents of the data.
Before estimating the eigencomponents, the data is centered. Using the covariance operator, the estimation is based on [2]. Using the Gram matrix, the estimation is based on [1].
- Parameters:
- data: MultivariateFunctionalData
Training data used to estimate the eigencomponents.
- points: Optional[List[DenseArgvals]]
The sampling points at which the covariance and the eigenfunctions will be estimated.
- method_smoothing: str, default=None
Should the mean and covariance be smoothed?
- **kwargs
Other keyword arguments are passed to the following functions:
FunctionalData.mean()andFunctionalData.center();preprocessing.dim_reduction.mfpca._fit_inner_product_multivariate().
References
[1]Golovkine, S., Gunning, E., Simpkin, A.J., Bargary, N. (2023). On the use of the Gram matrix for multivariate functional principal components analysis.
[2]Happ C. & Greven S. (2018), Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association, 113, pp. 649–659.
- transform(data: MultivariateFunctionalData | None = None, method: str = 'NumInt', method_smoothing: str = 'LP', **kwargs) ndarray[Any, dtype[float64]]#
Apply dimensionality reduction to the data.
The functional principal components scores are defined as the projection of the observation \(X_i\) on the eigenfunction \(\phi_k\). These scores are given by:
\[c_{ik} = \sum_{p = 1}^P \int_{\mathcal{T}_p} \{X_i^{(p)}(t) - \mu^{(p)}(t)\}\phi_k^{(p)}(t)dt.\]This integral can be estimated using numerical integration. If the eigenfunctions have been estimated using the inner-product matrix, the scores can also be estimated using the formula
\[c_{ik} = \sqrt{l_k}v_{ik},\]where \(l_k\) and \(v_{k}\) are the eigenvalues and eigenvectors of the inner-product matrix.
TODO: Test for 2D functional data
- Parameters:
- data: Optional[MultivariateFunctionalData], default=None
Data
- method: str, {‘NumInt’, ‘PACE’, ‘InnPro’}, default=’NumInt’
Method used to estimate the scores. If
method == 'NumInt', numerical integration method is performed. Ifmethod == 'InnPro', the estimation is performed using the inner product matrix of the data (can only be used if the eigencomponents have been estimated using the inner-product matrix.)- method_smoothing: str = ‘LP’,
Should the mean and covariance be smoothed?
- **kwargs:
Other keyword arguments are passed to the following function:
FunctionalData.center().
- Returns:
- npt.NDArray[np.float64], shape=(n_obs, n_components)
An array representing the projection of the data onto the basis of functions defined by the eigenfunctions.
Notes
Concerning the estimation of the scores using numerical integration, we directly estimate the scores using the projection of the data onto the multivariate eigenfunctions and not use the univariate components and the decomposition of the covariance of the univariate scores as Happ and Greven [1] could do.
References
[1]Happ and Greven (2018), Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association, 113, pp. 649–659.
- inverse_transform(scores: ndarray[Any, dtype[float64]]) MultivariateFunctionalData#
Transform the data back to its original space.
Given a set of scores \(c_{ik}\), we reconstruct the observations using a truncation of the Karhunen-Loève expansion,
\[X_{i}(t) = \mu(t) + \sum_{k = 1}^K c_{ik}\phi_k(t).\]Data can be multidimensional. Recall that, here, \(X_{i}\), \(\mu\) and \(\phi_k\) are \(P\)-dimensional functions.
- Parameters:
- scores: npt.NDArray[np.float64], shape=(n_obs, n_components)
New data, where n_obs is the number of observations and n_components is the number of components.
- Returns:
- MultivariateFunctionalData
A MultivariateFunctionalData object representing the transformation of the scores into the original curve space.
Functional Canonical Polyadic-Tensor Power Algorithm#
- class FDApy.preprocessing.dim_reduction.fcp_tpa.FCPTPA(n_components: int = 5, normalize: bool = False)#
Bases:
objectFunctional Canonical Polyadic - Tensor Power Algorithm (FCP-TPA).
This module implements the Functional CP-TPA algorithm [1]. This method computes an eigendecomposition of image observations, which can be interpreted as functions on a two-dimensional domain. We assume \(N\) observations of 2D images with dimension \(M_1 \times M_2\). The results are given in a CANDECOMP/PARAFRAC (CP) model format
\[X = \sum_{k = 1}^K c_k \cdot u_k \circ v_k \circ w_k\]where \(\circ\) stands for the outer product, \(c_k\) is a coefficient (scalar) and \(u_k, v_k, w_k\) are eigenvectors for each direction of the tensor. In this representation, the outer product \(v_k \circ w_k\) can be regarded as the \(k\)-th eigenimage, while \(d_k \cdot u_k\) represents the vector of individual scores for this eigenimage and each observation.
The smoothness of the eigenvectors \(v_k, w_k\) is induced by penalty matrices for both image directions, that are weighted by smoothing parameters \(\alpha_{v_k}, \alpha_{w_k}\). The eigenvectors \(u_k\) are not smoothed, hence the algorithm does not induce smoothness along observations.
Optimal smoothing parameters are found via a nested generalized cross validation [4]. In each iteration of the TPA (tensor power algorithm), the GCV criterion is optimized via
scipy.optimizeon the intervals specified viaalpha_range.The FCP-TPA algorithm is an iterative algorithm. Convergence is assumed if the relative difference between the actual and the previous values are all below the tolerance level
tolerance. The tolerance level is increased automatically, if the algorithm has not converged aftermax_iterationsteps and ifadapt_tolerance = TRUE. If the algorithm did not converge aftermax_iterationsteps steps, the function throws a warning. The code is adapted from [2] and [3].- Parameters:
- n_components: int, default=5
Number of components to be calculated.
- normalize: bool, default=False
Should the results be normalied?
- Attributes:
- eigenvalues: npt.NDArray[np.float64], shape=(n_components,)
The singular values corresponding to each of selected components.
- eigenfunctions: DenseFunctionalData
Principal axes in feature space, representing the directions of maximum variance in the data.
Methods
fit(data, penalty_matrices, alpha_range[, ...])Fit the model on data.
inverse_transform(scores)Transform the data back to its original space.
transform(data[, method])Apply dimension reduction to the data.
References
[1]Allen G. (2013) Multi-way Functional Principal Components Analysis, IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing.
[2]Happ C. and Greven S. (2018) Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains, Journal of the American Statistical Association, 113(522), 649–659, DOI: 10.1080/01621459.2016.1273115.
[3]Happ-Kurz C. (2020) Object-Oriented Software for Functional Data, Journal of Statistical Software, 93(5): 1–38.
[4]Huang J. Z., Shen H. and Buja A. (2009) The Analysis of Two-Way Functional Data Using Two-Way Regularized Singular Value Decomposition, Journal of the American Statistical Association, 104(488): 1609–1620.
- property n_components: int#
Getter for n_components.
- property normalize: bool#
Getter for normalize.
- property eigenvalues: ndarray[Any, dtype[float64]]#
Getter for eigenvalues.
- property eigenfunctions: DenseFunctionalData#
Getter for eigenfunctions.
- fit(data: DenseFunctionalData, penalty_matrices: Dict[str, ndarray[Any, dtype[float64]]], alpha_range: Dict[str, Tuple[float, float]], tolerance: float = 0.0001, max_iteration: int = 15, adapt_tolerance: bool = True, verbose: bool = False) None#
Fit the model on data.
This function is used to fit a model on the data.
- Parameters:
- data: DenseFunctionalData
Training data used to estimate the eigencoponents. The dimension of its value parameter is \(N \times M_1 \times M_2\).
- penalty_matrices: Dict[str, npt.NDArray[np.float64]]
A dictionary with entries \(v\) and \(w\), containing a roughness penalty matrix for each direction of the image. The algorithm does not induce smoothness along observations.
- alpha_range: Dict[str, Tuple[float, float]]
A dictionary with entries \(v\) and \(w\), containing the range of smoothness parameters \(\alpha_{v_k}, \alpha_{w_k}\) as a tuple.
- tolerance: float, default=1e-4
A numeric value, giving the tolerance for relative error values in the algorithm. It is automatically multiplyed by 10 after
max_itersteps, ifadapt_tol = True.- max_iteration: int, default=15
An integer, the maximal iteration steps. Can be doubled, if
adapt_tol = True.- adapt_tolerance: bool, default=True
If True, the tolerance is adapted (multiply by 10), if the algorithm has not converged after
max_itersteps and anothermax_itersteps are allowed with the increased tolerance.- verbose: bool, default=False
If True, computational details are given on the standard output during the computation. Here for debug purpose.
Examples
Simulate some data.
>>> kl = KarhunenLoeve( ... basis_name='bsplines', ... n_functions=5, ... dimension='2D', ... argvals={'input_dim_0': np.linspace(0, 1, 101)}, ... random_state=42 ... ) >>> kl.new(n_obs=50) >>> data = kl.data
Define some parameters.
>>> n_points = data.n_points >>> mat_v = np.diff(np.identity(n_points['input_dim_0'])) >>> mat_w = np.diff(np.identity(n_points['input_dim_1']))
Fit the FCP-TPA algorithm.
>>> fcptpa = FCPTPA(n_components=10) >>> fcptpa.fit( ... data, ... penalty_matrices={ ... 'v': np.dot(mat_v, mat_v.T), ... 'w': np.dot(mat_w, mat_w.T) ... }, ... alpha_range={ ... 'v': (1e-2, 1e2), ... 'w': (1e-2, 1e2) ... }, ... tolerance=1e-4, ... max_iteration=15, ... adapt_tolerance=True ... )
- transform(data: DenseFunctionalData, method: str = 'NumInt') ndarray[Any, dtype[float64]]#
Apply dimension reduction to the data.
- Parameters:
- data: DenseFunctionalData
Functional data object to be transformed. It has to be 2-dimensional data.
- method: str, {‘NumInt’, ‘FCPTPA’}
Not used. To be compliant with other methods.
- Returns:
- npt.NDArray[np.float64], shape=(n_obs, n_components)
An array representing the projection of the data onto the basis of functions defined by the eigenimages.
Examples
Using the model fitted using the
fitfunction.>>> scores = fcptpa.transform(data, 'NumInt')
- inverse_transform(scores: ndarray[Any, dtype[float64]]) DenseFunctionalData#
Transform the data back to its original space.
Return a DenseFunctionalData whose transform would be
scores.- Parameters:
- scores: npt.NDArray[np.float64], shape=(n_obs, n_components)
A set of coefficients to generate new data, where
n_obsis the number of observations andn_componentsis the number of components.
- Returns:
- DenseFunctionalData
The transformation of the scores into the original space.
Examples
Using the model fitted using the
fitfunction.>>> data_f = fcptpa.inverse_transform(scores)
Local Polynomials#
- class FDApy.preprocessing.smoothing.local_polynomial.LocalPolynomial(kernel_name: str = 'epanechnikov', bandwidth: float = 0.05, degree: int = 1, robust: bool = False, **kwargs)#
Bases:
objectLocal Polynomial Regression.
This module implements Local Polynomial Regression over different dimensional domain [2]. The idea of local regression is to fit a (simple) different model separetely at each query point \(x_0\). Using only the observations close to \(x_0\), the resulting estimated function is smooth in the definition domain. Selecting observations close to \(x_0\) is achieved via a weighted (kernel) function which assigned a weight to each observation based on its (euclidean) distance from the query point.
Different kernels are defined (gaussian, epanechnikov, tricube, bisquare). Each of them has slightly different properties. Kernels are indexed by a parameter (bandwith) that controls the width of the neighborhood of \(x_0\). Note that the bandwidth can be adaptive and depend on \(x_0\).
The degree of smoothing functions is controled using the degree parameter. A degree of 0 corresponds to locally constant, a degree of 1 to locally linear and a degree of 2 to locally quadratic, etc. High degrees can cause overfitting.
The implementation is adapted from [3].
- Parameters:
- kernel_name: np.str_, default=”epanechnikov”
Kernel name used as weight (gaussian, epanechnikov, tricube, bisquare).
- bandwidth: float, default=0.05
Strictly positive. Control the size of the associated neighborhood.
- degree: int, default=1
Degree of the local polynomial to fit. If
degree = 0, we fit the local constant estimator (equivalent to the Nadaraya-Watson estimator). Ifdegree = 1, we fit the local linear estimator. Ifdegree = 2, we fit the local quadratic estimator.- robust: bool, default=False
Whether to apply the robustification procedure from [1], page 831.
- Attributes:
- kernel: Callable
Function associated to the kernel name.
- poly_features: PolynomialFeatures
An instance of
sklearn.preprocessing.PolynomialFeaturesused to create design matrices. It includes an intercept and interactions for multidimensional inputs.
Methods
predict(y, x[, x_new])Predict using local polynomial regression.
Notes
This methods is memory-based and thus require no training; all the work is performed at evaluation time [2]. For now, no
fitfunction is necessary and only apredictis implemented.References
[1]Cleveland W. (1979) Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74(368): 829–836.
- property kernel_name: str#
Getter for kernel_name.
- property bandwidth: float#
Getter for bandwidth.
- property degree: float#
Getter for degree.
- property robust: bool#
Getter for robust.
- property kernel: Callable#
Getter for kernel.
- property poly_features: PolynomialFeatures#
Getter for poly_features.
- predict(y: ndarray[Any, dtype[float64]], x: ndarray[Any, dtype[float64]], x_new: ndarray[Any, dtype[float64]] | None = None) ndarray[Any, dtype[float64]]#
Predict using local polynomial regression.
- Parameters:
- y: npt.NDArray[np.float64], shape = (n_samples,)
Target values.
- x: npt.NDArray[np.float64], shape = (n_samples, n_dim)
Training data.
- x_new: Optional[npt.NDArray[np.float64]], default=None
Query points at which estimates the function. If
None, the (unique) training data are used as query points. The shape of the array must be (n_points, n_dim).
- Returns:
- npt.NDArray[np.float64], shape = (n_samples,)
Return predicted values.
Notes
Be careful that, for two-dimensional and higher-dimensional data, not passing a
x_newargument may result to something unexpected as for now, the functionnp.uniquere-order the columns of the data. To be sure of the results, please provide ax_newargument.Examples
For one-dimensional data.
>>> n_points = 101 >>> x = np.linspace(0, 1, n_points) >>> y = np.sin(x) + np.random.normal(0, 0.05, n_points) >>> x_new = np.linspace(0, 1, 11)
>>> lp = LocalPolynomial( ... kernel_name='epanechnikov', bandwidth=0.3, degree=1 ... ) >>> lp.predict(y=y, x=x, x_new=x_new)
For two-dimensional data.
>>> n_points = 51 >>> pts = np.linspace(0, 1, n_points) >>> xx, yy = np.meshgrid(pts, pts, indexing='ij') >>> x = np.column_stack([xx.flatten(), yy.flatten()]) >>> eps = np.random.normal(0, 0.1, len(x)) >>> y = np.sin(x[:, 0]) * np.cos(x[:, 1]) + eps
>>> lp = LocalPolynomial( ... kernel_name='epanechnikov', bandwidth=0.3, degree=2 ... ) >>> lp.predict(y=y, x=x, x_new=x_new)