IrregularFunctionalData#

class FDApy.representation.IrregularFunctionalData(argvals, values)[source]#

Represent irregularly sampled functional data.

Parameters:
  • argvals (IrregularArgvals) – The sampling points of the functional data. Each entry of the dictionary represents an input dimension. Then, each dimension is a dictionary where entries are the different observations. So, the observation \(i\) for the dimension \(j\) is a np.ndarray with shape \((m^i_j,)\) for \(0 \leq i \leq n\) and \(0 \leq j \leq p\).

  • values (IrregularValues) – The values of the functional data. Each entry of the dictionary is an observation of the process. And, an observation is represented by a np.ndarray of shape \((n, m_1, \dots, m_p)\). It should not contain any missing values.

Attributes:
  • argvals_stand (IrregularArgvals) – Standardized sampling points of the functional data.

  • n_obs (int) – Number of observations of the functional data.

  • n_dimension (int) – Number of input dimension of the functional data.

  • n_points (Dict[int, Tuple[int, …]]) – Number of sampling points.

Examples

For 1-dimensional irregular data:

>>> argvals = IrregularArgvals({
...     0: DenseArgvals({'input_dim_0': np.array([0, 1, 2, 3, 4])}),
...     1: DenseArgvals({'input_dim_0': np.array([0, 2, 4])}),
...     2: DenseArgvals({'input_dim_0': np.array([2, 4])})
... })
>>> values = IrregularValues({
...     0: np.array([1, 2, 3, 4, 5]),
...     1: np.array([2, 5, 6]),
...     2: np.array([4, 7])
... })
>>> IrregularFunctionalData(argvals, values)

For 2-dimensional irregular data:

>>> argvals = IrregularArgvals({
...     0: DenseArgvals({
...         'input_dim_0': np.array([1, 2, 3, 4]),
...         'input_dim_1': np.array([5, 6, 7])
...     }),
...     1: DenseArgvals({
...         'input_dim_0': np.array([2, 4]),
...         'input_dim_1': np.array([1, 2, 3])
...     }),
...     2: DenseArgvals({
...         'input_dim_0': np.array([4, 5, 6]),
...         'input_dim_1': np.array([8, 9])
...     })
... })
>>> values = IrregularValues({
...     0: np.array([[1, 2, 3], [4, 1, 2], [3, 4, 1], [2, 3, 4]]),
...     1: np.array([[1, 2, 3], [1, 2, 3]]),
...     2: np.array([[8, 9], [8, 9], [8, 9]])
... })
>>> IrregularFunctionalData(argvals, values)

References

Methods

center([mean, method_smoothing])

Center the data.

concatenate(*fdata)

Concatenate IrregularFunctionalData objects.

covariance([points, method_smoothing, ...])

Compute an estimate of the covariance function.

inner_product([method_integration, ...])

Compute the inner product matrix of the data.

mean([points, method_smoothing, approx])

Compute an estimate of the mean.

noise_variance([order])

Estimate the variance of the noise.

norm([squared, method_integration, ...])

Norm of each observation of the data.

normalize(**kwargs)

Normalize the data.

rescale([weights, method_integration, ...])

Rescale the data.

smooth([points, method, bandwidth, penalty])

Smooth the data.

standardize([center])

Standardize the data.

to_basis([points, method, penalty])

Convert the data to basis format.

to_long([reindex])

Convert the data to long format.

center(mean=None, method_smoothing='LP', **kwargs)[source]#

Center the data.

Parameters:
Returns:

The centered version of the data.

Return type:

IrregularFunctionalData

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.add_noise_and_sparsify(0.01, 0.95)
>>> kl.sparse_data.center(smooth=True)
Functional data object with 10 observations on a 1-dimensional support.
static concatenate(*fdata)[source]#

Concatenate IrregularFunctionalData objects.

Parameters:

fdata (IrregularFunctionalData) – Functional data to concatenate.

Returns:

The concatenated objects.

Return type:

IrregularFunctionalData

covariance(points=None, method_smoothing='LP', center=True, smooth=True, kwargs_center={}, **kwargs)[source]#

Compute an estimate of the covariance function.

This function computes an estimate of the covariance surface of a IrregularFunctionalData object. As the curves are not sampled on a common grid, we consider the method in [8].

Parameters:
  • points (DenseArgvals | None) – The sampling points at which the covariance is estimated. If None, the concatenation of the IrregularArgvals of the IrregularFunctionalData is used.

  • method_smoothing (str) – The method to used for the smoothing of the mean. If ‘PS’, the method is P-splines [4]. If ‘LP’, the method is local polynomials [2].

  • center (bool) – Should the data be centered before computing the covariance.

  • smooth (bool) – Should the covariance be smoothed.

  • kwargs_center (Dict[str, object]) – Keyword arguments to be passed to the function FunctionalData.center().

  • kwargs – Other keyword arguments are passed to the following function: FunctionalData._smooth_covariance().

Returns:

An estimate of the covariance as a two-dimensional DenseFunctionalData object.

Return type:

DenseFunctionalData

Raises:

NotImplementedError – Not implement for higher-dimensional data.

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=100)
>>> kl.sparsify(percentage=0.5, epsilon=0.05)
>>> kl.sparse_data.covariance()
Functional data object with 1 observations on a 2-dimensional support.
inner_product(method_integration='trapz', method_smoothing='LP', noise_variance=None, **kwargs)[source]#

Compute the inner product matrix of the data.

The inner product matrix is a n_obs by n_obs matrix where each entry is defined as

\[\langle x, y \rangle = \int_{\mathcal{T}} x(t)y(t)dt, t \in \mathcal{T},\]

where \(\mathcal{T}\) is a one- or multi-dimensional domain [1].

Parameters:
  • method_integration (str) – The method used to integrated.

  • method_smoothing (str) – Should the mean be smoothed?

  • noise_variance (float | None) – An estimation of the variance of the noise. If None, an estimation is computed using the methodology in [5].

  • kwargs – Other keyword arguments are passed to the following function: IrregularFunctionalData.center().

Returns:

Inner product matrix of the data.

Return type:

npt.NDArray[np.float64], shape=(n_obs, n_obs)

Raises:

NotImplementedError – Not implement for higher-dimensional data.

Examples

For one-dimensional functional data:

>>> kl = KarhunenLoeve(
...     basis_name='bsplines', n_functions=5, random_state=5
... )
>>> kl.new(n_obs=3)
>>> kl.sparsify(percentage=0.8, epsilon=0.05)
>>> kl.sparse_data.inner_product(noise_variance=0)
array([
    [ 0.15749721,  0.01983093, -0.09607059],
    [ 0.01983093,  0.17937531, -0.24773228],
    [-0.09607059, -0.24773228,  0.41648575]
])
mean(points=None, method_smoothing='LP', approx=True, **kwargs)[source]#

Compute an estimate of the mean.

This function computes an estimate of the mean curve of a IrregularFunctionalData object. The curves are not sampled on a common grid. We implement the methodology from [2].

Parameters:
  • points (DenseArgvals | None) – The sampling points at which the mean is estimated. If None, the concatenation of the argvals of the IrregularFunctionalData is used.

  • method_smoothing (str) – The method to used for the smoothing. If ‘PS’, the method is P-splines [4]. If ‘LP’, the method is local polynomials [2].

  • approx (bool) – Approximation of the estimation.

  • kwargs – Other keyword arguments are passed to the following function: IrregularFunctionalData.smooth().

Returns:

An estimate of the mean as a DenseFunctionalData object.

Return type:

DenseFunctionalData

Examples

For one-dimensional functional data:

>>> argvals = IrregularArgvals({
...     0: DenseArgvals({'input_dim_0': np.array([0, 1, 2, 3, 4])}),
...     1: DenseArgvals({'input_dim_0': np.array([0, 2, 4])}),
...     2: DenseArgvals({'input_dim_0': np.array([2, 4])})
... })
>>> values = IrregularValues({
...     0: np.array([1, 2, 3, 4, 5]),
...     1: np.array([2, 5, 6]),
...     2: np.array([4, 7])
... })
>>> fdata = IrregularFunctionalData(argvals, values)
>>> fdata.mean()
Functional data object with 1 observations on a 1-dimensional support.
noise_variance(order=2)[source]#

Estimate the variance of the noise.

This function estimates the variance of the noise. The noise is estimated for each individual curve using the methodology in [3]. As the curves are assumed to be generated by the same process, the estimation of the variance of the noise is the mean over the set of curves.

Parameters:

order (int) – Order of the difference sequence. The order has to be between 1 and 10. See [3] for more information.

Returns:

The estimation of the variance of the noise.

Return type:

float

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=100)
>>> kl.sparsify(0.5)
>>> kl.sparse_data.noise_variance(order=2)
0.006671248206782777
norm(squared=False, method_integration='trapz', use_argvals_stand=False)[source]#

Norm of each observation of the data.

For each observation in the data, it computes its norm defined in [6] as

\[\| X \| = \left\{\int_{\mathcal{T}} X(t)^2dt\right\}^{\frac12}.\]
Parameters:
  • squared (bool) – If True, the function calculates the squared norm, otherwise the result is not squared.

  • method_integration (str) – The method used to integrated.

  • use_argvals_stand (bool) – Use standardized argvals to compute the normalization of the data.

Returns:

The norm of each observations.

Return type:

npt.NDArray[np.float64], shape=(n_obs,)

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.sparsify(percentage=0.5, epsilon=0.05)
>>> kl.sparse_data.norm()
array([
    0.53419879, 0.40750272, 0.67092435, 0.26762124, 0.27425138,
    0.37419987, 0.65775515, 0.54579643, 0.25830787, 0.49324345
])
normalize(**kwargs)[source]#

Normalize the data.

The normalization is performed by divising each functional datum \(X\) by its norm \(\| X \|\). It results in

\[\widetilde{X} = \frac{X}{\| X \|}.\]
Parameters:

kwargs – Other keyword arguments are passed to the following function: IrregularFunctionalData.norm().

Returns:

The normalized data.

Return type:

IrregularFunctionalData

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.sparsify(percentage=0.5, epsilon=0.05)
>>> kl.sparse_data.normalize()
Functional data object with 10 observations on a 1-dimensional support.
rescale(weights=0.0, method_integration='trapz', method_smoothing='LP', use_argvals_stand=False, **kwargs)[source]#

Rescale the data.

The rescaling is performed by first centering the data and then multiplying with a common weight:

\[\widetilde{X}(t) = w\{X(t) - \mu(t)\}.\]

The weights are defined in [6].

Parameters:
  • weights (float) – The weights used to normalize the data. If weights = 0.0, the weights are estimated by integrating the variance function [3].

  • method_integration (str) – The method used to integrated.

  • use_argvals_stand (bool) – Use standardized argvals to compute the normalization of the data.

  • kwargs – Other keyword arguments are passed to the following function: IrregularFunctionalData.smooth().

  • method_smoothing (str)

Returns:

The rescaled data and the weight.

Return type:

Tuple[IrregularFunctionalData, float]

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.sparsify(percentage=0.5, epsilon=0.05)
>>> kl.sparse_data.normalize()
(Functional data object with 10 observations on a 1-dimensional
support., DenseValues(0.16802008))
smooth(points=None, method='PS', bandwidth=None, penalty=None, **kwargs)[source]#

Smooth the data.

This function smooths each curves individually. Based on [2], it fits a local polynomial smoother to the data. Based on [4], it fits P-splines to the data.

Parameters:
  • points (DenseArgvals | None) – Points at which the curves are estimated. The default is None, meaning we use the argvals as estimation points.

  • method (str) – The method to used for the smoothing. If ‘PS’, the method is P-splines [4]. If ‘LP’, the method is local polynomials [2]. Otherwise, it raises an error.

  • bandwidth (float | None) – Strictly positive. Control the size of the associated neighborhood. If bandwidth=None, it is assumed that the curves are twice differentiable and the bandwidth is set to \(n^{-1/5}\) [7] where \(n\) is the number of sampling points per curve. Be careful with the results if the curves are not sampled on \([0, 1]\).

  • penalty (float | None) – Strictly positive. Penalty used in the P-splined fitting of the data.

  • kwargs – Other keyword arguments are passed to one of the following functions: preprocessing.smoothing.PSplines() (method='PS') and preprocessing.smoothing.LocalPolynomial() (method='LP').

Returns:

Smoothed data.

Return type:

DenseFunctionalData

Examples

For one-dimensional functional data:

>>> argvals = IrregularArgvals({
...     0: DenseArgvals({'input_dim_0': np.array([0, 1, 2, 3, 4])}),
...     1: DenseArgvals({'input_dim_0': np.array([0, 2, 4])}),
...     2: DenseArgvals({'input_dim_0': np.array([2, 4])})
... })
>>> values = IrregularValues({
...     0: np.array([1, 2, 3, 4, 5]),
...     1: np.array([2, 5, 6]),
...     2: np.array([4, 7])
... })
>>> fdata = IrregularFunctionalData(argvals, values)
>>> fdata.smooth()
Functional data object with 3 observations on a 1-dimensional support.
standardize(center=True, **kwargs)[source]#

Standardize the data.

The standardization is performed by first centering the data and then dividing by the standard deviation curve [3]. It results in

\[\widetilde{X}(t) = C(t, t)^{-\frac12}\{X(t) - \mu(t)\}, \quad t \in \mathcal{T}.\]
Parameters:
Returns:

The standardized data.

Return type:

IrregularFunctionalData

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.sparsify(percentage=0.5, epsilon=0.05)
>>> kl.sparse_data.standardize()
Functional data object with 10 observations on a 1-dimensional support.
to_basis(points=None, method='PS', penalty=None, **kwargs)[source]#

Convert the data to basis format.

This function transforms a IrregularFunctionalData object into a BasisFunctionalData object using method.

Parameters:
  • points (DenseArgvals | None) – The argvals of the basis.

  • method (str) – The method to get the coefficients.

  • penalty (float | None) – Strictly positive. Penalty used in the P-splined fitting of the data.

  • kwargs – Other keyword arguments are passed to the function: preprocessing.smoothing.PSplines()

Returns:

The expanded data.

Return type:

BasisFunctionalData

to_long(reindex=False)[source]#

Convert the data to long format.

This function transform a IrregularFunctionalData object into pandas DataFrame. It uses the long format to represent the IrregularFunctionalData object as a dataframe. This is a helper function as it might be easier for some computation, e.g., smoothing of the mean and covariance functions to have a long format.

Parameters:

reindex (bool) – Should the observations be reindexed?

Returns:

The data in a long format.

Return type:

pd.DataFrame

Examples

For one-dimensional functional data:

>>> argvals = IrregularArgvals({
...     0: DenseArgvals({'input_dim_0': np.array([0, 1, 2, 3, 4])}),
...     1: DenseArgvals({'input_dim_0': np.array([0, 2, 4])}),
...     2: DenseArgvals({'input_dim_0': np.array([2, 4])})
... })
>>> values = IrregularValues({
...     0: np.array([1, 2, 3, 4, 5]),
...     1: np.array([2, 5, 6]),
...     2: np.array([4, 7])
... })
>>> fdata = IrregularFunctionalData(argvals, values)
>>> fdata.to_long()
   input_dim_0  id  values
0            0   0       1
1            1   0       2
2            2   0       3
3            3   0       4
4            4   0       5
5            0   1       2
6            2   1       5
7            4   1       6
8            2   2       4
9            4   2       7