DenseFunctionalData#

class FDApy.representation.DenseFunctionalData(argvals, values)[source]#

Represent densely sampled functional data.

A class used to define dense functional data. We denote by \(n\), the number of observations and by \(p\), the number of input dimensions. Here, we are in the case of univariate functional data, and so the output dimension will be \(\mathbb{R}\). We note by \(X\) an observation, while we use \(X_1, \dots, X_n\) if we refer to a particular set of observations. The observations are defined as:

\[X(t): \mathcal{T} \longrightarrow \mathbb{R},\]

where \(\mathcal{T} \subset \mathbb{R}^p\). We denote the mean function by

\[\mu(t): \mathcal{T} \longrightarrow \mathbb{R},\]

and the covariance function by:

\[C(s, t): \mathcal{T} \times \mathcal{T} \longrightarrow \mathbb{R}.\]

We also note \(\mathbf{M}\) the Gram matrix of the set of observations.

Parameters:
  • argvals (DenseArgvals) – The sampling points of the functional data. Each entry of the dictionary represents an input dimension. The shape of the \(j\) th dimension is \((m_j,)\) for \(0 \leq j \leq p\).

  • values (DenseValues) – The values of the functional data. The shape of the array is \((n, m_1, \dots, m_p)\).

Attributes:
  • argvals_stand (DenseArgvals) – Standardized sampling points of the functional data.

  • n_obs (int) – Number of observations of the functional data.

  • n_dimension (int) – Number of input dimension of the functional data.

  • n_points (Tuple[int, …]) – Number of sampling points.

Examples

For 1-dimensional dense data:

>>> argvals = DenseArgvals({'input_dim_0': np.array([1, 2, 3, 4, 5])})
>>> values = DenseValues(np.array([
...     [1, 2, 3, 4, 5],
...     [6, 7, 8, 9, 10],
...     [11, 12, 13, 14, 15]
... ]))
>>> DenseFunctionalData(argvals, values)

For 2-dimensional dense data:

>>> argvals = DenseArgvals({
...     'input_dim_0': np.array([1, 2, 3, 4]),
...     'input_dim_1': np.array([5, 6, 7])
... })
>>> values = DenseValues(np.array([
...     [[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]],
...     [[5, 6, 7], [5, 6, 7], [5, 6, 7], [5, 6, 7]],
...     [[3, 4, 5], [3, 4, 5], [3, 4, 5], [3, 4, 5]]
... ]))
>>> DenseFunctionalData(argvals, values)

References

Methods

center([mean, method_smoothing])

Center the data.

concatenate(*fdata)

Concatenate DenseFunctional objects.

covariance([points, method_smoothing, ...])

Compute an estimate of the covariance function.

inner_product([method_integration, ...])

Compute the inner product matrix of the data.

mean([points, method_smoothing])

Compute an estimate of the mean.

noise_variance([order])

Estimate the variance of the noise.

norm([squared, method_integration, ...])

Norm of each observation of the data.

normalize(**kwargs)

Normalize the data.

rescale([weights, method_integration, ...])

Rescale the data.

smooth([points, method, bandwidth, penalty])

Smooth the data.

standardize([center])

Standardize the data.

to_basis([points, method, penalty])

Convert the data to basis format.

to_long([reindex])

Convert the data to long format.

center(mean=None, method_smoothing=None, **kwargs)[source]#

Center the data.

The centering is done by estimating the mean from the data and then substracting it to the data. It results in

\[\widetilde{X}(t) = X(t) - \mu(t).\]
Parameters:
  • mean (DenseFunctionalData | None) – A precomputed mean as a DenseFunctionalData object.

  • method_smoothing (str | None) – The method to used for the smoothing of the mean. If ‘None’, no smoothing is performed. If ‘PS’, the method is P-splines [4]. If ‘LP’, the method is local polynomials [2].

  • kwargs – Other keyword arguments are passed to one of the following functions: DenseFunctionalData.mean() (mean=None) and DenseFunctionalData.smooth().

Returns:

The centered version of the data.

Return type:

DenseFunctionalData

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.data.center(smooth=True)
Functional data object with 10 observations on a 1-dimensional support.
static concatenate(*fdata)[source]#

Concatenate DenseFunctional objects.

Parameters:

fdata (DenseFunctionalData) – Functional data to concatenate.

Returns:

The concatenated object.

Return type:

DenseFunctionalData

covariance(points=None, method_smoothing=None, center=True, kwargs_center={}, **kwargs)[source]#

Compute an estimate of the covariance function.

This function computes an estimate of the covariance surface of a DenseFunctionalData object. As the curves are sampled on a common grid, we consider the sample covariance [7].

Parameters:
  • points (DenseArgvals | None) – The sampling points at which the covariance is estimated. If None, the DenseArgvals of the DenseFunctionalData is used. If smooth is False, the DenseArgvals of the DenseFunctionalData is used.

  • method_smoothing (str | None) – The method to used for the smoothing of the mean. If ‘None’, no smoothing is performed. If ‘PS’, the method is P-splines [4]. If ‘LP’, the method is local polynomials [2].

  • center (bool) – Should the data be centered before computing the covariance.

  • kwargs_center (Dict[str, object]) – Keyword arguments to be passed to the function FunctionalData.center().

  • kwargs – Other keyword arguments are passed to the following function: functional_data._smooth_covariance().

Returns:

An estimate of the covariance as a two-dimensional DenseFunctionalData object.

Return type:

DenseFunctionalData

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=100)
>>> kl.add_noise(0.01)
>>> kl.noisy_data.covariance(smooth=True)
Functional data object with 1 observations on a 2-dimensional support.
inner_product(method_integration='trapz', method_smoothing=None, noise_variance=None, **kwargs)[source]#

Compute the inner product matrix of the data.

The inner product matrix is a n_obs by n_obs matrix where each entry is defined as

\[\langle x, y \rangle = \int_{\mathcal{T}} x(t)y(t)dt, t \in \mathcal{T},\]

where \(\mathcal{T}\) is a one- or multi-dimensional domain [1].

Parameters:
  • method_integration (str) – The method used to integrated.

  • method_smoothing (str | None) – The method to used for the smoothing of the mean. If ‘None’, no smoothing is performed. If ‘PS’, the method is P-splines [4]. If ‘LP’, the method is local polynomials [2].

  • noise_variance (float | None) – An estimation of the variance of the noise. If None, an estimation is computed using the methodology in [5].

  • kwargs – Other keyword arguments are passed to the following function: DenseFunctionalData.center().

Returns:

Inner product matrix of the data.

Return type:

npt.NDArray[np.float64], shape=(n_obs, n_obs)

Examples

For one-dimensional functional data:

>>> kl = KarhunenLoeve(
...     basis_name='bsplines', n_functions=5, random_state=42
... )
>>> kl.new(n_obs=3)
>>> kl.data.inner_product(noise_variance=0)
array([
    [ 0.16288536,  0.01958865, -0.10017322],
    [ 0.01958865,  0.17701988, -0.2459348 ],
    [-0.10017322, -0.2459348 ,  0.42008035]
])

For two-dimensional functional data:

>>> kl = KarhunenLoeve(
...     basis_name='bsplines', dimension='2D', n_functions=5,
...     random_state=42, argvals=np.linspace(0, 1, 11)
... )
>>> kl.new(n_obs=3)
>>> kl.data.inner_product(noise_variance=0)
array([
    [ 0.01669878,  0.00349892, -0.00817676],
    [ 0.00349892,  0.03208174, -0.03777796],
    [-0.00817676, -0.03777796,  0.05083159]
])
mean(points=None, method_smoothing=None, **kwargs)[source]#

Compute an estimate of the mean.

This function computes an estimate of the mean curve of a DenseFunctionalData object. As the curves are sampled on a common grid, we consider the sample mean, as defined in [7]. The sampled mean is rate optimal [2]. We included some smoothing using Local Polynonial Estimators [8] or P-Splines [4].

Parameters:
  • points (DenseArgvals | None) – The sampling points at which the mean is estimated. If None, the DenseArgvals of the DenseFunctionalData is used.

  • method_smoothing (str | None) – The method to used for the smoothing. If ‘None’, no smoothing is performed. If ‘PS’, the method is P-splines [4]. If ‘LP’, the method is local polynomials [8].

  • kwargs – Other keyword arguments are passed to the following function DenseFunctionalData.smooth().

Returns:

An estimate of the mean as a DenseFunctionalData object.

Return type:

DenseFunctionalData

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=100)
>>> kl.add_noise(0.01)
>>> kl.noisy_data.mean(smooth=True)
Functional data object with 1 observations on a 1-dimensional support.
noise_variance(order=2)[source]#

Estimate the variance of the noise.

This function estimates the variance of the noise. The noise is estimated for each individual curve using the methodology in [5]. As the curves are assumed to be generated by the same process, the estimation of the variance of the noise is the mean over the set of curves.

Parameters:

order (int) – Order of the difference sequence. The order has to be between 1 and 10. See [5] for more information.

Returns:

The estimation of the variance of the noise.

Return type:

float

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=100)
>>> kl.add_noise(0.05)
>>> kl.noisy_data.noise_variance(order=2)
0.051922438333740877
norm(squared=False, method_integration='trapz', use_argvals_stand=False)[source]#

Norm of each observation of the data.

For each observation in the data, it computes its norm defined in [6] as

\[\| X \| = \left\{\int_{\mathcal{T}} X(t)^2dt\right\}^{\frac12}.\]
Parameters:
  • squared (bool) – If True, the function calculates the squared norm, otherwise it returns the norm.

  • method_integration (str) – The method used to estimate the integral.

  • use_argvals_stand (bool) – Use standardized argvals to compute the normalization of the data.

Returns:

The norm of each observations.

Return type:

npt.NDArray[np.float64], shape=(n_obs,)

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.data.norm()
array([
    0.53253351, 0.42212112, 0.6709846 , 0.26672898, 0.27440755,
    0.37906252, 0.65277413, 0.53998411, 0.2872874 , 0.4934973
])
normalize(**kwargs)[source]#

Normalize the data.

The normalization is performed by divising each functional datum \(X\) by its norm \(\| X \|\). It results in

\[\widetilde{X} = \frac{X}{\| X \|}.\]
Parameters:

kwargs – Other keyword arguments are passed to the following function: DenseFunctionalData.norm().

Returns:

The normalized data.

Return type:

DenseFunctionalData

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.data.normalize()
Functional data object with 10 observations on a 1-dimensional support.
rescale(weights=0.0, method_integration='trapz', use_argvals_stand=False, **kwargs)[source]#

Rescale the data.

The rescaling is performed by first centering the data and then multiplying with a common weight:

\[\widetilde{X}(t) = w\{X(t) - \mu(t)\}.\]

The weights are defined in [6].

Parameters:
  • weights (float) – The weights used to normalize the data. If weights = 0.0, the weights are estimated by integrating the variance function [3].

  • method_integration (str) – The method used to estimate the integral.

  • use_argvals_stand (bool) – Use standardized argvals to compute the normalization of the data.

Returns:

The rescaled data and the weight.

Return type:

Tuple[DenseFunctionalData, float]

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.data.rescale()
Functional data object with 10 observations on a 1-dimensional support.
smooth(points=None, method='PS', bandwidth=None, penalty=None, **kwargs)[source]#

Smooth the data.

This function smooths each curves individually. Based on [2], it fits a local polynomial smoother to the data. Based on [4], it fits P-splines to the data.

Parameters:
  • points (DenseArgvals | None) – Points at which the curves are estimated. The default is None, meaning we use the argvals as estimation points.

  • method (str) – The method to used for the smoothing. If ‘PS’, the method is P-splines [4]. If ‘LP’, the method is local polynomials [2]. Otherwise, it raises an error.

  • bandwidth (float | None) – Strictly positive. Control the size of the associated neighborhood. If bandwidth=None, it is assumed that the curves are twice differentiable and the bandwidth is set to \(n^{-1/5}\) [8] where \(n\) is the number of sampling points per curve. Be careful with the results if the curves are not sampled on \([0, 1]\).

  • penalty (float | None) – Strictly positive. Penalty used in the P-splined fitting of the data.

  • kwargs – Other keyword arguments are passed to one of the following functions preprocessing.smoothing.PSplines() (method='PS') and preprocessing.smoothing.LocalPolynomial() (method='LP').

Returns:

Smoothed data.

Return type:

DenseFunctionalData

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=1)
>>> kl.add_noise(0.05)
>>> kl.noisy_data.smooth()
Functional data object with 1 observations on a 1-dimensional support.
standardize(center=True, **kwargs)[source]#

Standardize the data.

The standardization is performed by first centering the data and then dividing by the standard deviation curve [3]. It results in

\[\widetilde{X}(t) = C(t, t)^{-\frac12}\{X(t) - \mu(t)\}, \quad t \in \mathcal{T}.\]
Parameters:
Returns:

The standardized data.

Return type:

DenseFunctionalData

Examples

>>> kl = KarhunenLoeve(
...     basis_name='bsplines',
...     n_functions=5,
...     random_state=42
... )
>>> kl.new(n_obs=10)
>>> kl.data.standardize()
Functional data object with 10 observations on a 1-dimensional support.
to_basis(points=None, method='PS', penalty=None, **kwargs)[source]#

Convert the data to basis format.

This function transform a DenseFunctionalData object into a BasisFunctionalData object using method.

Parameters:
  • points (DenseArgvals | None) – The argvals of the basis.

  • method (str) – The method to get the coefficients.

  • penalty (float | None) – Strictly positive. Penalty used in the P-splined fitting of the data.

  • kwargs – Other keyword arguments are passed to the function: preprocessing.smoothing.PSplines()

Returns:

The expanded data.

Return type:

BasisFunctionalData

to_long(reindex=False)[source]#

Convert the data to long format.

This function transform a DenseFunctionalData object into pandas DataFrame. It uses the long format to represent the DenseFunctionalData object as a dataframe. This is a helper function as it might be easier for some computation, e.g., smoothing of the mean and covariance functions to have a long format.

Parameters:

reindex (bool) – Not used here.

Returns:

The data in a long format.

Return type:

pd.DataFrame

Examples

>>> argvals = DenseArgvals({'input_dim_0': np.array([1, 2, 3, 4, 5])})
>>> values = DenseValues(np.array([
...     [1, 2, 3, 4, 5],
...     [6, 7, 8, 9, 10],
...     [11, 12, 13, 14, 15]
... ]))
>>> fdata = DenseFunctionalData(argvals, values)
>>> fdata.to_long()
    input_dim_0  id  values
0             1   0       1
1             2   0       2
2             3   0       3
3             4   0       4
4             5   0       5
5             1   1       6
6             2   1       7
7             3   1       8
8             4   1       9
9             5   1      10
10            1   2      11
11            2   2      12
12            3   2      13
13            4   2      14
14            5   2      15