1. An Introduction to Functional Data Analysis (FDA)
Rene Essomba, Sugnet Lubbe
Department of Statistical Sciences, University of Cape Town
franckess48@gmail.com
November 2013
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 1 / 20
2. Break-Down
To represent the data in ways that aid further analysis.
To display the data so as to highlight various characteristics.
To study important sources of pattern and variation among the data.
To explain variation in dependent variable by using independent
variable information.
To compare two or more sets of data with respect to certain types of
variation.
For illustration, the R-packages fda and fda.usc will be used.
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 2 / 20
3. Overview
1 Introduction
2 Basis Representation
Fourier Basis
B-Splines
3 Summary Statistics for functional data
Functional means and variances
Covariance and Correlation functions
4 Functional Principal Component Analysis (fPCA)
5 Functional Linear Regression Model (fLRM)
6 Some References
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 3 / 20
4. Introduction
The Main Equation
Zk(ti ) = X(ti ) + (ti ) for i = 1, . . . , n & k = 1, . . . , N
Zk(ti ) is the noisy observation from the k-th cluster.
X(ti ) is the value of a continuous underlying process.
(ti ) is the error term.
N.B.: N denotes the number of observed curves on a discrete grid
(ti , i = 1, . . . , n)
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 4 / 20
5. Introduction
Example: The Canadian Weather (temperatures and precipitations)
daily observations (i.e. Zk(ti ));
35 different weather stations (i.e. k = 1, . . . , 35);
observed at time ti = 0.5, . . . , 364.5.
Therefore, our observed pairs will be (ti , Zk(ti )).
Plot of the raw data for the station located in Saint Johns & Halifax.
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 5 / 20
6. Basis Representation
Example: The Canadian Weather (continued)
X(ti ) continuous process observed at 365 discrete observations.
Finding a linear combination of K basis functions (0 < K < 365)
X(t) ≈
K
k=1
θkφk(t) with φk(t) as basis functions and θk as the
coefficients.
Types of basis functions:
Fourier Basis
B-Splines
Remark: The optimal number of basis functions is determined by using a
generalized cross validation criterion (GCV).
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 6 / 20
7. Fourier Basis
Definition
Useful for periodic data, Fourier basis expansion is composed by the
following orthonormal functions:
φo(t) = 1/
√
T, φ2r−1(t) =
sin(rωt)
T/2
and φ2r (t) =
cos(rωt)
T/2
,
with r = 1, ..., L/2 where L is an even integer. The period T is by default
the range of discretization points t and ω = 2π/T.
In R: create.fourier.basis(rangeval, nbasis,...) (fda package).
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 7 / 20
8. Fourier Basis
Figure : Fourier Basis plot with 7 basis functions
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 8 / 20
9. B-Splines
Definition
Appropriate for non-periodic data.
Selecting a series of knots along the t-axis τ1 < τ2 < ... < τL+2M
where M is the order of the spline;
φk,m(t) = t−τk
τk+m−1−τk
φk,m−1(t) + τk+m−t
τk+m−τk+1
φk+1,m−1(t) for
k = 1, ..., L + 2M − m and φk,1(t) = I[τk ;τk+1](t).
In R: create.bspline.basis(rangeval,nbasis,norder,...) (fda package)
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 9 / 20
10. B-Splines
Figure : B-Splines of order 4
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 10 / 20
12. Summary Statistics
The usual tools used for summarizing data in an univariate context remain
the same for functional data
Definition
functional mean: ¯X(t) = N−1
N
i=1
Xi (t).
functional variance: Var(X(t)) = (N − 1)−1
N
i=1
(Xi (t) − ¯X(t))2.
functional covariance:
Cov(X(t), X(s)) = (N − 1)−1
N
i=1
(Xi (s) − ¯X(s))(Xi (t) − ¯X(t)) .
In R, mean.fd & var.fd (fda package).
Remark: The values returned will also be objects of class fd & fdata.
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 12 / 20
13. Mean Function and Standard Deviation
Figure : Mean temperature and standard deviation
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 13 / 20
14. Correlation Function
Figure : Temperature Correlation Function
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 14 / 20
15. Functional Principal Component Analysis (f PCA)
Primarily used as a tool for dimension reduction, it is designed to explain
the source of variation within the functional data created.
Algorithm
1 Find the function ξ1(t) of norm 1 (i.e. ξ2
1(t)dt = 1) such that
N−1
i f 2
i1 is maximized with fi1 = ξ1(t)Xc
i (t)dt.
2 On the mth step (m > 1), compute ξm(t) with the orthogonality
constraint(s): ξm(t)ξk(t)dt = 0, for k < m.
The functional data will therefore be: ˆXi (t) = M
k=1 fik
ˆξk(t) where
fik = ξk(t)Xc
i (t)dt with Xc
i = Xi (t) − ¯X(t).
f PCA in R: fdata2pc(fdataobj, ncomp,...) (fda.usc package)
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 15 / 20
16. Functional Principal Component Analysis (f PCA)
Example (Canadian Weather)
R> temp.svd <- fdata2pc(tempdat.fdata, ncomp=3)
R> norm.fdata(temp.svd$rotation[1:2])
[,1]
[1,] 0.9976567
[2,] 0.9980333
# With 3 components that explained 98.56% of the
variability of explicative variables.
# Variability for each component (%): PC1 88.03 PC2 8.47
PC3 2.06
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 16 / 20
17. Functional Principal Component Analysis (f PCA)
Figure : Loadings for PC1 & PC2
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 17 / 20
18. Functional Linear Regression Model (f LRM)
Consider the following functional linear regression models:
Functional response with multivariate covariates:
yi (t) = β1(t)xi1 + · · · + βp(t)xip + i (t); i = 1, . . . , N
Scalar response with functional covariates:
yi = α +
T
0
p
j=1
βj (s)xij (s)ds + i ; i = 1, . . . , N; s ∈ [0, T].
Functional response with functional covariates:
yi (t) = α(t) +
T
0
p
j=1
βj (t, s)xij (s)ds + i (t); i = 1, . . . , N; s ∈ [0, T].
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 18 / 20
19. Useful References
M. Febrero-Bande, M. O. De La Fuente (2012)
Statistical Computing in Functional Data Analysis: The R Package fda.usc.
J. O. Ramsay, G. Hooker and S. Graves (2009)
Functional Data Analysis in R and Matlab.
T. Hastie, R. Tibshirani, J. Friedman (2009)
The Elements of Statistical Learning.
J. O. Ramsay and B. W. Silverman (2005)
Functional Data Analysis
Carl de Boor (1978),
A practical guide to splines, Springer-Verlag, New York Heidelberg Berlin.
(Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 19 / 20
20. (Rene Essomba & Sugnet Lubbe) Functional Data Analysis November 2013 20 / 20