Introduction to FDA and linear models
Nathalie Villa-Vialaneix - nathalie.villa@math.univ-toulouse.fr
http://www.nathalievilla.org
Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de Perpignan
France
La Havane, September 15th, 2008
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 1 / 37
Table of contents
1 Motivations
2 Functional Principal Component Analysis
3 Functional linear regression models
4 References
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 2 / 37
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomena
on a discrete sampling grid
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomena
on a discrete sampling grid
Example 1: Regression case 1
100 wavelengths
Find the fat content of peaces of meat from their absorbence spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomena
on a discrete sampling grid
Example 2: Regression case 2
1 049 wavelengths
Find the disease content in wheat from its absorbence spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomena
on a discrete sampling grid
Example 3: Classification case 1
Recognize one of the five phonemes from its log-periodograms (256
wavelengths).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomena
on a discrete sampling grid
Example 4: Classification case 2
Recognize one of two words from its record in frequency domain (8 192
time steps).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomena
on a discrete sampling grid
Example 5: Regression on functional data
[Azaïs et al., 2008]
Estimate a typical load curve (electricity consumption) from economic
multivariate variables.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
What is Functional Data Analysis (FDA)?
FDA deals with data that are measurements of continuous phenomena
on a discrete sampling grid
Example 6: Curves clustering
[Bensmail et al., 2005]
Create a typology of sick cells from their “SELDI mass” spectra.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
Specific issues of learning with FDA
High dimensional data (the number of discretization point is often
bigger or much more bigger than the number of observations);
Highly correlated data (because of the functional structure
underlined, the values at two sampling points are correlated).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 4 / 37
Specific issues of learning with FDA
High dimensional data (the number of discretization point is often
bigger or much more bigger than the number of observations);
Highly correlated data (because of the functional structure
underlined, the values at two sampling points are correlated).
Consequences: Direct use of classical statistical methods on the
discretization leads to ill-posed problems and provides inaccurate
solutions.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 4 / 37
Theoretical model
A functional random variable is a random variable X taking its values in
a functional space X where X can be:
a (infinite dimensional) Hilbert space with inner product ., . X;
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 5 / 37
Theoretical model
A functional random variable is a random variable X taking its values in
a functional space X where X can be:
a (infinite dimensional) Hilbert space with inner product ., . X;
in particular, L2
is often used;
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 5 / 37
Theoretical model
A functional random variable is a random variable X taking its values in
a functional space X where X can be:
a (infinite dimensional) Hilbert space with inner product ., . X;
in particular, L2
is often used;
any (infinite dimensional) Banach space with norm . X (less usual);
for example, C0
.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 5 / 37
Hilbertian context
In the hilbertian context, we are able to define
the expectation of X as (Theorem of Riesz) the unique element
E (X) of X such that
∀ u ∈ X, E (X) , u X = E ( X, u X) ,
1
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
Hilbertian context
In the hilbertian context, we are able to define
the expectation of X as (Theorem of Riesz) the unique element
E (X) of X such that
∀ u ∈ X, E (X) , u X = E ( X, u X) ,
for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator
u1 ⊗ u2 : v ∈ X → u1, v Xu2 ∈ X,
1
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
Hilbertian context
In the hilbertian context, we are able to define
the expectation of X as (Theorem of Riesz) the unique element
E (X) of X such that
∀ u ∈ X, E (X) , u X = E ( X, u X) ,
for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator
u1 ⊗ u2 : v ∈ X → u1, v Xu2 ∈ X,
as (X − E (X)) ⊗ (X − E (X)) is an element of the Hilbert Space of
Hilbert-Schmidt operators from X to X, HS(X)1
1
This Hilbert space is equipped with the inner product
∀ g1, g2 ∈ HS(X), g1, g2 HS(X) = i g1ei, g2ei X where (ei)i is any orthonormal basis of X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
Hilbertian context
In the hilbertian context, we are able to define
the expectation of X as (Theorem of Riesz) the unique element
E (X) of X such that
∀ u ∈ X, E (X) , u X = E ( X, u X) ,
for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator
u1 ⊗ u2 : v ∈ X → u1, v Xu2 ∈ X,
as (X − E (X)) ⊗ (X − E (X)) is an element of the Hilbert Space of
Hilbert-Schmidt operators from X to X, HS(X)1
the variance of X
as the linear operator ΓX
ΓX = E ((X − E (X)) ⊗ (X − E (X))) : u ∈ X →
E ( X − E (X) , u X(X − E (X))) .
1
This Hilbert space is equipped with the inner product
∀ g1, g2 ∈ HS(X), g1, g2 HS(X) = i g1ei, g2ei X where (ei)i is any orthonormal basis of X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) = X(t)dPX ,
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) = X(t)dPX ,
variance: for all t, t ∈ [0, 1],
ΓX γ(t, t ) = E (X(t)X(t ))
(if E (X) = 0 for clarity reasons)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) = X(t)dPX ,
variance: for all t, t ∈ [0, 1],
ΓX γ(t, t ) = E (X(t)X(t ))
(if E (X) = 0 for clarity reasons) because:
1 for all t ∈ [0, 1], we can define Γt
X
: u ∈ X → (ΓX u)(t) ∈ R,
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) = X(t)dPX ,
variance: for all t, t ∈ [0, 1],
ΓX γ(t, t ) = E (X(t)X(t ))
(if E (X) = 0 for clarity reasons) because:
1 for all t ∈ [0, 1], we can define Γt
X
: u ∈ X → (ΓX u)(t) ∈ R,
2 By Riesz’s Theorem, it exists ζt
∈ X such that ∀ u ∈ X, Γt
X
u = ζt
, u X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) = X(t)dPX ,
variance: for all t, t ∈ [0, 1],
ΓX γ(t, t ) = E (X(t)X(t ))
(if E (X) = 0 for clarity reasons) because:
1 for all t ∈ [0, 1], we can define Γt
X
: u ∈ X → (ΓX u)(t) ∈ R,
2 By Riesz’s Theorem, it exists ζt
∈ X such that ∀ u ∈ X, Γt
X
u = ζt
, u X.
As,
(ΓX u)(t) = E ( X, u XX(t))
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) = X(t)dPX ,
variance: for all t, t ∈ [0, 1],
ΓX γ(t, t ) = E (X(t)X(t ))
(if E (X) = 0 for clarity reasons) because:
1 for all t ∈ [0, 1], we can define Γt
X
: u ∈ X → (ΓX u)(t) ∈ R,
2 By Riesz’s Theorem, it exists ζt
∈ X such that ∀ u ∈ X, Γt
X
u = ζt
, u X.
As,
(ΓX u)(t) = E ( X(t)X, u X)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) = X(t)dPX ,
variance: for all t, t ∈ [0, 1],
ΓX γ(t, t ) = E (X(t)X(t ))
(if E (X) = 0 for clarity reasons) because:
1 for all t ∈ [0, 1], we can define Γt
X
: u ∈ X → (ΓX u)(t) ∈ R,
2 By Riesz’s Theorem, it exists ζt
∈ X such that ∀ u ∈ X, Γt
X
u = ζt
, u X.
As,
(ΓX u)(t) = E (X(t)X), u X,
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) = X(t)dPX ,
variance: for all t, t ∈ [0, 1],
ΓX γ(t, t ) = E (X(t)X(t ))
(if E (X) = 0 for clarity reasons) because:
1 for all t ∈ [0, 1], we can define Γt
X
: u ∈ X → (ΓX u)(t) ∈ R,
2 By Riesz’s Theorem, it exists ζt
∈ X such that ∀ u ∈ X, Γt
X
u = ζt
, u X.
As,
(ΓX u)(t) = E (X(t)X), u X,
we have that ζt
= E (X(t)X) .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Case X = L2
([0, 1])
if X = L2
([0, 1]), this expressions simplify in:
norm: X 2
= [0,1]
(X(t))2
dt < +∞;
expectation: for all t ∈ [0, 1],
E (X) (t) = E (X(t)) = X(t)dPX ,
variance: for all t, t ∈ [0, 1],
ΓX γ(t, t ) = E (X(t)X(t ))
(if E (X) = 0 for clarity reasons) because:
1 for all t ∈ [0, 1], we can define Γt
X
: u ∈ X → (ΓX u)(t) ∈ R,
2 By Riesz’s Theorem, it exists ζt
∈ X such that ∀ u ∈ X, Γt
X
u = ζt
, u X.
As,
(ΓX u)(t) = E (X(t)X), u X,
we have that ζt
= E (X(t)X) .
3 We define γ : (t, t ) ∈ [0, 1]2
→ ζt
(t ) = E (X(t)X(t ))
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
Properties of ΓX
ΓX is Hilbert-Schmidt:
(definition) i ΓX ei
2
< +∞;
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
Properties of ΓX
ΓX is Hilbert-Schmidt:
(definition) i ΓX ei
2
< +∞;
it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi
(for all i ≤ 1).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
Properties of ΓX
ΓX is Hilbert-Schmidt:
(definition) i ΓX ei
2
< +∞;
it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi
(for all i ≤ 1). This eigensystem is such that:
λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i,
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
Properties of ΓX
ΓX is Hilbert-Schmidt:
(definition) i ΓX ei
2
< +∞;
it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi
(for all i ≤ 1). This eigensystem is such that:
λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i,
Karhunen-Loeve decomposition of ΓX
ΓX =
i≥1
λivi ⊗ vi.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
Properties of ΓX
ΓX is Hilbert-Schmidt:
(definition) i ΓX ei
2
< +∞;
it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi
(for all i ≤ 1). This eigensystem is such that:
λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i,
Karhunen-Loeve decomposition of ΓX
ΓX =
i≥1
λivi ⊗ vi.
ΓX has no inverse in the space of continuous operator from X to X, if
X has infinite dimension.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
Properties of ΓX
ΓX is Hilbert-Schmidt:
(definition) i ΓX ei
2
< +∞;
it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi
(for all i ≤ 1). This eigensystem is such that:
λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i,
Karhunen-Loeve decomposition of ΓX
ΓX =
i≥1
λivi ⊗ vi.
ΓX has no inverse in the space of continuous operator from X to X, if
X has infinite dimension. More precisely, if λi > 0 for all i ≥ 1,
i≥1
1
λi
= +∞.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
Model of the observed data
We focus on:
1 a regression problem: Y ∈ R has to be predicted from X ∈ X,
2 OR a (binary) classification problem: Y ∈ {−1, 1} has to be
predicted from X ∈ X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 9 / 37
Model of the observed data
We focus on:
1 a regression problem: Y ∈ R has to be predicted from X ∈ X,
2 OR a (binary) classification problem: Y ∈ {−1, 1} has to be
predicted from X ∈ X.
Learning set - Version 1
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 9 / 37
Model of the observed data
We focus on:
1 a regression problem: Y ∈ R has to be predicted from X ∈ X,
2 OR a (binary) classification problem: Y ∈ {−1, 1} has to be
predicted from X ∈ X.
Learning set - Version 1
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y).
Remark: if E X 2
X < +∞ (i.e., if ΓX exists), a functional version of the
Central Limit Theorem exists.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 9 / 37
Model of the uncertainty on X
In fact, realizations of X are never observed. Only a (possibly noisy)
discretization of them are given.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
Model of the uncertainty on X
In fact, realizations of X are never observed. Only a (possibly noisy)
discretization of them are given.
Learning set - Version II
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for
all i = 1, . . . , n, xτi
i
= (xi(t))t∈τi is observed, where τi
is a finite set.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
Model of the uncertainty on X
In fact, realizations of X are never observed. Only a (possibly noisy)
discretization of them are given.
Learning set - Version II
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for
all i = 1, . . . , n, xτi
i
= (xi(t))t∈τi is observed, where τi
is a finite set.
Questions:
1 How to obtain (xi)i from (xτi
i
)i?
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
Model of the uncertainty on X
In fact, realizations of X are never observed. Only a (possibly noisy)
discretization of them are given.
Learning set - Version II
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for
all i = 1, . . . , n, xτi
i
= (xi(t))t∈τi is observed, where τi
is a finite set.
Questions:
1 How to obtain (xi)i from (xτi
i
)i?
2 What are the consequences of this uncertainty on the accuracy of
the solution of the regression/classification problem? Can we obtain a
solution that is as good as those obtained from the direct observation
of (xi)i?
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
Noisy data model
Learning set - Version III
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for
all i = 1, . . . , n, xτi
i
= (xi(t) + it )t∈τi is observed, where τi
is a finite set
and it is a centered random variable independant of X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
Noisy data model
Learning set - Version III
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for
all i = 1, . . . , n, xτi
i
= (xi(t) + it )t∈τi is observed, where τi
is a finite set
and it is a centered random variable independant of X.
Again,
1 How to obtain (xi)i from (xτi
i
)i? (works have been done: function
estimation)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
Noisy data model
Learning set - Version III
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for
all i = 1, . . . , n, xτi
i
= (xi(t) + it )t∈τi is observed, where τi
is a finite set
and it is a centered random variable independant of X.
Again,
1 How to obtain (xi)i from (xτi
i
)i? (works have been done: function
estimation)
2 What are the consequences of this uncertainty on the accuracy of
the solution of the regression/classification problem? Can we obtain a
solution that is as good as those obtained from the direct observation
of (xi)i? (relating to “errors-in-variables” problems ; almost no work in
FDA)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
Noisy data model
Learning set - Version III
(x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for
all i = 1, . . . , n, xτi
i
= (xi(t) + it )t∈τi is observed, where τi
is a finite set
and it is a centered random variable independant of X.
Again,
1 How to obtain (xi)i from (xτi
i
)i? (works have been done: function
estimation)
2 What are the consequences of this uncertainty on the accuracy of
the solution of the regression/classification problem? Can we obtain a
solution that is as good as those obtained from the direct observation
of (xi)i? (relating to “errors-in-variables” problems ; almost no work in
FDA)
In these presentations, works coming from the three points of view.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
Table of contents
1 Motivations
2 Functional Principal Component Analysis
3 Functional linear regression models
4 References
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 12 / 37
Multidimensional PCA: context and notations
Data:
A real matrix
X = (x
j
i
)i=1,...,n, j=1,...,p
which is the observation of p variables on n individuals.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
Multidimensional PCA: context and notations
Data:
A real matrix
X = (x
j
i
)i=1,...,n, j=1,...,p
which is the observation of p variables on n individuals.
n rows, each corresponding to the value of p variables for an
individual:
xi = (x1
i , . . . , x
p
i
)T
.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
Multidimensional PCA: context and notations
Data:
A real matrix
X = (x
j
i
)i=1,...,n, j=1,...,p
which is the observation of p variables on n individuals.
n rows, each corresponding to the value of p variables for an
individual:
xi = (x1
i , . . . , x
p
i
)T
.
p columns, each corresponding to n observations of a variable:
xj
= (x
j
1
, . . . , x
j
n)T
.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
Multidimensional PCA: context and notations
Data:
A real matrix
X = (x
j
i
)i=1,...,n, j=1,...,p
which is the observation of p variables on n individuals.
n rows, each corresponding to the value of p variables for an
individual:
xi = (x1
i , . . . , x
p
i
)T
.
p columns, each corresponding to n observations of a variable:
xj
= (x
j
1
, . . . , x
j
n)T
.
Aim: Find linearly independant variables, that are linear combinations of
the original ones, by order of “importance” in X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
First principal component
Suppose (to simplify)
1 Data are centered: for all j = 1, . . . , p, xj = 1
n
n
i=1 x
j
i
= 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1
n XT
X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
First principal component
Suppose (to simplify)
1 Data are centered: for all j = 1, . . . , p, xj = 1
n
n
i=1 x
j
i
= 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1
n XT
X.
Problem: Find a∗ ∈ Rp
such that:
a∗
:= arg max
a: a Rp =1



Var Pa(xi) Rp
i
inertia



.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
First principal component
Suppose (to simplify)
1 Data are centered: for all j = 1, . . . , p, xj = 1
n
n
i=1 x
j
i
= 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1
n XT
X.
Problem: Find a∗ ∈ Rp
such that:
a∗
:= arg max
a: a Rp =1



Var Pa(xi) Rp
i
inertia



.
Solution:
Var Pa(xi) Rp
i
=
1
n
n
i=1
aT
xia
2
Rp
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
First principal component
Suppose (to simplify)
1 Data are centered: for all j = 1, . . . , p, xj = 1
n
n
i=1 x
j
i
= 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1
n XT
X.
Problem: Find a∗ ∈ Rp
such that:
a∗
:= arg max
a: a Rp =1



Var Pa(xi) Rp
i
inertia



.
Solution:
Var Pa(xi) Rp
i
=
1
n
n
i=1
(aT
xi)2
a 2
Rp
=1
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
First principal component
Suppose (to simplify)
1 Data are centered: for all j = 1, . . . , p, xj = 1
n
n
i=1 x
j
i
= 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1
n XT
X.
Problem: Find a∗ ∈ Rp
such that:
a∗
:= arg max
a: a Rp =1



Var Pa(xi) Rp
i
inertia



.
Solution:
Var Pa(xi) Rp
i
=
1
n
n
i=1
(aT
xi)(xT
i a)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
First principal component
Suppose (to simplify)
1 Data are centered: for all j = 1, . . . , p, xj = 1
n
n
i=1 x
j
i
= 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1
n XT
X.
Problem: Find a∗ ∈ Rp
such that:
a∗
:= arg max
a: a Rp =1



Var Pa(xi) Rp
i
inertia



.
Solution:
Var Pa(xi) Rp
i
= aT


1
n
n
i=1
xixT
i

 a
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
First principal component
Suppose (to simplify)
1 Data are centered: for all j = 1, . . . , p, xj = 1
n
n
i=1 x
j
i
= 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1
n XT
X.
Problem: Find a∗ ∈ Rp
such that:
a∗
:= arg max
a: a Rp =1



Var Pa(xi) Rp
i
inertia



.
Solution:
Var Pa(xi) Rp
i
= aT
Var (X) a
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
First principal component
Suppose (to simplify)
1 Data are centered: for all j = 1, . . . , p, xj = 1
n
n
i=1 x
j
i
= 0;
2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1
n XT
X.
Problem: Find a∗ ∈ Rp
such that:
a∗
:= arg max
a: a Rp =1



Var Pa(xi) Rp
i
inertia



.
Solution:
Var Pa(xi) Rp
i
= aT
Var (X) a
⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive)
eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
An eigenvalue decomposition
Generalization
If ((λi)i=1,...,p, (ai)i=1,...,p) is the eigenvalue decomposition of Var (X) (by
decreasing order of the positive λi) then (ai) are the factorial axis of X.
The principal components of X are the coordinates of the projections of
the data onto these axis.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 15 / 37
An eigenvalue decomposition
Generalization
If ((λi)i=1,...,p, (ai)i=1,...,p) is the eigenvalue decomposition of Var (X) (by
decreasing order of the positive λi) then (ai) are the factorial axis of X.
The principal components of X are the coordinates of the projections of
the data onto these axis.
Then, we have:
Var (X) =
p
j=1
λjajaT
j
and
xi =
p
j=1
(xT
i aj)
principal component c
j
i
aj.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 15 / 37
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functional
variable X taking its values in X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functional
variable X taking its values in X.
Aim: Find a∗ ∈ X such that:
a∗
:= arg max
a: a X=1
Var Pa(xi) X i
.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functional
variable X taking its values in X.
Aim: Find a∗ ∈ X such that:
a∗
:= arg max
a: a X=1
Var Pa(xi) X i
.
Solution:
Var Pa(xi) Rp
i
=
1
n
n
i=1
a, xi Xa 2
X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functional
variable X taking its values in X.
Aim: Find a∗ ∈ X such that:
a∗
:= arg max
a: a X=1
Var Pa(xi) X i
.
Solution:
Var Pa(xi) Rp
i
=
1
n
n
i=1
a, xi
2
X a 2
X
=1
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functional
variable X taking its values in X.
Aim: Find a∗ ∈ X such that:
a∗
:= arg max
a: a X=1
Var Pa(xi) X i
.
Solution:
Var Pa(xi) Rp
i
=
1
n
n
i=1
a, a, xi Xxi X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functional
variable X taking its values in X.
Aim: Find a∗ ∈ X such that:
a∗
:= arg max
a: a X=1
Var Pa(xi) X i
.
Solution:
Var Pa(xi) Rp
i
=
1
n
n
i=1
a, xi Xxi, a X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functional
variable X taking its values in X.
Aim: Find a∗ ∈ X such that:
a∗
:= arg max
a: a X=1
Var Pa(xi) X i
.
Solution:
Var Pa(xi) Rp
i
= Γn
X a, a X
where Γn
X
= 1
n
n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rank
almost equal to n (Hilbert-Schmidt operator).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
Ordinary generalization of PCA in FDA
Data: x1, . . . , xn are n centered observations of a random functional
variable X taking its values in X.
Aim: Find a∗ ∈ X such that:
a∗
:= arg max
a: a X=1
Var Pa(xi) X i
.
Solution:
Var Pa(xi) Rp
i
= Γn
X a, a X
where Γn
X
= 1
n
n
i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rank
almost equal to n (Hilbert-Schmidt operator).
⇒ a∗ is the eigenvector of Γn
X
associated with the biggest (positive)
eigenvalue: Γn
X
a∗ = λ∗a∗.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
Eigenvalue decomposition of Γn
X
Factorial axes and principal components
If ((λi)i≥1, (ai)i≥1) is the eigenvalue decomposition of Γn
X
(by decreasing
order of the positive λi) then (ai) are the factorial axis of x1, . . . , xn (note
that at most n λi are nonzero).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 17 / 37
Eigenvalue decomposition of Γn
X
Factorial axes and principal components
If ((λi)i≥1, (ai)i≥1) is the eigenvalue decomposition of Γn
X
(by decreasing
order of the positive λi) then (ai) are the factorial axis of x1, . . . , xn (note
that at most n λi are nonzero).
The principal components of x1, . . . , xn are the coordinates of the
projections of the data onto these axis.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 17 / 37
Eigenvalue decomposition of Γn
X
Factorial axes and principal components
If ((λi)i≥1, (ai)i≥1) is the eigenvalue decomposition of Γn
X
(by decreasing
order of the positive λi) then (ai) are the factorial axis of x1, . . . , xn (note
that at most n λi are nonzero).
The principal components of x1, . . . , xn are the coordinates of the
projections of the data onto these axis.
Then, we have:
Γn
X =
n
j=1
λjaj ⊗ aj
and
xi =
n
j=1
xi, aj X
principal component c
j
i
aj.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 17 / 37
Example on Tecator dataset
Data:
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
Example on Tecator dataset
Data:
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
Example on Tecator dataset
Two first factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
Example on Tecator dataset
Two first factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
Example on Tecator dataset
Two first factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
Example on Tecator dataset
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
Example on Tecator dataset
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
Example on Tecator dataset
3rd and 4th factorial axes
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
Link with the regression problem
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 19 / 37
Link with the regression problem
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 19 / 37
Smoothness of the factorial axes
On a practical point of view, functional PCA in its original version is
computing as the multivariate PCA (on the discretization or on the
decomposition of the curves on a Hilbert basis).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 20 / 37
Smoothness of the factorial axes
On a practical point of view, functional PCA in its original version is
computing as the multivariate PCA (on the discretization or on the
decomposition of the curves on a Hilbert basis).
Hence, if the original data is irregular, the factorial axes won’t be smooth:
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 20 / 37
Smooth functional PCA
Aims: Introduce a penalty in the optimization problem so as to obtain
smooth (regular) factorial axes.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 21 / 37
Smooth functional PCA
Aims: Introduce a penalty in the optimization problem so as to obtain
smooth (regular) factorial axes.
Ordinary functional PCA:
a∗
:= arg max
a: a X=1
Var Pa(xi) X i
and hence a∗ is the eigenvector of Γn
X
associated with the biggest
eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 21 / 37
Smooth functional PCA
Aims: Introduce a penalty in the optimization problem so as to obtain
smooth (regular) factorial axes.
Penalized functional PCA: if D2m
X ∈ X = L2
a∗
:= arg max
a: a X=1
Var Pa(xi) X i
+µ Dm
a
2
X
(µ > 0) and hence a∗ is the eigenvector of Γn
X
+ µD2m
associated with the
biggest eigenvalue.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 21 / 37
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,
1 Approximate the observations by xi = K
k=1 ξi
k
ek .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,
1 Approximate the observations by xi = K
k=1 ξi
k
ek .
2 Approximate the derivatives of the (ek )k by D2m
ek = K
k=1 βk
k
ek .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,
1 Approximate the observations by xi = K
k=1 ξi
k
ek .
2 Approximate the derivatives of the (ek )k by D2m
ek = K
k=1 βk
k
ek .
3 Then, we can demonstrate that:
Γn
X a + µD2m
a =
K
k=1
1
n
n
i=1
((ξi
)T
Ea)ξi
k ek + µ(βT
k a)ek ,
where E is the matrix containing ( ek , ek X)k,k =1,...,K .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,
1 Approximate the observations by xi = K
k=1 ξi
k
ek .
2 Approximate the derivatives of the (ek )k by D2m
ek = K
k=1 βk
k
ek .
3 which implicates
Γn
X + µD2m
: a ∈ RK
→
K
k=1
1
n
n
i=1
((ξi
)T
Ea)ξi
+ µ(βT
a).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,
1 Approximate the observations by xi = K
k=1 ξi
k
ek .
2 Approximate the derivatives of the (ek )k by D2m
ek = K
k=1 βk
k
ek .
3 which implicates
Γn
X + µD2m
: a ∈ RK
→
K
k=1
1
n
n
i=1
((ξi
)T
Ea)ξi
+ µ(βT
a).
4 Smooth PCA is performed by an eigendecomposition of
1
n
n
i=1 ξi
(ξi
)T
E + µβT
.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
Practical implementation of smooth PCA
Let (ek )k≥1 be any functional basis. Then,
1 Approximate the observations by xi = K
k=1 ξi
k
ek .
2 Approximate the derivatives of the (ek )k by D2m
ek = K
k=1 βk
k
ek .
3 which implicates
Γn
X + µD2m
: a ∈ RK
→
K
k=1
1
n
n
i=1
((ξi
)T
Ea)ξi
+ µ(βT
a).
4 Smooth PCA is performed by an eigendecomposition of
1
n
n
i=1 ξi
(ξi
)T
E + µβT
.
Remark: The decomposition D2m
ek = K
k=1 βk
k
ek is easy to obtain
when using a spline basis ⇒ splines are well designed to represent data
with smooth properties (see Presentation 4 for futher details).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
To conclude, several references. . .
Theoretical background for functional PCA:
[Deville, 1974]
[Dauxois and Pousse, 1976]
Smooth PCA:
[Besse and Ramsay, 1986]
[Pezzulli and Silverman, 1993]
[Silverman, 1996]
Several examples and discussion:
[Ramsay and Silverman, 2002]
[Ramsay and Silverman, 1997]
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 23 / 37
Table of contents
1 Motivations
2 Functional Principal Component Analysis
3 Functional linear regression models
4 References
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 24 / 37
The model
We are interested here in the functional linear regression model:
Y is a random variable taking its values in R,
X is a random variable taking its values in X,
X and Y satisfy the following model:
Y = X, α X +
where is a real random variable centered and independant of X and
α is the parameter to be estimated.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 25 / 37
The model
We are interested here in the functional linear regression model:
Y is a random variable taking its values in R,
X is a random variable taking its values in X,
X and Y satisfy the following model:
Y = X, α X +
where is a real random variable centered and independant of X and
α is the parameter to be estimated.
We are given a training set of size n, (xi, yi)i=1,...,n of independant
realizations of the random pair (X, Y).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 25 / 37
The model
We are interested here in the functional linear regression model:
Y is a random variable taking its values in R,
X is a random variable taking its values in X,
X and Y satisfy the following model:
Y = X, α X +
where is a real random variable centered and independant of X and
α is the parameter to be estimated.
Or, alternatively, we are given a training set of size n with
errors-in-variables: (wi, yi)i=1,...,n with:
wi = xi + ηi (ηi is the realization of a centered random variable
independant of Y)
yi = xi, α X + i
This problem will be investigated in Presentation 4
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 25 / 37
Basics about the functional linear regression model
To avoid useless difficulties, we will then suppose that E (X) = 0.
Let us first define the covariance between X and Y as:
∆(X,Y) = E (XY) ∈ X.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
Basics about the functional linear regression model
To avoid useless difficulties, we will then suppose that E (X) = 0.
Let us first define the covariance between X and Y as:
∆(X,Y) = E (XY) ∈ X.
Then, we have:
ΓX α = ∆(X,Y).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
Basics about the functional linear regression model
To avoid useless difficulties, we will then suppose that E (X) = 0.
Let us first define the covariance between X and Y as:
∆(X,Y) = E (XY) ∈ X.
Then, we have:
ΓX α = ∆(X,Y).
But, as ΓX is Hilbert-Schmidt, it is not invertible and thus, the empirical
estimate of α (using a generalized inverse of Γn
X
) does not converge to α
when n tends to infinity. It is a ill-posed inverse problem.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
Basics about the functional linear regression model
To avoid useless difficulties, we will then suppose that E (X) = 0.
Let us first define the covariance between X and Y as:
∆(X,Y) = E (XY) ∈ X.
Then, we have:
ΓX α = ∆(X,Y).
But, as ΓX is Hilbert-Schmidt, it is not invertible and thus, the empirical
estimate of α (using a generalized inverse of Γn
X
) does not converge to α
when n tends to infinity. It is a ill-posed inverse problem.
⇒ Penalization or regularization are needed to access a relevant
estimate.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] on
hilbertian AR models.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] on
hilbertian AR models.
PCA decomposition of X: Note
((λn
i
, vn
i
))i≥1 the eigenvalue decomposition of Γn
X
((λi)i are ordered in
decreasing order and almost n eigenvalues are not null; (vn
i
)i are
orthonormal)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] on
hilbertian AR models.
PCA decomposition of X: Note
((λn
i
, vn
i
))i≥1 the eigenvalue decomposition of Γn
X
((λi)i are ordered in
decreasing order and almost n eigenvalues are not null; (vn
i
)i are
orthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] on
hilbertian AR models.
PCA decomposition of X: Note
((λn
i
, vn
i
))i≥1 the eigenvalue decomposition of Γn
X
((λi)i are ordered in
decreasing order and almost n eigenvalues are not null; (vn
i
)i are
orthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Pkn the projector Pkn (u) = kn
i=1
vn
i
, . Xvn
i
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] on
hilbertian AR models.
PCA decomposition of X: Note
((λn
i
, vn
i
))i≥1 the eigenvalue decomposition of Γn
X
((λi)i are ordered in
decreasing order and almost n eigenvalues are not null; (vn
i
)i are
orthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Pkn the projector Pkn (u) = kn
i=1
vn
i
, . Xvn
i
Γn,kn
X
= Pkn ◦ Γn,kn
X
◦ Pkn = 1
n
kn
i=1
λn
i
vn
i
, . Xvn
i
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
PCA approach
References: [Cardot et al., 1999] from the works of [Bosq, 1991] on
hilbertian AR models.
PCA decomposition of X: Note
((λn
i
, vn
i
))i≥1 the eigenvalue decomposition of Γn
X
((λi)i are ordered in
decreasing order and almost n eigenvalues are not null; (vn
i
)i are
orthonormal)
kn an integer such that: kn ≤ n and limn→+∞ kn = +∞
Pkn the projector Pkn (u) = kn
i=1
vn
i
, . Xvn
i
Γn,kn
X
= Pkn ◦ Γn,kn
X
◦ Pkn = 1
n
kn
i=1
λn
i
vn
i
, . Xvn
i
∆n,kn
(X,Y)
= Pkn ◦ 1
n
n
i=1 yixi = 1
n i=1,...,n, i =1,...,kn
yi xi, vn
i Xvn
i
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
Definition of a consistent estimate for α
Define:
αn
= Γn,kn
X
+
∆n,kn
(X,Y)
where Γn,kn
X
+
denotes the generalized inverse (Moore-Penrose):
Γn,kn
X
+
=
kn
i=1
(λn
i )−1
vn
i , . Xvn
i .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 28 / 37
Assumptions for consistency result
(A1) (λn
i
)i=1,...,kn are all distinct and not null a.s.
(A2) (λi)i≥1 are all distinct and not null
(A3) X is a.s. bounded in X ( X X ≤ C1 a.s.)
(A4) is a.s. bounded ( ≤ C2 a.s.)
(A5) limn→+∞
nλ4
kn
log n = +∞
(A6) limn→+∞
nλ2
kn
kn
j=1
aj
2
log n
= +∞ where a1 = 2
√
2
λ1−λ2
and
aj = 2
√
2
min(λj−1−λj;λj−λj+1)
for j > 1
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 29 / 37
Assumptions for consistency result
(A1) (λn
i
)i=1,...,kn are all distinct and not null a.s.
(A2) (λi)i≥1 are all distinct and not null
(A3) X is a.s. bounded in X ( X X ≤ C1 a.s.)
(A4) is a.s. bounded ( ≤ C2 a.s.)
(A5) limn→+∞
nλ4
kn
log n = +∞
(A6) limn→+∞
nλ2
kn
kn
j=1
aj
2
log n
= +∞ where a1 = 2
√
2
λ1−λ2
and
aj = 2
√
2
min(λj−1−λj;λj−λj+1)
for j > 1
Example of ΓX satisfying those assumptions: if the eigenvalues of ΓX
are geometrically or exponentially decreasing, these assumptions are
fullfilled as long as the sequence (kn)n tends slowly enough to +∞.
For example, X is a Brownian motion on [0, 1] and kn = o(log n).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 29 / 37
Consistency result
Theorem [Cardot et al., 1999]
Under assumptions (A1)-(A6), we have:
αn
− α X
n→+∞
−−−−−−→ 0
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 30 / 37
Smoothing approach based on B-splines
References: [Cardot et al., 2003]
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
Smoothing approach based on B-splines
References: [Cardot et al., 2003]
Suppose that X takes values in L2
([0, 1]).
Basics on B-splines: Let q and k be to integers and denotes by Sqk the
space of functions satisfying:
s ∈ Sqk are polynomials of degree q on each interval l−1
k , l
k for all
l = 1, . . . , k,
s ∈ Sqk are q − 1 times differentiable on [0, 1].
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
Smoothing approach based on B-splines
References: [Cardot et al., 2003]
Suppose that X takes values in L2
([0, 1]).
Basics on B-splines: Let q and k be to integers and denotes by Sqk the
space of functions satisfying:
s ∈ Sqk are polynomials of degree q on each interval l−1
k , l
k for all
l = 1, . . . , k,
s ∈ Sqk are q − 1 times differentiable on [0, 1].
The space Sqk has dimension q + k and a normalized basis of Sqk is
denoted by {B
qk
j
, j = 1, . . . , q + k} (normalized B-Splines, see
[de Boor, 1978]).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
Smoothing approach based on B-splines
References: [Cardot et al., 2003]
Suppose that X takes values in L2
([0, 1]).
Basics on B-splines: Let q and k be to integers and denotes by Sqk the
space of functions satisfying:
s ∈ Sqk are polynomials of degree q on each interval l−1
k , l
k for all
l = 1, . . . , k,
s ∈ Sqk are q − 1 times differentiable on [0, 1].
The space Sqk has dimension q + k and a normalized basis of Sqk is
denoted by {B
qk
j
, j = 1, . . . , q + k} (normalized B-Splines, see
[de Boor, 1978]).
These functions are easy to manipulate and have interesting smoothness
properties. They can be used to express either X and the parameter α as
well as to impose smoothness constrains on α.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
Definition of a consistent estimate for α
Note Bqk := B
qk
1
, . . . , B
qk
q+k
T
and B
(m)
qk
the m derivatives of Bqk for a
m < q + k.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
Definition of a consistent estimate for α
Note Bqk := B
qk
1
, . . . , B
qk
q+k
T
and B
(m)
qk
the m derivatives of Bqk for a
m < q + k.
A penalized mean square estimate: Providing the fact that αn
is smooth
enough, we aim at finding αn
= q+k
j=1
an
j
B
qk
j
= (an
)T
Bqk solution of the
optimization problem:
arg min
a∈Rq+k
1
n
n
i=1
yi − aT
Bqk , xi X
2
+ µ aT
B
(m)
qk
2
X
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
Definition of a consistent estimate for α
Note Bqk := B
qk
1
, . . . , B
qk
q+k
T
and B
(m)
qk
the m derivatives of Bqk for a
m < q + k.
A penalized mean square estimate: Providing the fact that αn
is smooth
enough, we aim at finding αn
= q+k
j=1
an
j
B
qk
j
= (an
)T
Bqk solution of the
optimization problem:
arg min
a∈Rq+k
1
n
n
i=1
yi − aT
Bqk , xi X
2
mean square criterion
+µ aT
B
(m)
qk
2
X
smoothness penalization
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
Definition of a consistent estimate for α
Note Bqk := B
qk
1
, . . . , B
qk
q+k
T
and B
(m)
qk
the m derivatives of Bqk for a
m < q + k.
A penalized mean square estimate: Providing the fact that αn
is smooth
enough, we aim at finding αn
= q+k
j=1
an
j
B
qk
j
= (an
)T
Bqk solution of the
optimization problem:
arg min
a∈Rq+k
1
n
n
i=1
yi − aT
Bqk , xi X
2
mean square criterion
+µ aT
B
(m)
qk
2
X
smoothness penalization
The solution of the previous problem is given by
an
= Cn
+ µGn
qk
−1
bn
where Cn
is the matrix with components Γn
X
B
qk
j
, B
qk
j X
(j, j = 1, . . . , q + k), bn
is the vector with components ∆n
(X,Y)
, B
qk
j X
(j = 1, . . . , q + k) and Gn
qk
is the matrix with components B
qk(m)
j
, B
qk(m)
j X
(j, j = 1, . . . , q + k).
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
Assumptions for consistency result
(A1) X is a.s. bounded in X
(A2) Var (Y|X = x) ≤ C1 for all x ∈ X
(A3) E (Y|X = x) ≤ C2 for all x ∈ X
(A4) it exists an integer p and a real ν ∈ [0, 1] such that p + ν ≤ q
and α(p )(t1) − α(p )(t2) ≤ |t1 − t2|ν
(A5) µ = O n−(1−δ)/2
for a 0 < δ < 1
(A6) limn→+∞ µk2(m−p) = 0 for p = p + ν
(A7) k = O n1/(4p+1)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 33 / 37
Consistency result
Theorem [Cardot et al., 2003]
Under assumptions (A1)-(A7),
limn→+∞ P (it exists a unique solution to the minimization problem) = 1
E αn
− α 2
X |x1, . . . , xn = OP n−2p/(4p+1)
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 34 / 37
Other functional linear methods
Canonical correlation: [Leurgans et al., 1993]
Factorial Discriminant Analysis: [Hastie et al., 1995]
Partial Least Squares: [Preda and Saporta, 2005]
. . .
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 35 / 37
Table of contents
1 Motivations
2 Functional Principal Component Analysis
3 Functional linear regression models
4 References
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 36 / 37
References
Further details for the references are given in the joint document.
Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 37 / 37

Introduction to FDA and linear models

  • 1.
    Introduction to FDAand linear models Nathalie Villa-Vialaneix - nathalie.villa@math.univ-toulouse.fr http://www.nathalievilla.org Institut de Mathématiques de Toulouse - IUT de Carcassonne, Université de Perpignan France La Havane, September 15th, 2008 Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 1 / 37
  • 2.
    Table of contents 1Motivations 2 Functional Principal Component Analysis 3 Functional linear regression models 4 References Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 2 / 37
  • 3.
    What is FunctionalData Analysis (FDA)? FDA deals with data that are measurements of continuous phenomena on a discrete sampling grid Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
  • 4.
    What is FunctionalData Analysis (FDA)? FDA deals with data that are measurements of continuous phenomena on a discrete sampling grid Example 1: Regression case 1 100 wavelengths Find the fat content of peaces of meat from their absorbence spectra. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
  • 5.
    What is FunctionalData Analysis (FDA)? FDA deals with data that are measurements of continuous phenomena on a discrete sampling grid Example 2: Regression case 2 1 049 wavelengths Find the disease content in wheat from its absorbence spectra. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
  • 6.
    What is FunctionalData Analysis (FDA)? FDA deals with data that are measurements of continuous phenomena on a discrete sampling grid Example 3: Classification case 1 Recognize one of the five phonemes from its log-periodograms (256 wavelengths). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
  • 7.
    What is FunctionalData Analysis (FDA)? FDA deals with data that are measurements of continuous phenomena on a discrete sampling grid Example 4: Classification case 2 Recognize one of two words from its record in frequency domain (8 192 time steps). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
  • 8.
    What is FunctionalData Analysis (FDA)? FDA deals with data that are measurements of continuous phenomena on a discrete sampling grid Example 5: Regression on functional data [Azaïs et al., 2008] Estimate a typical load curve (electricity consumption) from economic multivariate variables. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
  • 9.
    What is FunctionalData Analysis (FDA)? FDA deals with data that are measurements of continuous phenomena on a discrete sampling grid Example 6: Curves clustering [Bensmail et al., 2005] Create a typology of sick cells from their “SELDI mass” spectra. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 3 / 37
  • 10.
    Specific issues oflearning with FDA High dimensional data (the number of discretization point is often bigger or much more bigger than the number of observations); Highly correlated data (because of the functional structure underlined, the values at two sampling points are correlated). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 4 / 37
  • 11.
    Specific issues oflearning with FDA High dimensional data (the number of discretization point is often bigger or much more bigger than the number of observations); Highly correlated data (because of the functional structure underlined, the values at two sampling points are correlated). Consequences: Direct use of classical statistical methods on the discretization leads to ill-posed problems and provides inaccurate solutions. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 4 / 37
  • 12.
    Theoretical model A functionalrandom variable is a random variable X taking its values in a functional space X where X can be: a (infinite dimensional) Hilbert space with inner product ., . X; Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 5 / 37
  • 13.
    Theoretical model A functionalrandom variable is a random variable X taking its values in a functional space X where X can be: a (infinite dimensional) Hilbert space with inner product ., . X; in particular, L2 is often used; Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 5 / 37
  • 14.
    Theoretical model A functionalrandom variable is a random variable X taking its values in a functional space X where X can be: a (infinite dimensional) Hilbert space with inner product ., . X; in particular, L2 is often used; any (infinite dimensional) Banach space with norm . X (less usual); for example, C0 . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 5 / 37
  • 15.
    Hilbertian context In thehilbertian context, we are able to define the expectation of X as (Theorem of Riesz) the unique element E (X) of X such that ∀ u ∈ X, E (X) , u X = E ( X, u X) , 1 Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
  • 16.
    Hilbertian context In thehilbertian context, we are able to define the expectation of X as (Theorem of Riesz) the unique element E (X) of X such that ∀ u ∈ X, E (X) , u X = E ( X, u X) , for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator u1 ⊗ u2 : v ∈ X → u1, v Xu2 ∈ X, 1 Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
  • 17.
    Hilbertian context In thehilbertian context, we are able to define the expectation of X as (Theorem of Riesz) the unique element E (X) of X such that ∀ u ∈ X, E (X) , u X = E ( X, u X) , for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator u1 ⊗ u2 : v ∈ X → u1, v Xu2 ∈ X, as (X − E (X)) ⊗ (X − E (X)) is an element of the Hilbert Space of Hilbert-Schmidt operators from X to X, HS(X)1 1 This Hilbert space is equipped with the inner product ∀ g1, g2 ∈ HS(X), g1, g2 HS(X) = i g1ei, g2ei X where (ei)i is any orthonormal basis of X Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
  • 18.
    Hilbertian context In thehilbertian context, we are able to define the expectation of X as (Theorem of Riesz) the unique element E (X) of X such that ∀ u ∈ X, E (X) , u X = E ( X, u X) , for any u1, u2 ∈ X, u1 ⊗ u2 is the linear operator u1 ⊗ u2 : v ∈ X → u1, v Xu2 ∈ X, as (X − E (X)) ⊗ (X − E (X)) is an element of the Hilbert Space of Hilbert-Schmidt operators from X to X, HS(X)1 the variance of X as the linear operator ΓX ΓX = E ((X − E (X)) ⊗ (X − E (X))) : u ∈ X → E ( X − E (X) , u X(X − E (X))) . 1 This Hilbert space is equipped with the inner product ∀ g1, g2 ∈ HS(X), g1, g2 HS(X) = i g1ei, g2ei X where (ei)i is any orthonormal basis of X Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 6 / 37
  • 19.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 20.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; expectation: for all t ∈ [0, 1], E (X) (t) = E (X(t)) = X(t)dPX , Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 21.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; expectation: for all t ∈ [0, 1], E (X) (t) = E (X(t)) = X(t)dPX , variance: for all t, t ∈ [0, 1], ΓX γ(t, t ) = E (X(t)X(t )) (if E (X) = 0 for clarity reasons) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 22.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; expectation: for all t ∈ [0, 1], E (X) (t) = E (X(t)) = X(t)dPX , variance: for all t, t ∈ [0, 1], ΓX γ(t, t ) = E (X(t)X(t )) (if E (X) = 0 for clarity reasons) because: 1 for all t ∈ [0, 1], we can define Γt X : u ∈ X → (ΓX u)(t) ∈ R, Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 23.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; expectation: for all t ∈ [0, 1], E (X) (t) = E (X(t)) = X(t)dPX , variance: for all t, t ∈ [0, 1], ΓX γ(t, t ) = E (X(t)X(t )) (if E (X) = 0 for clarity reasons) because: 1 for all t ∈ [0, 1], we can define Γt X : u ∈ X → (ΓX u)(t) ∈ R, 2 By Riesz’s Theorem, it exists ζt ∈ X such that ∀ u ∈ X, Γt X u = ζt , u X. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 24.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; expectation: for all t ∈ [0, 1], E (X) (t) = E (X(t)) = X(t)dPX , variance: for all t, t ∈ [0, 1], ΓX γ(t, t ) = E (X(t)X(t )) (if E (X) = 0 for clarity reasons) because: 1 for all t ∈ [0, 1], we can define Γt X : u ∈ X → (ΓX u)(t) ∈ R, 2 By Riesz’s Theorem, it exists ζt ∈ X such that ∀ u ∈ X, Γt X u = ζt , u X. As, (ΓX u)(t) = E ( X, u XX(t)) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 25.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; expectation: for all t ∈ [0, 1], E (X) (t) = E (X(t)) = X(t)dPX , variance: for all t, t ∈ [0, 1], ΓX γ(t, t ) = E (X(t)X(t )) (if E (X) = 0 for clarity reasons) because: 1 for all t ∈ [0, 1], we can define Γt X : u ∈ X → (ΓX u)(t) ∈ R, 2 By Riesz’s Theorem, it exists ζt ∈ X such that ∀ u ∈ X, Γt X u = ζt , u X. As, (ΓX u)(t) = E ( X(t)X, u X) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 26.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; expectation: for all t ∈ [0, 1], E (X) (t) = E (X(t)) = X(t)dPX , variance: for all t, t ∈ [0, 1], ΓX γ(t, t ) = E (X(t)X(t )) (if E (X) = 0 for clarity reasons) because: 1 for all t ∈ [0, 1], we can define Γt X : u ∈ X → (ΓX u)(t) ∈ R, 2 By Riesz’s Theorem, it exists ζt ∈ X such that ∀ u ∈ X, Γt X u = ζt , u X. As, (ΓX u)(t) = E (X(t)X), u X, Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 27.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; expectation: for all t ∈ [0, 1], E (X) (t) = E (X(t)) = X(t)dPX , variance: for all t, t ∈ [0, 1], ΓX γ(t, t ) = E (X(t)X(t )) (if E (X) = 0 for clarity reasons) because: 1 for all t ∈ [0, 1], we can define Γt X : u ∈ X → (ΓX u)(t) ∈ R, 2 By Riesz’s Theorem, it exists ζt ∈ X such that ∀ u ∈ X, Γt X u = ζt , u X. As, (ΓX u)(t) = E (X(t)X), u X, we have that ζt = E (X(t)X) . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 28.
    Case X =L2 ([0, 1]) if X = L2 ([0, 1]), this expressions simplify in: norm: X 2 = [0,1] (X(t))2 dt < +∞; expectation: for all t ∈ [0, 1], E (X) (t) = E (X(t)) = X(t)dPX , variance: for all t, t ∈ [0, 1], ΓX γ(t, t ) = E (X(t)X(t )) (if E (X) = 0 for clarity reasons) because: 1 for all t ∈ [0, 1], we can define Γt X : u ∈ X → (ΓX u)(t) ∈ R, 2 By Riesz’s Theorem, it exists ζt ∈ X such that ∀ u ∈ X, Γt X u = ζt , u X. As, (ΓX u)(t) = E (X(t)X), u X, we have that ζt = E (X(t)X) . 3 We define γ : (t, t ) ∈ [0, 1]2 → ζt (t ) = E (X(t)X(t )) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 7 / 37
  • 29.
    Properties of ΓX ΓXis Hilbert-Schmidt: (definition) i ΓX ei 2 < +∞; Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
  • 30.
    Properties of ΓX ΓXis Hilbert-Schmidt: (definition) i ΓX ei 2 < +∞; it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi (for all i ≤ 1). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
  • 31.
    Properties of ΓX ΓXis Hilbert-Schmidt: (definition) i ΓX ei 2 < +∞; it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi (for all i ≤ 1). This eigensystem is such that: λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i, Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
  • 32.
    Properties of ΓX ΓXis Hilbert-Schmidt: (definition) i ΓX ei 2 < +∞; it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi (for all i ≤ 1). This eigensystem is such that: λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i, Karhunen-Loeve decomposition of ΓX ΓX = i≥1 λivi ⊗ vi. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
  • 33.
    Properties of ΓX ΓXis Hilbert-Schmidt: (definition) i ΓX ei 2 < +∞; it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi (for all i ≤ 1). This eigensystem is such that: λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i, Karhunen-Loeve decomposition of ΓX ΓX = i≥1 λivi ⊗ vi. ΓX has no inverse in the space of continuous operator from X to X, if X has infinite dimension. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
  • 34.
    Properties of ΓX ΓXis Hilbert-Schmidt: (definition) i ΓX ei 2 < +∞; it exists a countable eigensystem of ΓX , ((λi)i≤1, (vi)i≥1): ΓX vi = λivi (for all i ≤ 1). This eigensystem is such that: λ1 ≥ λ2 ≥ . . . ≥ 0 and 0 is the only possible accumulation value of (λi)i, Karhunen-Loeve decomposition of ΓX ΓX = i≥1 λivi ⊗ vi. ΓX has no inverse in the space of continuous operator from X to X, if X has infinite dimension. More precisely, if λi > 0 for all i ≥ 1, i≥1 1 λi = +∞. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 8 / 37
  • 35.
    Model of theobserved data We focus on: 1 a regression problem: Y ∈ R has to be predicted from X ∈ X, 2 OR a (binary) classification problem: Y ∈ {−1, 1} has to be predicted from X ∈ X. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 9 / 37
  • 36.
    Model of theobserved data We focus on: 1 a regression problem: Y ∈ R has to be predicted from X ∈ X, 2 OR a (binary) classification problem: Y ∈ {−1, 1} has to be predicted from X ∈ X. Learning set - Version 1 (x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 9 / 37
  • 37.
    Model of theobserved data We focus on: 1 a regression problem: Y ∈ R has to be predicted from X ∈ X, 2 OR a (binary) classification problem: Y ∈ {−1, 1} has to be predicted from X ∈ X. Learning set - Version 1 (x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y). Remark: if E X 2 X < +∞ (i.e., if ΓX exists), a functional version of the Central Limit Theorem exists. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 9 / 37
  • 38.
    Model of theuncertainty on X In fact, realizations of X are never observed. Only a (possibly noisy) discretization of them are given. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
  • 39.
    Model of theuncertainty on X In fact, realizations of X are never observed. Only a (possibly noisy) discretization of them are given. Learning set - Version II (x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for all i = 1, . . . , n, xτi i = (xi(t))t∈τi is observed, where τi is a finite set. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
  • 40.
    Model of theuncertainty on X In fact, realizations of X are never observed. Only a (possibly noisy) discretization of them are given. Learning set - Version II (x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for all i = 1, . . . , n, xτi i = (xi(t))t∈τi is observed, where τi is a finite set. Questions: 1 How to obtain (xi)i from (xτi i )i? Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
  • 41.
    Model of theuncertainty on X In fact, realizations of X are never observed. Only a (possibly noisy) discretization of them are given. Learning set - Version II (x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for all i = 1, . . . , n, xτi i = (xi(t))t∈τi is observed, where τi is a finite set. Questions: 1 How to obtain (xi)i from (xτi i )i? 2 What are the consequences of this uncertainty on the accuracy of the solution of the regression/classification problem? Can we obtain a solution that is as good as those obtained from the direct observation of (xi)i? Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 10 / 37
  • 42.
    Noisy data model Learningset - Version III (x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for all i = 1, . . . , n, xτi i = (xi(t) + it )t∈τi is observed, where τi is a finite set and it is a centered random variable independant of X. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
  • 43.
    Noisy data model Learningset - Version III (x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for all i = 1, . . . , n, xτi i = (xi(t) + it )t∈τi is observed, where τi is a finite set and it is a centered random variable independant of X. Again, 1 How to obtain (xi)i from (xτi i )i? (works have been done: function estimation) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
  • 44.
    Noisy data model Learningset - Version III (x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for all i = 1, . . . , n, xτi i = (xi(t) + it )t∈τi is observed, where τi is a finite set and it is a centered random variable independant of X. Again, 1 How to obtain (xi)i from (xτi i )i? (works have been done: function estimation) 2 What are the consequences of this uncertainty on the accuracy of the solution of the regression/classification problem? Can we obtain a solution that is as good as those obtained from the direct observation of (xi)i? (relating to “errors-in-variables” problems ; almost no work in FDA) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
  • 45.
    Noisy data model Learningset - Version III (x1, y1), . . . , (xn, yn) are i.i.d. realizations of the random pair (X, Y) and for all i = 1, . . . , n, xτi i = (xi(t) + it )t∈τi is observed, where τi is a finite set and it is a centered random variable independant of X. Again, 1 How to obtain (xi)i from (xτi i )i? (works have been done: function estimation) 2 What are the consequences of this uncertainty on the accuracy of the solution of the regression/classification problem? Can we obtain a solution that is as good as those obtained from the direct observation of (xi)i? (relating to “errors-in-variables” problems ; almost no work in FDA) In these presentations, works coming from the three points of view. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 11 / 37
  • 46.
    Table of contents 1Motivations 2 Functional Principal Component Analysis 3 Functional linear regression models 4 References Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 12 / 37
  • 47.
    Multidimensional PCA: contextand notations Data: A real matrix X = (x j i )i=1,...,n, j=1,...,p which is the observation of p variables on n individuals. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
  • 48.
    Multidimensional PCA: contextand notations Data: A real matrix X = (x j i )i=1,...,n, j=1,...,p which is the observation of p variables on n individuals. n rows, each corresponding to the value of p variables for an individual: xi = (x1 i , . . . , x p i )T . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
  • 49.
    Multidimensional PCA: contextand notations Data: A real matrix X = (x j i )i=1,...,n, j=1,...,p which is the observation of p variables on n individuals. n rows, each corresponding to the value of p variables for an individual: xi = (x1 i , . . . , x p i )T . p columns, each corresponding to n observations of a variable: xj = (x j 1 , . . . , x j n)T . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
  • 50.
    Multidimensional PCA: contextand notations Data: A real matrix X = (x j i )i=1,...,n, j=1,...,p which is the observation of p variables on n individuals. n rows, each corresponding to the value of p variables for an individual: xi = (x1 i , . . . , x p i )T . p columns, each corresponding to n observations of a variable: xj = (x j 1 , . . . , x j n)T . Aim: Find linearly independant variables, that are linear combinations of the original ones, by order of “importance” in X. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 13 / 37
  • 51.
    First principal component Suppose(to simplify) 1 Data are centered: for all j = 1, . . . , p, xj = 1 n n i=1 x j i = 0; 2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1 n XT X. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
  • 52.
    First principal component Suppose(to simplify) 1 Data are centered: for all j = 1, . . . , p, xj = 1 n n i=1 x j i = 0; 2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1 n XT X. Problem: Find a∗ ∈ Rp such that: a∗ := arg max a: a Rp =1    Var Pa(xi) Rp i inertia    . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
  • 53.
    First principal component Suppose(to simplify) 1 Data are centered: for all j = 1, . . . , p, xj = 1 n n i=1 x j i = 0; 2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1 n XT X. Problem: Find a∗ ∈ Rp such that: a∗ := arg max a: a Rp =1    Var Pa(xi) Rp i inertia    . Solution: Var Pa(xi) Rp i = 1 n n i=1 aT xia 2 Rp Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
  • 54.
    First principal component Suppose(to simplify) 1 Data are centered: for all j = 1, . . . , p, xj = 1 n n i=1 x j i = 0; 2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1 n XT X. Problem: Find a∗ ∈ Rp such that: a∗ := arg max a: a Rp =1    Var Pa(xi) Rp i inertia    . Solution: Var Pa(xi) Rp i = 1 n n i=1 (aT xi)2 a 2 Rp =1 Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
  • 55.
    First principal component Suppose(to simplify) 1 Data are centered: for all j = 1, . . . , p, xj = 1 n n i=1 x j i = 0; 2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1 n XT X. Problem: Find a∗ ∈ Rp such that: a∗ := arg max a: a Rp =1    Var Pa(xi) Rp i inertia    . Solution: Var Pa(xi) Rp i = 1 n n i=1 (aT xi)(xT i a) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
  • 56.
    First principal component Suppose(to simplify) 1 Data are centered: for all j = 1, . . . , p, xj = 1 n n i=1 x j i = 0; 2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1 n XT X. Problem: Find a∗ ∈ Rp such that: a∗ := arg max a: a Rp =1    Var Pa(xi) Rp i inertia    . Solution: Var Pa(xi) Rp i = aT   1 n n i=1 xixT i   a Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
  • 57.
    First principal component Suppose(to simplify) 1 Data are centered: for all j = 1, . . . , p, xj = 1 n n i=1 x j i = 0; 2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1 n XT X. Problem: Find a∗ ∈ Rp such that: a∗ := arg max a: a Rp =1    Var Pa(xi) Rp i inertia    . Solution: Var Pa(xi) Rp i = aT Var (X) a Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
  • 58.
    First principal component Suppose(to simplify) 1 Data are centered: for all j = 1, . . . , p, xj = 1 n n i=1 x j i = 0; 2 The empirical variance of X is: for all j = 1, . . . , p, Var (X) = 1 n XT X. Problem: Find a∗ ∈ Rp such that: a∗ := arg max a: a Rp =1    Var Pa(xi) Rp i inertia    . Solution: Var Pa(xi) Rp i = aT Var (X) a ⇒ a∗ is the eigenvector of Var (X) associated with the biggest (positive) eigenvalue. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 14 / 37
  • 59.
    An eigenvalue decomposition Generalization If((λi)i=1,...,p, (ai)i=1,...,p) is the eigenvalue decomposition of Var (X) (by decreasing order of the positive λi) then (ai) are the factorial axis of X. The principal components of X are the coordinates of the projections of the data onto these axis. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 15 / 37
  • 60.
    An eigenvalue decomposition Generalization If((λi)i=1,...,p, (ai)i=1,...,p) is the eigenvalue decomposition of Var (X) (by decreasing order of the positive λi) then (ai) are the factorial axis of X. The principal components of X are the coordinates of the projections of the data onto these axis. Then, we have: Var (X) = p j=1 λjajaT j and xi = p j=1 (xT i aj) principal component c j i aj. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 15 / 37
  • 61.
    Ordinary generalization ofPCA in FDA Data: x1, . . . , xn are n centered observations of a random functional variable X taking its values in X. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
  • 62.
    Ordinary generalization ofPCA in FDA Data: x1, . . . , xn are n centered observations of a random functional variable X taking its values in X. Aim: Find a∗ ∈ X such that: a∗ := arg max a: a X=1 Var Pa(xi) X i . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
  • 63.
    Ordinary generalization ofPCA in FDA Data: x1, . . . , xn are n centered observations of a random functional variable X taking its values in X. Aim: Find a∗ ∈ X such that: a∗ := arg max a: a X=1 Var Pa(xi) X i . Solution: Var Pa(xi) Rp i = 1 n n i=1 a, xi Xa 2 X Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
  • 64.
    Ordinary generalization ofPCA in FDA Data: x1, . . . , xn are n centered observations of a random functional variable X taking its values in X. Aim: Find a∗ ∈ X such that: a∗ := arg max a: a X=1 Var Pa(xi) X i . Solution: Var Pa(xi) Rp i = 1 n n i=1 a, xi 2 X a 2 X =1 Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
  • 65.
    Ordinary generalization ofPCA in FDA Data: x1, . . . , xn are n centered observations of a random functional variable X taking its values in X. Aim: Find a∗ ∈ X such that: a∗ := arg max a: a X=1 Var Pa(xi) X i . Solution: Var Pa(xi) Rp i = 1 n n i=1 a, a, xi Xxi X Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
  • 66.
    Ordinary generalization ofPCA in FDA Data: x1, . . . , xn are n centered observations of a random functional variable X taking its values in X. Aim: Find a∗ ∈ X such that: a∗ := arg max a: a X=1 Var Pa(xi) X i . Solution: Var Pa(xi) Rp i = 1 n n i=1 a, xi Xxi, a X Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
  • 67.
    Ordinary generalization ofPCA in FDA Data: x1, . . . , xn are n centered observations of a random functional variable X taking its values in X. Aim: Find a∗ ∈ X such that: a∗ := arg max a: a X=1 Var Pa(xi) X i . Solution: Var Pa(xi) Rp i = Γn X a, a X where Γn X = 1 n n i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rank almost equal to n (Hilbert-Schmidt operator). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
  • 68.
    Ordinary generalization ofPCA in FDA Data: x1, . . . , xn are n centered observations of a random functional variable X taking its values in X. Aim: Find a∗ ∈ X such that: a∗ := arg max a: a X=1 Var Pa(xi) X i . Solution: Var Pa(xi) Rp i = Γn X a, a X where Γn X = 1 n n i=1 xi ⊗ xi is the empirical estimate of ΓX and is of rank almost equal to n (Hilbert-Schmidt operator). ⇒ a∗ is the eigenvector of Γn X associated with the biggest (positive) eigenvalue: Γn X a∗ = λ∗a∗. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 16 / 37
  • 69.
    Eigenvalue decomposition ofΓn X Factorial axes and principal components If ((λi)i≥1, (ai)i≥1) is the eigenvalue decomposition of Γn X (by decreasing order of the positive λi) then (ai) are the factorial axis of x1, . . . , xn (note that at most n λi are nonzero). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 17 / 37
  • 70.
    Eigenvalue decomposition ofΓn X Factorial axes and principal components If ((λi)i≥1, (ai)i≥1) is the eigenvalue decomposition of Γn X (by decreasing order of the positive λi) then (ai) are the factorial axis of x1, . . . , xn (note that at most n λi are nonzero). The principal components of x1, . . . , xn are the coordinates of the projections of the data onto these axis. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 17 / 37
  • 71.
    Eigenvalue decomposition ofΓn X Factorial axes and principal components If ((λi)i≥1, (ai)i≥1) is the eigenvalue decomposition of Γn X (by decreasing order of the positive λi) then (ai) are the factorial axis of x1, . . . , xn (note that at most n λi are nonzero). The principal components of x1, . . . , xn are the coordinates of the projections of the data onto these axis. Then, we have: Γn X = n j=1 λjaj ⊗ aj and xi = n j=1 xi, aj X principal component c j i aj. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 17 / 37
  • 72.
    Example on Tecatordataset Data: Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
  • 73.
    Example on Tecatordataset Data: Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
  • 74.
    Example on Tecatordataset Two first factorial axes Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
  • 75.
    Example on Tecatordataset Two first factorial axes Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
  • 76.
    Example on Tecatordataset Two first factorial axes Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
  • 77.
    Example on Tecatordataset 3rd and 4th factorial axes Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
  • 78.
    Example on Tecatordataset 3rd and 4th factorial axes Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
  • 79.
    Example on Tecatordataset 3rd and 4th factorial axes Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 18 / 37
  • 80.
    Link with theregression problem Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 19 / 37
  • 81.
    Link with theregression problem Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 19 / 37
  • 82.
    Smoothness of thefactorial axes On a practical point of view, functional PCA in its original version is computing as the multivariate PCA (on the discretization or on the decomposition of the curves on a Hilbert basis). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 20 / 37
  • 83.
    Smoothness of thefactorial axes On a practical point of view, functional PCA in its original version is computing as the multivariate PCA (on the discretization or on the decomposition of the curves on a Hilbert basis). Hence, if the original data is irregular, the factorial axes won’t be smooth: Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 20 / 37
  • 84.
    Smooth functional PCA Aims:Introduce a penalty in the optimization problem so as to obtain smooth (regular) factorial axes. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 21 / 37
  • 85.
    Smooth functional PCA Aims:Introduce a penalty in the optimization problem so as to obtain smooth (regular) factorial axes. Ordinary functional PCA: a∗ := arg max a: a X=1 Var Pa(xi) X i and hence a∗ is the eigenvector of Γn X associated with the biggest eigenvalue. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 21 / 37
  • 86.
    Smooth functional PCA Aims:Introduce a penalty in the optimization problem so as to obtain smooth (regular) factorial axes. Penalized functional PCA: if D2m X ∈ X = L2 a∗ := arg max a: a X=1 Var Pa(xi) X i +µ Dm a 2 X (µ > 0) and hence a∗ is the eigenvector of Γn X + µD2m associated with the biggest eigenvalue. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 21 / 37
  • 87.
    Practical implementation ofsmooth PCA Let (ek )k≥1 be any functional basis. Then, 1 Approximate the observations by xi = K k=1 ξi k ek . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
  • 88.
    Practical implementation ofsmooth PCA Let (ek )k≥1 be any functional basis. Then, 1 Approximate the observations by xi = K k=1 ξi k ek . 2 Approximate the derivatives of the (ek )k by D2m ek = K k=1 βk k ek . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
  • 89.
    Practical implementation ofsmooth PCA Let (ek )k≥1 be any functional basis. Then, 1 Approximate the observations by xi = K k=1 ξi k ek . 2 Approximate the derivatives of the (ek )k by D2m ek = K k=1 βk k ek . 3 Then, we can demonstrate that: Γn X a + µD2m a = K k=1 1 n n i=1 ((ξi )T Ea)ξi k ek + µ(βT k a)ek , where E is the matrix containing ( ek , ek X)k,k =1,...,K . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
  • 90.
    Practical implementation ofsmooth PCA Let (ek )k≥1 be any functional basis. Then, 1 Approximate the observations by xi = K k=1 ξi k ek . 2 Approximate the derivatives of the (ek )k by D2m ek = K k=1 βk k ek . 3 which implicates Γn X + µD2m : a ∈ RK → K k=1 1 n n i=1 ((ξi )T Ea)ξi + µ(βT a). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
  • 91.
    Practical implementation ofsmooth PCA Let (ek )k≥1 be any functional basis. Then, 1 Approximate the observations by xi = K k=1 ξi k ek . 2 Approximate the derivatives of the (ek )k by D2m ek = K k=1 βk k ek . 3 which implicates Γn X + µD2m : a ∈ RK → K k=1 1 n n i=1 ((ξi )T Ea)ξi + µ(βT a). 4 Smooth PCA is performed by an eigendecomposition of 1 n n i=1 ξi (ξi )T E + µβT . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
  • 92.
    Practical implementation ofsmooth PCA Let (ek )k≥1 be any functional basis. Then, 1 Approximate the observations by xi = K k=1 ξi k ek . 2 Approximate the derivatives of the (ek )k by D2m ek = K k=1 βk k ek . 3 which implicates Γn X + µD2m : a ∈ RK → K k=1 1 n n i=1 ((ξi )T Ea)ξi + µ(βT a). 4 Smooth PCA is performed by an eigendecomposition of 1 n n i=1 ξi (ξi )T E + µβT . Remark: The decomposition D2m ek = K k=1 βk k ek is easy to obtain when using a spline basis ⇒ splines are well designed to represent data with smooth properties (see Presentation 4 for futher details). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 22 / 37
  • 93.
    To conclude, severalreferences. . . Theoretical background for functional PCA: [Deville, 1974] [Dauxois and Pousse, 1976] Smooth PCA: [Besse and Ramsay, 1986] [Pezzulli and Silverman, 1993] [Silverman, 1996] Several examples and discussion: [Ramsay and Silverman, 2002] [Ramsay and Silverman, 1997] Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 23 / 37
  • 94.
    Table of contents 1Motivations 2 Functional Principal Component Analysis 3 Functional linear regression models 4 References Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 24 / 37
  • 95.
    The model We areinterested here in the functional linear regression model: Y is a random variable taking its values in R, X is a random variable taking its values in X, X and Y satisfy the following model: Y = X, α X + where is a real random variable centered and independant of X and α is the parameter to be estimated. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 25 / 37
  • 96.
    The model We areinterested here in the functional linear regression model: Y is a random variable taking its values in R, X is a random variable taking its values in X, X and Y satisfy the following model: Y = X, α X + where is a real random variable centered and independant of X and α is the parameter to be estimated. We are given a training set of size n, (xi, yi)i=1,...,n of independant realizations of the random pair (X, Y). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 25 / 37
  • 97.
    The model We areinterested here in the functional linear regression model: Y is a random variable taking its values in R, X is a random variable taking its values in X, X and Y satisfy the following model: Y = X, α X + where is a real random variable centered and independant of X and α is the parameter to be estimated. Or, alternatively, we are given a training set of size n with errors-in-variables: (wi, yi)i=1,...,n with: wi = xi + ηi (ηi is the realization of a centered random variable independant of Y) yi = xi, α X + i This problem will be investigated in Presentation 4 Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 25 / 37
  • 98.
    Basics about thefunctional linear regression model To avoid useless difficulties, we will then suppose that E (X) = 0. Let us first define the covariance between X and Y as: ∆(X,Y) = E (XY) ∈ X. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
  • 99.
    Basics about thefunctional linear regression model To avoid useless difficulties, we will then suppose that E (X) = 0. Let us first define the covariance between X and Y as: ∆(X,Y) = E (XY) ∈ X. Then, we have: ΓX α = ∆(X,Y). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
  • 100.
    Basics about thefunctional linear regression model To avoid useless difficulties, we will then suppose that E (X) = 0. Let us first define the covariance between X and Y as: ∆(X,Y) = E (XY) ∈ X. Then, we have: ΓX α = ∆(X,Y). But, as ΓX is Hilbert-Schmidt, it is not invertible and thus, the empirical estimate of α (using a generalized inverse of Γn X ) does not converge to α when n tends to infinity. It is a ill-posed inverse problem. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
  • 101.
    Basics about thefunctional linear regression model To avoid useless difficulties, we will then suppose that E (X) = 0. Let us first define the covariance between X and Y as: ∆(X,Y) = E (XY) ∈ X. Then, we have: ΓX α = ∆(X,Y). But, as ΓX is Hilbert-Schmidt, it is not invertible and thus, the empirical estimate of α (using a generalized inverse of Γn X ) does not converge to α when n tends to infinity. It is a ill-posed inverse problem. ⇒ Penalization or regularization are needed to access a relevant estimate. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 26 / 37
  • 102.
    PCA approach References: [Cardotet al., 1999] from the works of [Bosq, 1991] on hilbertian AR models. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
  • 103.
    PCA approach References: [Cardotet al., 1999] from the works of [Bosq, 1991] on hilbertian AR models. PCA decomposition of X: Note ((λn i , vn i ))i≥1 the eigenvalue decomposition of Γn X ((λi)i are ordered in decreasing order and almost n eigenvalues are not null; (vn i )i are orthonormal) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
  • 104.
    PCA approach References: [Cardotet al., 1999] from the works of [Bosq, 1991] on hilbertian AR models. PCA decomposition of X: Note ((λn i , vn i ))i≥1 the eigenvalue decomposition of Γn X ((λi)i are ordered in decreasing order and almost n eigenvalues are not null; (vn i )i are orthonormal) kn an integer such that: kn ≤ n and limn→+∞ kn = +∞ Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
  • 105.
    PCA approach References: [Cardotet al., 1999] from the works of [Bosq, 1991] on hilbertian AR models. PCA decomposition of X: Note ((λn i , vn i ))i≥1 the eigenvalue decomposition of Γn X ((λi)i are ordered in decreasing order and almost n eigenvalues are not null; (vn i )i are orthonormal) kn an integer such that: kn ≤ n and limn→+∞ kn = +∞ Pkn the projector Pkn (u) = kn i=1 vn i , . Xvn i Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
  • 106.
    PCA approach References: [Cardotet al., 1999] from the works of [Bosq, 1991] on hilbertian AR models. PCA decomposition of X: Note ((λn i , vn i ))i≥1 the eigenvalue decomposition of Γn X ((λi)i are ordered in decreasing order and almost n eigenvalues are not null; (vn i )i are orthonormal) kn an integer such that: kn ≤ n and limn→+∞ kn = +∞ Pkn the projector Pkn (u) = kn i=1 vn i , . Xvn i Γn,kn X = Pkn ◦ Γn,kn X ◦ Pkn = 1 n kn i=1 λn i vn i , . Xvn i Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
  • 107.
    PCA approach References: [Cardotet al., 1999] from the works of [Bosq, 1991] on hilbertian AR models. PCA decomposition of X: Note ((λn i , vn i ))i≥1 the eigenvalue decomposition of Γn X ((λi)i are ordered in decreasing order and almost n eigenvalues are not null; (vn i )i are orthonormal) kn an integer such that: kn ≤ n and limn→+∞ kn = +∞ Pkn the projector Pkn (u) = kn i=1 vn i , . Xvn i Γn,kn X = Pkn ◦ Γn,kn X ◦ Pkn = 1 n kn i=1 λn i vn i , . Xvn i ∆n,kn (X,Y) = Pkn ◦ 1 n n i=1 yixi = 1 n i=1,...,n, i =1,...,kn yi xi, vn i Xvn i Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 27 / 37
  • 108.
    Definition of aconsistent estimate for α Define: αn = Γn,kn X + ∆n,kn (X,Y) where Γn,kn X + denotes the generalized inverse (Moore-Penrose): Γn,kn X + = kn i=1 (λn i )−1 vn i , . Xvn i . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 28 / 37
  • 109.
    Assumptions for consistencyresult (A1) (λn i )i=1,...,kn are all distinct and not null a.s. (A2) (λi)i≥1 are all distinct and not null (A3) X is a.s. bounded in X ( X X ≤ C1 a.s.) (A4) is a.s. bounded ( ≤ C2 a.s.) (A5) limn→+∞ nλ4 kn log n = +∞ (A6) limn→+∞ nλ2 kn kn j=1 aj 2 log n = +∞ where a1 = 2 √ 2 λ1−λ2 and aj = 2 √ 2 min(λj−1−λj;λj−λj+1) for j > 1 Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 29 / 37
  • 110.
    Assumptions for consistencyresult (A1) (λn i )i=1,...,kn are all distinct and not null a.s. (A2) (λi)i≥1 are all distinct and not null (A3) X is a.s. bounded in X ( X X ≤ C1 a.s.) (A4) is a.s. bounded ( ≤ C2 a.s.) (A5) limn→+∞ nλ4 kn log n = +∞ (A6) limn→+∞ nλ2 kn kn j=1 aj 2 log n = +∞ where a1 = 2 √ 2 λ1−λ2 and aj = 2 √ 2 min(λj−1−λj;λj−λj+1) for j > 1 Example of ΓX satisfying those assumptions: if the eigenvalues of ΓX are geometrically or exponentially decreasing, these assumptions are fullfilled as long as the sequence (kn)n tends slowly enough to +∞. For example, X is a Brownian motion on [0, 1] and kn = o(log n). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 29 / 37
  • 111.
    Consistency result Theorem [Cardotet al., 1999] Under assumptions (A1)-(A6), we have: αn − α X n→+∞ −−−−−−→ 0 Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 30 / 37
  • 112.
    Smoothing approach basedon B-splines References: [Cardot et al., 2003] Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
  • 113.
    Smoothing approach basedon B-splines References: [Cardot et al., 2003] Suppose that X takes values in L2 ([0, 1]). Basics on B-splines: Let q and k be to integers and denotes by Sqk the space of functions satisfying: s ∈ Sqk are polynomials of degree q on each interval l−1 k , l k for all l = 1, . . . , k, s ∈ Sqk are q − 1 times differentiable on [0, 1]. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
  • 114.
    Smoothing approach basedon B-splines References: [Cardot et al., 2003] Suppose that X takes values in L2 ([0, 1]). Basics on B-splines: Let q and k be to integers and denotes by Sqk the space of functions satisfying: s ∈ Sqk are polynomials of degree q on each interval l−1 k , l k for all l = 1, . . . , k, s ∈ Sqk are q − 1 times differentiable on [0, 1]. The space Sqk has dimension q + k and a normalized basis of Sqk is denoted by {B qk j , j = 1, . . . , q + k} (normalized B-Splines, see [de Boor, 1978]). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
  • 115.
    Smoothing approach basedon B-splines References: [Cardot et al., 2003] Suppose that X takes values in L2 ([0, 1]). Basics on B-splines: Let q and k be to integers and denotes by Sqk the space of functions satisfying: s ∈ Sqk are polynomials of degree q on each interval l−1 k , l k for all l = 1, . . . , k, s ∈ Sqk are q − 1 times differentiable on [0, 1]. The space Sqk has dimension q + k and a normalized basis of Sqk is denoted by {B qk j , j = 1, . . . , q + k} (normalized B-Splines, see [de Boor, 1978]). These functions are easy to manipulate and have interesting smoothness properties. They can be used to express either X and the parameter α as well as to impose smoothness constrains on α. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 31 / 37
  • 116.
    Definition of aconsistent estimate for α Note Bqk := B qk 1 , . . . , B qk q+k T and B (m) qk the m derivatives of Bqk for a m < q + k. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
  • 117.
    Definition of aconsistent estimate for α Note Bqk := B qk 1 , . . . , B qk q+k T and B (m) qk the m derivatives of Bqk for a m < q + k. A penalized mean square estimate: Providing the fact that αn is smooth enough, we aim at finding αn = q+k j=1 an j B qk j = (an )T Bqk solution of the optimization problem: arg min a∈Rq+k 1 n n i=1 yi − aT Bqk , xi X 2 + µ aT B (m) qk 2 X Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
  • 118.
    Definition of aconsistent estimate for α Note Bqk := B qk 1 , . . . , B qk q+k T and B (m) qk the m derivatives of Bqk for a m < q + k. A penalized mean square estimate: Providing the fact that αn is smooth enough, we aim at finding αn = q+k j=1 an j B qk j = (an )T Bqk solution of the optimization problem: arg min a∈Rq+k 1 n n i=1 yi − aT Bqk , xi X 2 mean square criterion +µ aT B (m) qk 2 X smoothness penalization Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
  • 119.
    Definition of aconsistent estimate for α Note Bqk := B qk 1 , . . . , B qk q+k T and B (m) qk the m derivatives of Bqk for a m < q + k. A penalized mean square estimate: Providing the fact that αn is smooth enough, we aim at finding αn = q+k j=1 an j B qk j = (an )T Bqk solution of the optimization problem: arg min a∈Rq+k 1 n n i=1 yi − aT Bqk , xi X 2 mean square criterion +µ aT B (m) qk 2 X smoothness penalization The solution of the previous problem is given by an = Cn + µGn qk −1 bn where Cn is the matrix with components Γn X B qk j , B qk j X (j, j = 1, . . . , q + k), bn is the vector with components ∆n (X,Y) , B qk j X (j = 1, . . . , q + k) and Gn qk is the matrix with components B qk(m) j , B qk(m) j X (j, j = 1, . . . , q + k). Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 32 / 37
  • 120.
    Assumptions for consistencyresult (A1) X is a.s. bounded in X (A2) Var (Y|X = x) ≤ C1 for all x ∈ X (A3) E (Y|X = x) ≤ C2 for all x ∈ X (A4) it exists an integer p and a real ν ∈ [0, 1] such that p + ν ≤ q and α(p )(t1) − α(p )(t2) ≤ |t1 − t2|ν (A5) µ = O n−(1−δ)/2 for a 0 < δ < 1 (A6) limn→+∞ µk2(m−p) = 0 for p = p + ν (A7) k = O n1/(4p+1) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 33 / 37
  • 121.
    Consistency result Theorem [Cardotet al., 2003] Under assumptions (A1)-(A7), limn→+∞ P (it exists a unique solution to the minimization problem) = 1 E αn − α 2 X |x1, . . . , xn = OP n−2p/(4p+1) Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 34 / 37
  • 122.
    Other functional linearmethods Canonical correlation: [Leurgans et al., 1993] Factorial Discriminant Analysis: [Hastie et al., 1995] Partial Least Squares: [Preda and Saporta, 2005] . . . Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 35 / 37
  • 123.
    Table of contents 1Motivations 2 Functional Principal Component Analysis 3 Functional linear regression models 4 References Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 36 / 37
  • 124.
    References Further details forthe references are given in the joint document. Nathalie Villa (IMT & UPVD) Presentation 1 La Havane, Sept. 15th, 2008 37 / 37