Interpretable Sparse Sliced Inverse Regression for
digitized functional data
Victor Picheny, Rémi Servien & Nathalie Villa-Vialaneix
nathalie.villa@toulouse.inra.fr
http://www.nathalievilla.org
Séminaire Institut de Mathématiques de Bordeaux
8 avril 2016
Nathalie Villa-Vialaneix | IS-SIR 1/26
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations
Nathalie Villa-Vialaneix | IS-SIR 2/26
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations
Nathalie Villa-Vialaneix | IS-SIR 3/26
A typical case study: meta-model in agronomy
climate
(daily time series:
rain, temperature...)
plant phenotypes
predictions
(yield, N leaching...)
Agronomic model
Nathalie Villa-Vialaneix | IS-SIR 4/26
A typical case study: meta-model in agronomy
climate
(daily time series:
rain, temperature...)
plant phenotypes
predictions
(yield, N leaching...)
Agronomic model
Agronomic model:
based on biological and chemical knowledge;
Nathalie Villa-Vialaneix | IS-SIR 4/26
A typical case study: meta-model in agronomy
climate
(daily time series:
rain, temperature...)
plant phenotypes
predictions
(yield, N leaching...)
Agronomic model
Agronomic model:
based on biological and chemical knowledge;
computationaly expensive to use;
Nathalie Villa-Vialaneix | IS-SIR 4/26
A typical case study: meta-model in agronomy
climate
(daily time series:
rain, temperature...)
plant phenotypes
predictions
(yield, N leaching...)
Agronomic model
Agronomic model:
based on biological and chemical knowledge;
computationaly expensive to use;
useful for realistic predictions but not to understand the link between
the inputs and the outputs.
Nathalie Villa-Vialaneix | IS-SIR 4/26
A typical case study: meta-model in agronomy
climate
(daily time series:
rain, temperature...)
plant phenotypes
predictions
(yield, N leaching...)
Agronomic model
Agronomic model:
based on biological and chemical knowledge;
computationaly expensive to use;
useful for realistic predictions but not to understand the link between
the inputs and the outputs.
Metamodeling: train a simplified, fast and interpretable model which can
be used as a proxy for the agronomic model.
Nathalie Villa-Vialaneix | IS-SIR 4/26
A first case study: SUNFLO [Casadebaig et al., 2011]
Inputs: 5 daily time series (length: one year) and 8 phenotypes for different
sunflower types
Output: sunflower yield
Data: 1000 sunflower types × 190 climatic series (different places and
years) (n = 190 000) of variables in R5×183
× R8
Nathalie Villa-Vialaneix | IS-SIR 5/26
Main facts obtained from a preliminary study
R. Kpekou internship
The study focused on the influence of the climate on the yield: 5 functional
variables digitized at 183 points.
Nathalie Villa-Vialaneix | IS-SIR 6/26
Main facts obtained from a preliminary study
R. Kpekou internship
The study focused on the influence of the climate on the yield: 5 functional
variables digitized at 183 points.
Main result: Using summary of the variables (mean, sd...) on several
weeks and an automatic aggregating procedure in a random forest
method, led to obtain good accuracy in prediction.
Nathalie Villa-Vialaneix | IS-SIR 6/26
Question and mathematical framework
A functional regression problem: X: random variable (functional) & Y:
random real variable
E(Y|X)?
Nathalie Villa-Vialaneix | IS-SIR 7/26
Question and mathematical framework
A functional regression problem: X: random variable (functional) & Y:
random real variable
E(Y|X)?
Data: n i.i.d. observations (xi, yi)i=1,...,n.
xi is not perfectly known but sampled at (fixed) points
xi = (xi(t1), . . . , xi(tp))T
∈ Rp
. We denote: X =


xT
1
...
xT
n


.
Nathalie Villa-Vialaneix | IS-SIR 7/26
Question and mathematical framework
A functional regression problem: X: random variable (functional) & Y:
random real variable
E(Y|X)?
Data: n i.i.d. observations (xi, yi)i=1,...,n.
xi is not perfectly known but sampled at (fixed) points
xi = (xi(t1), . . . , xi(tp))T
∈ Rp
. We denote: X =


xT
1
...
xT
n


.
Question: Find a model which is easily interpretable and points out
relevant intervals for the prediction within the range of X.
Nathalie Villa-Vialaneix | IS-SIR 7/26
Related works (variable selection in FDA)
LASSO / L1
regularization in linear models
[Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation
points), [Matsui and Konishi, 2011] (selects elements of an expansion
basis), [James et al., 2009] (sparsity on derivatives: piecewise
constant predictors)
[Fraiman et al., 2015] (blinding approach useable for various
problems: PCA, regression...)
[Gregorutti et al., 2015] adaptation of the importance of variables in
random forest for groups of variables
Nathalie Villa-Vialaneix | IS-SIR 8/26
Related works (variable selection in FDA)
LASSO / L1
regularization in linear models
[Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation
points), [Matsui and Konishi, 2011] (selects elements of an expansion
basis), [James et al., 2009] (sparsity on derivatives: piecewise
constant predictors)
[Fraiman et al., 2015] (blinding approach useable for various
problems: PCA, regression...)
[Gregorutti et al., 2015] adaptation of the importance of variables in
random forest for groups of variables
Our proposal: a semi-parametric (not entirely linear) model which selects
relevant intervals combined with an automatic procedure to define the
intervals.
Nathalie Villa-Vialaneix | IS-SIR 8/26
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations
Nathalie Villa-Vialaneix | IS-SIR 9/26
SIR in multidimensional framework
SIR: a semi-parametric regression model for X ∈ Rp
Y = F(aT
1 X, . . . , aT
d X, )
for a1, . . . , ad ∈ Rp
(to be estimated), F : Rd+1
→ R, unknown, and , an
error, independant from X.
Standard assumption for SIR
Y X | PA (X)
in which A is the so-called EDR space, spanned by (ak )k=1,...,d.
Nathalie Villa-Vialaneix | IS-SIR 10/26
Estimation
Equivalence between SIR and eigendecomposition
Nathalie Villa-Vialaneix | IS-SIR 11/26
Estimation
Equivalence between SIR and eigendecomposition
A is included in the space spanned by the first d Σ-orthogonal
eigenvectors of the generalized eigendecomposition problem:
Γa = λΣa, with Σ = E (X − E(X|Y)))T
E(X|Y) and
Γ = E E(X|Y)T
E(X|Y)
Nathalie Villa-Vialaneix | IS-SIR 11/26
Estimation
Equivalence between SIR and eigendecomposition
A is included in the space spanned by the first d Σ-orthogonal
eigenvectors of the generalized eigendecomposition problem:
Γa = λΣa, with Σ = E (X − E(X|Y)))T
E(X|Y) and
Γ = E E(X|Y)T
E(X|Y)
Estimation (when n > p)
compute X = 1
n
n
i=1 xi and ˆΣ = 1
n XT
(X − X)
Nathalie Villa-Vialaneix | IS-SIR 11/26
Estimation
Equivalence between SIR and eigendecomposition
A is included in the space spanned by the first d Σ-orthogonal
eigenvectors of the generalized eigendecomposition problem:
Γa = λΣa, with Σ = E (X − E(X|Y)))T
E(X|Y) and
Γ = E E(X|Y)T
E(X|Y)
Estimation (when n > p)
compute X = 1
n
n
i=1 xi and ˆΣ = 1
n XT
(X − X)
split the range of Y into H different slices: τ1, ... τH and estimate
ˆE(X|Y) = 1
nh i: yi∈τh
xi
h=1,...,H
, with nh = |{i : yi ∈ τh}|,
ˆΓ = ˆE(X|Y)T
DˆE(X|Y) with D = Diag n1
n , . . . , nH
n
Nathalie Villa-Vialaneix | IS-SIR 11/26
Estimation
Equivalence between SIR and eigendecomposition
A is included in the space spanned by the first d Σ-orthogonal
eigenvectors of the generalized eigendecomposition problem:
Γa = λΣa, with Σ = E (X − E(X|Y)))T
E(X|Y) and
Γ = E E(X|Y)T
E(X|Y)
Estimation (when n > p)
compute X = 1
n
n
i=1 xi and ˆΣ = 1
n XT
(X − X)
split the range of Y into H different slices: τ1, ... τH and estimate
ˆE(X|Y) = 1
nh i: yi∈τh
xi
h=1,...,H
, with nh = |{i : yi ∈ τh}|,
ˆΓ = ˆE(X|Y)T
DˆE(X|Y) with D = Diag n1
n , . . . , nH
n
solving the eigendecomposition problem ˆΓa = λˆΣa gives the
eigenvectors a1, . . . , ad ⇒ ˆA = (a1, . . . , ad), p × d
Nathalie Villa-Vialaneix | IS-SIR 11/26
Equivalent formulations
SIR as a regression problem [Li and Yin, 2008] shows that SIR is
equivalent to the (double) minimization of
E(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
for Xh = 1
nh i: yi∈τh
, A a (p × d)-matrix and C a vector in Rd
.
Nathalie Villa-Vialaneix | IS-SIR 12/26
Equivalent formulations
SIR as a regression problem [Li and Yin, 2008] shows that SIR is
equivalent to the (double) minimization of
E(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
for Xh = 1
nh i: yi∈τh
, A a (p × d)-matrix and C a vector in Rd
.
Rk: Given A, C is obtained as the solution of an ordinary least square
problem...
Nathalie Villa-Vialaneix | IS-SIR 12/26
Equivalent formulations
SIR as a regression problem [Li and Yin, 2008] shows that SIR is
equivalent to the (double) minimization of
E(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
for Xh = 1
nh i: yi∈τh
, A a (p × d)-matrix and C a vector in Rd
.
Rk: Given A, C is obtained as the solution of an ordinary least square
problem...
SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]
shows that SIR rewrites as the double optimisation problem
maxaj,φ Cor(φ(Y), aT
j
X), where φ is any function R → R and (aj)j are
Σ-orthonormal.
Nathalie Villa-Vialaneix | IS-SIR 12/26
Equivalent formulations
SIR as a regression problem [Li and Yin, 2008] shows that SIR is
equivalent to the (double) minimization of
E(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
for Xh = 1
nh i: yi∈τh
, A a (p × d)-matrix and C a vector in Rd
.
Rk: Given A, C is obtained as the solution of an ordinary least square
problem...
SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]
shows that SIR rewrites as the double optimisation problem
maxaj,φ Cor(φ(Y), aT
j
X), where φ is any function R → R and (aj)j are
Σ-orthonormal.
Rk: The solution is shown to satisfy φ(y) = aT
j
E(X|Y = y) and aj is
also obtained as the solution of the mean square error problem:
min
aj
E φ(Y) − aT
j X
2
Nathalie Villa-Vialaneix | IS-SIR 12/26
SIR in large dimensions: problem
In large dimension (or in Functional Data Analysis), n < p and ˆΣ is
ill-conditionned and does not have an inverse ⇒ Z = (X − InX
T
)ˆΣ−1/2
can
not be computed.
Nathalie Villa-Vialaneix | IS-SIR 13/26
SIR in large dimensions: problem
In large dimension (or in Functional Data Analysis), n < p and ˆΣ is
ill-conditionned and does not have an inverse ⇒ Z = (X − InX
T
)ˆΣ−1/2
can
not be computed.
Different solutions have been proposed in the litterature based on:
prior dimension reduction (e.g., PCA) [Ferré and Yao, 2003] (in the
framework of FDA)
regularization (ridge...) [Li and Yin, 2008, Bernard-Michel et al., 2008]
sparse SIR
[Li and Yin, 2008, Li and Nachtsheim, 2008, Ni et al., 2005]
Nathalie Villa-Vialaneix | IS-SIR 13/26
SIR in large dimensions: ridge penalty / L2-regularization
of ˆΣ
Following [Li and Yin, 2008] which shows that SIR is equivalent to the
minimization of
E2(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
,
Nathalie Villa-Vialaneix | IS-SIR 14/26
SIR in large dimensions: ridge penalty / L2-regularization
of ˆΣ
Following [Li and Yin, 2008] which shows that SIR is equivalent to the
minimization of
E2(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
+µ2
H
h=1
ˆph ACh
2
,
[Bernard-Michel et al., 2008] propose to penalize by a ridge penalty in a
high dimensional setting.
Nathalie Villa-Vialaneix | IS-SIR 14/26
SIR in large dimensions: ridge penalty / L2-regularization
of ˆΣ
Following [Li and Yin, 2008] which shows that SIR is equivalent to the
minimization of
E2(A, C) =
H
h=1
ˆph Xh − X − ˆΣACh
2
+µ2
H
h=1
ˆph ACh
2
,
[Bernard-Michel et al., 2008] propose to penalize by a ridge penalty in a
high dimensional setting.
They also show that this problem is equivalent to finding the eigenvectors
of the generalized eigenvalue problem
ˆΓa = λ ˆΣ + µ2Ip a.
Nathalie Villa-Vialaneix | IS-SIR 14/26
SIR in large dimensions: sparse versions
Specific issue to introduce sparsity in SIR
sparsity on a multiple-index model. Most authors use shrinkage
approaches.
First version: sparse penalization of the ridge solution
If (ˆA, ˆC) are the solutions of the ridge SIR as described in the previous
slide, [Ni et al., 2005, Li and Yin, 2008] propose to shrink this solution by
minimizing
Es,1(α) =
H
h=1
ˆph Xh − X − ˆΣDiag(α)ˆA ˆCh
2
+ µ1 α L1
(regression formulation of SIR)
Nathalie Villa-Vialaneix | IS-SIR 15/26
SIR in large dimensions: sparse versions
Specific issue to introduce sparsity in SIR
sparsity on a multiple-index model. Most authors use shrinkage
approaches.
Second version: [Li and Nachtsheim, 2008] derive the sparse optimization
problem from the correlation formulation of SIR:
min
as
j
n
i=1
Pˆaj
(X|yi) − (as
j )T
xi
2
+ µ1,j as
j L1
,
in which Pˆaj
is the projection of ˆE(X|Y = yi) = Xh onto the space spanned
by the solution of the ridge problem.
Nathalie Villa-Vialaneix | IS-SIR 15/26
Characteristics of the different approaches and possible
extensions
[Li and Yin, 2008] [Li and Nachtsheim, 2008]
sparsity on shrinkage coefficients estimates
nb optimization pb 1 d
sparsity common to all dims specific to each dim
Nathalie Villa-Vialaneix | IS-SIR 16/26
Characteristics of the different approaches and possible
extensions
[Li and Yin, 2008] [Li and Nachtsheim, 2008]
sparsity on shrinkage coefficients estimates
nb optimization pb 1 d
sparsity common to all dims specific to each dim
Extension to block-sparse SIR (like in PCA)?
Nathalie Villa-Vialaneix | IS-SIR 16/26
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations
Nathalie Villa-Vialaneix | IS-SIR 17/26
IS-SIR: a two step approach
Background: Back to the functional setting, we suppose that t1, ..., tp are
split into D intervals I1, ..., ID.
Nathalie Villa-Vialaneix | IS-SIR 18/26
IS-SIR: a two step approach
Background: Back to the functional setting, we suppose that t1, ..., tp are
split into D intervals I1, ..., ID.
First step: Solve the ridge problem on the digitized functions (viewed as
high dimensional vectors) to obtain ˆA and ˆC:
min
A,C
H
h=1
ˆph Xh − X − ˆΣACh
2
+ µ2
H
h=1
ˆph ACh
2
Nathalie Villa-Vialaneix | IS-SIR 18/26
IS-SIR: a two step approach
Background: Back to the functional setting, we suppose that t1, ..., tp are
split into D intervals I1, ..., ID.
First step: Solve the ridge problem on the digitized functions (viewed as
high dimensional vectors) to obtain ˆA and ˆC:
min
A,C
H
h=1
ˆph Xh − X − ˆΣACh
2
+ µ2
H
h=1
ˆph ACh
2
Second step: Sparse shrinkage using the intervals. If
PˆA (E(X|Y = yi)) = (Xh − X)T ˆA for h st yi ∈ τh and if Pi = (P1
i
, . . . , Pd
i
)T
and Pj
= (Pj
1
, . . . , Pj
n)T
, we solve:
arg min
α∈RD
d
j=1
Pj
− (X∆(ˆaj)) α 2
+ µ1 α L1
with ∆(ˆaj) the (p × D)-matrix such that ∆kl(ˆaj) = ˆajl if tl ∈ Ik and 0
otherwise.
Nathalie Villa-Vialaneix | IS-SIR 18/26
IS-SIR: Characteristics
uses the approach based on the correlation formulation (because the
dimensionality of the optimization problem is smaller);
uses a shrinkage approach and optimizes shrinkage coefficients in a
single optimization problem;
handles functional setting by penalizing entire intervals and not just
isolated points.
Nathalie Villa-Vialaneix | IS-SIR 19/26
Parameter estimation
H (number of slices): usually, SIR is known to be not very sensitive to
the number of slices (> d + 1). We took H = 10 (i.e., 10/30
observations per slice);
Nathalie Villa-Vialaneix | IS-SIR 20/26
Parameter estimation
H (number of slices): usually, SIR is known to be not very sensitive to
the number of slices (> d + 1). We took H = 10 (i.e., 10/30
observations per slice);
µ2 and d (ridge estimate ˆA):
L-fold CV for µ2 (for a d0 large enough) Note that GCV as described in
[Li and Yin, 2008] can not be used since the current version of the L2
penalty involves the use of an estimate of Σ−1
.
Nathalie Villa-Vialaneix | IS-SIR 20/26
Parameter estimation
H (number of slices): usually, SIR is known to be not very sensitive to
the number of slices (> d + 1). We took H = 10 (i.e., 10/30
observations per slice);
µ2 and d (ridge estimate ˆA):
L-fold CV for µ2 (for a d0 large enough)
using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of
R(d) = d − E Tr Πd
ˆΠd ,
in which Πd and ˆΠd are the projector onto the first d dimensions of the
EDR space and its estimate, is derived similarly as in
[Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied
to select a relevant d.
Nathalie Villa-Vialaneix | IS-SIR 20/26
Parameter estimation
H (number of slices): usually, SIR is known to be not very sensitive to
the number of slices (> d + 1). We took H = 10 (i.e., 10/30
observations per slice);
µ2 and d (ridge estimate ˆA):
L-fold CV for µ2 (for a d0 large enough)
using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of
R(d) = d − E Tr Πd
ˆΠd ,
in which Πd and ˆΠd are the projector onto the first d dimensions of the
EDR space and its estimate, is derived similarly as in
[Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied
to select a relevant d.
µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along the
regularization path.
Nathalie Villa-Vialaneix | IS-SIR 20/26
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
Nathalie Villa-Vialaneix | IS-SIR 21/26
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
along the regularization path, select three values for µ1:
Nathalie Villa-Vialaneix | IS-SIR 21/26
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
along the regularization path, select three values for µ1: P% of the
coefficients are zero, P% of the coefficients are non zero, best GCV.
define: D−
(“strong zeros”) and D+
(“strong non zeros”)
Nathalie Villa-Vialaneix | IS-SIR 21/26
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
define: D−
(“strong zeros”) and D+
(“strong non zeros”)
merge consecutive “strong zeros” (or “strong non zeros”) or “strong
zeros” (resp. “strong non zeros”) separated by a few numbers of
intervals which are of undetermined type.
Until no more iterations can be performed.
Nathalie Villa-Vialaneix | IS-SIR 21/26
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
define: D−
(“strong zeros”) and D+
(“strong non zeros”)
merge consecutive “strong zeros” (or “strong non zeros”) or “strong
zeros” (resp. “strong non zeros”) separated by a few numbers of
intervals which are of undetermined type.
Until no more iterations can be performed.
3 Output: Collection of models (first with p intervals, last with 1), M∗
D
(optimal for GCV) and corresponding GCVD versus D (number of
intervals).
Nathalie Villa-Vialaneix | IS-SIR 21/26
An automatic approach to define intervals
1 Initial state: ∀ k = 1, . . . , p, τk = {tk }
2 Iterate
define: D−
(“strong zeros”) and D+
(“strong non zeros”)
merge consecutive “strong zeros” (or “strong non zeros”) or “strong
zeros” (resp. “strong non zeros”) separated by a few numbers of
intervals which are of undetermined type.
Until no more iterations can be performed.
3 Output: Collection of models (first with p intervals, last with 1), M∗
D
(optimal for GCV) and corresponding GCVD versus D (number of
intervals).
Final solution: Minimize GCVD over D.
Nathalie Villa-Vialaneix | IS-SIR 21/26
Sommaire
1 Background and motivation
2 Presentation of SIR
3 Our proposal
4 Simulations
Nathalie Villa-Vialaneix | IS-SIR 22/26
Simulation framework
Data generated with:
Y = d
j=1 log X, aj with X(t) = Z(t) + in which Z is a Gaussian
process with mean µ(t) = −5 + 4t − 4t2
and the Matern 3/2
covariance function with parameters σ = 0.1 and θ = 0.2/
√
3, is a
centered Gaussian variable independant of Z, with standard deviation
0.1.;
aj = sin
t(2+j)π
2 −
(j−1)π
3 IIj
(t)
two models: (M1), d = 1, I1 = [0.2, 0.4]. For (M2), d = 3 and
I1 = [0, 0.1], I2 = [0.5, 0.65] and I3 = [0.65, 0.78].
Nathalie Villa-Vialaneix | IS-SIR 23/26
Simulation framework
Nathalie Villa-Vialaneix | IS-SIR 23/26
Simulation framework
Nathalie Villa-Vialaneix | IS-SIR 23/26
Ridge step (model M1)
Selection of µ2: µ2 = 1
Nathalie Villa-Vialaneix | IS-SIR 24/26
Ridge step (model M1)
Selection of d: d = 1
Nathalie Villa-Vialaneix | IS-SIR 24/26
Definition of the intervals
D = 200 (initial state)
0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.8
a^
1
Nathalie Villa-Vialaneix | IS-SIR 25/26
Definition of the intervals
D = 147 (retained solution)
0.2 0.4 0.6 0.8 1.0
0.000.020.040.060.08
a^
1
Nathalie Villa-Vialaneix | IS-SIR 25/26
Definition of the intervals
D = 43
0.2 0.4 0.6 0.8 1.0
−0.050.000.05
a^
1
Nathalie Villa-Vialaneix | IS-SIR 25/26
Definition of the intervals
D = 5
0.2 0.4 0.6 0.8 1.0
−0.04−0.020.000.020.040.060.08
a^
1
Nathalie Villa-Vialaneix | IS-SIR 25/26
Definition of the intervals
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 50 100 150 200
0.0190.0200.0210.0220.023
Number of intervals
CVerror
Nathalie Villa-Vialaneix | IS-SIR 25/26
Conclusion
IS-SIR:
sparse dimension reduction model adapted to functional framework;
fully automated definition of relevant intervals in the range of the
predictors
Nathalie Villa-Vialaneix | IS-SIR 26/26
Conclusion
IS-SIR:
sparse dimension reduction model adapted to functional framework;
fully automated definition of relevant intervals in the range of the
predictors
Perspective:
application to real data
block-wise sparse SIR?
Nathalie Villa-Vialaneix | IS-SIR 26/26
Aneiros, G. and Vieu, P. (2014).
Variable in infinite-dimensional problems.
Statistics and Probability Letters, 94:12–20.
Bernard-Michel, C., Gardes, L., and Girard, S. (2008).
A note on sliced inverse regression with regularizations.
Biometrics, 64(3):982–986.
Casadebaig, P., Guilioni, L., Lecoeur, J., Christophe, A., Champolivier, L., and Debaeke, P. (2011).
SUNFLO, a model to simulate genotype-specific performance of the sunflower crop in contrasting environments.
Agricultural and Forest Meteorology, 151(2):163–178.
Ferraty, F., Hall, P., and Vieu, P. (2010).
Most-predictive design points for functiona data predictors.
Biometrika, 97(4):807–824.
Ferré, L. and Yao, A. (2003).
Functional sliced inverse regression analysis.
Statistics, 37(6):475–488.
Fraiman, R., Gimenez, Y., and Svarc, M. (2015).
Feature selection for functional data.
Journal of Multivariate Analysis.
In Press.
Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015).
Grouped variable importance with random forests and application to multiple functional data analysis.
Computational Statistics and Data Analysis, 90:15–35.
James, G., Wang, J., and Zhu, J. (2009).
Functional linear regression that’s interpretable.
Annals of Statistics, 37(5A):2083–2108.
Li, L. and Nachtsheim, C. (2008).
Nathalie Villa-Vialaneix | IS-SIR 26/26
Sparse sliced inverse regression.
Technometrics, 48(4):503–510.
Li, L. and Yin, X. (2008).
Sliced inverse regression with regularizations.
Biometrics, 64:124–131.
Liquet, B. and Saracco, J. (2012).
A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approches.
Computational Statistics, 27(1):103–125.
Matsui, H. and Konishi, S. (2011).
Variable selection for functional regression models via the l1 regularization.
Computational Statistics and Data Analysis, 55(12):3304–3310.
Ni, L., Cook, D., and Tsai, C. (2005).
A note on shrinkage sliced inverse regression.
Biometrika, 92(1):242–247.
Nathalie Villa-Vialaneix | IS-SIR 26/26

Interpretable Sparse Sliced Inverse Regression for digitized functional data

  • 1.
    Interpretable Sparse SlicedInverse Regression for digitized functional data Victor Picheny, Rémi Servien & Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org Séminaire Institut de Mathématiques de Bordeaux 8 avril 2016 Nathalie Villa-Vialaneix | IS-SIR 1/26
  • 2.
    Sommaire 1 Background andmotivation 2 Presentation of SIR 3 Our proposal 4 Simulations Nathalie Villa-Vialaneix | IS-SIR 2/26
  • 3.
    Sommaire 1 Background andmotivation 2 Presentation of SIR 3 Our proposal 4 Simulations Nathalie Villa-Vialaneix | IS-SIR 3/26
  • 4.
    A typical casestudy: meta-model in agronomy climate (daily time series: rain, temperature...) plant phenotypes predictions (yield, N leaching...) Agronomic model Nathalie Villa-Vialaneix | IS-SIR 4/26
  • 5.
    A typical casestudy: meta-model in agronomy climate (daily time series: rain, temperature...) plant phenotypes predictions (yield, N leaching...) Agronomic model Agronomic model: based on biological and chemical knowledge; Nathalie Villa-Vialaneix | IS-SIR 4/26
  • 6.
    A typical casestudy: meta-model in agronomy climate (daily time series: rain, temperature...) plant phenotypes predictions (yield, N leaching...) Agronomic model Agronomic model: based on biological and chemical knowledge; computationaly expensive to use; Nathalie Villa-Vialaneix | IS-SIR 4/26
  • 7.
    A typical casestudy: meta-model in agronomy climate (daily time series: rain, temperature...) plant phenotypes predictions (yield, N leaching...) Agronomic model Agronomic model: based on biological and chemical knowledge; computationaly expensive to use; useful for realistic predictions but not to understand the link between the inputs and the outputs. Nathalie Villa-Vialaneix | IS-SIR 4/26
  • 8.
    A typical casestudy: meta-model in agronomy climate (daily time series: rain, temperature...) plant phenotypes predictions (yield, N leaching...) Agronomic model Agronomic model: based on biological and chemical knowledge; computationaly expensive to use; useful for realistic predictions but not to understand the link between the inputs and the outputs. Metamodeling: train a simplified, fast and interpretable model which can be used as a proxy for the agronomic model. Nathalie Villa-Vialaneix | IS-SIR 4/26
  • 9.
    A first casestudy: SUNFLO [Casadebaig et al., 2011] Inputs: 5 daily time series (length: one year) and 8 phenotypes for different sunflower types Output: sunflower yield Data: 1000 sunflower types × 190 climatic series (different places and years) (n = 190 000) of variables in R5×183 × R8 Nathalie Villa-Vialaneix | IS-SIR 5/26
  • 10.
    Main facts obtainedfrom a preliminary study R. Kpekou internship The study focused on the influence of the climate on the yield: 5 functional variables digitized at 183 points. Nathalie Villa-Vialaneix | IS-SIR 6/26
  • 11.
    Main facts obtainedfrom a preliminary study R. Kpekou internship The study focused on the influence of the climate on the yield: 5 functional variables digitized at 183 points. Main result: Using summary of the variables (mean, sd...) on several weeks and an automatic aggregating procedure in a random forest method, led to obtain good accuracy in prediction. Nathalie Villa-Vialaneix | IS-SIR 6/26
  • 12.
    Question and mathematicalframework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Nathalie Villa-Vialaneix | IS-SIR 7/26
  • 13.
    Question and mathematicalframework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Data: n i.i.d. observations (xi, yi)i=1,...,n. xi is not perfectly known but sampled at (fixed) points xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =   xT 1 ... xT n   . Nathalie Villa-Vialaneix | IS-SIR 7/26
  • 14.
    Question and mathematicalframework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Data: n i.i.d. observations (xi, yi)i=1,...,n. xi is not perfectly known but sampled at (fixed) points xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =   xT 1 ... xT n   . Question: Find a model which is easily interpretable and points out relevant intervals for the prediction within the range of X. Nathalie Villa-Vialaneix | IS-SIR 7/26
  • 15.
    Related works (variableselection in FDA) LASSO / L1 regularization in linear models [Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation points), [Matsui and Konishi, 2011] (selects elements of an expansion basis), [James et al., 2009] (sparsity on derivatives: piecewise constant predictors) [Fraiman et al., 2015] (blinding approach useable for various problems: PCA, regression...) [Gregorutti et al., 2015] adaptation of the importance of variables in random forest for groups of variables Nathalie Villa-Vialaneix | IS-SIR 8/26
  • 16.
    Related works (variableselection in FDA) LASSO / L1 regularization in linear models [Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation points), [Matsui and Konishi, 2011] (selects elements of an expansion basis), [James et al., 2009] (sparsity on derivatives: piecewise constant predictors) [Fraiman et al., 2015] (blinding approach useable for various problems: PCA, regression...) [Gregorutti et al., 2015] adaptation of the importance of variables in random forest for groups of variables Our proposal: a semi-parametric (not entirely linear) model which selects relevant intervals combined with an automatic procedure to define the intervals. Nathalie Villa-Vialaneix | IS-SIR 8/26
  • 17.
    Sommaire 1 Background andmotivation 2 Presentation of SIR 3 Our proposal 4 Simulations Nathalie Villa-Vialaneix | IS-SIR 9/26
  • 18.
    SIR in multidimensionalframework SIR: a semi-parametric regression model for X ∈ Rp Y = F(aT 1 X, . . . , aT d X, ) for a1, . . . , ad ∈ Rp (to be estimated), F : Rd+1 → R, unknown, and , an error, independant from X. Standard assumption for SIR Y X | PA (X) in which A is the so-called EDR space, spanned by (ak )k=1,...,d. Nathalie Villa-Vialaneix | IS-SIR 10/26
  • 19.
    Estimation Equivalence between SIRand eigendecomposition Nathalie Villa-Vialaneix | IS-SIR 11/26
  • 20.
    Estimation Equivalence between SIRand eigendecomposition A is included in the space spanned by the first d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, with Σ = E (X − E(X|Y)))T E(X|Y) and Γ = E E(X|Y)T E(X|Y) Nathalie Villa-Vialaneix | IS-SIR 11/26
  • 21.
    Estimation Equivalence between SIRand eigendecomposition A is included in the space spanned by the first d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, with Σ = E (X − E(X|Y)))T E(X|Y) and Γ = E E(X|Y)T E(X|Y) Estimation (when n > p) compute X = 1 n n i=1 xi and ˆΣ = 1 n XT (X − X) Nathalie Villa-Vialaneix | IS-SIR 11/26
  • 22.
    Estimation Equivalence between SIRand eigendecomposition A is included in the space spanned by the first d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, with Σ = E (X − E(X|Y)))T E(X|Y) and Γ = E E(X|Y)T E(X|Y) Estimation (when n > p) compute X = 1 n n i=1 xi and ˆΣ = 1 n XT (X − X) split the range of Y into H different slices: τ1, ... τH and estimate ˆE(X|Y) = 1 nh i: yi∈τh xi h=1,...,H , with nh = |{i : yi ∈ τh}|, ˆΓ = ˆE(X|Y)T DˆE(X|Y) with D = Diag n1 n , . . . , nH n Nathalie Villa-Vialaneix | IS-SIR 11/26
  • 23.
    Estimation Equivalence between SIRand eigendecomposition A is included in the space spanned by the first d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, with Σ = E (X − E(X|Y)))T E(X|Y) and Γ = E E(X|Y)T E(X|Y) Estimation (when n > p) compute X = 1 n n i=1 xi and ˆΣ = 1 n XT (X − X) split the range of Y into H different slices: τ1, ... τH and estimate ˆE(X|Y) = 1 nh i: yi∈τh xi h=1,...,H , with nh = |{i : yi ∈ τh}|, ˆΓ = ˆE(X|Y)T DˆE(X|Y) with D = Diag n1 n , . . . , nH n solving the eigendecomposition problem ˆΓa = λˆΣa gives the eigenvectors a1, . . . , ad ⇒ ˆA = (a1, . . . , ad), p × d Nathalie Villa-Vialaneix | IS-SIR 11/26
  • 24.
    Equivalent formulations SIR asa regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Nathalie Villa-Vialaneix | IS-SIR 12/26
  • 25.
    Equivalent formulations SIR asa regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Rk: Given A, C is obtained as the solution of an ordinary least square problem... Nathalie Villa-Vialaneix | IS-SIR 12/26
  • 26.
    Equivalent formulations SIR asa regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Rk: Given A, C is obtained as the solution of an ordinary least square problem... SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008] shows that SIR rewrites as the double optimisation problem maxaj,φ Cor(φ(Y), aT j X), where φ is any function R → R and (aj)j are Σ-orthonormal. Nathalie Villa-Vialaneix | IS-SIR 12/26
  • 27.
    Equivalent formulations SIR asa regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Rk: Given A, C is obtained as the solution of an ordinary least square problem... SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008] shows that SIR rewrites as the double optimisation problem maxaj,φ Cor(φ(Y), aT j X), where φ is any function R → R and (aj)j are Σ-orthonormal. Rk: The solution is shown to satisfy φ(y) = aT j E(X|Y = y) and aj is also obtained as the solution of the mean square error problem: min aj E φ(Y) − aT j X 2 Nathalie Villa-Vialaneix | IS-SIR 12/26
  • 28.
    SIR in largedimensions: problem In large dimension (or in Functional Data Analysis), n < p and ˆΣ is ill-conditionned and does not have an inverse ⇒ Z = (X − InX T )ˆΣ−1/2 can not be computed. Nathalie Villa-Vialaneix | IS-SIR 13/26
  • 29.
    SIR in largedimensions: problem In large dimension (or in Functional Data Analysis), n < p and ˆΣ is ill-conditionned and does not have an inverse ⇒ Z = (X − InX T )ˆΣ−1/2 can not be computed. Different solutions have been proposed in the litterature based on: prior dimension reduction (e.g., PCA) [Ferré and Yao, 2003] (in the framework of FDA) regularization (ridge...) [Li and Yin, 2008, Bernard-Michel et al., 2008] sparse SIR [Li and Yin, 2008, Li and Nachtsheim, 2008, Ni et al., 2005] Nathalie Villa-Vialaneix | IS-SIR 13/26
  • 30.
    SIR in largedimensions: ridge penalty / L2-regularization of ˆΣ Following [Li and Yin, 2008] which shows that SIR is equivalent to the minimization of E2(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 , Nathalie Villa-Vialaneix | IS-SIR 14/26
  • 31.
    SIR in largedimensions: ridge penalty / L2-regularization of ˆΣ Following [Li and Yin, 2008] which shows that SIR is equivalent to the minimization of E2(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 +µ2 H h=1 ˆph ACh 2 , [Bernard-Michel et al., 2008] propose to penalize by a ridge penalty in a high dimensional setting. Nathalie Villa-Vialaneix | IS-SIR 14/26
  • 32.
    SIR in largedimensions: ridge penalty / L2-regularization of ˆΣ Following [Li and Yin, 2008] which shows that SIR is equivalent to the minimization of E2(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 +µ2 H h=1 ˆph ACh 2 , [Bernard-Michel et al., 2008] propose to penalize by a ridge penalty in a high dimensional setting. They also show that this problem is equivalent to finding the eigenvectors of the generalized eigenvalue problem ˆΓa = λ ˆΣ + µ2Ip a. Nathalie Villa-Vialaneix | IS-SIR 14/26
  • 33.
    SIR in largedimensions: sparse versions Specific issue to introduce sparsity in SIR sparsity on a multiple-index model. Most authors use shrinkage approaches. First version: sparse penalization of the ridge solution If (ˆA, ˆC) are the solutions of the ridge SIR as described in the previous slide, [Ni et al., 2005, Li and Yin, 2008] propose to shrink this solution by minimizing Es,1(α) = H h=1 ˆph Xh − X − ˆΣDiag(α)ˆA ˆCh 2 + µ1 α L1 (regression formulation of SIR) Nathalie Villa-Vialaneix | IS-SIR 15/26
  • 34.
    SIR in largedimensions: sparse versions Specific issue to introduce sparsity in SIR sparsity on a multiple-index model. Most authors use shrinkage approaches. Second version: [Li and Nachtsheim, 2008] derive the sparse optimization problem from the correlation formulation of SIR: min as j n i=1 Pˆaj (X|yi) − (as j )T xi 2 + µ1,j as j L1 , in which Pˆaj is the projection of ˆE(X|Y = yi) = Xh onto the space spanned by the solution of the ridge problem. Nathalie Villa-Vialaneix | IS-SIR 15/26
  • 35.
    Characteristics of thedifferent approaches and possible extensions [Li and Yin, 2008] [Li and Nachtsheim, 2008] sparsity on shrinkage coefficients estimates nb optimization pb 1 d sparsity common to all dims specific to each dim Nathalie Villa-Vialaneix | IS-SIR 16/26
  • 36.
    Characteristics of thedifferent approaches and possible extensions [Li and Yin, 2008] [Li and Nachtsheim, 2008] sparsity on shrinkage coefficients estimates nb optimization pb 1 d sparsity common to all dims specific to each dim Extension to block-sparse SIR (like in PCA)? Nathalie Villa-Vialaneix | IS-SIR 16/26
  • 37.
    Sommaire 1 Background andmotivation 2 Presentation of SIR 3 Our proposal 4 Simulations Nathalie Villa-Vialaneix | IS-SIR 17/26
  • 38.
    IS-SIR: a twostep approach Background: Back to the functional setting, we suppose that t1, ..., tp are split into D intervals I1, ..., ID. Nathalie Villa-Vialaneix | IS-SIR 18/26
  • 39.
    IS-SIR: a twostep approach Background: Back to the functional setting, we suppose that t1, ..., tp are split into D intervals I1, ..., ID. First step: Solve the ridge problem on the digitized functions (viewed as high dimensional vectors) to obtain ˆA and ˆC: min A,C H h=1 ˆph Xh − X − ˆΣACh 2 + µ2 H h=1 ˆph ACh 2 Nathalie Villa-Vialaneix | IS-SIR 18/26
  • 40.
    IS-SIR: a twostep approach Background: Back to the functional setting, we suppose that t1, ..., tp are split into D intervals I1, ..., ID. First step: Solve the ridge problem on the digitized functions (viewed as high dimensional vectors) to obtain ˆA and ˆC: min A,C H h=1 ˆph Xh − X − ˆΣACh 2 + µ2 H h=1 ˆph ACh 2 Second step: Sparse shrinkage using the intervals. If PˆA (E(X|Y = yi)) = (Xh − X)T ˆA for h st yi ∈ τh and if Pi = (P1 i , . . . , Pd i )T and Pj = (Pj 1 , . . . , Pj n)T , we solve: arg min α∈RD d j=1 Pj − (X∆(ˆaj)) α 2 + µ1 α L1 with ∆(ˆaj) the (p × D)-matrix such that ∆kl(ˆaj) = ˆajl if tl ∈ Ik and 0 otherwise. Nathalie Villa-Vialaneix | IS-SIR 18/26
  • 41.
    IS-SIR: Characteristics uses theapproach based on the correlation formulation (because the dimensionality of the optimization problem is smaller); uses a shrinkage approach and optimizes shrinkage coefficients in a single optimization problem; handles functional setting by penalizing entire intervals and not just isolated points. Nathalie Villa-Vialaneix | IS-SIR 19/26
  • 42.
    Parameter estimation H (numberof slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); Nathalie Villa-Vialaneix | IS-SIR 20/26
  • 43.
    Parameter estimation H (numberof slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); µ2 and d (ridge estimate ˆA): L-fold CV for µ2 (for a d0 large enough) Note that GCV as described in [Li and Yin, 2008] can not be used since the current version of the L2 penalty involves the use of an estimate of Σ−1 . Nathalie Villa-Vialaneix | IS-SIR 20/26
  • 44.
    Parameter estimation H (numberof slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); µ2 and d (ridge estimate ˆA): L-fold CV for µ2 (for a d0 large enough) using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of R(d) = d − E Tr Πd ˆΠd , in which Πd and ˆΠd are the projector onto the first d dimensions of the EDR space and its estimate, is derived similarly as in [Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied to select a relevant d. Nathalie Villa-Vialaneix | IS-SIR 20/26
  • 45.
    Parameter estimation H (numberof slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); µ2 and d (ridge estimate ˆA): L-fold CV for µ2 (for a d0 large enough) using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of R(d) = d − E Tr Πd ˆΠd , in which Πd and ˆΠd are the projector onto the first d dimensions of the EDR space and its estimate, is derived similarly as in [Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied to select a relevant d. µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along the regularization path. Nathalie Villa-Vialaneix | IS-SIR 20/26
  • 46.
    An automatic approachto define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } Nathalie Villa-Vialaneix | IS-SIR 21/26
  • 47.
    An automatic approachto define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate along the regularization path, select three values for µ1: Nathalie Villa-Vialaneix | IS-SIR 21/26
  • 48.
    An automatic approachto define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate along the regularization path, select three values for µ1: P% of the coefficients are zero, P% of the coefficients are non zero, best GCV. define: D− (“strong zeros”) and D+ (“strong non zeros”) Nathalie Villa-Vialaneix | IS-SIR 21/26
  • 49.
    An automatic approachto define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate define: D− (“strong zeros”) and D+ (“strong non zeros”) merge consecutive “strong zeros” (or “strong non zeros”) or “strong zeros” (resp. “strong non zeros”) separated by a few numbers of intervals which are of undetermined type. Until no more iterations can be performed. Nathalie Villa-Vialaneix | IS-SIR 21/26
  • 50.
    An automatic approachto define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate define: D− (“strong zeros”) and D+ (“strong non zeros”) merge consecutive “strong zeros” (or “strong non zeros”) or “strong zeros” (resp. “strong non zeros”) separated by a few numbers of intervals which are of undetermined type. Until no more iterations can be performed. 3 Output: Collection of models (first with p intervals, last with 1), M∗ D (optimal for GCV) and corresponding GCVD versus D (number of intervals). Nathalie Villa-Vialaneix | IS-SIR 21/26
  • 51.
    An automatic approachto define intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate define: D− (“strong zeros”) and D+ (“strong non zeros”) merge consecutive “strong zeros” (or “strong non zeros”) or “strong zeros” (resp. “strong non zeros”) separated by a few numbers of intervals which are of undetermined type. Until no more iterations can be performed. 3 Output: Collection of models (first with p intervals, last with 1), M∗ D (optimal for GCV) and corresponding GCVD versus D (number of intervals). Final solution: Minimize GCVD over D. Nathalie Villa-Vialaneix | IS-SIR 21/26
  • 52.
    Sommaire 1 Background andmotivation 2 Presentation of SIR 3 Our proposal 4 Simulations Nathalie Villa-Vialaneix | IS-SIR 22/26
  • 53.
    Simulation framework Data generatedwith: Y = d j=1 log X, aj with X(t) = Z(t) + in which Z is a Gaussian process with mean µ(t) = −5 + 4t − 4t2 and the Matern 3/2 covariance function with parameters σ = 0.1 and θ = 0.2/ √ 3, is a centered Gaussian variable independant of Z, with standard deviation 0.1.; aj = sin t(2+j)π 2 − (j−1)π 3 IIj (t) two models: (M1), d = 1, I1 = [0.2, 0.4]. For (M2), d = 3 and I1 = [0, 0.1], I2 = [0.5, 0.65] and I3 = [0.65, 0.78]. Nathalie Villa-Vialaneix | IS-SIR 23/26
  • 54.
  • 55.
  • 56.
    Ridge step (modelM1) Selection of µ2: µ2 = 1 Nathalie Villa-Vialaneix | IS-SIR 24/26
  • 57.
    Ridge step (modelM1) Selection of d: d = 1 Nathalie Villa-Vialaneix | IS-SIR 24/26
  • 58.
    Definition of theintervals D = 200 (initial state) 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.8 a^ 1 Nathalie Villa-Vialaneix | IS-SIR 25/26
  • 59.
    Definition of theintervals D = 147 (retained solution) 0.2 0.4 0.6 0.8 1.0 0.000.020.040.060.08 a^ 1 Nathalie Villa-Vialaneix | IS-SIR 25/26
  • 60.
    Definition of theintervals D = 43 0.2 0.4 0.6 0.8 1.0 −0.050.000.05 a^ 1 Nathalie Villa-Vialaneix | IS-SIR 25/26
  • 61.
    Definition of theintervals D = 5 0.2 0.4 0.6 0.8 1.0 −0.04−0.020.000.020.040.060.08 a^ 1 Nathalie Villa-Vialaneix | IS-SIR 25/26
  • 62.
    Definition of theintervals q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q 0 50 100 150 200 0.0190.0200.0210.0220.023 Number of intervals CVerror Nathalie Villa-Vialaneix | IS-SIR 25/26
  • 63.
    Conclusion IS-SIR: sparse dimension reductionmodel adapted to functional framework; fully automated definition of relevant intervals in the range of the predictors Nathalie Villa-Vialaneix | IS-SIR 26/26
  • 64.
    Conclusion IS-SIR: sparse dimension reductionmodel adapted to functional framework; fully automated definition of relevant intervals in the range of the predictors Perspective: application to real data block-wise sparse SIR? Nathalie Villa-Vialaneix | IS-SIR 26/26
  • 65.
    Aneiros, G. andVieu, P. (2014). Variable in infinite-dimensional problems. Statistics and Probability Letters, 94:12–20. Bernard-Michel, C., Gardes, L., and Girard, S. (2008). A note on sliced inverse regression with regularizations. Biometrics, 64(3):982–986. Casadebaig, P., Guilioni, L., Lecoeur, J., Christophe, A., Champolivier, L., and Debaeke, P. (2011). SUNFLO, a model to simulate genotype-specific performance of the sunflower crop in contrasting environments. Agricultural and Forest Meteorology, 151(2):163–178. Ferraty, F., Hall, P., and Vieu, P. (2010). Most-predictive design points for functiona data predictors. Biometrika, 97(4):807–824. Ferré, L. and Yao, A. (2003). Functional sliced inverse regression analysis. Statistics, 37(6):475–488. Fraiman, R., Gimenez, Y., and Svarc, M. (2015). Feature selection for functional data. Journal of Multivariate Analysis. In Press. Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015). Grouped variable importance with random forests and application to multiple functional data analysis. Computational Statistics and Data Analysis, 90:15–35. James, G., Wang, J., and Zhu, J. (2009). Functional linear regression that’s interpretable. Annals of Statistics, 37(5A):2083–2108. Li, L. and Nachtsheim, C. (2008). Nathalie Villa-Vialaneix | IS-SIR 26/26
  • 66.
    Sparse sliced inverseregression. Technometrics, 48(4):503–510. Li, L. and Yin, X. (2008). Sliced inverse regression with regularizations. Biometrics, 64:124–131. Liquet, B. and Saracco, J. (2012). A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approches. Computational Statistics, 27(1):103–125. Matsui, H. and Konishi, S. (2011). Variable selection for functional regression models via the l1 regularization. Computational Statistics and Data Analysis, 55(12):3304–3310. Ni, L., Cook, D., and Tsai, C. (2005). A note on shrinkage sliced inverse regression. Biometrika, 92(1):242–247. Nathalie Villa-Vialaneix | IS-SIR 26/26