Successfully reported this slideshow.
Upcoming SlideShare
×

251 views

Published on

Chaire en mathématiques appliquées OQUAIDO
Programme Journées données fonctionnelles
19/20 juin 2017, Institut de Mathématiques de Toulouse

Published in: Science
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

1. 1. About functional SIR Victor Picheny, Rémi Servien & Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org Journées “Données fonctionnelles” Institut de Mathématiques de Toulouse, June 19th 2017 Nathalie Villa-Vialaneix | SISIR 1/34
2. 2. A joint work of SFCB team Victor Picheny Rémi Servien NV2 Nathalie Villa-Vialaneix | SISIR 2/34
3. 3. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 3/34
4. 4. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 4/34
5. 5. Introduction X a functional random variable and Y ∈ R n i.i.d. realizations of (X, Y) Nathalie Villa-Vialaneix | SISIR 5/34
6. 6. Objectives variable selection in functional regression selection of full intervals made of consecutive points without any a priori information on the intervals fully data-driven procedure Nathalie Villa-Vialaneix | SISIR 6/34
7. 7. Question and mathematical framework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Nathalie Villa-Vialaneix | SISIR 7/34
8. 8. Question and mathematical framework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Data: n i.i.d. observations (xi, yi)i=1,...,n. xi is not perfectly known but sampled at (ﬁxed) points xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =   xT 1 ... xT n   . Nathalie Villa-Vialaneix | SISIR 7/34
9. 9. Question and mathematical framework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Data: n i.i.d. observations (xi, yi)i=1,...,n. xi is not perfectly known but sampled at (ﬁxed) points xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =   xT 1 ... xT n   . Question: Find a model that is easily interpretable and points out relevant intervals for the prediction within the deﬁnition domain of X. Nathalie Villa-Vialaneix | SISIR 7/34
10. 10. Question and mathematical framework A functional regression problem: X: random variable (functional) & Y: random real variable E(Y|X)? Data: n i.i.d. observations (xi, yi)i=1,...,n. xi is not perfectly known but sampled at (ﬁxed) points xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =   xT 1 ... xT n   . Question: Find a model that is easily interpretable and points out relevant intervals for the prediction within the deﬁnition domain of X. Method: Do not expand X on a functional basis but use the fact that the entries of the digitized function xi are ordered in a natural way. Nathalie Villa-Vialaneix | SISIR 7/34
11. 11. Related works (variable selection in FDA) LASSO / L1 regularization in linear models [Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation points), [Matsui and Konishi, 2011] (selects elements of an expansion basis) [Fraiman et al., 2016] (blinding approach usable for various problems: PCA, regression...) [Gregorutti et al., 2015] adaptation of the importance of variables in random forest for groups of variables [Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and a greedy update of the selected evaluation points to select the most relevant evaluation points in a nonparametric framework Nathalie Villa-Vialaneix | SISIR 8/34
12. 12. Related works (variable selection in FDA) LASSO / L1 regularization in linear models [Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluation points), [Matsui and Konishi, 2011] (selects elements of an expansion basis) [Fraiman et al., 2016] (blinding approach usable for various problems: PCA, regression...) [Gregorutti et al., 2015] adaptation of the importance of variables in random forest for groups of variables [Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and a greedy update of the selected evaluation points to select the most relevant evaluation points in a nonparametric framework However, none of these approach propose to automatically design and select contiguous sets of variables. Nathalie Villa-Vialaneix | SISIR 8/34
13. 13. Related works (selection of groups of variables) [James et al., 2009] L1 regularization in linear model with sparsity on derivatives: piecewise constant predictors [Park et al., 2016] criterion based on a minimization of the overall correlation during a greedy segmentation [Grollemund et al., 2017] Bayesian approach in which a posteriori distribution about informative intervals can be obtained Nathalie Villa-Vialaneix | SISIR 9/34
14. 14. Related works (selection of groups of variables) [James et al., 2009] L1 regularization in linear model with sparsity on derivatives: piecewise constant predictors [Park et al., 2016] criterion based on a minimization of the overall correlation during a greedy segmentation [Grollemund et al., 2017] Bayesian approach in which a posteriori distribution about informative intervals can be obtained All are proposed in the framework of the linear model and the second one does not use the target variable to deﬁne and select relevant intervals. Nathalie Villa-Vialaneix | SISIR 9/34
15. 15. Related works (selection of groups of variables) [James et al., 2009] L1 regularization in linear model with sparsity on derivatives: piecewise constant predictors [Park et al., 2016] criterion based on a minimization of the overall correlation during a greedy segmentation [Grollemund et al., 2017] Bayesian approach in which a posteriori distribution about informative intervals can be obtained All are proposed in the framework of the linear model and the second one does not use the target variable to deﬁne and select relevant intervals. Our proposal: a semi-parametric (not entirely linear) model which selects relevant intervals combined with an automatic procedure to deﬁne the intervals. Nathalie Villa-Vialaneix | SISIR 9/34
16. 16. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 10/34
17. 17. SIR in multidimensional framework SIR: a semi-parametric regression model for X ∈ Rp Y = F(aT 1 X, . . . , aT d X, ) for a1, . . . , ad ∈ Rp (to be estimated), F : Rd+1 → R, unknown, and , an error, independant from X. Standard assumption for SIR Y X | PA (X) in which A is the so-called EDR space, spanned by (ak )k=1,...,d. Nathalie Villa-Vialaneix | SISIR 11/34
18. 18. SIR in multidimensional framework SIR: a semi-parametric regression model for X ∈ Rp Y = F(aT 1 X, . . . , aT d X, ) for a1, . . . , ad ∈ Rp (to be estimated), F : Rd+1 → R, unknown, and , an error, independant from X. Standard assumption for SIR Y X | PA (X) in which A is the so-called EDR space, spanned by (ak )k=1,...,d. SIR is the regression extension of Linear Discriminant Analysis. Nathalie Villa-Vialaneix | SISIR 11/34
19. 19. Estimation Equivalence between SIR and eigendecomposition Nathalie Villa-Vialaneix | SISIR 12/34
20. 20. Estimation Equivalence between SIR and eigendecomposition A is included in the space spanned by the ﬁrst d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of E(X|Y) Nathalie Villa-Vialaneix | SISIR 12/34
21. 21. Estimation Equivalence between SIR and eigendecomposition A is included in the space spanned by the ﬁrst d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of E(X|Y) Estimation (when n > p) compute X = 1 n n i=1 xi and ˆΣ = 1 n XT (X − X) Nathalie Villa-Vialaneix | SISIR 12/34
22. 22. Estimation Equivalence between SIR and eigendecomposition A is included in the space spanned by the ﬁrst d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of E(X|Y) Estimation (when n > p) compute X = 1 n n i=1 xi and ˆΣ = 1 n XT (X − X) split the range of Y into H different slices: τ1, ... τH and estimate ˆE(X|Y) = 1 nh i: yi∈τh xi h=1,...,H , with nh = |{i : yi ∈ τh}|, in each slice, to obtain an estimate of ˆΓ Nathalie Villa-Vialaneix | SISIR 12/34
23. 23. Estimation Equivalence between SIR and eigendecomposition A is included in the space spanned by the ﬁrst d Σ-orthogonal eigenvectors of the generalized eigendecomposition problem: Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix of E(X|Y) Estimation (when n > p) compute X = 1 n n i=1 xi and ˆΣ = 1 n XT (X − X) split the range of Y into H different slices: τ1, ... τH and estimate ˆE(X|Y) = 1 nh i: yi∈τh xi h=1,...,H , with nh = |{i : yi ∈ τh}|, in each slice, to obtain an estimate of ˆΓ solve the eigendecomposition problem ˆΓa = λˆΣa and obtain the eigenvectors a1, . . . , ad Nathalie Villa-Vialaneix | SISIR 12/34
24. 24. SIR in large dimensions: problem In large dimension (or in Functional Data Analysis), n < p and ˆΣ is ill-conditionned and does not have an inverse ⇒ Z = (X − InX T )ˆΣ−1/2 can not be computed. Nathalie Villa-Vialaneix | SISIR 13/34
25. 25. SIR in large dimensions: problem In large dimension (or in Functional Data Analysis), n < p and ˆΣ is ill-conditionned and does not have an inverse ⇒ Z = (X − InX T )ˆΣ−1/2 can not be computed. Different solutions have been proposed in the litterature based on: prior dimension reduction (e.g., PCA) [Ferré and Yao, 2003] (in the framework of FDA) regularization (ridge...) [Li and Yin, 2008, Bernard-Michel et al., 2008]: equivalent to the generalized eigendecomposition problem ˆΓa = λ(ˆΣ + µ2I)a sparse SIR [Li and Yin, 2008, Li and Nachtsheim, 2008, Ni et al., 2005] QZ-SIR [Coudret et al., 2014]: uses a method similar to QR-algorithm Nathalie Villa-Vialaneix | SISIR 13/34
26. 26. SIR in large dimensions: sparse versions Speciﬁc issue to introduce sparsity in SIR Sparsity on a multiple-index model: most authors use shrinkage approaches or sparsity on a single-index model and depletion (not shown) First version: Li and Yin (2008) based on the regression formulation Pro : Sparsity common to all dimensions d Cons : Minimization problem with dependent variables in Rp Second version: Li and Nachtsheim (2008) based on the correlation formulation Pro : Minimization problem with independent variables in Rd Cons : Sparsity different in all dimensions d Nathalie Villa-Vialaneix | SISIR 14/34
27. 27. Equivalent formulations SIR as a regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Nathalie Villa-Vialaneix | SISIR 15/34
28. 28. Equivalent formulations SIR as a regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Rk: Given A, C is obtained as the solution of an ordinary least square problem... Nathalie Villa-Vialaneix | SISIR 15/34
29. 29. Equivalent formulations SIR as a regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Rk: Given A, C is obtained as the solution of an ordinary least square problem... SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008] shows that SIR rewrites as the double optimisation problem maxaj,φ Cor(φ(Y), aT j X), where φ is any function R → R and (aj)j are Σ-orthonormal. Nathalie Villa-Vialaneix | SISIR 15/34
30. 30. Equivalent formulations SIR as a regression problem [Li and Yin, 2008] shows that SIR is equivalent to the (double) minimization of E(A, C) = H h=1 ˆph Xh − X − ˆΣACh 2 for Xh = 1 nh i: yi∈τh , A a (p × d)-matrix and C a vector in Rd . Rk: Given A, C is obtained as the solution of an ordinary least square problem... SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008] shows that SIR rewrites as the double optimisation problem maxaj,φ Cor(φ(Y), aT j X), where φ is any function R → R and (aj)j are Σ-orthonormal. Rk: The solution is shown to satisfy φ(y) = aT j E(X|Y = y) and aj is also obtained as the solution of the mean square error problem: min aj E φ(Y) − aT j X 2 Nathalie Villa-Vialaneix | SISIR 15/34
31. 31. SIR in large dimensions: sparse versions First version: sparse penalization of the ridge solution If (ˆA, ˆC) are the solutions of the ridge SIR, [Ni et al., 2005, Li and Yin, 2008] propose to shrink this solution by minimizing Es,1(α) = H h=1 ˆph Xh − X − ˆΣDiag(α)ˆA ˆCh 2 + µ1 α L1 (regression formulation of SIR) Nathalie Villa-Vialaneix | SISIR 16/34
32. 32. SIR in large dimensions: sparse versions Second version: [Li and Nachtsheim, 2008] derive the sparse optimization problem from the correlation formulation of SIR: min as j n i=1 Pˆaj (X|yi) − (as j )T xi 2 + µ1,j as j L1 , in which Pˆaj is the projection of ˆE(X|Y = yi) = Xh onto the space spanned by the solution of the ridge problem. Nathalie Villa-Vialaneix | SISIR 16/34
33. 33. Characteristics of the different approaches and possible extensions [Li and Yin, 2008] [Li and Nachtsheim, 2008] sparsity on shrinkage coefﬁcients estimates nb optimization pb 1 d sparsity common to all dims speciﬁc to each dim Nathalie Villa-Vialaneix | SISIR 17/34
34. 34. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 18/34
35. 35. SIR in large dimensions: our sparse version Background: Back to the functional setting, we suppose that t1, ..., tp are split into D intervals I1, ..., ID. Based on the minimization problem of Li and Nachtsheim (2008) Our adaptation: Sparsity under the intervals using α = (α1, . . . , αD) ∀l = 1, . . . , p, ˆas jl = ˆαk ˆajl for k such that tj ∈ Ik . the sparsity constraint is put on α and not directly on ˆas j α are made identical for all dimensions of the projection j = 1, . . . , d Nathalie Villa-Vialaneix | SISIR 19/34
36. 36. SIR in large dimensions: our sparse version Li and Nachtsheim (2008) (LASSO): min as j n i=1 Pˆaj (X|yi) − (as j )T xi 2 + µ1,j as j L1 , in which Pˆaj is the projection of ˆE(X|Y = yi) = Xh (for h such that yi in slide h) onto the space spanned by the ˆaj. Our adaptation: ˆα = arg min α∈RD d j=1 n i=1 Pˆaj (X|yi) − (Λ(α) ˆaj) xi 2 + µ1 α L1 with ∀l = 1, . . . , p, ˆas jl = ˆαk ˆajl for k such that tj ∈ Ik and Λ(α) = Diag (α1I|I1|, . . . , αDI|ID |) ∈ Mp×p. Nathalie Villa-Vialaneix | SISIR 20/34
37. 37. Summary : SISIR: a two step approach First step: Solve the projection problem (using SIR and L2-regularization of Σ) that provides the estimates (ˆaj)j∈{1,...,d} of the vectors spanning the EDR space. Second step: Sparsity under the D intervals using α = (α1, . . . , αD) solving a LASSO problem : handles functional setting by penalizing entire intervals and not just isolated points. Nathalie Villa-Vialaneix | SISIR 21/34
38. 38. SISIR: Characteristics uses the approach based on the correlation formulation (because the dimensionality of the optimization problem is smaller); uses a shrinkage approach and optimizes shrinkage coefﬁcients in a single optimization problem; handles functional setting by penalizing entire intervals and not just isolated points. Nathalie Villa-Vialaneix | SISIR 22/34
39. 39. An automatic approach to deﬁne intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } Nathalie Villa-Vialaneix | SISIR 23/34
40. 40. An automatic approach to deﬁne intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate along the regularization path, select three values for µ1: Nathalie Villa-Vialaneix | SISIR 23/34
41. 41. An automatic approach to deﬁne intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate along the regularization path, select three values for µ1: P% of the coefﬁcients are zero, P% of the coefﬁcients are non zero, best GCV. deﬁne: D− (“strong zeros”) and D+ (“strong non zeros”) Nathalie Villa-Vialaneix | SISIR 23/34
42. 42. An automatic approach to deﬁne intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate deﬁne: D− (“strong zeros”) and D+ (“strong non zeros”) merge consecutive “strong zeros” (or “strong non zeros”) or “strong zeros” (resp. “strong non zeros”) separated by a few numbers of intervals which are of undetermined type. Until no more iterations can be performed. Nathalie Villa-Vialaneix | SISIR 23/34
43. 43. An automatic approach to deﬁne intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate deﬁne: D− (“strong zeros”) and D+ (“strong non zeros”) merge consecutive “strong zeros” (or “strong non zeros”) or “strong zeros” (resp. “strong non zeros”) separated by a few numbers of intervals which are of undetermined type. Until no more iterations can be performed. 3 Output: Collection of models (ﬁrst with p intervals, last with 1), M∗ D (optimal for GCV) and corresponding GCVD versus D (number of intervals). Nathalie Villa-Vialaneix | SISIR 23/34
44. 44. An automatic approach to deﬁne intervals 1 Initial state: ∀ k = 1, . . . , p, τk = {tk } 2 Iterate deﬁne: D− (“strong zeros”) and D+ (“strong non zeros”) merge consecutive “strong zeros” (or “strong non zeros”) or “strong zeros” (resp. “strong non zeros”) separated by a few numbers of intervals which are of undetermined type. Until no more iterations can be performed. 3 Output: Collection of models (ﬁrst with p intervals, last with 1), M∗ D (optimal for GCV) and corresponding GCVD versus D (number of intervals). Final solution: Minimize GCVD over D. Nathalie Villa-Vialaneix | SISIR 23/34
45. 45. Sommaire 1 Background and motivation 2 Presentation of SIR 3 Our proposal 4 Simulations and Real data Nathalie Villa-Vialaneix | SISIR 24/34
46. 46. Simulation framework Data generated with: X(t) a Gaussian process with mean µ(t) = −5 + 4t − 4t2 and a Matern covariance aj = sin t(2+j)π 2 − (j−1)π 3 IIj (t) Y = d j=1 log X, aj one model: (M1), d = 1, I1 = [0.2, 0.4]. Nathalie Villa-Vialaneix | SISIR 25/34
47. 47. Deﬁnition of the intervals D = p = 200 (initial state=LASSO) D = 142 D = 41 D = 5 Nathalie Villa-Vialaneix | SISIR 26/34
48. 48. Second model (M2): d = 3 and I1 = [0, 0.1], I2 = [0.5, 0.65] and I3 = [0.65, 0.78]. Nathalie Villa-Vialaneix | SISIR 27/34
49. 49. Second model SISIR standard sparse Nathalie Villa-Vialaneix | SISIR 28/34
50. 50. Tecator dataset relevant intervals easily interpretable good MSE Nathalie Villa-Vialaneix | SISIR 29/34
51. 51. Sunﬂower dataset climatic time series (between 1975 and 2012 in France) daily measure from April to October X=evaportranspiration, Y=yield, n = 111, p = 309 Nathalie Villa-Vialaneix | SISIR 30/34
52. 52. Sunﬂower dataset only two points identiﬁed outside the interval focus on the second half of the interval matches expert knowledge Nathalie Villa-Vialaneix | SISIR 31/34
53. 53. Conclusion SI-SIR: sparse dimension reduction model adapted to functional framework fully automated deﬁnition of relevant intervals in the range of the predictors Package SISIR available on CRAN at https://cran.r-project.org/package=SISIR. Perspectives adaptation to multiple X application to large-scale real data (agricultural application: X={temperature,rainfall ...}, Y={yield}) replace CV criterion? Nathalie Villa-Vialaneix | SISIR 32/34
54. 54. Nathalie Villa-Vialaneix | SISIR 33/34
55. 55. Aneiros, G. and Vieu, P. (2014). Variable in inﬁnite-dimensional problems. Statistics and Probability Letters, 94:12–20. Bernard-Michel, C., Gardes, L., and Girard, S. (2008). A note on sliced inverse regression with regularizations. Biometrics, 64(3):982–986. Coudret, R., Liquet, B., and Saracco, J. (2014). Comparison of sliced inverse regression aproaches for undetermined cases. Journal de la Société Française de Statistique, 155(2):72–96. Fauvel, M., Deschene, C., Zullo, A., and Ferraty, F. (2015). Fast forward feature selection of hyperspectral images for classiﬁcation with Gaussian mixture models. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(6):2824–2831. Ferraty, F. and Hall, P. (2015). An algorithm for nonlinear, nonparametric model choice and prediction. Journal of Computational and Graphical Statistics, 24(3):695–714. Ferraty, F., Hall, P., and Vieu, P. (2010). Most-predictive design points for functiona data predictors. Biometrika, 97(4):807–824. Ferré, L. and Yao, A. (2003). Functional sliced inverse regression analysis. Statistics, 37(6):475–488. Fraiman, R., Gimenez, Y., and Svarc, M. (2016). Feature selection for functional data. Journal of Multivariate Analysis, 146:191–208. Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015). Grouped variable importance with random forests and application to multiple functional data analysis. Nathalie Villa-Vialaneix | SISIR 33/34
56. 56. Computational Statistics and Data Analysis, 90:15–35. Grollemund, P., Abraham, C., Baragatti, M., and Pudlo, P. (2017). Bayesian functional linear regression with sparse step functions. Preprint. James, G., Wang, J., and Zhu, J. (2009). Functional linear regression that’s interpretable. Annals of Statistics, 37(5A):2083–2108. Li, L. and Nachtsheim, C. (2008). Sparse sliced inverse regression. Technometrics, 48(4):503–510. Li, L. and Yin, X. (2008). Sliced inverse regression with regularizations. Biometrics, 64(1):124–131. Liquet, B. and Saracco, J. (2012). A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approches. Computational Statistics, 27(1):103–125. Matsui, H. and Konishi, S. (2011). Variable selection for functional regression models via the l1 regularization. Computational Statistics and Data Analysis, 55(12):3304–3310. Ni, L., Cook, D., and Tsai, C. (2005). A note on shrinkage sliced inverse regression. Biometrika, 92(1):242–247. Park, A., Aston, J., and Ferraty, F. (2016). Stable and predictive functional domain selection with application to brain images. Preprint arXiv 1606.02186. Nathalie Villa-Vialaneix | SISIR 34/34
57. 57. Parameter estimation H (number of slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); Nathalie Villa-Vialaneix | SISIR 34/34
58. 58. Parameter estimation H (number of slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); µ2 and d (ridge estimate ˆA): L-fold CV for µ2 (for a d0 large enough) Note that GCV as described in [Li and Yin, 2008] can not be used since the current version of the L2 penalty involves the use of an estimate of Σ−1 . Nathalie Villa-Vialaneix | SISIR 34/34
59. 59. Parameter estimation H (number of slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); µ2 and d (ridge estimate ˆA): L-fold CV for µ2 (for a d0 large enough) using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of R(d) = d − E Tr Πd ˆΠd , in which Πd and ˆΠd are the projector onto the ﬁrst d dimensions of the EDR space and its estimate, is derived similarly as in [Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied to select a relevant d. Nathalie Villa-Vialaneix | SISIR 34/34
60. 60. Parameter estimation H (number of slices): usually, SIR is known to be not very sensitive to the number of slices (> d + 1). We took H = 10 (i.e., 10/30 observations per slice); µ2 and d (ridge estimate ˆA): L-fold CV for µ2 (for a d0 large enough) using again L-fold CV, ∀ d = 1, . . . , d0, an estimate of R(d) = d − E Tr Πd ˆΠd , in which Πd and ˆΠd are the projector onto the ﬁrst d dimensions of the EDR space and its estimate, is derived similarly as in [Liquet and Saracco, 2012]. The evolution of ˆR(d) versus d is studied to select a relevant d. µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along the regularization path. Nathalie Villa-Vialaneix | SISIR 34/34