Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Reading the Lindley-Smith 1973 pape... by Christian Robert 3545 views
- ISBA 2016: Foundations by Christian Robert 2491 views
- Reading Efron's 1979 paper on boots... by Christian Robert 5713 views
- slides Céline Beji by Christian Robert 2705 views
- Presentation of Bassoum Abou on Ste... by Christian Robert 2359 views
- Reading Birnbaum's (1962) paper, by... by Christian Robert 2609 views

5,930 views

Published on

Published in:
Education

No Downloads

Total views

5,930

On SlideShare

0

From Embeds

0

Number of Embeds

2,245

Shares

0

Downloads

85

Comments

0

Likes

5

No embeds

No notes for slide

- 1. READING SEMINAR ON CLASSICS Regression Shrinkage and Selection via the LASSO By Robert Tibshirani Presented by Ulcinaite Agne November 4, 2012Presented by Ulcinaite Agne LASSO November 4, 2012 1 / 41
- 2. Outline1 Introduction Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
- 3. Outline1 Introduction2 OLS estimates OLS critics Standard improving techniques Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
- 4. Outline1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
- 5. Outline1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
- 6. Outline1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
- 7. Outline1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
- 8. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 3 / 41
- 9. IntroductionThe Article Regression Shrinkage and Selection via the LASSO by Robert Tibshirani Published in 1996 for the Royal Statistical Society. Series B (Methodological), vol. 58, No.1 Presented by Ulcinaite Agne LASSO November 4, 2012 4 / 41
- 10. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 5 / 41
- 11. OLS estimatesWe consider the usual regression situation.The data: ( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T and yiare the regressors and the response for the ith observation.The ordinary least square (OLS) estimates minimize the residual sum ofsquares (RSS): N p RSS = (yi − βo − xij βj )2 i=1 j=1 Presented by Ulcinaite Agne LASSO November 4, 2012 6 / 41
- 12. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 7 / 41
- 13. OLS criticsThe two reasons why data analysts are often not satisﬁed with OLSestimates: Prediction accuracy: OLS estimates having low bias but large variance Presented by Ulcinaite Agne LASSO November 4, 2012 8 / 41
- 14. OLS criticsThe two reasons why data analysts are often not satisﬁed with OLSestimates: Prediction accuracy: OLS estimates having low bias but large variance Iterpretation: when having too much predictors, it would be better to have smaller subset exhibiting stronger eﬀects Presented by Ulcinaite Agne LASSO November 4, 2012 8 / 41
- 15. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 9 / 41
- 16. Standard improving techniques Subset selection: small changes in data can result in very diﬀerent models Presented by Ulcinaite Agne LASSO November 4, 2012 10 / 41
- 17. Standard improving techniques Subset selection: small changes in data can result in very diﬀerent models Ridge regression: N ˆ β ridge = argmin (yi − β0 − βj xij )2 i=1 j subject to βj2 ≤ t j Does not set any of the coeﬃcients to 0 and hence does not give an easily interpretable model Presented by Ulcinaite Agne LASSO November 4, 2012 10 / 41
- 18. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 11 / 41
- 19. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 12 / 41
- 20. DeﬁnitionWe are considering the same data as in OLS estimation case:( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
- 21. DeﬁnitionWe are considering the same data as in OLS estimation case:( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )TThe LASSO (Least Absolute Shrinkage and Selection Operator) estimate α ˆ(ˆ , β) is deﬁned by N α ˆ (ˆ , β) = argmin (yi − α − βj xij )2 i=1 j Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
- 22. DeﬁnitionWe are considering the same data as in OLS estimation case:( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )TThe LASSO (Least Absolute Shrinkage and Selection Operator) estimate α ˆ(ˆ , β) is deﬁned by N α ˆ (ˆ , β) = argmin (yi − α − βj xij )2 i=1 jsubject to |βj | ≤ t j Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
- 23. DeﬁnitionThe amount of shrinkage is controlled by parameter t ≥ 0 which is appliedto the estimates. Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
- 24. DeﬁnitionThe amount of shrinkage is controlled by parameter t ≥ 0 which is appliedto the estimates. ˆ ˆLet βjo be the full least square estimates and let t0 = |βjo |.Values t < t0 will shrink the solutions towards 0, some coeﬃcients makingequal to 0. Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
- 25. DeﬁnitionThe amount of shrinkage is controlled by parameter t ≥ 0 which is appliedto the estimates. ˆ ˆLet βjo be the full least square estimates and let t0 = |βjo |.Values t < t0 will shrink the solutions towards 0, some coeﬃcients makingequal to 0.For example, taking t = t0 /2, we will have the eﬀect roughly similar toﬁnding the best subset of size p/2. Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
- 26. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 15 / 41
- 27. Motivation for LASSOLASSO came from the proposal of Breiman (1993).Breiman’s non-negative garotte minimizes N (yi − α − cj βjo xij )2 i=1 jsubject to cj ≥ 0, cj ≤ t. Presented by Ulcinaite Agne LASSO November 4, 2012 16 / 41
- 28. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 17 / 41
- 29. Orthonormal design caseLet X the n × p design matrix with ijth entry xij and XT X = I.The solution of previous minimization problem is ˆ ˆ ˆ βj = sign(βjo )(|βjo | − γ)+ Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
- 30. Orthonormal design caseLet X the n × p design matrix with ijth entry xij and XT X = I.The solution of previous minimization problem is ˆ ˆ ˆ βj = sign(βjo )(|βjo | − γ)+ Best subset selection (of size k) Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
- 31. Orthonormal design caseLet X the n × p design matrix with ijth entry xij and XT X = I.The solution of previous minimization problem is ˆ ˆ ˆ βj = sign(βjo )(|βjo | − γ)+ Best subset selection (of size k) Ridge regression solutions: 1 β oˆ 1+γ j Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
- 32. Orthonormal design caseLet X the n × p design matrix with ijth entry xij and XT X = I.The solution of previous minimization problem is ˆ ˆ ˆ βj = sign(βjo )(|βjo | − γ)+ Best subset selection (of size k) Ridge regression solutions: 1 β oˆ 1+γ j Garotte estimates: (1 − ˆ ˆ γ/βjo2 )+ βjo Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
- 33. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 19 / 41
- 34. Function forms(a) Subset regression, (b) ridge regression, (c) the LASSO, (d) the garrotte Presented by Ulcinaite Agne LASSO November 4, 2012 20 / 41
- 35. Estimation picture for (a) the LASSO and (b) ridge regression Presented by Ulcinaite Agne LASSO November 4, 2012 21 / 41
- 36. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 22 / 41
- 37. Example of prostate cancerData examined: from a study byStamey (1989)The factors: log(cancer volume) lcavol log(prostate weigth) lweigth age log(benign prostatic hyperplasia amount) lbph seminal vesicle invasion svi log(capsular penetration) lcp Gleason score gleason percentage Gleason scores pgg45 Presented by Ulcinaite Agne LASSO November 4, 2012 23 / 41
- 38. Example of prostate cancer Linear model to log(prostate speciﬁcData examined: from a study by antigen) lpsaStamey (1989)The factors: log(cancer volume) lcavol log(prostate weigth) lweigth age log(benign prostatic hyperplasia amount) lbph seminal vesicle invasion svi log(capsular penetration) lcp Gleason score gleason percentage Gleason scores pgg45 Presented by Ulcinaite Agne LASSO November 4, 2012 23 / 41
- 39. Statistics of the exampleEstimated coeﬃcients and test error results, for diﬀerent subset andshrinkage methods applied to the prostate data. The blank entriescorrespond to variables omitted. Presented by Ulcinaite Agne LASSO November 4, 2012 24 / 41
- 40. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 25 / 41
- 41. Prediction error and estimation of tMethods for the estimation of the LASSO parameter t: Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
- 42. Prediction error and estimation of tMethods for the estimation of the LASSO parameter t: Cross-validation Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
- 43. Prediction error and estimation of tMethods for the estimation of the LASSO parameter t: Cross-validation Generalized cross-validation Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
- 44. Prediction error and estimation of tMethods for the estimation of the LASSO parameter t: Cross-validation Generalized cross-validation Analytical unbiased estimate of risk Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
- 45. Prediction error and estimation of tMethods for the estimation of the LASSO parameter t: Cross-validation Generalized cross-validation Analytical unbiased estimate of riskStrictly speaking the ﬁrst two methods are applicable in the ’X-random’case, and the third method applies to the X-ﬁxed case. Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
- 46. Prediction error and estimation of tSuppose that Y = η(X) + εwhere E (ε) = 0 and var (ε) = σ 2 Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
- 47. Prediction error and estimation of tSuppose that Y = η(X) + εwhere E (ε) = 0 and var (ε) = σ 2 ME = E {ˆ(X) − η(X)}2 η Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
- 48. Prediction error and estimation of tSuppose that Y = η(X) + εwhere E (ε) = 0 and var (ε) = σ 2 ME = E {ˆ(X) − η(X)}2 η PE = E {Y − η (X)}2 = ME + σ 2 ˆ Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
- 49. Cross-validationThe Prediction Error (PE) is estimated by ﬁvefold cross-validation. The ˆLASSO is indexed in terms of the normalised parameter s = t/ βjo , PEis estimated over a grid of values of s from 0 to 1 inclusive. Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
- 50. Cross-validationThe Prediction Error (PE) is estimated by ﬁvefold cross-validation. The ˆLASSO is indexed in terms of the normalised parameter s = t/ βjo , PEis estimated over a grid of values of s from 0 to 1 inclusive. Create a 5-fold partition of the dataset Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
- 51. Cross-validationThe Prediction Error (PE) is estimated by ﬁvefold cross-validation. The ˆLASSO is indexed in terms of the normalised parameter s = t/ βjo , PEis estimated over a grid of values of s from 0 to 1 inclusive. Create a 5-fold partition of the dataset For each fold, all-but-one of the chunks are used for training and the remaining chunk - for testing. Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
- 52. Cross-validationThe Prediction Error (PE) is estimated by ﬁvefold cross-validation. The ˆLASSO is indexed in terms of the normalised parameter s = t/ βjo , PEis estimated over a grid of values of s from 0 to 1 inclusive. Create a 5-fold partition of the dataset For each fold, all-but-one of the chunks are used for training and the remaining chunk - for testing. Repeat 5 times so that each chunk is used once for testing. Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
- 53. Cross-validationThe Prediction Error (PE) is estimated by ﬁvefold cross-validation. The ˆLASSO is indexed in terms of the normalised parameter s = t/ βjo , PEis estimated over a grid of values of s from 0 to 1 inclusive. Create a 5-fold partition of the dataset For each fold, all-but-one of the chunks are used for training and the remaining chunk - for testing. Repeat 5 times so that each chunk is used once for testing. Value s yielding the lowest estimated PE is selected. ˆ Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
- 54. Generalized Cross-validationThe constrained is re-written as βj2 /|βj | ≤ t. So the constrained ˜solution β can be expressed as the ridge regression estimator β = (XT X + λW− )−1 XT y ˜where W = diag (|βj |) and W− denotes a generalized inverse. The number ˜of eﬀective parameters in the constrained ﬁt β may be approximated by p(t) = tr X(XT X + λW− )−1 XT )The generalised cross-validation style statistic 1 RSS(t) GCV (t) = N {1 − p(t)/N}2 Presented by Ulcinaite Agne LASSO November 4, 2012 29 / 41
- 55. Unbiased estimate of riskThis method is based on Stein’s (1981) unbiased estimate of risk. √ ˆDenote the estimated standard error of βjo by τ = σ / N, where ˆ ˆσ 2 = (yi − yi )2 /(N − p). Then the formula is derivedˆ ˆ p ˆ ˆ τ R β(γ) ≈ τ 2 p − 2#(j; |βjo /ˆ| < γ) + ˆ ˆ τ max(|βjo /ˆ|, γ)2 j=1as an approximately unbiased estimate of the risk . Hence an estimate of ˆγ can be obtained as the minimizer of R β(γ) : ˆ γ = argminγ≥0 [R β(γ) ]. ˆFrom this we obtain an estimate of the LASSO parameter t: ˆ t= ˆ (|βjo | − γ )+ . ˆ Presented by Ulcinaite Agne LASSO November 4, 2012 30 / 41
- 56. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 31 / 41
- 57. Algorithm for ﬁnding LASSO solutionsWe ﬁx t ≥ 0. The minimization problem of N (yi − βj xij )2 i=1 jsubject to j |βj | ≤ t can be seen as a least squares problem with 2pinequality constraints. Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
- 58. Algorithm for ﬁnding LASSO solutionsWe ﬁx t ≥ 0. The minimization problem of N (yi − βj xij )2 i=1 jsubject to j |βj | ≤ t can be seen as a least squares problem with 2pinequality constraints.Denote G an m × p matrix, corresponding to m linear inequalityconstraints of the p-vector β. For our problem, m = 2p . Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
- 59. Algorithm for ﬁnding LASSO solutionsWe ﬁx t ≥ 0. The minimization problem of N (yi − βj xij )2 i=1 jsubject to j |βj | ≤ t can be seen as a least squares problem with 2pinequality constraints.Denote G an m × p matrix, corresponding to m linear inequalityconstraints of the p-vector β. For our problem, m = 2p .Denote g (β) = N (yi − j βj xij )2 . i=1Set E is the equality set corresponding to those constraints which areexactly met. Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
- 60. Algorithm for ﬁnding LASSO solutionsOutline of the algorithm Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
- 61. Algorithm for ﬁnding LASSO solutionsOutline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
- 62. Algorithm for ﬁnding LASSO solutionsOutline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) 2 ˆ Find β to minimize g (β) subject to GE β ≤ t1 Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
- 63. Algorithm for ﬁnding LASSO solutionsOutline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) 2 ˆ Find β to minimize g (β) subject to GE β ≤ t1 3 While ˆ |βj | > t , Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
- 64. Algorithm for ﬁnding LASSO solutionsOutline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) 2 ˆ Find β to minimize g (β) subject to GE β ≤ t1 3 While ˆ |βj | > t , 4 ˆ ˆ add i to the set E where δi = sign(β). Find β to minimize N g (β) = (yi − βj xij )2 i=1 j subject to GE β ≤ t1. Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
- 65. Algorithm for ﬁnding LASSO solutionsOutline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) 2 ˆ Find β to minimize g (β) subject to GE β ≤ t1 3 While ˆ |βj | > t , 4 ˆ ˆ add i to the set E where δi = sign(β). Find β to minimize N g (β) = (yi − βj xij )2 i=1 j subject to GE β ≤ t1.This procedure must always converge to in a ﬁnite number of steps sinceone element is added to the set E at each step, and there is a total of 2pelements. Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
- 66. Least angle regression algorithm (Efron 2004 )Least Angle Regression Algorithm Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
- 67. Least angle regression algorithm (Efron 2004 )Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
- 68. Least angle regression algorithm (Efron 2004 )Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
- 69. Least angle regression algorithm (Efron 2004 )Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. 3 Move βj from 0 towards its least-squares coeﬃcient (xj , r ), until some other competitor xk has as much correlation with the current residual as does xj . Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
- 70. Least angle regression algorithm (Efron 2004 )Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. 3 Move βj from 0 towards its least-squares coeﬃcient (xj , r ), until some other competitor xk has as much correlation with the current residual as does xj . 4 Move βj and βk in the direction deﬁned by their joint least squares coeﬃcient of the current residual on (xj , xk ), until some other competitor xl has as much correlation with the current residual. Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
- 71. Least angle regression algorithm (Efron 2004 )Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. 3 Move βj from 0 towards its least-squares coeﬃcient (xj , r ), until some other competitor xk has as much correlation with the current residual as does xj . 4 Move βj and βk in the direction deﬁned by their joint least squares coeﬃcient of the current residual on (xj , xk ), until some other competitor xl has as much correlation with the current residual. 5 If a non-zero coeﬃcient hits zero, drop its variable from the active set of variables and recompute the current joint least squares direction. Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
- 72. Least angle regression algorithm (Efron 2004 )Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. 3 Move βj from 0 towards its least-squares coeﬃcient (xj , r ), until some other competitor xk has as much correlation with the current residual as does xj . 4 Move βj and βk in the direction deﬁned by their joint least squares coeﬃcient of the current residual on (xj , xk ), until some other competitor xl has as much correlation with the current residual. 5 If a non-zero coeﬃcient hits zero, drop its variable from the active set of variables and recompute the current joint least squares direction. 6 Continue in this way until all p predictors have been entered. After min(N-1, p) steps, we arrive at the full least-squares solution. Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
- 73. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 35 / 41
- 74. SimulationIn the example, 50 data sets consisting of 20 observations from the model y = βT + σwere simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and is standardnormal. Presented by Ulcinaite Agne LASSO November 4, 2012 36 / 41
- 75. SimulationIn the example, 50 data sets consisting of 20 observations from the model y = βT + σwere simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and is standardnormal. Mean-squared errors over 200 simulations from the model Presented by Ulcinaite Agne LASSO November 4, 2012 36 / 41
- 76. SimulationMost frequent models selected by Most frequent models selected byLASSO subset regression Presented by Ulcinaite Agne LASSO November 4, 2012 37 / 41
- 77. Table of Contents1 Introduction2 OLS estimates OLS critics Standard improving techniques3 LASSO Deﬁnition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t4 Algorithm for ﬁnding LASSO solutions5 Simulation6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 38 / 41
- 78. ConclusionsLASSO - a worthy competitor to subset selection and ridge regression.Performance in diﬀerent scenarios: Small number of large eﬀects - Subset selection does best, LASSO - not quite as well, ridge regression - quite poorly. Small to moderate number of moderate-size eﬀects - LASSO does best, followed by ridge regression and then subset selection. Large number of small eﬀects - Ridge regression does best, followed by LASSO and then subset selection. Presented by Ulcinaite Agne LASSO November 4, 2012 39 / 41
- 79. References Robert Tibshirani (1996) Regression Shrinkage and Selection via the LASSO Journal of the Royal Statistical Society 58(1), 267–288. Travor Hastie, Robert Tibshirani, Jerome Friedman (2008) The Elements of Statistical Learning Springer-Verlag, 57–73. Abhimanyu Das, David Kempe Algorithms for Subset Selection in Linear Regression Yizao Wang (2007) A Note on the LASSO in Model Selection Presented by Ulcinaite Agne LASSO November 4, 2012 40 / 41
- 80. The EndPresented by Ulcinaite Agne LASSO November 4, 2012 41 / 41

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment