Ridge regression, lasso and elastic net

7,544
-1

Published on

It is given by Yunting Sun at NYC open data meetup, see more information at www.nycopendata.com or join us at www.meetup.com/nyc-open-data

Published in: Education
1 Comment
10 Likes
Statistics
Notes
  • very nice presentation
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
7,544
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
277
Comments
1
Likes
10
Embeds 0
No embeds

No notes for slide

Ridge regression, lasso and elastic net

  1. 1. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression, LASSO and Elastic Net A talk given at NYC open data meetup, find more at www.nycopendata.com Yunting Sun Google Inc file:///Users/ytsun/elasticnet/index.html#42 1/42
  2. 2. 2/13/2014 Ridge Regression, LASSO and Elastic Net Overview · Linear Regression · Ordinary Least Square · Ridge Regression · LASSO · Elastic Net · Examples · Exercises Note: make sure that you have installed elasticnet package library(MASS) library(elasticnet) 2/42 file:///Users/ytsun/elasticnet/index.html#42 2/42
  3. 3. 2/13/2014 Ridge Regression, LASSO and Elastic Net Linear Regression n observations, each has one response variable and p predictors ,) p X , p × n , 1 x( = x to px , n , 1x y , y( = Y 1 , 1X( = X y to predict and ) px , ) y, of predictors - describe the actual relationship between T x = y - use T , 1 × n · We want to find a linear combination · Examples - find relationship between pressure and water boiling point - use GDP to predict interest rate (the accuracy of the prediction is important but the actual relationship may not matter) 3/42 file:///Users/ytsun/elasticnet/index.html#42 3/42
  4. 4. 2/13/2014 Ridge Regression, LASSO and Elastic Net Quality of an estimator + T 0 x = y , the difference between the actual response and the model prediction 2 2 ]) T 0 ) T 0 x T 0 T 0 x (raV + ) 0 T 0 x y([E = ) 0 x(EPE x(E + 2 2 = ) 0 x(EPE x( sa iB[ + 2 . 0 x in estimating T T 0 x ) = ) 0 x(EPE T 0 x T 0 0 x(E = ) T 0 x( sa iB · The second and third terms make up the mean squared error of 0 x Where ] 0 x = x| ) 0 0 · Prediction error at , is the true value and ) 1 , 0(  Suppose . · How to estimate prediction error? 4/42 file:///Users/ytsun/elasticnet/index.html#42 4/42
  5. 5. 2/13/2014 Ridge Regression, LASSO and Elastic Net K-fold Cross Validation · Split dataset into K groups - leave one group out as test set - use the rest K-1 groups as training set to train the model - estimate prediction error of the model from the test set 5/42 file:///Users/ytsun/elasticnet/index.html#42 5/42
  6. 6. 2/13/2014 Ridge Regression, LASSO and Elastic Net K-fold Cross Validation i E Let be the prediction errors for the ith test group, the average prediction error is K 1 iE ∑ K = E 1=i 6/42 file:///Users/ytsun/elasticnet/index.html#42 6/42
  7. 7. 2/13/2014 Ridge Regression, LASSO and Elastic Net Quality of an estimator · Mean squared error of the estimator 2 ] ) ) 0 (raV + ) ([ E = ) 2 (ESM ( sa iB = ) (ESM · A biased estimator may achieve smaller MSE than an unbiased estimator · useful when our goal is to understand the relationship instead of prediction 7/42 file:///Users/ytsun/elasticnet/index.html#42 7/42
  8. 8. 2/13/2014 Ridge Regression, LASSO and Elastic Net Least Squares Estimator (LSE) + 1× n p× n X 1× p = 1× n Y p n, ,1 = i , i + j ji x ∑ = i y 1=j ) 1 , 0(  d.i.i i Minimize Residual Sum of Square (RSS) T X 1 )X T X( = ) X T X X and Y Y( T ) p > n X Y( n im gra = The solution is uniquely well defined when inversible 8/42 file:///Users/ytsun/elasticnet/index.html#42 8/42
  9. 9. 2/13/2014 Ridge Regression, LASSO and Elastic Net Pros de sa i b n u = ) (E · LSE has the minimum MSE among unbiased linear estimator though a biased estimator may have smaller MSE than LSE · explicit form 2 ) p n(O · computation · confidence interval, significance of coefficient 9/42 file:///Users/ytsun/elasticnet/index.html#42 9/42
  10. 10. 2/13/2014 Ridge Regression, LASSO and Elastic Net Cons 2 1 )X T X( = ) (raV · Multicollinearity leads to high variance of estimator - exact or approximate linear relationship among predictors 1 )X T X( - tends to have large entries · Requires n > p, i.e., number of observations larger than the number of predictors r orre n o i tc i der p de tam i t se p 2 + ) n / p( 2 = ) 0 x(EPE 0x E · Prediction error increases linearly as a function of · Hard to interpret when the number of predictors is large, need a smaller subset that exhibit the strongest effects 10/42 file:///Users/ytsun/elasticnet/index.html#42 10/42
  11. 11. 2/13/2014 Ridge Regression, LASSO and Elastic Net Example: Leukemia classification · Leukemia Data, Golub et al. Science 1999 ji X · 9217 = p · There are 38 training samples and 34 test samples with total genes (p >> n) is the gene expression value for sample i and gene j · Sample i either has tumor type AML or ALL · We want to select genes relevant to tumor type - eliminate the trivial genes - grouped selection as many genes are highly correlated · LSE does not work here! 11/42 file:///Users/ytsun/elasticnet/index.html#42 11/42
  12. 12. 2/13/2014 Ridge Regression, LASSO and Elastic Net Solution: regularization · instead of minimizing RSS, ) sre temara p e h t n o y t la ne p × 0 = ! + SSR( ez im i n im · Trade bias for smaller variance, biased estimator when · Continuous variable selection (unlike AIC, BIC, subset selection) · can be chosen by cross validation 12/42 file:///Users/ytsun/elasticnet/index.html#42 12/42
  13. 13. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression } 2 + 2 Y T 2 2 X 1 e g dir X )I Y + X { n im gra = T e g dir X( = Pros: · p >> n · Multicollinearity · biased but smaller variance and smaller MSE (Mean Squared Error) · explicit solution Cons: · shrink coefficients to zero but can not produce a parsimonious model 13/42 file:///Users/ytsun/elasticnet/index.html#42 13/42
  14. 14. 2/13/2014 Ridge Regression, LASSO and Elastic Net Grouped Selection · if two predictors are highly correlated among themselves, the estimated coefficients will be similar for them. · if some variables are exactly identical, they will have same coefficients Ridge is good for grouped selection but not good for eliminating trivial genes 14/42 file:///Users/ytsun/elasticnet/index.html#42 14/42
  15. 15. 2/13/2014 Ridge Regression, LASSO and Elastic Net Example: Ridge Regression (Collinearity) 2x + 1x = 3x · multicollinearity · show that ridge regression beats OLS in the multilinearity case lbayMS) irr(AS n=50 0 z=romn 0 1 nr(, , ) y=z+02*romn 0 1 . nr(, , ) x =z+romn 0 1 1 nr(, , ) x =z+romn 0 1 2 nr(, , ) x =x +x 3 1 2 d=dt.rm( =y x =x,x =x,x =x) aafaey , 1 1 2 2 3 3 15/42 file:///Users/ytsun/elasticnet/index.html#42 15/42
  16. 16. 2/13/2014 Ridge Regression, LASSO and Elastic Net OLS #OSfi t cluaecefcetfrx L al o aclt ofiin o 3 osmdl=l( ~.-1 d l.oe my , ) ce(l.oe) ofosmdl # # x 1 x 2 x 3 # 035 038 # .03 .17 N A 16/42 file:///Users/ytsun/elasticnet/index.html#42 16/42
  17. 17. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression #cos tnn prmtr hoe uig aaee rdemdl=l.ig( ~.-1 d lmd =sq0 1,01) ig.oe mrdey , , aba e(, 0 .) lmd.p =rdemdllmd[hc.i(ig.oe$C) abaot ig.oe$abawihmnrdemdlGV] #rdergeso (hikcefcet) ig ersin srn ofiins ce(mrdey~.-1 d lmd =lmd.p) ofl.ig( , , aba abaot) # # x 1 x 2 x 3 # 017 010 015 # .71 .92 .28 17/42 file:///Users/ytsun/elasticnet/index.html#42 17/42
  18. 18. 2/13/2014 Ridge Regression, LASSO and Elastic Net Approximately multicollinear · show that ridge regreesion correct coefficient signs and reduce mean squared error x =x +x +00 *romn 0 1 3 1 2 .5 nr(, , ) d=dt.rm( =y x =x,x =x,x =x) aafaey , 1 1 2 2 3 3 dtan=d140 ] .ri [:0, dts =d4150 ] .et [0:0, 18/42 file:///Users/ytsun/elasticnet/index.html#42 18/42
  19. 19. 2/13/2014 Ridge Regression, LASSO and Elastic Net OLS ostan=l( ~.-1 dtan l.ri my , .ri) ce(l.ri) ofostan # # x 1 x 2 x 3 # -.74-.52 063 # 036 032 .89 #peito err rdcin ros sm(.ety-peitostan nwaa=dts)^) u(dts$ rdc(l.ri, edt .et)2 # []3.3 # 1 75 19/42 file:///Users/ytsun/elasticnet/index.html#42 19/42
  20. 20. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression #cos tnn prmtrfrrdergeso hoe uig aaee o ig ersin rdetan=l.ig( ~.-1 dtan lmd =sq0 1,01) ig.ri mrdey , .ri, aba e(, 0 .) lmd.p =rdetanlmd[hc.i(ig.ri$C) abaot ig.ri$abawihmnrdetanGV] rdemdl=l.ig( ~.-1 dtan lmd =lmd.p) ig.oe mrdey , .ri, aba abaot ce(ig.oe) #cretsgs ofrdemdl orc in # # x 1 x 2 x 3 # 011 013 014 # .73 .96 .30 ces=ce(ig.oe) of ofrdemdl sm(.ety-a.arxdts[ -] %%mti(of,3 1)2 u(dts$ smti(.et, 1) * arxces , )^) # []3.7 # 1 68 20/42 file:///Users/ytsun/elasticnet/index.html#42 20/42
  21. 21. 2/13/2014 Ridge Regression, LASSO and Elastic Net LASSO }1 + ossal 2 X 2 Y { n im gra = Or equivalently p t | | j ∑ = 1 . t. s 2 2 X Y n im 1=j Pros · allow p >> n · enforce sparcity in parameters · goes to , OLS solution , t goes to 0, 0 = goes to 0, t goes to 2 · ) p n(O · quadratic programming problem, lars solution requires 21/42 file:///Users/ytsun/elasticnet/index.html#42 21/42
  22. 22. 2/13/2014 Ridge Regression, LASSO and Elastic Net Cons · if a group of predictors are highly correlated among themselves, LASSO tends to pick only one of them and shrink the other to zero · can not do grouped selection, tend to select one variable LASSO is good for eliminating trivial genes but not good for grouped selection 22/42 file:///Users/ytsun/elasticnet/index.html#42 22/42
  23. 23. 2/13/2014 Ridge Regression, LASSO and Elastic Net LARS algorithm of Efron et al (2004) · stepwise variable selection (Least angle regression and shrinkage) · less greedy version of traditional forward selection methods · solve the entire lasso solution path efficiently 2 ) p n(O · same order of computational efforts as a single OLS fit 23/42 file:///Users/ytsun/elasticnet/index.html#42 23/42
  24. 24. 2/13/2014 Ridge Regression, LASSO and Elastic Net LARS Path S LO ] 1 , 0[ s ,1 s 1 . t. s 2 2 X Y n im 24/42 file:///Users/ytsun/elasticnet/index.html#42 24/42
  25. 25. 2/13/2014 Ridge Regression, LASSO and Elastic Net parsimonious model lbayMS) irr(AS n=2 0 #bt i sas ea s pre bt =mti((,15 0 0 2 0 0 0,8 1 ea arxc3 ., , , , , , ) , ) p=lnt(ea eghbt) ro=03 h . cr =mti(,p p or arx0 , ) fr( i sqp){ o i n e() fr( i sqp){ o j n e() cr[,j =roasi-j ori ] h^b( ) } } X=mromn m =rp0 p,Sga=cr) vnr(, u e(, ) im or y=X%%bt +3*romn 0 1 * ea nr(, , ) d=a.aafaecidy X) sdt.rm(bn(, ) clae()=c"" pse(x,sqp) onmsd (y, at0"" e()) 25/42 file:///Users/ytsun/elasticnet/index.html#42 25/42
  26. 26. 2/13/2014 Ridge Regression, LASSO and Elastic Net OLS nsm=10 .i 0 me=rp0 nsm s e(, .i) fr( i sqnsm){ o i n e(.i) X=mromn m =rp0 p,Sga=cr) vnr(, u e(, ) im or y=X%%bt +3*romn 0 1 * ea nr(, , ) d=a.aafaecidy X) sdt.rm(bn(, ) clae()=c"" pse(x,sqp) onmsd (y, at0"" e()) #ftOSwtotitret i L ihu necp osmdl=l( ~.-1 d l.oe my , ) mei =sm(ofosmdl -bt)2 s[] u(ce(l.oe) ea^) } mda(s) einme # []63 # 1 .2 26/42 file:///Users/ytsun/elasticnet/index.html#42 26/42
  27. 27. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression nsm=10 .i 0 me=rp0 nsm s e(, .i) fr( i sqnsm){ o i n e(.i) X=mromn m =rp0 p,Sga=cr) vnr(, u e(, ) im or y=X%%bt +3*romn 0 1 * ea nr(, , ) d=a.aafaecidy X) sdt.rm(bn(, ) clae()=c"" pse(x,sqp) onmsd (y, at0"" e()) rdec =l.ig( ~.-1 d lmd =sq0 1,01) ig.v mrdey , , aba e(, 0 .) lmd.p =rdec$abawihmnrdec$C) abaot ig.vlmd[hc.i(ig.vGV] #ftrdergeso wtotitret i ig ersin ihu necp rdemdl=l.ig( ~.-1 d lmd =lmd.p) ig.oe mrdey , , aba abaot mei =sm(ofrdemdl -bt)2 s[] u(ce(ig.oe) ea^) } mda(s) einme # []404 # 1 .7 27/42 file:///Users/ytsun/elasticnet/index.html#42 27/42
  28. 28. 2/13/2014 Ridge Regression, LASSO and Elastic Net LASSO lbayeatce) irr(lsint nsm=10 .i 0 me=rp0 nsm s e(, .i) fr( i sqnsm){ o i n e(.i) X=mromn m =rp0 p,Sga=cr) vnr(, u e(, ) im or y=X%%bt +3*romn 0 1 * ea nr(, , ) ojc =c.ntX y lmd =0 s=sq01 1 lnt =10,po.t=FLE b.v vee(, , aba , e(., , egh 0) lti AS, md ="rcin,tae=FLE mxses=8) oe fato" rc AS, a.tp 0 sot=ojc$[hc.i(b.vc) .p b.vswihmnojc$v] lsomdl=ee(,y lmd =0 itret=FLE as.oe ntX , aba , necp AS) ces=peitlsomdl s=sot tp ="ofiins,md ="rcin) of rdc(as.oe, .p, ye cefcet" oe fato" mei =sm(of$ofiins-bt)2 s[] u(cescefcet ea^) } mda(s) einme # []333 # 1 .9 28/42 file:///Users/ytsun/elasticnet/index.html#42 28/42
  29. 29. 2/13/2014 Ridge Regression, LASSO and Elastic Net Elastic Net } 2 2 2 + 1 1 + ) X Y( T te ne ) X Y({ n im gra = Pros · enforce sparsity · no limitation on the number of selected variable · encourage grouping effect in the presence of highly correlated predictors Cons · naive elastic net suffers from double shrinkage Correction )2 + 1( = te ne 29/42 file:///Users/ytsun/elasticnet/index.html#42 29/42
  30. 30. 2/13/2014 Ridge Regression, LASSO and Elastic Net LASSO vs Elastic Net Construct a data set with grouped effects to show that Elastic Net outperform LASSO in grouped selection · response y , 2x , 1 x as minor factors 2 ) 1 , 0( N + 2z z and 3x 1. 0 + 1z 1 z Two independent "hidden" factors , 5x , 4 x we would like to shrink to zero as dominant factors, 6x · 6 predictors fall into two group, = y Correlated grouped covariates 1z + 3 2z + 6 = = 3x 6x ,2 ,5 )6x , + + 1z 2z = = 2x 5x ,1 ,4 + + 1z 2z = = 1x 4x , 2 x , 1x( = X 30/42 file:///Users/ytsun/elasticnet/index.html#42 30/42
  31. 31. 2/13/2014 Ridge Regression, LASSO and Elastic Net Simulated data N=10 0 z =rnfN mn=0 mx=2) 1 ui(, i , a 0 z =rnfN mn=0 mx=2) 2 ui(, i , a 0 y=z +01*z +romN 1 . 2 nr() X=cidz %%mti((,-,1,1 3,z %%mti((,-,1,1 3) bn(1 * arxc1 1 ) , ) 2 * arxc1 1 ) , ) X=X+mti(nr( *6,N 6 arxromN ) , ) 31/42 file:///Users/ytsun/elasticnet/index.html#42 31/42
  32. 32. 2/13/2014 Ridge Regression, LASSO and Elastic Net LASSO path lbayeatce) irr(lsint ojlso=ee(,y lmd =0 b.as ntX , aba ) po(b.as,ueclr=TU) ltojlso s.oo RE 32/42 file:///Users/ytsun/elasticnet/index.html#42 32/42
  33. 33. 2/13/2014 Ridge Regression, LASSO and Elastic Net Elastic Net lbayeatce) irr(lsint ojee =ee(,y lmd =05 b.nt ntX , aba .) po(b.nt ueclr=TU) ltojee, s.oo RE 33/42 file:///Users/ytsun/elasticnet/index.html#42 33/42
  34. 34. 2/13/2014 Ridge Regression, LASSO and Elastic Net How to choose tuning parameter For a sequence of , find the s that minimizer of the CV prediction error and then find the which minimize the CV prediction error lbayeatce) irr(lsint ojc =c.ntX y lmd =05 s=sq0 1 lnt =10,md ="rcin, b.v vee(, , aba ., e(, , egh 0) oe fato" tae=FLE mxses=8) rc AS, a.tp 0 34/42 file:///Users/ytsun/elasticnet/index.html#42 34/42
  35. 35. 2/13/2014 Ridge Regression, LASSO and Elastic Net Prostate Cancer Example · Predictors are eight clinical measures · Training set with 67 observations · Test set with 30 observations · Modeling fitting and turning parameter selection by tenfold CV on training set · Compare model performance by prediction mean-squared error on the test data 35/42 file:///Users/ytsun/elasticnet/index.html#42 35/42
  36. 36. 2/13/2014 Ridge Regression, LASSO and Elastic Net Compare models · medium correlation among predictors and the highest correlation is 0.76 · elastic net beat LASSO and ridge regression beat OLS 36/42 file:///Users/ytsun/elasticnet/index.html#42 36/42
  37. 37. 2/13/2014 Ridge Regression, LASSO and Elastic Net Summary · Ridge Regression: - good for multicollinearity, grouped selection - not good for variable selection · LASSO - good for variable selection - not good for grouped selection for strongly correlated predictors · Elastic Net - combine strength between Ridge Regression and LASSO · Regularization - trade bias for variance reduction - better prediction accuracy 37/42 file:///Users/ytsun/elasticnet/index.html#42 37/42
  38. 38. 2/13/2014 Ridge Regression, LASSO and Elastic Net Reference Most of the materials covered in this slides are adapted from · Paper: Regularization and variable selection via the elastic net · Slide: http://www.stanford.edu/~hastie/TALKS/enet_talk.pdf · The Elements of Statistical Learning 38/42 file:///Users/ytsun/elasticnet/index.html#42 38/42
  39. 39. 2/13/2014 Ridge Regression, LASSO and Elastic Net Exercise 1: simulated data bt =mti((e(,1) rp0 2),4,1 ea arxcrp3 5, e(, 5) 0 ) sga=1 im 5 n=50 0 z =mti(nr(,0 1,n 1 1 arxromn , ) , ) z =mti(nr(,0 1,n 1 2 arxromn , ) , ) z =mti(nr(,0 1,n 1 3 arxromn , ) , ) X =z %%mti(e(,5,1 5 +00 *mti(nr( *5,n 5 1 1 * arxrp1 ) , ) .1 arxromn ) , ) X =z %%mti(e(,5,1 5 +00 *mti(nr( *5,n 5 2 2 * arxrp1 ) , ) .1 arxromn ) , ) X =z %%mti(e(,5,1 5 +00 *mti(nr( *5,n 5 3 3 * arxrp1 ) , ) .1 arxromn ) , ) X =mti(nr( *2,0 1,n 2) 4 arxromn 5 , ) , 5 X=cidX,X,X,X) bn(1 2 3 4 Y=X%%bt +sga*romn 0 1 * ea im nr(, , ) Ytan=Y140 .ri [:0] Xtan=X140 ] .ri [:0, Yts =Y4050 .et [0:0] Xts =X4050 ] .et [0:0, 39/42 file:///Users/ytsun/elasticnet/index.html#42 39/42
  40. 40. 2/13/2014 Ridge Regression, LASSO and Elastic Net Questions: · Fit OLS, LASSO, Ridge regression and elastic net to the training data and calculate the prediction error from the test data · Simulate the data set for 100 times and compare the median mean-squared errors for those models 40/42 file:///Users/ytsun/elasticnet/index.html#42 40/42
  41. 41. 2/13/2014 Ridge Regression, LASSO and Elastic Net Exercise 2: Diabetes · x a matrix with 10 columns · y a numeric vector (442 rows) · x2 a matrix with 64 columns lbayeatce) irr(lsint dt(ibts aadaee) clae(ibts onmsdaee) # []"" "" "2 # 1 x y x" 41/42 file:///Users/ytsun/elasticnet/index.html#42 41/42
  42. 42. 2/13/2014 Ridge Regression, LASSO and Elastic Net Questions · Fit LASSO and Elastic Net to the data with optimal tuning parameter chosen by cross validation. · Compare solution paths for the two methods 42/42 file:///Users/ytsun/elasticnet/index.html#42 42/42
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×