Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ridge regression, lasso and elastic net

30,215 views

Published on

It is given by Yunting Sun at NYC open data meetup, see more information at www.nycopendata.com or join us at www.meetup.com/nyc-open-data

Published in: Education
  • Statistical Modeling !
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi All, We are planning to start Hadoop online training batch on this week... If any one interested to attend the demo please register in our website... For this batch we are also provide everyday recorded sessions with Materials. For more information feel free to contact us : siva@keylabstraining.com. For Course Content and Recorded Demo Click Here : http://www.keylabstraining.com/hadoop-online-training-hyderabad-bangalore
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • very nice presentation
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Ridge regression, lasso and elastic net

  1. 1. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression, LASSO and Elastic Net A talk given at NYC open data meetup, find more at www.nycopendata.com Yunting Sun Google Inc file:///Users/ytsun/elasticnet/index.html#42 1/42
  2. 2. 2/13/2014 Ridge Regression, LASSO and Elastic Net Overview · Linear Regression · Ordinary Least Square · Ridge Regression · LASSO · Elastic Net · Examples · Exercises Note: make sure that you have installed elasticnet package library(MASS) library(elasticnet) 2/42 file:///Users/ytsun/elasticnet/index.html#42 2/42
  3. 3. 2/13/2014 Ridge Regression, LASSO and Elastic Net Linear Regression n observations, each has one response variable and p predictors ,) p X , p × n , 1 x( = x to px , n , 1x y , y( = Y 1 , 1X( = X y to predict and ) px , ) y, of predictors - describe the actual relationship between T x = y - use T , 1 × n · We want to find a linear combination · Examples - find relationship between pressure and water boiling point - use GDP to predict interest rate (the accuracy of the prediction is important but the actual relationship may not matter) 3/42 file:///Users/ytsun/elasticnet/index.html#42 3/42
  4. 4. 2/13/2014 Ridge Regression, LASSO and Elastic Net Quality of an estimator + T 0 x = y , the difference between the actual response and the model prediction 2 2 ]) T 0 ) T 0 x T 0 T 0 x (raV + ) 0 T 0 x y([E = ) 0 x(EPE x(E + 2 2 = ) 0 x(EPE x( sa iB[ + 2 . 0 x in estimating T T 0 x ) = ) 0 x(EPE T 0 x T 0 0 x(E = ) T 0 x( sa iB · The second and third terms make up the mean squared error of 0 x Where ] 0 x = x| ) 0 0 · Prediction error at , is the true value and ) 1 , 0(  Suppose . · How to estimate prediction error? 4/42 file:///Users/ytsun/elasticnet/index.html#42 4/42
  5. 5. 2/13/2014 Ridge Regression, LASSO and Elastic Net K-fold Cross Validation · Split dataset into K groups - leave one group out as test set - use the rest K-1 groups as training set to train the model - estimate prediction error of the model from the test set 5/42 file:///Users/ytsun/elasticnet/index.html#42 5/42
  6. 6. 2/13/2014 Ridge Regression, LASSO and Elastic Net K-fold Cross Validation i E Let be the prediction errors for the ith test group, the average prediction error is K 1 iE ∑ K = E 1=i 6/42 file:///Users/ytsun/elasticnet/index.html#42 6/42
  7. 7. 2/13/2014 Ridge Regression, LASSO and Elastic Net Quality of an estimator · Mean squared error of the estimator 2 ] ) ) 0 (raV + ) ([ E = ) 2 (ESM ( sa iB = ) (ESM · A biased estimator may achieve smaller MSE than an unbiased estimator · useful when our goal is to understand the relationship instead of prediction 7/42 file:///Users/ytsun/elasticnet/index.html#42 7/42
  8. 8. 2/13/2014 Ridge Regression, LASSO and Elastic Net Least Squares Estimator (LSE) + 1× n p× n X 1× p = 1× n Y p n, ,1 = i , i + j ji x ∑ = i y 1=j ) 1 , 0(  d.i.i i Minimize Residual Sum of Square (RSS) T X 1 )X T X( = ) X T X X and Y Y( T ) p > n X Y( n im gra = The solution is uniquely well defined when inversible 8/42 file:///Users/ytsun/elasticnet/index.html#42 8/42
  9. 9. 2/13/2014 Ridge Regression, LASSO and Elastic Net Pros de sa i b n u = ) (E · LSE has the minimum MSE among unbiased linear estimator though a biased estimator may have smaller MSE than LSE · explicit form 2 ) p n(O · computation · confidence interval, significance of coefficient 9/42 file:///Users/ytsun/elasticnet/index.html#42 9/42
  10. 10. 2/13/2014 Ridge Regression, LASSO and Elastic Net Cons 2 1 )X T X( = ) (raV · Multicollinearity leads to high variance of estimator - exact or approximate linear relationship among predictors 1 )X T X( - tends to have large entries · Requires n > p, i.e., number of observations larger than the number of predictors r orre n o i tc i der p de tam i t se p 2 + ) n / p( 2 = ) 0 x(EPE 0x E · Prediction error increases linearly as a function of · Hard to interpret when the number of predictors is large, need a smaller subset that exhibit the strongest effects 10/42 file:///Users/ytsun/elasticnet/index.html#42 10/42
  11. 11. 2/13/2014 Ridge Regression, LASSO and Elastic Net Example: Leukemia classification · Leukemia Data, Golub et al. Science 1999 ji X · 9217 = p · There are 38 training samples and 34 test samples with total genes (p >> n) is the gene expression value for sample i and gene j · Sample i either has tumor type AML or ALL · We want to select genes relevant to tumor type - eliminate the trivial genes - grouped selection as many genes are highly correlated · LSE does not work here! 11/42 file:///Users/ytsun/elasticnet/index.html#42 11/42
  12. 12. 2/13/2014 Ridge Regression, LASSO and Elastic Net Solution: regularization · instead of minimizing RSS, ) sre temara p e h t n o y t la ne p × 0 = ! + SSR( ez im i n im · Trade bias for smaller variance, biased estimator when · Continuous variable selection (unlike AIC, BIC, subset selection) · can be chosen by cross validation 12/42 file:///Users/ytsun/elasticnet/index.html#42 12/42
  13. 13. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression } 2 + 2 Y T 2 2 X 1 e g dir X )I Y + X { n im gra = T e g dir X( = Pros: · p >> n · Multicollinearity · biased but smaller variance and smaller MSE (Mean Squared Error) · explicit solution Cons: · shrink coefficients to zero but can not produce a parsimonious model 13/42 file:///Users/ytsun/elasticnet/index.html#42 13/42
  14. 14. 2/13/2014 Ridge Regression, LASSO and Elastic Net Grouped Selection · if two predictors are highly correlated among themselves, the estimated coefficients will be similar for them. · if some variables are exactly identical, they will have same coefficients Ridge is good for grouped selection but not good for eliminating trivial genes 14/42 file:///Users/ytsun/elasticnet/index.html#42 14/42
  15. 15. 2/13/2014 Ridge Regression, LASSO and Elastic Net Example: Ridge Regression (Collinearity) 2x + 1x = 3x · multicollinearity · show that ridge regression beats OLS in the multilinearity case lbayMS) irr(AS n=50 0 z=romn 0 1 nr(, , ) y=z+02*romn 0 1 . nr(, , ) x =z+romn 0 1 1 nr(, , ) x =z+romn 0 1 2 nr(, , ) x =x +x 3 1 2 d=dt.rm( =y x =x,x =x,x =x) aafaey , 1 1 2 2 3 3 15/42 file:///Users/ytsun/elasticnet/index.html#42 15/42
  16. 16. 2/13/2014 Ridge Regression, LASSO and Elastic Net OLS #OSfi t cluaecefcetfrx L al o aclt ofiin o 3 osmdl=l( ~.-1 d l.oe my , ) ce(l.oe) ofosmdl # # x 1 x 2 x 3 # 035 038 # .03 .17 N A 16/42 file:///Users/ytsun/elasticnet/index.html#42 16/42
  17. 17. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression #cos tnn prmtr hoe uig aaee rdemdl=l.ig( ~.-1 d lmd =sq0 1,01) ig.oe mrdey , , aba e(, 0 .) lmd.p =rdemdllmd[hc.i(ig.oe$C) abaot ig.oe$abawihmnrdemdlGV] #rdergeso (hikcefcet) ig ersin srn ofiins ce(mrdey~.-1 d lmd =lmd.p) ofl.ig( , , aba abaot) # # x 1 x 2 x 3 # 017 010 015 # .71 .92 .28 17/42 file:///Users/ytsun/elasticnet/index.html#42 17/42
  18. 18. 2/13/2014 Ridge Regression, LASSO and Elastic Net Approximately multicollinear · show that ridge regreesion correct coefficient signs and reduce mean squared error x =x +x +00 *romn 0 1 3 1 2 .5 nr(, , ) d=dt.rm( =y x =x,x =x,x =x) aafaey , 1 1 2 2 3 3 dtan=d140 ] .ri [:0, dts =d4150 ] .et [0:0, 18/42 file:///Users/ytsun/elasticnet/index.html#42 18/42
  19. 19. 2/13/2014 Ridge Regression, LASSO and Elastic Net OLS ostan=l( ~.-1 dtan l.ri my , .ri) ce(l.ri) ofostan # # x 1 x 2 x 3 # -.74-.52 063 # 036 032 .89 #peito err rdcin ros sm(.ety-peitostan nwaa=dts)^) u(dts$ rdc(l.ri, edt .et)2 # []3.3 # 1 75 19/42 file:///Users/ytsun/elasticnet/index.html#42 19/42
  20. 20. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression #cos tnn prmtrfrrdergeso hoe uig aaee o ig ersin rdetan=l.ig( ~.-1 dtan lmd =sq0 1,01) ig.ri mrdey , .ri, aba e(, 0 .) lmd.p =rdetanlmd[hc.i(ig.ri$C) abaot ig.ri$abawihmnrdetanGV] rdemdl=l.ig( ~.-1 dtan lmd =lmd.p) ig.oe mrdey , .ri, aba abaot ce(ig.oe) #cretsgs ofrdemdl orc in # # x 1 x 2 x 3 # 011 013 014 # .73 .96 .30 ces=ce(ig.oe) of ofrdemdl sm(.ety-a.arxdts[ -] %%mti(of,3 1)2 u(dts$ smti(.et, 1) * arxces , )^) # []3.7 # 1 68 20/42 file:///Users/ytsun/elasticnet/index.html#42 20/42
  21. 21. 2/13/2014 Ridge Regression, LASSO and Elastic Net LASSO }1 + ossal 2 X 2 Y { n im gra = Or equivalently p t | | j ∑ = 1 . t. s 2 2 X Y n im 1=j Pros · allow p >> n · enforce sparcity in parameters · goes to , OLS solution , t goes to 0, 0 = goes to 0, t goes to 2 · ) p n(O · quadratic programming problem, lars solution requires 21/42 file:///Users/ytsun/elasticnet/index.html#42 21/42
  22. 22. 2/13/2014 Ridge Regression, LASSO and Elastic Net Cons · if a group of predictors are highly correlated among themselves, LASSO tends to pick only one of them and shrink the other to zero · can not do grouped selection, tend to select one variable LASSO is good for eliminating trivial genes but not good for grouped selection 22/42 file:///Users/ytsun/elasticnet/index.html#42 22/42
  23. 23. 2/13/2014 Ridge Regression, LASSO and Elastic Net LARS algorithm of Efron et al (2004) · stepwise variable selection (Least angle regression and shrinkage) · less greedy version of traditional forward selection methods · solve the entire lasso solution path efficiently 2 ) p n(O · same order of computational efforts as a single OLS fit 23/42 file:///Users/ytsun/elasticnet/index.html#42 23/42
  24. 24. 2/13/2014 Ridge Regression, LASSO and Elastic Net LARS Path S LO ] 1 , 0[ s ,1 s 1 . t. s 2 2 X Y n im 24/42 file:///Users/ytsun/elasticnet/index.html#42 24/42
  25. 25. 2/13/2014 Ridge Regression, LASSO and Elastic Net parsimonious model lbayMS) irr(AS n=2 0 #bt i sas ea s pre bt =mti((,15 0 0 2 0 0 0,8 1 ea arxc3 ., , , , , , ) , ) p=lnt(ea eghbt) ro=03 h . cr =mti(,p p or arx0 , ) fr( i sqp){ o i n e() fr( i sqp){ o j n e() cr[,j =roasi-j ori ] h^b( ) } } X=mromn m =rp0 p,Sga=cr) vnr(, u e(, ) im or y=X%%bt +3*romn 0 1 * ea nr(, , ) d=a.aafaecidy X) sdt.rm(bn(, ) clae()=c"" pse(x,sqp) onmsd (y, at0"" e()) 25/42 file:///Users/ytsun/elasticnet/index.html#42 25/42
  26. 26. 2/13/2014 Ridge Regression, LASSO and Elastic Net OLS nsm=10 .i 0 me=rp0 nsm s e(, .i) fr( i sqnsm){ o i n e(.i) X=mromn m =rp0 p,Sga=cr) vnr(, u e(, ) im or y=X%%bt +3*romn 0 1 * ea nr(, , ) d=a.aafaecidy X) sdt.rm(bn(, ) clae()=c"" pse(x,sqp) onmsd (y, at0"" e()) #ftOSwtotitret i L ihu necp osmdl=l( ~.-1 d l.oe my , ) mei =sm(ofosmdl -bt)2 s[] u(ce(l.oe) ea^) } mda(s) einme # []63 # 1 .2 26/42 file:///Users/ytsun/elasticnet/index.html#42 26/42
  27. 27. 2/13/2014 Ridge Regression, LASSO and Elastic Net Ridge Regression nsm=10 .i 0 me=rp0 nsm s e(, .i) fr( i sqnsm){ o i n e(.i) X=mromn m =rp0 p,Sga=cr) vnr(, u e(, ) im or y=X%%bt +3*romn 0 1 * ea nr(, , ) d=a.aafaecidy X) sdt.rm(bn(, ) clae()=c"" pse(x,sqp) onmsd (y, at0"" e()) rdec =l.ig( ~.-1 d lmd =sq0 1,01) ig.v mrdey , , aba e(, 0 .) lmd.p =rdec$abawihmnrdec$C) abaot ig.vlmd[hc.i(ig.vGV] #ftrdergeso wtotitret i ig ersin ihu necp rdemdl=l.ig( ~.-1 d lmd =lmd.p) ig.oe mrdey , , aba abaot mei =sm(ofrdemdl -bt)2 s[] u(ce(ig.oe) ea^) } mda(s) einme # []404 # 1 .7 27/42 file:///Users/ytsun/elasticnet/index.html#42 27/42
  28. 28. 2/13/2014 Ridge Regression, LASSO and Elastic Net LASSO lbayeatce) irr(lsint nsm=10 .i 0 me=rp0 nsm s e(, .i) fr( i sqnsm){ o i n e(.i) X=mromn m =rp0 p,Sga=cr) vnr(, u e(, ) im or y=X%%bt +3*romn 0 1 * ea nr(, , ) ojc =c.ntX y lmd =0 s=sq01 1 lnt =10,po.t=FLE b.v vee(, , aba , e(., , egh 0) lti AS, md ="rcin,tae=FLE mxses=8) oe fato" rc AS, a.tp 0 sot=ojc$[hc.i(b.vc) .p b.vswihmnojc$v] lsomdl=ee(,y lmd =0 itret=FLE as.oe ntX , aba , necp AS) ces=peitlsomdl s=sot tp ="ofiins,md ="rcin) of rdc(as.oe, .p, ye cefcet" oe fato" mei =sm(of$ofiins-bt)2 s[] u(cescefcet ea^) } mda(s) einme # []333 # 1 .9 28/42 file:///Users/ytsun/elasticnet/index.html#42 28/42
  29. 29. 2/13/2014 Ridge Regression, LASSO and Elastic Net Elastic Net } 2 2 2 + 1 1 + ) X Y( T te ne ) X Y({ n im gra = Pros · enforce sparsity · no limitation on the number of selected variable · encourage grouping effect in the presence of highly correlated predictors Cons · naive elastic net suffers from double shrinkage Correction )2 + 1( = te ne 29/42 file:///Users/ytsun/elasticnet/index.html#42 29/42
  30. 30. 2/13/2014 Ridge Regression, LASSO and Elastic Net LASSO vs Elastic Net Construct a data set with grouped effects to show that Elastic Net outperform LASSO in grouped selection · response y , 2x , 1 x as minor factors 2 ) 1 , 0( N + 2z z and 3x 1. 0 + 1z 1 z Two independent "hidden" factors , 5x , 4 x we would like to shrink to zero as dominant factors, 6x · 6 predictors fall into two group, = y Correlated grouped covariates 1z + 3 2z + 6 = = 3x 6x ,2 ,5 )6x , + + 1z 2z = = 2x 5x ,1 ,4 + + 1z 2z = = 1x 4x , 2 x , 1x( = X 30/42 file:///Users/ytsun/elasticnet/index.html#42 30/42
  31. 31. 2/13/2014 Ridge Regression, LASSO and Elastic Net Simulated data N=10 0 z =rnfN mn=0 mx=2) 1 ui(, i , a 0 z =rnfN mn=0 mx=2) 2 ui(, i , a 0 y=z +01*z +romN 1 . 2 nr() X=cidz %%mti((,-,1,1 3,z %%mti((,-,1,1 3) bn(1 * arxc1 1 ) , ) 2 * arxc1 1 ) , ) X=X+mti(nr( *6,N 6 arxromN ) , ) 31/42 file:///Users/ytsun/elasticnet/index.html#42 31/42
  32. 32. 2/13/2014 Ridge Regression, LASSO and Elastic Net LASSO path lbayeatce) irr(lsint ojlso=ee(,y lmd =0 b.as ntX , aba ) po(b.as,ueclr=TU) ltojlso s.oo RE 32/42 file:///Users/ytsun/elasticnet/index.html#42 32/42
  33. 33. 2/13/2014 Ridge Regression, LASSO and Elastic Net Elastic Net lbayeatce) irr(lsint ojee =ee(,y lmd =05 b.nt ntX , aba .) po(b.nt ueclr=TU) ltojee, s.oo RE 33/42 file:///Users/ytsun/elasticnet/index.html#42 33/42
  34. 34. 2/13/2014 Ridge Regression, LASSO and Elastic Net How to choose tuning parameter For a sequence of , find the s that minimizer of the CV prediction error and then find the which minimize the CV prediction error lbayeatce) irr(lsint ojc =c.ntX y lmd =05 s=sq0 1 lnt =10,md ="rcin, b.v vee(, , aba ., e(, , egh 0) oe fato" tae=FLE mxses=8) rc AS, a.tp 0 34/42 file:///Users/ytsun/elasticnet/index.html#42 34/42
  35. 35. 2/13/2014 Ridge Regression, LASSO and Elastic Net Prostate Cancer Example · Predictors are eight clinical measures · Training set with 67 observations · Test set with 30 observations · Modeling fitting and turning parameter selection by tenfold CV on training set · Compare model performance by prediction mean-squared error on the test data 35/42 file:///Users/ytsun/elasticnet/index.html#42 35/42
  36. 36. 2/13/2014 Ridge Regression, LASSO and Elastic Net Compare models · medium correlation among predictors and the highest correlation is 0.76 · elastic net beat LASSO and ridge regression beat OLS 36/42 file:///Users/ytsun/elasticnet/index.html#42 36/42
  37. 37. 2/13/2014 Ridge Regression, LASSO and Elastic Net Summary · Ridge Regression: - good for multicollinearity, grouped selection - not good for variable selection · LASSO - good for variable selection - not good for grouped selection for strongly correlated predictors · Elastic Net - combine strength between Ridge Regression and LASSO · Regularization - trade bias for variance reduction - better prediction accuracy 37/42 file:///Users/ytsun/elasticnet/index.html#42 37/42
  38. 38. 2/13/2014 Ridge Regression, LASSO and Elastic Net Reference Most of the materials covered in this slides are adapted from · Paper: Regularization and variable selection via the elastic net · Slide: http://www.stanford.edu/~hastie/TALKS/enet_talk.pdf · The Elements of Statistical Learning 38/42 file:///Users/ytsun/elasticnet/index.html#42 38/42
  39. 39. 2/13/2014 Ridge Regression, LASSO and Elastic Net Exercise 1: simulated data bt =mti((e(,1) rp0 2),4,1 ea arxcrp3 5, e(, 5) 0 ) sga=1 im 5 n=50 0 z =mti(nr(,0 1,n 1 1 arxromn , ) , ) z =mti(nr(,0 1,n 1 2 arxromn , ) , ) z =mti(nr(,0 1,n 1 3 arxromn , ) , ) X =z %%mti(e(,5,1 5 +00 *mti(nr( *5,n 5 1 1 * arxrp1 ) , ) .1 arxromn ) , ) X =z %%mti(e(,5,1 5 +00 *mti(nr( *5,n 5 2 2 * arxrp1 ) , ) .1 arxromn ) , ) X =z %%mti(e(,5,1 5 +00 *mti(nr( *5,n 5 3 3 * arxrp1 ) , ) .1 arxromn ) , ) X =mti(nr( *2,0 1,n 2) 4 arxromn 5 , ) , 5 X=cidX,X,X,X) bn(1 2 3 4 Y=X%%bt +sga*romn 0 1 * ea im nr(, , ) Ytan=Y140 .ri [:0] Xtan=X140 ] .ri [:0, Yts =Y4050 .et [0:0] Xts =X4050 ] .et [0:0, 39/42 file:///Users/ytsun/elasticnet/index.html#42 39/42
  40. 40. 2/13/2014 Ridge Regression, LASSO and Elastic Net Questions: · Fit OLS, LASSO, Ridge regression and elastic net to the training data and calculate the prediction error from the test data · Simulate the data set for 100 times and compare the median mean-squared errors for those models 40/42 file:///Users/ytsun/elasticnet/index.html#42 40/42
  41. 41. 2/13/2014 Ridge Regression, LASSO and Elastic Net Exercise 2: Diabetes · x a matrix with 10 columns · y a numeric vector (442 rows) · x2 a matrix with 64 columns lbayeatce) irr(lsint dt(ibts aadaee) clae(ibts onmsdaee) # []"" "" "2 # 1 x y x" 41/42 file:///Users/ytsun/elasticnet/index.html#42 41/42
  42. 42. 2/13/2014 Ridge Regression, LASSO and Elastic Net Questions · Fit LASSO and Elastic Net to the data with optimal tuning parameter chosen by cross validation. · Compare solution paths for the two methods 42/42 file:///Users/ytsun/elasticnet/index.html#42 42/42

×