SlideShare a Scribd company logo
1 of 25
LINEAR MODEL SELECTION
AND REGULARIZATION
Chapter 06
Disclaimer:
This PPT is modified based on IOM 530: Intro. to Statistical Learning
STT592-002: Intro. to Statistical Learning 1
Outline
Subset Selection
Best Subset Selection
Stepwise Selection
Choosing the Optimal Model
Shrinkage Methods
Ridge Regression
The Lasso
STT592-002: Intro. to Statistical Learning 2
Improving on the Least Squares
Regression Estimates?
• We want to improve the Linear Regression model, by
replacing the least square fitting with some alternative
fitting procedure, i.e., the values that minimize the mean
square error (MSE)
• There are 2 reasons we might not prefer to just use the
ordinary least squares (OLS) estimates
1. Prediction Accuracy
2. Model Interpretability
STT592-002: Intro. to Statistical Learning 3
1. Prediction Accuracy
• The least squares estimates have relatively low bias and
low variability especially when the relationship between Y
and X is linear and the number of observations n is way
bigger than the number of predictors p
• But, when , then the least squares fit can have high
variance and may result in over fitting and poor estimates
on unseen observations,
• And, when , then the variability of the least squares
fit increases dramatically, and the variance of these
estimates in infinite
STT592-002: Intro. to Statistical Learning 4
n » p
n< p
(n>> p)
2. Model Interpretability
• When we have a large number of variables X in the model
there will generally be many that have little or no effect on
Y
• Leaving these variables in the model makes it harder to
see the “big picture”, i.e., the effect of the “important
variables”
• The model would be easier to interpret by removing (i.e.
setting the coefficients to zero) the unimportant variables
STT592-002: Intro. to Statistical Learning 5
Solution
• Subset Selection
• Identifying a subset of all p predictors X that we believe to be related to
the response Y, and then fitting the model using this subset
• E.g. best subset selection and stepwise selection
• Shrinkage
• Involves shrinking the estimates coefficients towards zero
• This shrinkage reduces the variance
• Some of the coefficients may shrink to exactly zero, and hence
shrinkage methods can also perform variable selection
• E.g. Ridge regression and the Lasso
• Dimension Reduction
• Involves projecting all p predictors into an M-dimensional space where
M < p, and then fitting linear regression model
• E.g. Principle Components Regression
STT592-002: Intro. to Statistical Learning 6
6.1 SUBSET SELECTION
STT592-002: Intro. to Statistical Learning 7
6.6.1 Best Subset Selection
• In this approach, we run a linear regression for each
possible combination of the X predictors
• How do we judge which subset is the “best”?
• One simple approach is to take the subset with the
smallest RSS or the largest R2
• Unfortunately, one can show that the model that includes
all the variables will always have the largest R2 (and
smallest RSS)
STT592-002: Intro. to Statistical Learning 8
Credit Data: R2 vs. Subset Size
• The RSS/R2 will always decline/increase as the number of
variables increase so they are not very useful
• The red line tracks the best model for a given number of
predictors, according to RSS and R2
STT592-002: Intro. to Statistical Learning 9
Other Measures of Comparison
• To compare different models, we can use other
approaches:
• Adjusted R2
• AIC (Akaike information criterion)
• BIC (Bayesian information criterion)
• Cp (equivalent to AIC for linear regression)
• These methods add penalty to RSS for the number of
variables (i.e. complexity) in the model
• None are perfect
STT592-002: Intro. to Statistical Learning 10
Credit Data: Cp, BIC, and Adjusted R2
• A small value of Cp and BIC indicates a low error, and
thus a better model
• A large value for the Adjusted R2 indicates a better model
STT592-002: Intro. to Statistical Learning 11
6.1.2 Stepwise Selection
• Best Subset Selection is computationally intensive
especially when we have a large number of predictors
(large p)
• More attractive methods:
• Forward Stepwise Selection: Begins with the model containing no
predictor, and then adds one predictor at a time that improves the
model the most until no further improvement is possible
• Backward Stepwise Selection: Begins with the model containing all
predictors, and then deleting one predictor at a time that improves
the model the most until no further improvement is possible
STT592-002: Intro. to Statistical Learning 12
6.2 SHRINKAGE METHODS
STT592-002: Intro. to Statistical Learning 13
6.2.1 Ridge Regression
• Ordinary Least Squares (OLS) estimates by
minimizing
• Ridge Regression uses a slightly different equation
STT592-002: Intro. to Statistical Learning 14
b's
Ridge Regression Adds a Penalty on !
• The effect of this equation is to add a penalty of the form
Where the tuning parameter is a positive value.
• This has the effect of “shrinking” large values of
towards zero.
• It turns out that such a constraint should improve the fit,
because shrinking the coefficients can significantly reduce
their variance
• Notice that when = 0, we get the OLS!
STT592-002: Intro. to Statistical Learning 15
b's
b's
Credit Data: Ridge Regression
• As increases, the standardized coefficients shrinks
towards zero.
STT592-002: Intro. to Statistical Learning 16
Why can shrinking towards zero be a good
thing to do?
• It turns out that the OLS estimates generally have low
bias but can be highly variable. In particular when n and p
are of similar size or when n < p, then the OLS estimates
will be extremely variable
• The penalty term makes the ridge regression estimates
biased but can also substantially reduce variance
• Thus, there is a bias/ variance trade-off
STT592-002: Intro. to Statistical Learning 17
Ridge Regression Bias/ Variance
• Black: Bias
• Green: Variance
• Purple: MSE
• Increase increases bias but decreases variance
STT592-002: Intro. to Statistical Learning 18
Bias/ Variance Trade-off
• In general, the ridge
regression estimates
will be more biased
than the OLS ones but
have lower variance
• Ridge regression will
work best in situations
where the OLS
estimates have high
variance
STT592-002: Intro. to Statistical Learning 19
Computational Advantages of Ridge
Regression
• If p is large, then using the best subset selection
approach requires searching through enormous numbers
of possible models
• With Ridge Regression, for any given , we only need to
fit one model and the computations turn out to be very
simple
• Ridge Regression can even be used when p > n, a
situation where OLS fails completely!
STT592-002: Intro. to Statistical Learning 20
6.2.2. The LASSO
• Ridge Regression isn’t perfect
• One significant problem is that the penalty term will never
force any of the coefficients to be exactly zero. Thus, the
final model will include all variables, which makes it
harder to interpret
• A more modern alternative is the LASSO
• The LASSO works in a similar way to Ridge Regression,
except it uses a different penalty term
STT592-002: Intro. to Statistical Learning 21
LASSO’s Penalty Term
• Ridge Regression minimizes
• The LASSO estimates the by minimizing the
STT592-002: Intro. to Statistical Learning 22
b's
What’s the Big Deal?
• This seems like a very similar idea but there is a big
difference
• Using this penalty, it could be proven mathematically that
some coefficients end up being set to exactly zero
• With LASSO, we can produce a model that has high
predictive power and it is simple to interpret
STT592-002: Intro. to Statistical Learning 23
Credit Data: LASSO
STT592-002: Intro. to Statistical Learning 24
6.2.3 Selecting the Tuning Parameter
• We need to decide on a value for
• Select a grid of potential values, use cross validation to
estimate the error rate on test data (for each value of )
and select the value that gives the least error rate
STT592-002: Intro. to Statistical Learning 25

More Related Content

Similar to Stepwise Selection Choosing the Optimal Model .ppt

What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Big Data Project - Final version
Big Data Project - Final versionBig Data Project - Final version
Big Data Project - Final versionMihir Sanghavi
 
MSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaMSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaKartik Malla
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
variableselectionmodelBuilding.ppt
variableselectionmodelBuilding.pptvariableselectionmodelBuilding.ppt
variableselectionmodelBuilding.pptrajalakshmi5921
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Attributive Data (MSA)
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Attributive Data (MSA)Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Attributive Data (MSA)
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Attributive Data (MSA)J. García - Verdugo
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisShailendra Tomar
 
Process Capability - Cp, Cpk. Pp, Ppk
Process Capability - Cp, Cpk. Pp, Ppk Process Capability - Cp, Cpk. Pp, Ppk
Process Capability - Cp, Cpk. Pp, Ppk J. García - Verdugo
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Alex Gilgur
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Chris Ohk
 
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...WithTheBest
 
Towards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingTowards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingAkos Hajdu
 
Regression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalRegression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalJohn Michael Croft
 
Linear regression
Linear regressionLinear regression
Linear regressionansrivas21
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice ModelMichael770443
 

Similar to Stepwise Selection Choosing the Optimal Model .ppt (20)

Six sigma pedagogy
Six sigma pedagogySix sigma pedagogy
Six sigma pedagogy
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Big Data Project - Final version
Big Data Project - Final versionBig Data Project - Final version
Big Data Project - Final version
 
MSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik MallaMSc Finance_EF_0853352_Kartik Malla
MSc Finance_EF_0853352_Kartik Malla
 
Agreement analysis
Agreement analysisAgreement analysis
Agreement analysis
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
variableselectionmodelBuilding.ppt
variableselectionmodelBuilding.pptvariableselectionmodelBuilding.ppt
variableselectionmodelBuilding.ppt
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Attributive Data (MSA)
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Attributive Data (MSA)Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Attributive Data (MSA)
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Attributive Data (MSA)
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Poster
PosterPoster
Poster
 
Process Capability - Cp, Cpk. Pp, Ppk
Process Capability - Cp, Cpk. Pp, Ppk Process Capability - Cp, Cpk. Pp, Ppk
Process Capability - Cp, Cpk. Pp, Ppk
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
 
MF Presentation.pptx
MF Presentation.pptxMF Presentation.pptx
MF Presentation.pptx
 
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
 
Towards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingTowards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model Checking
 
Regression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalRegression Analysis of SAT Scores Final
Regression Analysis of SAT Scores Final
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Discrete Choice Model
Discrete Choice ModelDiscrete Choice Model
Discrete Choice Model
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 

More from neelamsanjeevkumar

HILL CLIMBING FOR ELECTRONICS AND COMMUNICATION ENG
HILL CLIMBING FOR ELECTRONICS AND COMMUNICATION ENGHILL CLIMBING FOR ELECTRONICS AND COMMUNICATION ENG
HILL CLIMBING FOR ELECTRONICS AND COMMUNICATION ENGneelamsanjeevkumar
 
simulated aneeleaning in artificial intelligence .pptx
simulated aneeleaning in artificial intelligence .pptxsimulated aneeleaning in artificial intelligence .pptx
simulated aneeleaning in artificial intelligence .pptxneelamsanjeevkumar
 
Feed forward back propogation algorithm .pptx
Feed forward back propogation algorithm .pptxFeed forward back propogation algorithm .pptx
Feed forward back propogation algorithm .pptxneelamsanjeevkumar
 
IOT Unit 3 for engineering second year .pptx
IOT Unit 3 for engineering second year .pptxIOT Unit 3 for engineering second year .pptx
IOT Unit 3 for engineering second year .pptxneelamsanjeevkumar
 
Genetic-Algorithms forv artificial .ppt
Genetic-Algorithms forv artificial  .pptGenetic-Algorithms forv artificial  .ppt
Genetic-Algorithms forv artificial .pptneelamsanjeevkumar
 
Genetic_Algorithms_genetic for_data .ppt
Genetic_Algorithms_genetic for_data .pptGenetic_Algorithms_genetic for_data .ppt
Genetic_Algorithms_genetic for_data .pptneelamsanjeevkumar
 
Genetic-Algorithms for machine learning and ai.ppt
Genetic-Algorithms for machine learning and ai.pptGenetic-Algorithms for machine learning and ai.ppt
Genetic-Algorithms for machine learning and ai.pptneelamsanjeevkumar
 
the connection of iot with lora pan which enable
the connection of iot with lora pan which enablethe connection of iot with lora pan which enable
the connection of iot with lora pan which enableneelamsanjeevkumar
 
what is lorapan ,explanation of iot module with
what is lorapan ,explanation of iot module withwhat is lorapan ,explanation of iot module with
what is lorapan ,explanation of iot module withneelamsanjeevkumar
 
What is First Order Logic in AI or FOL in AI.docx
What is First Order Logic in AI or FOL in AI.docxWhat is First Order Logic in AI or FOL in AI.docx
What is First Order Logic in AI or FOL in AI.docxneelamsanjeevkumar
 
unit2_mental objects pruning and game theory .pptx
unit2_mental objects pruning and game theory .pptxunit2_mental objects pruning and game theory .pptx
unit2_mental objects pruning and game theory .pptxneelamsanjeevkumar
 
2_RaspberryPi presentation.pptx
2_RaspberryPi presentation.pptx2_RaspberryPi presentation.pptx
2_RaspberryPi presentation.pptxneelamsanjeevkumar
 
Neural networks are parallel computing devices.docx.pdf
Neural networks are parallel computing devices.docx.pdfNeural networks are parallel computing devices.docx.pdf
Neural networks are parallel computing devices.docx.pdfneelamsanjeevkumar
 
An_Introduction_To_The_Backpropagation_A.ppt
An_Introduction_To_The_Backpropagation_A.pptAn_Introduction_To_The_Backpropagation_A.ppt
An_Introduction_To_The_Backpropagation_A.pptneelamsanjeevkumar
 

More from neelamsanjeevkumar (20)

HILL CLIMBING FOR ELECTRONICS AND COMMUNICATION ENG
HILL CLIMBING FOR ELECTRONICS AND COMMUNICATION ENGHILL CLIMBING FOR ELECTRONICS AND COMMUNICATION ENG
HILL CLIMBING FOR ELECTRONICS AND COMMUNICATION ENG
 
simulated aneeleaning in artificial intelligence .pptx
simulated aneeleaning in artificial intelligence .pptxsimulated aneeleaning in artificial intelligence .pptx
simulated aneeleaning in artificial intelligence .pptx
 
Feed forward back propogation algorithm .pptx
Feed forward back propogation algorithm .pptxFeed forward back propogation algorithm .pptx
Feed forward back propogation algorithm .pptx
 
IOT Unit 3 for engineering second year .pptx
IOT Unit 3 for engineering second year .pptxIOT Unit 3 for engineering second year .pptx
IOT Unit 3 for engineering second year .pptx
 
Genetic-Algorithms forv artificial .ppt
Genetic-Algorithms forv artificial  .pptGenetic-Algorithms forv artificial  .ppt
Genetic-Algorithms forv artificial .ppt
 
Genetic_Algorithms_genetic for_data .ppt
Genetic_Algorithms_genetic for_data .pptGenetic_Algorithms_genetic for_data .ppt
Genetic_Algorithms_genetic for_data .ppt
 
Genetic-Algorithms for machine learning and ai.ppt
Genetic-Algorithms for machine learning and ai.pptGenetic-Algorithms for machine learning and ai.ppt
Genetic-Algorithms for machine learning and ai.ppt
 
the connection of iot with lora pan which enable
the connection of iot with lora pan which enablethe connection of iot with lora pan which enable
the connection of iot with lora pan which enable
 
what is lorapan ,explanation of iot module with
what is lorapan ,explanation of iot module withwhat is lorapan ,explanation of iot module with
what is lorapan ,explanation of iot module with
 
What is First Order Logic in AI or FOL in AI.docx
What is First Order Logic in AI or FOL in AI.docxWhat is First Order Logic in AI or FOL in AI.docx
What is First Order Logic in AI or FOL in AI.docx
 
unit2_mental objects pruning and game theory .pptx
unit2_mental objects pruning and game theory .pptxunit2_mental objects pruning and game theory .pptx
unit2_mental objects pruning and game theory .pptx
 
2_RaspberryPi presentation.pptx
2_RaspberryPi presentation.pptx2_RaspberryPi presentation.pptx
2_RaspberryPi presentation.pptx
 
LINEAR REGRESSION.pptx
LINEAR REGRESSION.pptxLINEAR REGRESSION.pptx
LINEAR REGRESSION.pptx
 
Adaline and Madaline.ppt
Adaline and Madaline.pptAdaline and Madaline.ppt
Adaline and Madaline.ppt
 
Neural networks are parallel computing devices.docx.pdf
Neural networks are parallel computing devices.docx.pdfNeural networks are parallel computing devices.docx.pdf
Neural networks are parallel computing devices.docx.pdf
 
PRNN syllabus.pdf
PRNN syllabus.pdfPRNN syllabus.pdf
PRNN syllabus.pdf
 
PRNN syllabus.pdf
PRNN syllabus.pdfPRNN syllabus.pdf
PRNN syllabus.pdf
 
backprop.ppt
backprop.pptbackprop.ppt
backprop.ppt
 
feedforward-network-
feedforward-network-feedforward-network-
feedforward-network-
 
An_Introduction_To_The_Backpropagation_A.ppt
An_Introduction_To_The_Backpropagation_A.pptAn_Introduction_To_The_Backpropagation_A.ppt
An_Introduction_To_The_Backpropagation_A.ppt
 

Recently uploaded

IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 

Recently uploaded (20)

IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Stepwise Selection Choosing the Optimal Model .ppt

  • 1. LINEAR MODEL SELECTION AND REGULARIZATION Chapter 06 Disclaimer: This PPT is modified based on IOM 530: Intro. to Statistical Learning STT592-002: Intro. to Statistical Learning 1
  • 2. Outline Subset Selection Best Subset Selection Stepwise Selection Choosing the Optimal Model Shrinkage Methods Ridge Regression The Lasso STT592-002: Intro. to Statistical Learning 2
  • 3. Improving on the Least Squares Regression Estimates? • We want to improve the Linear Regression model, by replacing the least square fitting with some alternative fitting procedure, i.e., the values that minimize the mean square error (MSE) • There are 2 reasons we might not prefer to just use the ordinary least squares (OLS) estimates 1. Prediction Accuracy 2. Model Interpretability STT592-002: Intro. to Statistical Learning 3
  • 4. 1. Prediction Accuracy • The least squares estimates have relatively low bias and low variability especially when the relationship between Y and X is linear and the number of observations n is way bigger than the number of predictors p • But, when , then the least squares fit can have high variance and may result in over fitting and poor estimates on unseen observations, • And, when , then the variability of the least squares fit increases dramatically, and the variance of these estimates in infinite STT592-002: Intro. to Statistical Learning 4 n » p n< p (n>> p)
  • 5. 2. Model Interpretability • When we have a large number of variables X in the model there will generally be many that have little or no effect on Y • Leaving these variables in the model makes it harder to see the “big picture”, i.e., the effect of the “important variables” • The model would be easier to interpret by removing (i.e. setting the coefficients to zero) the unimportant variables STT592-002: Intro. to Statistical Learning 5
  • 6. Solution • Subset Selection • Identifying a subset of all p predictors X that we believe to be related to the response Y, and then fitting the model using this subset • E.g. best subset selection and stepwise selection • Shrinkage • Involves shrinking the estimates coefficients towards zero • This shrinkage reduces the variance • Some of the coefficients may shrink to exactly zero, and hence shrinkage methods can also perform variable selection • E.g. Ridge regression and the Lasso • Dimension Reduction • Involves projecting all p predictors into an M-dimensional space where M < p, and then fitting linear regression model • E.g. Principle Components Regression STT592-002: Intro. to Statistical Learning 6
  • 7. 6.1 SUBSET SELECTION STT592-002: Intro. to Statistical Learning 7
  • 8. 6.6.1 Best Subset Selection • In this approach, we run a linear regression for each possible combination of the X predictors • How do we judge which subset is the “best”? • One simple approach is to take the subset with the smallest RSS or the largest R2 • Unfortunately, one can show that the model that includes all the variables will always have the largest R2 (and smallest RSS) STT592-002: Intro. to Statistical Learning 8
  • 9. Credit Data: R2 vs. Subset Size • The RSS/R2 will always decline/increase as the number of variables increase so they are not very useful • The red line tracks the best model for a given number of predictors, according to RSS and R2 STT592-002: Intro. to Statistical Learning 9
  • 10. Other Measures of Comparison • To compare different models, we can use other approaches: • Adjusted R2 • AIC (Akaike information criterion) • BIC (Bayesian information criterion) • Cp (equivalent to AIC for linear regression) • These methods add penalty to RSS for the number of variables (i.e. complexity) in the model • None are perfect STT592-002: Intro. to Statistical Learning 10
  • 11. Credit Data: Cp, BIC, and Adjusted R2 • A small value of Cp and BIC indicates a low error, and thus a better model • A large value for the Adjusted R2 indicates a better model STT592-002: Intro. to Statistical Learning 11
  • 12. 6.1.2 Stepwise Selection • Best Subset Selection is computationally intensive especially when we have a large number of predictors (large p) • More attractive methods: • Forward Stepwise Selection: Begins with the model containing no predictor, and then adds one predictor at a time that improves the model the most until no further improvement is possible • Backward Stepwise Selection: Begins with the model containing all predictors, and then deleting one predictor at a time that improves the model the most until no further improvement is possible STT592-002: Intro. to Statistical Learning 12
  • 13. 6.2 SHRINKAGE METHODS STT592-002: Intro. to Statistical Learning 13
  • 14. 6.2.1 Ridge Regression • Ordinary Least Squares (OLS) estimates by minimizing • Ridge Regression uses a slightly different equation STT592-002: Intro. to Statistical Learning 14 b's
  • 15. Ridge Regression Adds a Penalty on ! • The effect of this equation is to add a penalty of the form Where the tuning parameter is a positive value. • This has the effect of “shrinking” large values of towards zero. • It turns out that such a constraint should improve the fit, because shrinking the coefficients can significantly reduce their variance • Notice that when = 0, we get the OLS! STT592-002: Intro. to Statistical Learning 15 b's b's
  • 16. Credit Data: Ridge Regression • As increases, the standardized coefficients shrinks towards zero. STT592-002: Intro. to Statistical Learning 16
  • 17. Why can shrinking towards zero be a good thing to do? • It turns out that the OLS estimates generally have low bias but can be highly variable. In particular when n and p are of similar size or when n < p, then the OLS estimates will be extremely variable • The penalty term makes the ridge regression estimates biased but can also substantially reduce variance • Thus, there is a bias/ variance trade-off STT592-002: Intro. to Statistical Learning 17
  • 18. Ridge Regression Bias/ Variance • Black: Bias • Green: Variance • Purple: MSE • Increase increases bias but decreases variance STT592-002: Intro. to Statistical Learning 18
  • 19. Bias/ Variance Trade-off • In general, the ridge regression estimates will be more biased than the OLS ones but have lower variance • Ridge regression will work best in situations where the OLS estimates have high variance STT592-002: Intro. to Statistical Learning 19
  • 20. Computational Advantages of Ridge Regression • If p is large, then using the best subset selection approach requires searching through enormous numbers of possible models • With Ridge Regression, for any given , we only need to fit one model and the computations turn out to be very simple • Ridge Regression can even be used when p > n, a situation where OLS fails completely! STT592-002: Intro. to Statistical Learning 20
  • 21. 6.2.2. The LASSO • Ridge Regression isn’t perfect • One significant problem is that the penalty term will never force any of the coefficients to be exactly zero. Thus, the final model will include all variables, which makes it harder to interpret • A more modern alternative is the LASSO • The LASSO works in a similar way to Ridge Regression, except it uses a different penalty term STT592-002: Intro. to Statistical Learning 21
  • 22. LASSO’s Penalty Term • Ridge Regression minimizes • The LASSO estimates the by minimizing the STT592-002: Intro. to Statistical Learning 22 b's
  • 23. What’s the Big Deal? • This seems like a very similar idea but there is a big difference • Using this penalty, it could be proven mathematically that some coefficients end up being set to exactly zero • With LASSO, we can produce a model that has high predictive power and it is simple to interpret STT592-002: Intro. to Statistical Learning 23
  • 24. Credit Data: LASSO STT592-002: Intro. to Statistical Learning 24
  • 25. 6.2.3 Selecting the Tuning Parameter • We need to decide on a value for • Select a grid of potential values, use cross validation to estimate the error rate on test data (for each value of ) and select the value that gives the least error rate STT592-002: Intro. to Statistical Learning 25