SlideShare a Scribd company logo
1 of 8
Download to read offline
Lecture:
Seven sins of regression analysis
Based on book by Naked Statistics by Charles Wheelan
Chaudhary Awais Salman
Doctoral Researcher in Future Energy
Course instructor
School of Business, Society and Engineering
Fuuture Energy – Centre of Excellence
Email: awaissalman@hotmail.com
1. Using regression to analyze a nonlinear relationship
● “Do Not Use When There Is Not a Linear Association between the Variables That You Are
Analyzing. “
● “Remember, a regression coefficient describes the slope of the “line of best fit” for the data; a
line that is not straight will have a different slope in different places.”
2
● Consider the following hypothetical relationship
between the number of golf lessons that a person
take during a month.
● The first few golf lessons appear to bring score
down rapidly which is good.
● But then when they reach the point where the
spending of $200 and $300 a month on lessons,
then the lessons do not seem to have much effect
at all.
The most important point here is that we cannot accurately summarize the
relationship between lessons and scores with a single coefficient.
2. Correlation does not equal causation.
● We cannot prove with statistics alone that a change in one variable is causing a change in the
other.
● A sloppy regression equation can produce a large and statistically significant association between
two variables that have nothing to do with one another.
3
● The graph shows perfect correlation
among US spending on science and
suicides by hanging. The R-square equals
0.99.
● But this does not make any sense.
● Regression analysis can only
demonstrate an association between two
variables.
● It is scientist or analysts duty to interpret
the results according to their expertise
3. Reverse causality.
● Lets take the golf score predicted from money spending on golf lessons example from slide 2.
● There the variable for golf lessons is consistently associated with worse scores.
● The more lessons person take, the worse they shoot!
● One explanation can be that they have a really, really bad golf instructor.
● A more plausible explanation could be that they tend to take more lessons when they were
playing poorly; bad golf is causing more lessons, not the other way around.
● (There are some simple methodological fixes to a problem of this nature. For example, I might
include golf lessons in one month as an explanatory variable for golf scores in the next month .)
4
A statistical association between A and B
does not prove that A causes B. In fact, it’s
entirely plausible that B is causing A.
4. Omitted variable bias.
● Regression results will be misleading and inaccurate if the regression equation leaves out an
important explanatory variable, particularly if other variables in the equation “pick up” that effect.
● You see a headline, “Golfers More Prone to Heart Disease, Cancer, and Arthritis!”
● But people play more golf when they get older, particularly in retirement.
● Any analysis that leaves out age as an explanatory variable is going to miss the fact that golfers, on
average, will be older than nongolfers.
● Golf isn’t killing people; old age is killing people.
● And they happen to enjoy playing golf while it does.
● When age is inserted into the regression analysis as a control variable, we will get a different
outcome.
5
5. Highly correlated explanatory variables (multicollinearity).
If a regression equation includes two or more explanatory variables that are highly correlated with each
other, the analysis will not necessarily be able to discern the true relationship between each of those
variables
6
● Assume we are trying to gauge the effect of illegal drug use on SAT scores. Specifically, we
have data on whether the participants in our study have ever used cocaine and also on whether
they have ever used heroin.
● The methodological challenge is that people who have used heroin have likely also used
cocaine.
● If we put both variables in regression equation, we will have very few individuals who have
used one drug but not the other, which leaves us very little variation in the data with which to
calculate their independent effects.
Similarly, the correlation between a husband’s and a wife’s educational
attainments is so high that we cannot depend on regression analysis to give
us coefficients that meaningfully isolate the effect of either parent’s
education on the SATs score of their child
6. Extrapolating beyond the data .
● Regression analysis, like all forms of statistical inference, is designed to offer us insights
into the world around us. We seek patterns that will hold true for the larger population.
● However, our results are valid only for a population that is similar to the sample on which
the analysis has been done .
● In the last chapter, I created a regression equation to predict ice cream sales based on a
outside temperature.
7
y = 12,728x - 442,39
R² = 0,8774
0
100
200
300
400
500
600
700
800
70 72 74 76 78 80 82 84 86 88 90
Icecreamsales,USD
Temperature (deg F)
Linear regression
● What will be the sales at 120 deg F or higher(48 deg C)
● Sales would be zero (because no one will be outside
their homes to buy ice creams at this extreme
temperatures)
● Fun fact: Swedish people like ice cream in winter
● So as a researcher you have the responsibility to see the
limitations of data boundary in your analysis
https://www.thelocal.se/20160406/nine-bizarre-swedish-eating-habits-that-will-confuse-foreigners
7. Data mining (too many variables).
● If omitting important variables is a potential problem, then presumably adding as many
explanatory variables as possible to a regression equation must be the solution. Nope.
● Your results can be compromised if you include too many variables, particularly extraneous
explanatory variables with no theoretical justification.
● For example, one should not design a research strategy built around the following premise: Since
we don’t know what causes autism, we should put as many potential explanatory variables as
possible in the regression equation just to see what might turn up as statistically significant; then
maybe we’ll get some answers.
● If you put enough junk variables in a regression equation, one of them is bound to meet the is
that junk variables are not always easily recognized as such.
● Clever researchers can always build a theory after the fact for why some curious variable that is
really just nonsense turns up as statistically significant.
8

More Related Content

Similar to Seven sins of regression analysis

statistical measurement project presentation
statistical measurement project presentationstatistical measurement project presentation
statistical measurement project presentationKexinZhang22
 
Descriptive resarch ppt
Descriptive resarch pptDescriptive resarch ppt
Descriptive resarch pptShikami Agnes
 
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxjasoninnes20
 
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxcurwenmichaela
 
Interpretation of Data and Statistical Fallacies
Interpretation of Data and Statistical FallaciesInterpretation of Data and Statistical Fallacies
Interpretation of Data and Statistical FallaciesRHIMRJ Journal
 
Describing Relationship between Variables
Describing Relationship between VariablesDescribing Relationship between Variables
Describing Relationship between VariablesMaribelMadarimot1
 
Rethinking the Teaching and Definition of the Practical Significance of Quant...
Rethinking the Teaching and Definition of the Practical Significance of Quant...Rethinking the Teaching and Definition of the Practical Significance of Quant...
Rethinking the Teaching and Definition of the Practical Significance of Quant...CPEDInitiative
 
You clearly understand the concepts of this assignment. You’ve don.docx
You clearly understand the concepts of this assignment. You’ve don.docxYou clearly understand the concepts of this assignment. You’ve don.docx
You clearly understand the concepts of this assignment. You’ve don.docxjeffevans62972
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipRithish Kumar
 
Correlation coefficient and it’s use in day to.pptx
Correlation coefficient and it’s use in day to.pptxCorrelation coefficient and it’s use in day to.pptx
Correlation coefficient and it’s use in day to.pptxPrathamSiddannavar
 
Learning Outcomes1. Describe correlations and regression a.docx
Learning Outcomes1. Describe correlations and regression a.docxLearning Outcomes1. Describe correlations and regression a.docx
Learning Outcomes1. Describe correlations and regression a.docxSHIVA101531
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic RegressionTaweh Beysolow II
 
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingOffice for National Statistics
 
Multiple regression_statistics lesson.pptx
Multiple regression_statistics lesson.pptxMultiple regression_statistics lesson.pptx
Multiple regression_statistics lesson.pptxlinmaetonares2
 
Correlational research
Correlational researchCorrelational research
Correlational researchAiden Yeh
 

Similar to Seven sins of regression analysis (20)

statistical measurement project presentation
statistical measurement project presentationstatistical measurement project presentation
statistical measurement project presentation
 
5090 chapter8
5090 chapter85090 chapter8
5090 chapter8
 
Descriptive resarch ppt
Descriptive resarch pptDescriptive resarch ppt
Descriptive resarch ppt
 
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
 
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
 
Interpretation of Data and Statistical Fallacies
Interpretation of Data and Statistical FallaciesInterpretation of Data and Statistical Fallacies
Interpretation of Data and Statistical Fallacies
 
Correlational Study
Correlational StudyCorrelational Study
Correlational Study
 
Describing Relationship between Variables
Describing Relationship between VariablesDescribing Relationship between Variables
Describing Relationship between Variables
 
Rethinking the Teaching and Definition of the Practical Significance of Quant...
Rethinking the Teaching and Definition of the Practical Significance of Quant...Rethinking the Teaching and Definition of the Practical Significance of Quant...
Rethinking the Teaching and Definition of the Practical Significance of Quant...
 
You clearly understand the concepts of this assignment. You’ve don.docx
You clearly understand the concepts of this assignment. You’ve don.docxYou clearly understand the concepts of this assignment. You’ve don.docx
You clearly understand the concepts of this assignment. You’ve don.docx
 
Yg2298
Yg2298Yg2298
Yg2298
 
Lecture 06
Lecture 06Lecture 06
Lecture 06
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationship
 
Correlation coefficient and it’s use in day to.pptx
Correlation coefficient and it’s use in day to.pptxCorrelation coefficient and it’s use in day to.pptx
Correlation coefficient and it’s use in day to.pptx
 
Learning Outcomes1. Describe correlations and regression a.docx
Learning Outcomes1. Describe correlations and regression a.docxLearning Outcomes1. Describe correlations and regression a.docx
Learning Outcomes1. Describe correlations and regression a.docx
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic Regression
 
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
 
Multiple regression_statistics lesson.pptx
Multiple regression_statistics lesson.pptxMultiple regression_statistics lesson.pptx
Multiple regression_statistics lesson.pptx
 
Teaching students how to critically appraise organizational data
Teaching students how to critically appraise organizational dataTeaching students how to critically appraise organizational data
Teaching students how to critically appraise organizational data
 
Correlational research
Correlational researchCorrelational research
Correlational research
 

Recently uploaded

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Recently uploaded (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Seven sins of regression analysis

  • 1. Lecture: Seven sins of regression analysis Based on book by Naked Statistics by Charles Wheelan Chaudhary Awais Salman Doctoral Researcher in Future Energy Course instructor School of Business, Society and Engineering Fuuture Energy – Centre of Excellence Email: awaissalman@hotmail.com
  • 2. 1. Using regression to analyze a nonlinear relationship ● “Do Not Use When There Is Not a Linear Association between the Variables That You Are Analyzing. “ ● “Remember, a regression coefficient describes the slope of the “line of best fit” for the data; a line that is not straight will have a different slope in different places.” 2 ● Consider the following hypothetical relationship between the number of golf lessons that a person take during a month. ● The first few golf lessons appear to bring score down rapidly which is good. ● But then when they reach the point where the spending of $200 and $300 a month on lessons, then the lessons do not seem to have much effect at all. The most important point here is that we cannot accurately summarize the relationship between lessons and scores with a single coefficient.
  • 3. 2. Correlation does not equal causation. ● We cannot prove with statistics alone that a change in one variable is causing a change in the other. ● A sloppy regression equation can produce a large and statistically significant association between two variables that have nothing to do with one another. 3 ● The graph shows perfect correlation among US spending on science and suicides by hanging. The R-square equals 0.99. ● But this does not make any sense. ● Regression analysis can only demonstrate an association between two variables. ● It is scientist or analysts duty to interpret the results according to their expertise
  • 4. 3. Reverse causality. ● Lets take the golf score predicted from money spending on golf lessons example from slide 2. ● There the variable for golf lessons is consistently associated with worse scores. ● The more lessons person take, the worse they shoot! ● One explanation can be that they have a really, really bad golf instructor. ● A more plausible explanation could be that they tend to take more lessons when they were playing poorly; bad golf is causing more lessons, not the other way around. ● (There are some simple methodological fixes to a problem of this nature. For example, I might include golf lessons in one month as an explanatory variable for golf scores in the next month .) 4 A statistical association between A and B does not prove that A causes B. In fact, it’s entirely plausible that B is causing A.
  • 5. 4. Omitted variable bias. ● Regression results will be misleading and inaccurate if the regression equation leaves out an important explanatory variable, particularly if other variables in the equation “pick up” that effect. ● You see a headline, “Golfers More Prone to Heart Disease, Cancer, and Arthritis!” ● But people play more golf when they get older, particularly in retirement. ● Any analysis that leaves out age as an explanatory variable is going to miss the fact that golfers, on average, will be older than nongolfers. ● Golf isn’t killing people; old age is killing people. ● And they happen to enjoy playing golf while it does. ● When age is inserted into the regression analysis as a control variable, we will get a different outcome. 5
  • 6. 5. Highly correlated explanatory variables (multicollinearity). If a regression equation includes two or more explanatory variables that are highly correlated with each other, the analysis will not necessarily be able to discern the true relationship between each of those variables 6 ● Assume we are trying to gauge the effect of illegal drug use on SAT scores. Specifically, we have data on whether the participants in our study have ever used cocaine and also on whether they have ever used heroin. ● The methodological challenge is that people who have used heroin have likely also used cocaine. ● If we put both variables in regression equation, we will have very few individuals who have used one drug but not the other, which leaves us very little variation in the data with which to calculate their independent effects. Similarly, the correlation between a husband’s and a wife’s educational attainments is so high that we cannot depend on regression analysis to give us coefficients that meaningfully isolate the effect of either parent’s education on the SATs score of their child
  • 7. 6. Extrapolating beyond the data . ● Regression analysis, like all forms of statistical inference, is designed to offer us insights into the world around us. We seek patterns that will hold true for the larger population. ● However, our results are valid only for a population that is similar to the sample on which the analysis has been done . ● In the last chapter, I created a regression equation to predict ice cream sales based on a outside temperature. 7 y = 12,728x - 442,39 R² = 0,8774 0 100 200 300 400 500 600 700 800 70 72 74 76 78 80 82 84 86 88 90 Icecreamsales,USD Temperature (deg F) Linear regression ● What will be the sales at 120 deg F or higher(48 deg C) ● Sales would be zero (because no one will be outside their homes to buy ice creams at this extreme temperatures) ● Fun fact: Swedish people like ice cream in winter ● So as a researcher you have the responsibility to see the limitations of data boundary in your analysis https://www.thelocal.se/20160406/nine-bizarre-swedish-eating-habits-that-will-confuse-foreigners
  • 8. 7. Data mining (too many variables). ● If omitting important variables is a potential problem, then presumably adding as many explanatory variables as possible to a regression equation must be the solution. Nope. ● Your results can be compromised if you include too many variables, particularly extraneous explanatory variables with no theoretical justification. ● For example, one should not design a research strategy built around the following premise: Since we don’t know what causes autism, we should put as many potential explanatory variables as possible in the regression equation just to see what might turn up as statistically significant; then maybe we’ll get some answers. ● If you put enough junk variables in a regression equation, one of them is bound to meet the is that junk variables are not always easily recognized as such. ● Clever researchers can always build a theory after the fact for why some curious variable that is really just nonsense turns up as statistically significant. 8