SlideShare a Scribd company logo
Problem set 3
Jonathan Zimmermann
31 October 2015
Exercise 1
Suppose you collect data from a survey on wages, education, experience, and gender. In
addition, you ask for information about marijuana usage. The original question is: “On how
many separate occasions last month did you smoke marijuana?"
a) Write an equation that would allow you to estimate the effects of marijuana usage on wage, while
controlling for other factors. You should be able to make statements such as,“Smoking marijuana five
more times per month is estimated to change wage by x%."
>>
To be able to interpret the variables in that way, we need to build a log-linear model. The regression equation
would look like that:
log(wage) = β0 + β1marijuna_usage + β2education + β3experience + δ1gender + u
b) Write a model that would allow you to test whether drug usage has different effects on wages for men
and women. How would you test that there are no differences in the effects of drug usage for men and
women?
>>
We would need to add an interaction variable between the gender and the marijuana variables. The new
regression equation would look like that:
log(wage) = β0 + β1marijuna_usage + β2education + β3experience + δ1gender + δ2gender ∗ marijuna_usage + u
To test whether there are differences in the effects of drug usage for men and women, we could test the
following hypothesis with a t-test:
H0 : δ2 = 0H1 : δ2 = 0
To perform the t-test, we would first need to calculate the t-statistic with the following formula:
t =
gender ∗ marijuna − 0
s/
√
n
We would then look for the critical value based on the (1 − α/2) percentile in the t distribution with n-1
degrees of freedom. If the absolute value of the t-statistic is greater than the critical value, we would then
reject H0.
c) Suppose you think it is better to measure marijuana usage by putting people into one of four categories:
nonuser, light user (1 to 5 times per month), moderate user (6 to 10 times per month), and heavy
user (more than 10 times per month). Now, write a model that allows you to estimate the effects of
marijuana usage on wage.
1
>>
Incorporating this change into the model in a), we would have:
log(wage) = β0 + β2education + β3experience + δ1gender + δ2light_user + δ3moderate_user + δ4heavy_user + u
It is now easy to estimate each of the coefficients by running the regression normally.
d) Using the model in part (c), explain in detail how to test the null hypothesis that marijuana usage has
no effect on wage.
>>
We would need to test the following hypothesis (i.e. we want to test whether delta2, delta3 and delta4 are
together jointly significant), using a F-test:
H0 : δ2 = 0 AND δ3 = 0 AND δ4 = 0H1 : H0 is false
Let’s call the model in c) the “unrestricted model”. The “restricted model” would then be be:
log(wage) = β0 + β2education + β3experience + δ1gender + u
We then calculate the F-statistic, using the following formula:
SSRrestricted − SSRunrestricted/q
SSRunrestricted/(n − k − 1)
Where q = number of restrictions = 3 (because we test three parameters), k = number of variables in the
unrestricted model = 6
We would then reject H0 if the F-statistic is higher than the critical value (based on the Fisher distribution
at d1=q, d2=n-k-1).
e) What are some potential problems with drawing causal inference using the survey data that you
collected?
>>
The survey data might have multiple problems that would make it non representative of the population. One
of the biggest issues is self-selection and social desirability bias. In the case of this study, we could expect
for example individuals to voluntarily (or unconsciously) report lower values than their actual marijuna
consumption, by fear of looking like an addict/junkie (social desirability). Other issues might be linked to
the way the data has been collected. For example, if the survey has been conducted in a particular area or at
a particular time of the day, the respondants might not be a truly random sample of the population; this
will be the case for example if the survey is conducted by phone during the day, at times when the active
population is at work (which would result in a overrepresentation of unemployed people, housewives, retired
people, etc.). There are of course many other response biases that could make the data inaccurate, such as
the acquiescence bias.
2
Exercise 2
** Use the data in nbasal.RData for this exercise. **
a) Estimate a linear regression model relating points per game to experience in the league and position
(guard, forward, or center). Include experience in quadratic form and use centers as the base group.
Report the results (including SRF, the sample size, and R-squared).
>>
load("nbasal.RData")
The regression model is:
points = β0 + β1exper + β2expersq + δ1guard + δ2forward + u
The SRF is:
a = lm(points~exper+expersq+guard+forward,data)
a
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward, data = data)
##
## Coefficients:
## (Intercept) exper expersq guard forward
## 4.76076 1.28067 -0.07184 2.31469 1.54457
summary(a)
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.220 -4.268 -1.003 3.444 22.265
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.76076 1.17862 4.039 7.03e-05 ***
## exper 1.28067 0.32853 3.898 0.000123 ***
## expersq -0.07184 0.02407 -2.985 0.003106 **
## guard 2.31469 1.00036 2.314 0.021444 *
## forward 1.54457 1.00226 1.541 0.124492
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.668 on 264 degrees of freedom
## Multiple R-squared: 0.09098, Adjusted R-squared: 0.07721
## F-statistic: 6.606 on 4 and 264 DF, p-value: 4.426e-05
3
Regression results:
points = 4.76076
(1.17862)
+ 1.28067
(0.32853)
exper − 0.07184
(0.02407)
expersq + 2.31469
(1.00036)
guard + 1.54457
(1.00226)
forward
The sample size is:
summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have
## [1] 269
The r-squared is:
summary(a)$r.squared
## [1] 0.09097856
b) Holding experience fixed, does a guard score more than a center? How much more? Is the difference
statistically significant?
>>
Yes, a guard seems to score more than a center. When we control for experience and experienceˆ2, a guard
seems to score on average 2.31469 (δ1) more points.
If we want to know whether it has a statistically significant positive effect, we can test the following hypothesis:
H0 : δ1 = 0H1 : δ1 > 0
The one-sided p-value of δ1 is 0.010722 (two-sided p-value divided by two), so it is statistically significant at
the 1.0722048% significance level.
c) Now, add marital status to the equation. Holding position and experience fixed, are married players
more productive (based on points per game)?
>>
The new regression model is:
points = β0 + β1exper + β2expersq + δ1guard + δ2forward + δ3marr + u
The SRF is:
a = lm(points~exper+expersq+guard+forward+marr,data)
a
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr,
## data = data)
##
## Coefficients:
## (Intercept) exper expersq guard forward
## 4.70294 1.23326 -0.07037 2.28632 1.54091
## marr
## 0.58427
4
summary(a)
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.874 -4.227 -1.251 3.631 22.412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.70294 1.18174 3.980 8.93e-05 ***
## exper 1.23326 0.33421 3.690 0.000273 ***
## expersq -0.07037 0.02416 -2.913 0.003892 **
## guard 2.28632 1.00172 2.282 0.023265 *
## forward 1.54091 1.00298 1.536 0.125660
## marr 0.58427 0.74040 0.789 0.430751
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.672 on 263 degrees of freedom
## Multiple R-squared: 0.09313, Adjusted R-squared: 0.07588
## F-statistic: 5.401 on 5 and 263 DF, p-value: 9.526e-05
Regression results:
points = 4.70294
(1.18174)
+ 1.23326
(0.33421)
exper − 0.07037
(0.02416)
expersq + 2.28632
(1.00172)
guard + 1.54091
(1.00298)
forward + 0.5842
(0.74040)
marr
The sample size is still:
summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have
## [1] 269
The r-squared is:
summary(a)$r.squared
## [1] 0.09312579
Yes, married players seem to be more productive than non-married players. When we control for experience,
experienceˆ2 and position, a guard seems to score on average 0.58427 (δ3) more points. However, if might
not be statistically significant.
If we want to know whether it has a statistically significant positive effect, we need to test the following
hypothesis:
H0 : δ3 = 0H1 : δ3 > 0
The one-sided p-value of δ3 is 0.2153757 (two-sided p-value divided by two), so it is statistically significant
at the 21.5375685% significance level. So for most practical purposes, we cannot consider it as statistically
significant.
5
d) Add interactions of marital status with both experience variables. In this expanded model, is there
strong evidence that marital status affects points per game?
>>
The new regression model is:
points = β0+β1exper+β2expersq+δ1guard+δ2forward+δ3marr+δ4marr∗experience+δ5marr∗expersq+u
The SRF is:
a = lm(points~exper+expersq+guard+forward+marr+marr*exper+marr*expersq,data)
a
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr +
## marr * exper + marr * expersq, data = data)
##
## Coefficients:
## (Intercept) exper expersq guard forward
## 5.81615 0.70255 -0.02950 2.25079 1.62915
## marr exper:marr expersq:marr
## -2.53750 1.27965 -0.09359
summary(a)
##
## Call:
## lm(formula = points ~ exper + expersq + guard + forward + marr +
## marr * exper + marr * expersq, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.239 -4.328 -1.067 3.742 22.197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.81615 1.34878 4.312 2.29e-05 ***
## exper 0.70255 0.43405 1.619 0.1067
## expersq -0.02950 0.03267 -0.903 0.3674
## guard 2.25079 1.00002 2.251 0.0252 *
## forward 1.62915 1.00199 1.626 0.1052
## marr -2.53750 2.03822 -1.245 0.2143
## exper:marr 1.27965 0.68229 1.876 0.0618 .
## expersq:marr -0.09359 0.04887 -1.915 0.0566 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.654 on 261 degrees of freedom
## Multiple R-squared: 0.1058, Adjusted R-squared: 0.08184
## F-statistic: 4.413 on 7 and 261 DF, p-value: 0.0001188
6
Regression results:
points = 5.81615
(1.34878)
+0.70255
(0.43405)
exper−0.02950
(0.03267)
expersq+2.25079
(1.00002)
guard+1.62915
(1.00199)
forward−2.53750
(2.03822)
marr+1.27965
(0.68229)
exper∗marrm−0.
(0.
The sample size is still:
summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have
## [1] 269
The r-squared is:
summary(a)$r.squared
## [1] 0.1058214
This time, we want to perform a two-sided test (because we are interested in whether there is an effect in
either direction), on three different coefficients at the same time. Therefore, this is a joint hypothesis testing:
we want to know if, together, all the coefficients that include the marrital status have an effect on the points:
H0 : δ3 = 0ANDδ4 = 0ANDδ5 = 0H1 : H0isfalse
The two-sided p-value of δ3 is 0.2142624, so it is statistically significant at the 21.4262432% significance level.
So no, for most practical purposes, we cannot really say there is strong evidence that marital status affects
points per game.
e) Estimate the model from part (c) but use assists per game as the dependent variable. Are there any
notable differences from part (c)? Discuss.
>>
The new regression model is:
assists = β0 + β1exper + β2expersq + δ1guard + δ2forward + δ3marr + u
The SRF is:
a = lm(assists~exper+expersq+guard+forward+marr,data)
a
##
## Call:
## lm(formula = assists ~ exper + expersq + guard + forward + marr,
## data = data)
##
## Coefficients:
## (Intercept) exper expersq guard forward
## -0.22581 0.44360 -0.02673 2.49167 0.44747
## marr
## 0.32190
7
summary(a)
##
## Call:
## lm(formula = assists ~ exper + expersq + guard + forward + marr,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3127 -1.0780 -0.3157 0.6788 8.2488
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.225809 0.354904 -0.636 0.52516
## exper 0.443603 0.100372 4.420 1.45e-05 ***
## expersq -0.026726 0.007256 -3.683 0.00028 ***
## guard 2.491672 0.300842 8.282 6.19e-15 ***
## forward 0.447471 0.301220 1.486 0.13860
## marr 0.321899 0.222359 1.448 0.14891
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.704 on 263 degrees of freedom
## Multiple R-squared: 0.3499, Adjusted R-squared: 0.3375
## F-statistic: 28.31 on 5 and 263 DF, p-value: < 2.2e-16
Regression results:
assists = −0.225809
(0.354904)
+0.443603
(0.100372)
exper−0.026726
(0.007256)
expersq+2.491672
(0.300842)
guard+0.447471
(0.301220)
forward+0.321899
(0.222359)
marr
The sample size is still:
summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have
## [1] 269
The r-squared is:
summary(a)$r.squared
## [1] 0.3498759
As we can see, there are some differences compared to c), but nothing major. Except for the intercept,
which changed sign, the direction of all the effects is the same. The intercept, which was highly statistically
significant in c), is no longer statistically significant and the variable “guard” is now much more significant
than in c). All the variables changed in magnitude in sometimes significative ways. Most of these differences
in magnitude is explained by the different scales of “assists” and “points”:
8
mean(data$assists)
## [1] 2.408922
mean(data$points)
## [1] 10.21041
9

More Related Content

What's hot

Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
Pabitra Mishra
 
Quantitative Analysis For Management 11th Edition Render Solutions Manual
Quantitative Analysis For Management 11th Edition Render Solutions ManualQuantitative Analysis For Management 11th Edition Render Solutions Manual
Quantitative Analysis For Management 11th Edition Render Solutions Manual
Shermanne
 
Multicollinearity
MulticollinearityMulticollinearity
Multicollinearity
Bernard Asia
 
Nash equilibrium and applications
Nash equilibrium and applicationsNash equilibrium and applications
Nash equilibrium and applications
Alyaa Muhi
 
Chap14 multiple regression model building
Chap14 multiple regression model buildingChap14 multiple regression model building
Chap14 multiple regression model building
Uni Azza Aunillah
 
Chap09 2 sample test
Chap09 2 sample testChap09 2 sample test
Chap09 2 sample test
Uni Azza Aunillah
 
Indirect Taxes & Subsidies
Indirect Taxes & SubsidiesIndirect Taxes & Subsidies
Indirect Taxes & Subsidiesmattbentley34
 
Dummy variable
Dummy variableDummy variable
Dummy variableAkram Ali
 
Simulation methods finance_2
Simulation methods finance_2Simulation methods finance_2
Simulation methods finance_2
Giovanni Della Lunga
 
Chap12 multiple regression
Chap12 multiple regressionChap12 multiple regression
Chap12 multiple regression
Judianto Nugroho
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
GunjanKhandelwal13
 
Econometrics ch13
Econometrics ch13Econometrics ch13
Econometrics ch13
Baterdene Batchuluun
 
Linear programming problem
Linear programming problemLinear programming problem
Linear programming problem
makeitsimple007
 
Sensitivity analysis linear programming copy
Sensitivity analysis linear programming   copySensitivity analysis linear programming   copy
Sensitivity analysis linear programming copy
Kiran Jadhav
 
Probability Density Function (PDF)
Probability Density Function (PDF)Probability Density Function (PDF)
Probability Density Function (PDF)
AakankshaR
 
Duality in Linear Programming
Duality in Linear ProgrammingDuality in Linear Programming
Duality in Linear Programming
jyothimonc
 
Chapter7
Chapter7Chapter7
Chapter 7 cost of production
Chapter 7 cost of productionChapter 7 cost of production
Chapter 7 cost of production
Yesica Adicondro
 

What's hot (20)

Autocorrelation
AutocorrelationAutocorrelation
Autocorrelation
 
Quantitative Analysis For Management 11th Edition Render Solutions Manual
Quantitative Analysis For Management 11th Edition Render Solutions ManualQuantitative Analysis For Management 11th Edition Render Solutions Manual
Quantitative Analysis For Management 11th Edition Render Solutions Manual
 
Multicollinearity
MulticollinearityMulticollinearity
Multicollinearity
 
Nash equilibrium and applications
Nash equilibrium and applicationsNash equilibrium and applications
Nash equilibrium and applications
 
Chap14 multiple regression model building
Chap14 multiple regression model buildingChap14 multiple regression model building
Chap14 multiple regression model building
 
Chap09 2 sample test
Chap09 2 sample testChap09 2 sample test
Chap09 2 sample test
 
Correlation and Simple Regression
Correlation  and Simple RegressionCorrelation  and Simple Regression
Correlation and Simple Regression
 
Indirect Taxes & Subsidies
Indirect Taxes & SubsidiesIndirect Taxes & Subsidies
Indirect Taxes & Subsidies
 
Dummy variable
Dummy variableDummy variable
Dummy variable
 
Simulation methods finance_2
Simulation methods finance_2Simulation methods finance_2
Simulation methods finance_2
 
Chap12 multiple regression
Chap12 multiple regressionChap12 multiple regression
Chap12 multiple regression
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Econometrics ch13
Econometrics ch13Econometrics ch13
Econometrics ch13
 
Linear programming problem
Linear programming problemLinear programming problem
Linear programming problem
 
Sensitivity analysis linear programming copy
Sensitivity analysis linear programming   copySensitivity analysis linear programming   copy
Sensitivity analysis linear programming copy
 
Probability Density Function (PDF)
Probability Density Function (PDF)Probability Density Function (PDF)
Probability Density Function (PDF)
 
Duality in Linear Programming
Duality in Linear ProgrammingDuality in Linear Programming
Duality in Linear Programming
 
Chapter7
Chapter7Chapter7
Chapter7
 
Chapter 7 cost of production
Chapter 7 cost of productionChapter 7 cost of production
Chapter 7 cost of production
 
Chapter06
Chapter06Chapter06
Chapter06
 

Similar to Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperial College London

1624.pptx
1624.pptx1624.pptx
1624.pptx
Jyoti863900
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Daniel Katz
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression
Ashwini Mathur
 
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedBasic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
bob panic
 
A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docx
evonnehoggarth79783
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysismetalkid132
 
Math 300 MM Project
Math 300 MM ProjectMath 300 MM Project
Math 300 MM Project
Amber Rodriguez
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
anhlodge
 
Probability unit2.pptx
Probability unit2.pptxProbability unit2.pptx
Probability unit2.pptx
SNIGDHABADIDA2127755
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regressionjamuga gitulho
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...
Salehkhanovic
 
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docxInstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
dirkrplav
 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched Pairs
Long Beach City College
 
1. A law firm wants to determine the trend in its annual billings .docx
1. A law firm wants to determine the trend in its annual billings .docx1. A law firm wants to determine the trend in its annual billings .docx
1. A law firm wants to determine the trend in its annual billings .docx
monicafrancis71118
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
pujashri1975
 
3 es timation-of_parameters[1]
3 es timation-of_parameters[1]3 es timation-of_parameters[1]
3 es timation-of_parameters[1]
Fernando Jose Damayo
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance OptimizationAlbert Chu
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
boyfieldhouse
 

Similar to Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperial College London (20)

1624.pptx
1624.pptx1624.pptx
1624.pptx
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression
 
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedBasic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
 
A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docx
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysis
 
2 simple regression
2   simple regression2   simple regression
2 simple regression
 
One way anova
One way anovaOne way anova
One way anova
 
Math 300 MM Project
Math 300 MM ProjectMath 300 MM Project
Math 300 MM Project
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
 
Probability unit2.pptx
Probability unit2.pptxProbability unit2.pptx
Probability unit2.pptx
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...
 
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docxInstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
InstructionDue Date 6 pm on October 28 (Wed)Part IProbability a.docx
 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched Pairs
 
1. A law firm wants to determine the trend in its annual billings .docx
1. A law firm wants to determine the trend in its annual billings .docx1. A law firm wants to determine the trend in its annual billings .docx
1. A law firm wants to determine the trend in its annual billings .docx
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
3 es timation-of_parameters[1]
3 es timation-of_parameters[1]3 es timation-of_parameters[1]
3 es timation-of_parameters[1]
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance Optimization
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 

More from Jonathan Zimmermann

Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
Jonathan Zimmermann
 
External analysis - french coffee industry
External analysis  - french coffee industryExternal analysis  - french coffee industry
External analysis - french coffee industry
Jonathan Zimmermann
 
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Jonathan Zimmermann
 
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Jonathan Zimmermann
 
Visualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College LondonVisualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College London
Jonathan Zimmermann
 
Target Corporation - Consulting project
Target Corporation - Consulting projectTarget Corporation - Consulting project
Target Corporation - Consulting project
Jonathan Zimmermann
 
Rambus v. FTC
Rambus v. FTCRambus v. FTC
Rambus v. FTC
Jonathan Zimmermann
 

More from Jonathan Zimmermann (7)

Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
Can Vietnamese Vinfast rise to the stars? Learnings from Proton’s failure in ...
 
External analysis - french coffee industry
External analysis  - french coffee industryExternal analysis  - french coffee industry
External analysis - french coffee industry
 
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 4 - Statistics and Econometrics - Msc Business Analytics - Imperi...
 
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
Network Analytics - Homework 3 - Msc Business Analytics - Imperial College Lo...
 
Visualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College LondonVisualisation - Homework 1 - Msc Business Analytics - Imperial College London
Visualisation - Homework 1 - Msc Business Analytics - Imperial College London
 
Target Corporation - Consulting project
Target Corporation - Consulting projectTarget Corporation - Consulting project
Target Corporation - Consulting project
 
Rambus v. FTC
Rambus v. FTCRambus v. FTC
Rambus v. FTC
 

Recently uploaded

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 

Recently uploaded (20)

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 

Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperial College London

  • 1. Problem set 3 Jonathan Zimmermann 31 October 2015 Exercise 1 Suppose you collect data from a survey on wages, education, experience, and gender. In addition, you ask for information about marijuana usage. The original question is: “On how many separate occasions last month did you smoke marijuana?" a) Write an equation that would allow you to estimate the effects of marijuana usage on wage, while controlling for other factors. You should be able to make statements such as,“Smoking marijuana five more times per month is estimated to change wage by x%." >> To be able to interpret the variables in that way, we need to build a log-linear model. The regression equation would look like that: log(wage) = β0 + β1marijuna_usage + β2education + β3experience + δ1gender + u b) Write a model that would allow you to test whether drug usage has different effects on wages for men and women. How would you test that there are no differences in the effects of drug usage for men and women? >> We would need to add an interaction variable between the gender and the marijuana variables. The new regression equation would look like that: log(wage) = β0 + β1marijuna_usage + β2education + β3experience + δ1gender + δ2gender ∗ marijuna_usage + u To test whether there are differences in the effects of drug usage for men and women, we could test the following hypothesis with a t-test: H0 : δ2 = 0H1 : δ2 = 0 To perform the t-test, we would first need to calculate the t-statistic with the following formula: t = gender ∗ marijuna − 0 s/ √ n We would then look for the critical value based on the (1 − α/2) percentile in the t distribution with n-1 degrees of freedom. If the absolute value of the t-statistic is greater than the critical value, we would then reject H0. c) Suppose you think it is better to measure marijuana usage by putting people into one of four categories: nonuser, light user (1 to 5 times per month), moderate user (6 to 10 times per month), and heavy user (more than 10 times per month). Now, write a model that allows you to estimate the effects of marijuana usage on wage. 1
  • 2. >> Incorporating this change into the model in a), we would have: log(wage) = β0 + β2education + β3experience + δ1gender + δ2light_user + δ3moderate_user + δ4heavy_user + u It is now easy to estimate each of the coefficients by running the regression normally. d) Using the model in part (c), explain in detail how to test the null hypothesis that marijuana usage has no effect on wage. >> We would need to test the following hypothesis (i.e. we want to test whether delta2, delta3 and delta4 are together jointly significant), using a F-test: H0 : δ2 = 0 AND δ3 = 0 AND δ4 = 0H1 : H0 is false Let’s call the model in c) the “unrestricted model”. The “restricted model” would then be be: log(wage) = β0 + β2education + β3experience + δ1gender + u We then calculate the F-statistic, using the following formula: SSRrestricted − SSRunrestricted/q SSRunrestricted/(n − k − 1) Where q = number of restrictions = 3 (because we test three parameters), k = number of variables in the unrestricted model = 6 We would then reject H0 if the F-statistic is higher than the critical value (based on the Fisher distribution at d1=q, d2=n-k-1). e) What are some potential problems with drawing causal inference using the survey data that you collected? >> The survey data might have multiple problems that would make it non representative of the population. One of the biggest issues is self-selection and social desirability bias. In the case of this study, we could expect for example individuals to voluntarily (or unconsciously) report lower values than their actual marijuna consumption, by fear of looking like an addict/junkie (social desirability). Other issues might be linked to the way the data has been collected. For example, if the survey has been conducted in a particular area or at a particular time of the day, the respondants might not be a truly random sample of the population; this will be the case for example if the survey is conducted by phone during the day, at times when the active population is at work (which would result in a overrepresentation of unemployed people, housewives, retired people, etc.). There are of course many other response biases that could make the data inaccurate, such as the acquiescence bias. 2
  • 3. Exercise 2 ** Use the data in nbasal.RData for this exercise. ** a) Estimate a linear regression model relating points per game to experience in the league and position (guard, forward, or center). Include experience in quadratic form and use centers as the base group. Report the results (including SRF, the sample size, and R-squared). >> load("nbasal.RData") The regression model is: points = β0 + β1exper + β2expersq + δ1guard + δ2forward + u The SRF is: a = lm(points~exper+expersq+guard+forward,data) a ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward, data = data) ## ## Coefficients: ## (Intercept) exper expersq guard forward ## 4.76076 1.28067 -0.07184 2.31469 1.54457 summary(a) ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -11.220 -4.268 -1.003 3.444 22.265 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.76076 1.17862 4.039 7.03e-05 *** ## exper 1.28067 0.32853 3.898 0.000123 *** ## expersq -0.07184 0.02407 -2.985 0.003106 ** ## guard 2.31469 1.00036 2.314 0.021444 * ## forward 1.54457 1.00226 1.541 0.124492 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.668 on 264 degrees of freedom ## Multiple R-squared: 0.09098, Adjusted R-squared: 0.07721 ## F-statistic: 6.606 on 4 and 264 DF, p-value: 4.426e-05 3
  • 4. Regression results: points = 4.76076 (1.17862) + 1.28067 (0.32853) exper − 0.07184 (0.02407) expersq + 2.31469 (1.00036) guard + 1.54457 (1.00226) forward The sample size is: summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have ## [1] 269 The r-squared is: summary(a)$r.squared ## [1] 0.09097856 b) Holding experience fixed, does a guard score more than a center? How much more? Is the difference statistically significant? >> Yes, a guard seems to score more than a center. When we control for experience and experienceˆ2, a guard seems to score on average 2.31469 (δ1) more points. If we want to know whether it has a statistically significant positive effect, we can test the following hypothesis: H0 : δ1 = 0H1 : δ1 > 0 The one-sided p-value of δ1 is 0.010722 (two-sided p-value divided by two), so it is statistically significant at the 1.0722048% significance level. c) Now, add marital status to the equation. Holding position and experience fixed, are married players more productive (based on points per game)? >> The new regression model is: points = β0 + β1exper + β2expersq + δ1guard + δ2forward + δ3marr + u The SRF is: a = lm(points~exper+expersq+guard+forward+marr,data) a ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward + marr, ## data = data) ## ## Coefficients: ## (Intercept) exper expersq guard forward ## 4.70294 1.23326 -0.07037 2.28632 1.54091 ## marr ## 0.58427 4
  • 5. summary(a) ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward + marr, ## data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.874 -4.227 -1.251 3.631 22.412 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.70294 1.18174 3.980 8.93e-05 *** ## exper 1.23326 0.33421 3.690 0.000273 *** ## expersq -0.07037 0.02416 -2.913 0.003892 ** ## guard 2.28632 1.00172 2.282 0.023265 * ## forward 1.54091 1.00298 1.536 0.125660 ## marr 0.58427 0.74040 0.789 0.430751 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.672 on 263 degrees of freedom ## Multiple R-squared: 0.09313, Adjusted R-squared: 0.07588 ## F-statistic: 5.401 on 5 and 263 DF, p-value: 9.526e-05 Regression results: points = 4.70294 (1.18174) + 1.23326 (0.33421) exper − 0.07037 (0.02416) expersq + 2.28632 (1.00172) guard + 1.54091 (1.00298) forward + 0.5842 (0.74040) marr The sample size is still: summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have ## [1] 269 The r-squared is: summary(a)$r.squared ## [1] 0.09312579 Yes, married players seem to be more productive than non-married players. When we control for experience, experienceˆ2 and position, a guard seems to score on average 0.58427 (δ3) more points. However, if might not be statistically significant. If we want to know whether it has a statistically significant positive effect, we need to test the following hypothesis: H0 : δ3 = 0H1 : δ3 > 0 The one-sided p-value of δ3 is 0.2153757 (two-sided p-value divided by two), so it is statistically significant at the 21.5375685% significance level. So for most practical purposes, we cannot consider it as statistically significant. 5
  • 6. d) Add interactions of marital status with both experience variables. In this expanded model, is there strong evidence that marital status affects points per game? >> The new regression model is: points = β0+β1exper+β2expersq+δ1guard+δ2forward+δ3marr+δ4marr∗experience+δ5marr∗expersq+u The SRF is: a = lm(points~exper+expersq+guard+forward+marr+marr*exper+marr*expersq,data) a ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward + marr + ## marr * exper + marr * expersq, data = data) ## ## Coefficients: ## (Intercept) exper expersq guard forward ## 5.81615 0.70255 -0.02950 2.25079 1.62915 ## marr exper:marr expersq:marr ## -2.53750 1.27965 -0.09359 summary(a) ## ## Call: ## lm(formula = points ~ exper + expersq + guard + forward + marr + ## marr * exper + marr * expersq, data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.239 -4.328 -1.067 3.742 22.197 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.81615 1.34878 4.312 2.29e-05 *** ## exper 0.70255 0.43405 1.619 0.1067 ## expersq -0.02950 0.03267 -0.903 0.3674 ## guard 2.25079 1.00002 2.251 0.0252 * ## forward 1.62915 1.00199 1.626 0.1052 ## marr -2.53750 2.03822 -1.245 0.2143 ## exper:marr 1.27965 0.68229 1.876 0.0618 . ## expersq:marr -0.09359 0.04887 -1.915 0.0566 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.654 on 261 degrees of freedom ## Multiple R-squared: 0.1058, Adjusted R-squared: 0.08184 ## F-statistic: 4.413 on 7 and 261 DF, p-value: 0.0001188 6
  • 7. Regression results: points = 5.81615 (1.34878) +0.70255 (0.43405) exper−0.02950 (0.03267) expersq+2.25079 (1.00002) guard+1.62915 (1.00199) forward−2.53750 (2.03822) marr+1.27965 (0.68229) exper∗marrm−0. (0. The sample size is still: summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have ## [1] 269 The r-squared is: summary(a)$r.squared ## [1] 0.1058214 This time, we want to perform a two-sided test (because we are interested in whether there is an effect in either direction), on three different coefficients at the same time. Therefore, this is a joint hypothesis testing: we want to know if, together, all the coefficients that include the marrital status have an effect on the points: H0 : δ3 = 0ANDδ4 = 0ANDδ5 = 0H1 : H0isfalse The two-sided p-value of δ3 is 0.2142624, so it is statistically significant at the 21.4262432% significance level. So no, for most practical purposes, we cannot really say there is strong evidence that marital status affects points per game. e) Estimate the model from part (c) but use assists per game as the dependent variable. Are there any notable differences from part (c)? Discuss. >> The new regression model is: assists = β0 + β1exper + β2expersq + δ1guard + δ2forward + δ3marr + u The SRF is: a = lm(assists~exper+expersq+guard+forward+marr,data) a ## ## Call: ## lm(formula = assists ~ exper + expersq + guard + forward + marr, ## data = data) ## ## Coefficients: ## (Intercept) exper expersq guard forward ## -0.22581 0.44360 -0.02673 2.49167 0.44747 ## marr ## 0.32190 7
  • 8. summary(a) ## ## Call: ## lm(formula = assists ~ exper + expersq + guard + forward + marr, ## data = data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.3127 -1.0780 -0.3157 0.6788 8.2488 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.225809 0.354904 -0.636 0.52516 ## exper 0.443603 0.100372 4.420 1.45e-05 *** ## expersq -0.026726 0.007256 -3.683 0.00028 *** ## guard 2.491672 0.300842 8.282 6.19e-15 *** ## forward 0.447471 0.301220 1.486 0.13860 ## marr 0.321899 0.222359 1.448 0.14891 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.704 on 263 degrees of freedom ## Multiple R-squared: 0.3499, Adjusted R-squared: 0.3375 ## F-statistic: 28.31 on 5 and 263 DF, p-value: < 2.2e-16 Regression results: assists = −0.225809 (0.354904) +0.443603 (0.100372) exper−0.026726 (0.007256) expersq+2.491672 (0.300842) guard+0.447471 (0.301220) forward+0.321899 (0.222359) marr The sample size is still: summary(a)$df[2]+length(coef(a)) # = Degrees of freedom + number of coefficients. nrow(data) would have ## [1] 269 The r-squared is: summary(a)$r.squared ## [1] 0.3498759 As we can see, there are some differences compared to c), but nothing major. Except for the intercept, which changed sign, the direction of all the effects is the same. The intercept, which was highly statistically significant in c), is no longer statistically significant and the variable “guard” is now much more significant than in c). All the variables changed in magnitude in sometimes significative ways. Most of these differences in magnitude is explained by the different scales of “assists” and “points”: 8