SlideShare a Scribd company logo
Part A:
Question 1:
a) As we see in Table 1; from running a regression of “nettfa” on “inc”, “age” & “agesq”,
the coefficient of “age” is given as -1.569052, so for each additional year that an
individual in the sample has lived, that individual’s Net Financial Assets will be lowered
by $1,569.05, holding all other variables equal. On its own, this coefficient isn’t
interesting as we have not taken into account “agesq” which shows a changing effect of
“age” on “nettfa”
b) nettfa = -39.7499 + 1.333173 inc - 1.569052 age + 0.0364665 age2
(33.15342) (0.0780732) (1.606975) (0.018022)
n=1494 R2=0.2043
I am not too concerned about the sign of the coefficient on “age” because the presence
of the variable “agesq” changes our interpretation of how age affects Net Financial
Assets; we see from the estimated model that after the age of 22, the negative change
in “nettfa” for each year lived dissipates and reverses so that each year increases the
explained variable.
c) Rearranging Θ2=β2+2β3age we can assert that β2= Θ2-50β3 and sub this into our
regression to get an equation where Θ2 is the coefficient of “age” and β3 is the
coefficient of a new function, (agesq-50*age), which I have generated and represented
in the software as the variable, “c”. Table 2 displays the results of a regression with
these coefficients and explanatory variables.
So we see from the regression of “nettfa” on “inc”, “age”, and “c”, that the coefficient
on “age”, namely the fitted value of Θ2, has a value of 0.2542717.
Testing the Hypotheses: H0: Θ2=0 Ha: Θ2≠0
We use the two-sided p-value generated by the software to observe if p>0.05 in which
case we will fail to reject the null hypothesis (H0). We see from our results that the p-
value for Θ2 is 0.723>0.05
Therefore we fail to reject the null that Θ2=0 at the 95% level and can conclude that the
partial effect on “nettfa” of “age” at 25 years is negligible.
d) Including the explanatory variable, “incsq” in the regression, we observe the results in
Table 3.
Setting our hypotheses as: Ho: βincsq=0 Ha: βincsq≠0,
We test if the two-sided p-value is <0.05, if this does not hold, we fail to reject the null
and therefore “incsq” is insignificant. What we observe in the results is:
The coefficient p-value=0.000<0.05, therefore we can reject the null hypothesis and
state that “incsq” is statistically significant at the 95% level.
To estimate a strictly increasing effect of “inc” on “nettfa”, we observe the sample
values for “inc” and see that the smallest value for this variable is 10.14. We can see that
the partial effect of “inc” on “nettfa” in the previous model is: β1+2β2inc. Plugging in the
lowest value of “inc” into the partial effect and rearranging, we can substitute
coefficient Θ3 into the regression and isolate it:
nettfa=β0+ Θ3inc+ β2(incsq-20.28*inc)+β3age+β4agesq+u.
This regression will give us the strictly increasing effect of “inc” on “nettfa”.
Question 2:
a) nettfa = -20.98499 + 0.7705833inc + 0.0251267(age-25)2 + 2.477927male +
(2.472022) (0.061452) (0.0025934) (2.047776)
6.886223e401k
(2.12275) R2=0.1279 n=2017
Our reported coefficient on the explanatory variable “e401k” implies that eligibility
for the 401(k) pension plan should boost Net Financial Assets by $6,886.22 for
someone who is single.
b) The results of the regression are exhibited in Table 4, with “g” representing the variable
[age-25]. To determine whether or not our model abides by the assumption of
homoskedasticity, we must take the squared residuals from the dataset and regress
them on our explanatory variables, so our initial hypotheses:
H0: Var(u | inc, [age-25]2, male, e401k)=σ2 Ha: Var(u | inc, [age-25]2, male, e401k)≠σ2
Can be written as:
H0: δ1=δ2=δ3=δ4=0 Ha: δj≠0 {where fitted u2= δ0 +δ1inc+δ2(age-25)2+ δ3male+
δ4e401k+v}
Conducting an F-Test for overall significance with rejection criterion: F > F4,2012=2.37, we
see from Table 5 that F = 4322.52 thus we reject our null hypothesis and strongly
conclude, that the error term is not independent of the explanatory variables in the
model.
c) Estimating our equation by Least Absolute Deviations, we attain that:
nettfa = -7.01748 + 0.3274764inc + 0.0047929(age-25)2 + 0.2868279male +
2.77321e401k
(0.9453906) (0.0235015) (0.0009918) (0.7831437)
(0.812017)
Pseudo R2=0.0678 n=2017
The coefficient for “e401k” obtained in this estimation states that eligibility for a
401(k) pension plan will increase an individual’s net financial assets by $2,773.21.
d) Observing both the OLS estimation and the heteroskedasticity-robust LAD estimation,
we can see that in both cases, we have the same general indication from the data: That
eligibility for a 401(K) pension plan in the U.S. corresponds with an increased level of net
financial assets.
Part B:
Question 1:
a) Educ = ϒ0 + ϒ1age + ϒ2agesq + ϒ3black + ϒ4othrural + ϒ5smcity + ϒ6meduc + ϒ7feduc + v
As we see in the specified equation for education, all the explanatory variables for
the reduced form of the variable “educ” are exogenous and so there is no issue with
estimating the equation using the OLS method.
b) To test whether our instrumental variables are relevant, we observe the OLS
estimation of “educ”:
H0: ϒmeduc = ϒfeduc = 0 Ha: ϒmeduc or ϒfeduc ≠ 0
Testing these hypotheses using a two-tailed t-test with α=0.05 against:
Tc = t0.05,1129= 1.96
We obtain: tmeduc = 8.3226 > 1.96 and tfeduc = 8.7055 > 1.96
Therefore we reject the null hypothesis for both instrumental variables and conclude
that “meduc” and “feduc” are relevant instrumental variables to use on “educ” in
Model 1.
c) Carrying out the Sargan over-identification test on Model 4 which includes the
instrumented “educ”; with a p-value of 0.809272, it is firmly implied that including
both “meduc” and “feduc” as instruments for “educ” constitutes an over-
identification of the original model and thus only one should be required as an
instrumental variable “educ” in the model.
d) In Model 3 “Model 2 Residuals” is included – represented by “𝑣̂”. So in Model 3 we
have:
Kids = β0 + β1x1 +…+ βkxk + δ1 𝑣̂
With H0: δ 𝑣̂= 0 Ha: δ 𝑣̂ ≠ 0
Conducting a two-tailed t-test with α=0.05, hence tc = 1.96
We obtain: t 𝑣̂ = 0.6663 < 1.96
Therefore we fail to reject the null hypothesis and must conclude that education is
not correlated with the unobservable effects on fertility, and thus the endogeneity
problem does not affect Model 1.
e) Since we know that the model exhibits homoskedasticity, I would opt to use the
coefficient of education from Model 1 where we have not used any instrumental
variables. The variable, “educ”, here, doesn’t suffer from the endogeneity problem.
Even if we were to account for “feduc” and “meduc”, we would still be omitting
variables that contribute to “educ”, and thus the coefficient from Model 4 isn’t any
better than that in Model 1 taking into account our results from Sargan and
Hausman tests.

More Related Content

Viewers also liked

FitnessPark
FitnessParkFitnessPark
FitnessPark
Jed Fletcher
 
Poemas de vários autores
Poemas de vários autoresPoemas de vários autores
Poemas de vários autores
bibliotecanordeste
 
Concurso
ConcursoConcurso
Financesemesterdoc
FinancesemesterdocFinancesemesterdoc
Financesemesterdoc
Jed Fletcher
 
Bab i, iv, daftar pustaka
Bab i, iv, daftar pustakaBab i, iv, daftar pustaka
Bab i, iv, daftar pustaka
supritria
 
Fundamentod de wicius wong
Fundamentod de wicius wongFundamentod de wicius wong
Fundamentod de wicius wong
Alexo2K
 
Grow aquaponics brief
Grow aquaponics briefGrow aquaponics brief
Grow aquaponics brief
Sathyanarayana A.N
 

Viewers also liked (7)

FitnessPark
FitnessParkFitnessPark
FitnessPark
 
Poemas de vários autores
Poemas de vários autoresPoemas de vários autores
Poemas de vários autores
 
Concurso
ConcursoConcurso
Concurso
 
Financesemesterdoc
FinancesemesterdocFinancesemesterdoc
Financesemesterdoc
 
Bab i, iv, daftar pustaka
Bab i, iv, daftar pustakaBab i, iv, daftar pustaka
Bab i, iv, daftar pustaka
 
Fundamentod de wicius wong
Fundamentod de wicius wongFundamentod de wicius wong
Fundamentod de wicius wong
 
Grow aquaponics brief
Grow aquaponics briefGrow aquaponics brief
Grow aquaponics brief
 

Similar to Stats Coursework

STATA Project
STATA ProjectSTATA Project
STATA Project
Mia Attruia
 
Course Assignment : Skip gram
Course Assignment : Skip gramCourse Assignment : Skip gram
Course Assignment : Skip gram
KhalilBergaoui
 
2 simple regression
2   simple regression2   simple regression
2 simple regression
georgettenicolaides
 
Week 7
Week 7Week 7
Week 7
EasyStudy3
 
Spurious correlation (updated)
Spurious correlation (updated)Spurious correlation (updated)
Spurious correlation (updated)
jemille6
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
Seth Anandaram Jaipuria College
 
Chapter14
Chapter14Chapter14
Chapter14
rwmiller
 
Business statistics homework help
Business statistics homework helpBusiness statistics homework help
Business statistics homework help
Statistics Help Desk
 
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
MaxineBoyd
 
Introduction to lifecontingencies R package
Introduction to lifecontingencies R packageIntroduction to lifecontingencies R package
Introduction to lifecontingencies R package
Giorgio Alfredo Spedicato
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Daniel Katz
 
Multivariate Analysis of Cauchy’s Inequality
Multivariate Analysis of Cauchy’s InequalityMultivariate Analysis of Cauchy’s Inequality
Multivariate Analysis of Cauchy’s Inequality
IRJET Journal
 
Regression
RegressionRegression
Regression
ramyaranjith
 
Biometry regression
Biometry regressionBiometry regression
Biometry regression
musadoto
 
exercises.pdf
exercises.pdfexercises.pdf
exercises.pdf
mekuannintdemeke
 
Chapter14
Chapter14Chapter14
Chapter14
Richard Ferreria
 
stats_ch12.pdf
stats_ch12.pdfstats_ch12.pdf
stats_ch12.pdf
shermanullah
 
Curve_Fitting.pdf
Curve_Fitting.pdfCurve_Fitting.pdf
Curve_Fitting.pdf
Irfan Khan
 
Cauchy’s Inequality based study of the Differential Equations and the Simple ...
Cauchy’s Inequality based study of the Differential Equations and the Simple ...Cauchy’s Inequality based study of the Differential Equations and the Simple ...
Cauchy’s Inequality based study of the Differential Equations and the Simple ...
IRJET Journal
 
A Novel Multiplication Algorithm Based On Euclidean Division For Multiplying ...
A Novel Multiplication Algorithm Based On Euclidean Division For Multiplying ...A Novel Multiplication Algorithm Based On Euclidean Division For Multiplying ...
A Novel Multiplication Algorithm Based On Euclidean Division For Multiplying ...
Jim Webb
 

Similar to Stats Coursework (20)

STATA Project
STATA ProjectSTATA Project
STATA Project
 
Course Assignment : Skip gram
Course Assignment : Skip gramCourse Assignment : Skip gram
Course Assignment : Skip gram
 
2 simple regression
2   simple regression2   simple regression
2 simple regression
 
Week 7
Week 7Week 7
Week 7
 
Spurious correlation (updated)
Spurious correlation (updated)Spurious correlation (updated)
Spurious correlation (updated)
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
Chapter14
Chapter14Chapter14
Chapter14
 
Business statistics homework help
Business statistics homework helpBusiness statistics homework help
Business statistics homework help
 
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
 
Introduction to lifecontingencies R package
Introduction to lifecontingencies R packageIntroduction to lifecontingencies R package
Introduction to lifecontingencies R package
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
 
Multivariate Analysis of Cauchy’s Inequality
Multivariate Analysis of Cauchy’s InequalityMultivariate Analysis of Cauchy’s Inequality
Multivariate Analysis of Cauchy’s Inequality
 
Regression
RegressionRegression
Regression
 
Biometry regression
Biometry regressionBiometry regression
Biometry regression
 
exercises.pdf
exercises.pdfexercises.pdf
exercises.pdf
 
Chapter14
Chapter14Chapter14
Chapter14
 
stats_ch12.pdf
stats_ch12.pdfstats_ch12.pdf
stats_ch12.pdf
 
Curve_Fitting.pdf
Curve_Fitting.pdfCurve_Fitting.pdf
Curve_Fitting.pdf
 
Cauchy’s Inequality based study of the Differential Equations and the Simple ...
Cauchy’s Inequality based study of the Differential Equations and the Simple ...Cauchy’s Inequality based study of the Differential Equations and the Simple ...
Cauchy’s Inequality based study of the Differential Equations and the Simple ...
 
A Novel Multiplication Algorithm Based On Euclidean Division For Multiplying ...
A Novel Multiplication Algorithm Based On Euclidean Division For Multiplying ...A Novel Multiplication Algorithm Based On Euclidean Division For Multiplying ...
A Novel Multiplication Algorithm Based On Euclidean Division For Multiplying ...
 

Stats Coursework

  • 1. Part A: Question 1: a) As we see in Table 1; from running a regression of “nettfa” on “inc”, “age” & “agesq”, the coefficient of “age” is given as -1.569052, so for each additional year that an individual in the sample has lived, that individual’s Net Financial Assets will be lowered by $1,569.05, holding all other variables equal. On its own, this coefficient isn’t interesting as we have not taken into account “agesq” which shows a changing effect of “age” on “nettfa” b) nettfa = -39.7499 + 1.333173 inc - 1.569052 age + 0.0364665 age2 (33.15342) (0.0780732) (1.606975) (0.018022) n=1494 R2=0.2043 I am not too concerned about the sign of the coefficient on “age” because the presence of the variable “agesq” changes our interpretation of how age affects Net Financial Assets; we see from the estimated model that after the age of 22, the negative change in “nettfa” for each year lived dissipates and reverses so that each year increases the explained variable. c) Rearranging Θ2=β2+2β3age we can assert that β2= Θ2-50β3 and sub this into our regression to get an equation where Θ2 is the coefficient of “age” and β3 is the coefficient of a new function, (agesq-50*age), which I have generated and represented in the software as the variable, “c”. Table 2 displays the results of a regression with these coefficients and explanatory variables. So we see from the regression of “nettfa” on “inc”, “age”, and “c”, that the coefficient on “age”, namely the fitted value of Θ2, has a value of 0.2542717. Testing the Hypotheses: H0: Θ2=0 Ha: Θ2≠0 We use the two-sided p-value generated by the software to observe if p>0.05 in which case we will fail to reject the null hypothesis (H0). We see from our results that the p- value for Θ2 is 0.723>0.05 Therefore we fail to reject the null that Θ2=0 at the 95% level and can conclude that the partial effect on “nettfa” of “age” at 25 years is negligible. d) Including the explanatory variable, “incsq” in the regression, we observe the results in Table 3.
  • 2. Setting our hypotheses as: Ho: βincsq=0 Ha: βincsq≠0, We test if the two-sided p-value is <0.05, if this does not hold, we fail to reject the null and therefore “incsq” is insignificant. What we observe in the results is: The coefficient p-value=0.000<0.05, therefore we can reject the null hypothesis and state that “incsq” is statistically significant at the 95% level. To estimate a strictly increasing effect of “inc” on “nettfa”, we observe the sample values for “inc” and see that the smallest value for this variable is 10.14. We can see that the partial effect of “inc” on “nettfa” in the previous model is: β1+2β2inc. Plugging in the lowest value of “inc” into the partial effect and rearranging, we can substitute coefficient Θ3 into the regression and isolate it: nettfa=β0+ Θ3inc+ β2(incsq-20.28*inc)+β3age+β4agesq+u. This regression will give us the strictly increasing effect of “inc” on “nettfa”. Question 2: a) nettfa = -20.98499 + 0.7705833inc + 0.0251267(age-25)2 + 2.477927male + (2.472022) (0.061452) (0.0025934) (2.047776) 6.886223e401k (2.12275) R2=0.1279 n=2017 Our reported coefficient on the explanatory variable “e401k” implies that eligibility for the 401(k) pension plan should boost Net Financial Assets by $6,886.22 for someone who is single. b) The results of the regression are exhibited in Table 4, with “g” representing the variable [age-25]. To determine whether or not our model abides by the assumption of homoskedasticity, we must take the squared residuals from the dataset and regress them on our explanatory variables, so our initial hypotheses: H0: Var(u | inc, [age-25]2, male, e401k)=σ2 Ha: Var(u | inc, [age-25]2, male, e401k)≠σ2 Can be written as: H0: δ1=δ2=δ3=δ4=0 Ha: δj≠0 {where fitted u2= δ0 +δ1inc+δ2(age-25)2+ δ3male+ δ4e401k+v} Conducting an F-Test for overall significance with rejection criterion: F > F4,2012=2.37, we see from Table 5 that F = 4322.52 thus we reject our null hypothesis and strongly
  • 3. conclude, that the error term is not independent of the explanatory variables in the model. c) Estimating our equation by Least Absolute Deviations, we attain that: nettfa = -7.01748 + 0.3274764inc + 0.0047929(age-25)2 + 0.2868279male + 2.77321e401k (0.9453906) (0.0235015) (0.0009918) (0.7831437) (0.812017) Pseudo R2=0.0678 n=2017 The coefficient for “e401k” obtained in this estimation states that eligibility for a 401(k) pension plan will increase an individual’s net financial assets by $2,773.21. d) Observing both the OLS estimation and the heteroskedasticity-robust LAD estimation, we can see that in both cases, we have the same general indication from the data: That eligibility for a 401(K) pension plan in the U.S. corresponds with an increased level of net financial assets. Part B: Question 1: a) Educ = ϒ0 + ϒ1age + ϒ2agesq + ϒ3black + ϒ4othrural + ϒ5smcity + ϒ6meduc + ϒ7feduc + v As we see in the specified equation for education, all the explanatory variables for the reduced form of the variable “educ” are exogenous and so there is no issue with estimating the equation using the OLS method. b) To test whether our instrumental variables are relevant, we observe the OLS estimation of “educ”: H0: ϒmeduc = ϒfeduc = 0 Ha: ϒmeduc or ϒfeduc ≠ 0 Testing these hypotheses using a two-tailed t-test with α=0.05 against: Tc = t0.05,1129= 1.96 We obtain: tmeduc = 8.3226 > 1.96 and tfeduc = 8.7055 > 1.96 Therefore we reject the null hypothesis for both instrumental variables and conclude that “meduc” and “feduc” are relevant instrumental variables to use on “educ” in Model 1.
  • 4. c) Carrying out the Sargan over-identification test on Model 4 which includes the instrumented “educ”; with a p-value of 0.809272, it is firmly implied that including both “meduc” and “feduc” as instruments for “educ” constitutes an over- identification of the original model and thus only one should be required as an instrumental variable “educ” in the model. d) In Model 3 “Model 2 Residuals” is included – represented by “𝑣̂”. So in Model 3 we have: Kids = β0 + β1x1 +…+ βkxk + δ1 𝑣̂ With H0: δ 𝑣̂= 0 Ha: δ 𝑣̂ ≠ 0 Conducting a two-tailed t-test with α=0.05, hence tc = 1.96 We obtain: t 𝑣̂ = 0.6663 < 1.96 Therefore we fail to reject the null hypothesis and must conclude that education is not correlated with the unobservable effects on fertility, and thus the endogeneity problem does not affect Model 1. e) Since we know that the model exhibits homoskedasticity, I would opt to use the coefficient of education from Model 1 where we have not used any instrumental variables. The variable, “educ”, here, doesn’t suffer from the endogeneity problem. Even if we were to account for “feduc” and “meduc”, we would still be omitting variables that contribute to “educ”, and thus the coefficient from Model 4 isn’t any better than that in Model 1 taking into account our results from Sargan and Hausman tests.