Stats Coursework

Part A:
Question 1:
a) As we see in Table 1; from running a regression of “nettfa” on “inc”, “age” & “agesq”,
the coefficient of “age” is given as -1.569052, so for each additional year that an
individual in the sample has lived, that individual’s Net Financial Assets will be lowered
by $1,569.05, holding all other variables equal. On its own, this coefficient isn’t
interesting as we have not taken into account “agesq” which shows a changing effect of
“age” on “nettfa”
b) nettfa = -39.7499 + 1.333173 inc - 1.569052 age + 0.0364665 age2
(33.15342) (0.0780732) (1.606975) (0.018022)
n=1494 R2=0.2043
I am not too concerned about the sign of the coefficient on “age” because the presence
of the variable “agesq” changes our interpretation of how age affects Net Financial
Assets; we see from the estimated model that after the age of 22, the negative change
in “nettfa” for each year lived dissipates and reverses so that each year increases the
explained variable.
c) Rearranging Θ2=β2+2β3age we can assert that β2= Θ2-50β3 and sub this into our
regression to get an equation where Θ2 is the coefficient of “age” and β3 is the
coefficient of a new function, (agesq-50*age), which I have generated and represented
in the software as the variable, “c”. Table 2 displays the results of a regression with
these coefficients and explanatory variables.
So we see from the regression of “nettfa” on “inc”, “age”, and “c”, that the coefficient
on “age”, namely the fitted value of Θ2, has a value of 0.2542717.
Testing the Hypotheses: H0: Θ2=0 Ha: Θ2≠0
We use the two-sided p-value generated by the software to observe if p>0.05 in which
case we will fail to reject the null hypothesis (H0). We see from our results that the p-
value for Θ2 is 0.723>0.05
Therefore we fail to reject the null that Θ2=0 at the 95% level and can conclude that the
partial effect on “nettfa” of “age” at 25 years is negligible.
d) Including the explanatory variable, “incsq” in the regression, we observe the results in
Table 3.

Setting our hypotheses as: Ho: βincsq=0 Ha: βincsq≠0,
We test if the two-sided p-value is <0.05, if this does not hold, we fail to reject the null
and therefore “incsq” is insignificant. What we observe in the results is:
The coefficient p-value=0.000<0.05, therefore we can reject the null hypothesis and
state that “incsq” is statistically significant at the 95% level.
To estimate a strictly increasing effect of “inc” on “nettfa”, we observe the sample
values for “inc” and see that the smallest value for this variable is 10.14. We can see that
the partial effect of “inc” on “nettfa” in the previous model is: β1+2β2inc. Plugging in the
lowest value of “inc” into the partial effect and rearranging, we can substitute
coefficient Θ3 into the regression and isolate it:
nettfa=β0+ Θ3inc+ β2(incsq-20.28*inc)+β3age+β4agesq+u.
This regression will give us the strictly increasing effect of “inc” on “nettfa”.
Question 2:
a) nettfa = -20.98499 + 0.7705833inc + 0.0251267(age-25)2 + 2.477927male +
(2.472022) (0.061452) (0.0025934) (2.047776)
6.886223e401k
(2.12275) R2=0.1279 n=2017
Our reported coefficient on the explanatory variable “e401k” implies that eligibility
for the 401(k) pension plan should boost Net Financial Assets by $6,886.22 for
someone who is single.
b) The results of the regression are exhibited in Table 4, with “g” representing the variable
[age-25]. To determine whether or not our model abides by the assumption of
homoskedasticity, we must take the squared residuals from the dataset and regress
them on our explanatory variables, so our initial hypotheses:
H0: Var(u | inc, [age-25]2, male, e401k)=σ2 Ha: Var(u | inc, [age-25]2, male, e401k)≠σ2
Can be written as:
H0: δ1=δ2=δ3=δ4=0 Ha: δj≠0 {where fitted u2= δ0 +δ1inc+δ2(age-25)2+ δ3male+
δ4e401k+v}
Conducting an F-Test for overall significance with rejection criterion: F > F4,2012=2.37, we
see from Table 5 that F = 4322.52 thus we reject our null hypothesis and strongly

conclude, that the error term is not independent of the explanatory variables in the
model.
c) Estimating our equation by Least Absolute Deviations, we attain that:
nettfa = -7.01748 + 0.3274764inc + 0.0047929(age-25)2 + 0.2868279male +
2.77321e401k
(0.9453906) (0.0235015) (0.0009918) (0.7831437)
(0.812017)
Pseudo R2=0.0678 n=2017
The coefficient for “e401k” obtained in this estimation states that eligibility for a
401(k) pension plan will increase an individual’s net financial assets by $2,773.21.
d) Observing both the OLS estimation and the heteroskedasticity-robust LAD estimation,
we can see that in both cases, we have the same general indication from the data: That
eligibility for a 401(K) pension plan in the U.S. corresponds with an increased level of net
financial assets.
Part B:
Question 1:
a) Educ = ϒ0 + ϒ1age + ϒ2agesq + ϒ3black + ϒ4othrural + ϒ5smcity + ϒ6meduc + ϒ7feduc + v
As we see in the specified equation for education, all the explanatory variables for
the reduced form of the variable “educ” are exogenous and so there is no issue with
estimating the equation using the OLS method.
b) To test whether our instrumental variables are relevant, we observe the OLS
estimation of “educ”:
H0: ϒmeduc = ϒfeduc = 0 Ha: ϒmeduc or ϒfeduc ≠ 0
Testing these hypotheses using a two-tailed t-test with α=0.05 against:
Tc = t0.05,1129= 1.96
We obtain: tmeduc = 8.3226 > 1.96 and tfeduc = 8.7055 > 1.96
Therefore we reject the null hypothesis for both instrumental variables and conclude
that “meduc” and “feduc” are relevant instrumental variables to use on “educ” in
Model 1.

c) Carrying out the Sargan over-identification test on Model 4 which includes the
instrumented “educ”; with a p-value of 0.809272, it is firmly implied that including
both “meduc” and “feduc” as instruments for “educ” constitutes an over-
identification of the original model and thus only one should be required as an
instrumental variable “educ” in the model.
d) In Model 3 “Model 2 Residuals” is included – represented by “𝑣̂”. So in Model 3 we
have:
Kids = β0 + β1x1 +…+ βkxk + δ1 𝑣̂
With H0: δ 𝑣̂= 0 Ha: δ 𝑣̂ ≠ 0
Conducting a two-tailed t-test with α=0.05, hence tc = 1.96
We obtain: t 𝑣̂ = 0.6663 < 1.96
Therefore we fail to reject the null hypothesis and must conclude that education is
not correlated with the unobservable effects on fertility, and thus the endogeneity
problem does not affect Model 1.
e) Since we know that the model exhibits homoskedasticity, I would opt to use the
coefficient of education from Model 1 where we have not used any instrumental
variables. The variable, “educ”, here, doesn’t suffer from the endogeneity problem.
Even if we were to account for “feduc” and “meduc”, we would still be omitting
variables that contribute to “educ”, and thus the coefficient from Model 4 isn’t any
better than that in Model 1 taking into account our results from Sargan and
Hausman tests.

Stats Coursework

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Stats Coursework

Similar to Stats Coursework (20)

Stats Coursework