ECN 425: Introduction to Econometrics
Alvin Murphy Arizona State University: Fall 2018
Assignment #1
Due at the beginning of class on Thursday, September 6th
PART I: DERIVING OLS ESTIMATORS
(You must show all work to receive full credit)
1) 1) Suppose the population regression function can be written as: uxy
10
, where
0uE and 0| xuE . The sample equivalents to these two restrictions imply:
0ˆ
1
:1
n
i
i
u
n
and 0ˆ
1
:1
n
i
ii
ux
n
. Parts (a)-(c) of this problem ask you to derive the OLS
estimators for
0
and
1
. Please show all of your work.
(20 points: 5/5/10)
(a) Use 0ˆ
1
:1
n
i
i
u
n
to demonstrate that the OLS estimator for
0
can be written as:
xy
10
ˆˆ , where
n
i
i
y
n
y
:1
1
and
n
i
i
x
n
x
:1
1
.
(b) Use 0ˆ
1
:1
n
i
ii
ux
n
together with the result from (a) to demonstrate that the OLS
estimator for
1
can be written as:
n
i
ii
n
i
ii
xxx
yyx
1
:1
1
̂ .
(c) Use your result from (b) together with the definition of the variance and covariance to
demonstrate that
i
ii
x
yx
var
,covˆ
1
.
2
2) Suppose the population regression function is uzy
i
10
, and you estimate the
following sample regression function:
iii
uxy ˆˆˆ
10
, where zx .
(20 points: 10/10)
(a) Express your estimator,
1
̂ , in terms of the data and parameters of the population
regression function,
ii
zx ,,
1
, and
i
u .
(b) Use your result from (a) to demonstrate that
1
̂ is generally a biased estimator for
1
.
PART II: USING A FAKE DATA EXPERIMENT TO INVESTIGATE OLS ESTIMATORS
A fake data experiment can be a useful way to investigate the properties of an estimator. This
process begins by specifying the “true” economic model (i.e. the population regression
function). The next step is to use this model to generate some data that represent a population.
Finally, by taking repeated samples from the population and using these samples to estimate the
sample regression function several times, you can evaluate how well your estimator performs
(e.g. bias and variance) under specific conditions.
3) In this problem, you will use a fake data experiment to demonstrate the importance of
correctly specifying the form of the sample regression function. More precisely, you will
compare the bias of the OLS estimator when the model is correctly specified, to the bias
when the model is incorrectly specified to use the wrong explanatory variable. In the file
“fake1.dta”, I have generated a population of 500 observations from the (true) regression
equation: uzy
10
, such that 0uE , 0| zuE , and 2|var zu .
(25 points: 5/5/5/5/5)
a) Use these data to calculate the population paramete.
internship ppt on smartinternz platform as salesforce developer
ECN 425 Introduction to Econometrics Alvin Murphy .docx
1. ECN 425: Introduction to Econometrics
Alvin Murphy Arizona State University: Fall
2018
Assignment #1
Due at the beginning of class on Thursday, September 6th
PART I: DERIVING OLS ESTIMATORS
(You must show all work to receive full credit)
1) 1) Suppose the population regression function can be written
10
two restrictions imply:
1
7. ii
x
yx
var
,covˆ
1
2
2) Suppose the population regression function is uzy
i
10
following sample regression function:
iii
uxy ˆˆˆ
10
(20 points: 10/10)
8. (a) Express your estimator,
1
̂ , in terms of the data and parameters of the population
regression function,
ii
zx ,,
1
i
u .
(b) Use your result from (a) to demonstrate that
1
̂ is generally a biased estimator for
1
PART II: USING A FAKE DATA EXPERIMENT TO
9. INVESTIGATE OLS ESTIMATORS
A fake data experiment can be a useful way to investigate the
properties of an estimator. This
process begins by specifying the “true” economic model (i.e.
the population regression
function). The next step is to use this model to generate some
data that represent a population.
Finally, by taking repeated samples from the population and
using these samples to estimate the
sample regression function several times, you can evaluate how
well your estimator performs
(e.g. bias and variance) under specific conditions.
3) In this problem, you will use a fake data experiment to
demonstrate the importance of
correctly specifying the form of the sample regression function.
More precisely, you will
compare the bias of the OLS estimator when the model is
correctly specified, to the bias
when the model is incorrectly specified to use the wrong
explanatory variable. In the file
“fake1.dta”, I have generated a population of 500 observations
from the (true) regression
10. 10
.
(25 points: 5/5/5/5/5)
a) Use these data to calculate the population parameters
0
1
use 2 decimal places.
3
b) Now, take a random 5% sample from the population and
discard the remaining
observations. This can be done using the command “bsample
round(0.05*_N)”. Use
this random sample to calculate OLS estimates for
0
11. 1
Report your results out to
3 decimal places.
c) Repeat part (b) 19 more times, saving the values for
0
̂ and
1
̂ on each iteration. Thus,
on each iteration you are reloading “fake1.dta”, taking a new
randomly-chosen 5%
sample, and using that sample to generate estimates for
0
1
0
̂
1
12. ̂bias . Of your
20 samples, what is the closest and the farthest that you come
from recovering the true
values of
0
1
00
11
00
11
d) Repeat the exercise in parts (b) and (c), except this time you
will incorrectly replace z
1
̂bias , 11
13. 11
0
̂
1
̂bias consistent with the
theoretical properties of correctly specified OLS estimators?
Are your sample results
from part (d) consistent with what you learned from problem #2
about the theoretical
properties of an OLS estimator that is incorrectly specified to
use the wrong explanatory
variable? Please explain your answers.
1 Stata hint: After typing in the commands for the first iteration
14. in part (b), you can use the review window to click
on those same commands 19 more times, rather than typing
them again
4
PART III: EMPIRICAL ANALYSIS2
4) Use airfare.dta to answer the following questions. (15
points: 5/5/5)
(i) Report the mean, standard deviation, minimum and maximum
airfare for: (a) one-way
flights less than 500 miles, (b) one way flights between 500 and
1000 miles, (c) one-way
flights between 1000 and 2000 miles; and (d) one way flights
over 2000 miles.
(ii) Estimate a regression model where a one mile increase in
flight distance changes the
fare by a constant dollar amount. Use your result to predict the
price of flying 250 miles.
15. (iii) Now estimate a regression model where a one percent
increase in flight distance
leads to a constant percentage change in price. Use your result
to report the elasticity of
airfare to flight distance.
5) Is there adverse selection in the market for health care? I
have obtained state-level data on
health outcomes for the share of the population with health
insurance for 2004, 2005, 2008,
2009, and 2010. These data are from the Behavioral Risk
Factor Surveillance System. This
question asks you investigate the data, run some regressions,
and interpret the results. The
file BRFSS.dta contains data on state population, state
population with health insurance, and
health outcomes for the insured population.
(20 points: 5/5/5/5)
a) Generate a variable, share_insured, that measures the share of
the state population with
16. health insurance. Report summary statistics for share_insured
for each year (mean, st.
dev, min, max). Did the share of people with health insurance
in the average state
increase during the 2000’s?
b) Use a simple linear regression model to estimate how the
share of people with health
insurance impacts health outcomes for the insured population.
Report slope coefficients,
their standard errors, the number of observations, and the R2 in
the table below.
2 Stata hint: you might find the if and bysort commands helpful
on this part of the assignment.
5
avg. days
health not
good
avg. days
health
17. prevented
regular activity
disability (%)
use
equipment
because of
disability (%)
exercise in
past month
(%)
asthma (%) diabetes (%)
Slope coefficient ___ ___ ___ ___ ___ ___ ___
(___) (___) (___) (___) (___) (___) (___)
N ___ ___ ___ ___ ___ ___ ___
R2 ___ ___ ___ ___ ___ ___ ___
Dependent Variable
c) Based on your results, how would increasing the share of the
18. state population with health
insurance by 1% affect the average days that insured consumers
report their health is not
good? How would it affect the percentage of the insured
consumers with diabetes? Are
these results consistent with the presence of adverse selection in
the market for health
insurance? Explain your answer.
d) Does it seem reasonable to expect that the model we
estimated in part (b) provides an
unbiased estimator for the impact of health insurance on health
outcomes in the insured
population? If so, justify your answer by explaining why you
suspect SLR.1 through
SLR.4 are satisfied. If not, explain why you suspect one or
more of the four SLR
assumptions are violated.