Econometrics and statistics mcqs part 2

Week 6: Dummy Variables
The dummy variable trap occurs when
a dummy is not defined as zero orone
there is more than one type of category using dummies
the intercept is omitted
none of the above
The next 13 questions are based on the following information. Suppose we specify that y
= + x + 1Male + 2Female + 1Left + 2Center + 3Right + where Left, Center, and
Right refer to the three possible political orientations. A variable Fringe is created as the
sum of Left and Right, and a variable x*Male is created as the product of x and Male.
Which of the following creates a dummy variable trap? Regress y on an intercept, x,
Male and Left
Male, Left, and Center
Left, Center, and Right
None of these
Which of the following creates a dummy variable trap? Regress y on an intercept, x,
Male and Fringe
Male, Center, and Fringe.
Both of the above
None of the above
The variable Fringe is interpreted as
being on the Left or on the Right
being on both the Left and the Right
being twice the value of being on the Left or being on the Right
none of these
Using Fringe instead of Left and Right separately in this specification is done to force
the slopes of Left and Right to be
the same
half the slope of Center
twice the slope of Center
the same as the slope of Center
If we regress y on an intercept, x, Male, Left, and Center, the slope coefficient on
Male is interpreted as the intercept difference between males and females
regardless of political orientation
assuming a Right political orientation
assuming a Left or Center political orientation
none of the above

If we regress y on an intercept, x, Male, and x*Male the slope coefficient on x*Male is
interpreted as
the difference between the male and female intercept
the male slope coefficient estimate
the difference between the male and female slope coefficient estimates
none of these
Suppose we regress y on an intercept, x, and Male, and then do another regression,
regressing y on an intercept, x, and Female. The slope estimates on Male and on
Female should be
equal to one another
equal but opposite in sign
bear no necessary relationship to one another
none of these
Suppose we regress y on an intercept, x, Male, Left and Center and then do another regression,
regressing y on an intercept, x, and Center and Right. The interpretation of the slope estimate
on Center should be
the intercept for those from the political center in both regressions
the difference between the Center and Right intercepts in the first regression, and the
difference between the Center and Left intercepts in the second regression
the difference between the Center and Left intercepts in the first regression, and the
difference between the Center and Right intercepts in the second regression
none of these
Suppose we regress y on an intercept, x, Male, Left and Center and then do another
regression, regressing y on an intercept, x, and Center and Right. The slope estimate on
Center in the second regression should be
the same as the slope estimate on Center in the first regression
equal to the difference between the original Center coefficient and the Left coefficient
equal to the difference between the original Center coefficient and the Right
coefficient
unrelated to the first regression results
Suppose we regress y on an intercept, Male, Left, and Center. The base category is
a male on the left
a female on the left
a male on the right
a female on the right
Suppose we regress y on an intercept, Male, Left, and Center. The intercept is
interpreted as the intercept of a
male

male on the right
female
female on the right
Researcher A has used the specification:
y = + x + MLMaleLeft + MCMaleCenter + MRMaleRight + FLFemaleLeft +
FCFemaleCenter +
Here MaleLeft is a dummy representing a male on the left; other variables are defined in
similar fashion.
Researcher B has used the specification:
y = B + Bx + Male + Left + Center + MLMale*Left + MCMale*Center + Here
Male*Left is a variable calculated as the product of Male and Left; other variables are
defined in similar fashion. These specifications are fundamentally
different
the same so that the estimate of ML should be equal to the estimate of ML
the same so that the estimate of ML should be equal to the sum of the estimates of ,
, and ML
the same so that the sum of the estimates of ML, MC, and MR should be equal to the
estimate of .
In the preceding question, the base categories for specifications A and B are,
respectively,
male on the right and female on the right
male on the right and female on the left
female on the right and female on the right
female on the right and male on the right
Analysis of variance is designed to
estimate the influence of different categories on a dependent variable
test whether a particular category has a nonzero influence on a dependent variable
test whether the intercepts for all categories in an OLS regression are the same
none of these
Suppose you have estimated wage = 5 + 3education + 2gender, where gender is one
for male and zero for female. If gender had been one for female and zero for male, this
result would have been
Unchanged
wage = 5 + 3education - 2gender
wage = 7 + 3education + 2gender
wage = 7 + 3education - 2gender
Suppose we have estimated y = 10 + 1.5x + 4D where y is earnings, x isexperience
and D is zero for females and one for males. If we had coded the dummy as minus

one for females and one for males the results (10, 2, 3) would have been
a) 14, 1.5, -4 b) 18, 1.5, -4 c) 12, 1.5, 2 d) 12, 1.5, -2
Suppose we have estimated y = 10 + 2x + 3D where y is earnings, x is experience and D
is zero for females and one for males. If we had coded the dummy as one for females and
two for males, the results (10, 2, 3) would have been
a) 10, 2, 3 b) 10, 2, 1.5 c) 7, 2, 3 d) 7, 2, 1.5
The following relates to the next three questions. In a study investigating the effect of a
new computer instructional technology for economics principles, a researcher taught a
control class in the normal way and an experimental class using the new technology. She
regressed student final exam numerical grade (out of 100) on GPA, Male, Age, Tech (a
dummy equaling unity for the experimental class), and interaction variables Tech*GPA,
Tech*Male, and Tech*Age. Age and Tech*GPA had coefficients jointly insignificantly
different from zero, so she dropped them and ended up with
grade = 45 + 9*GPA + 5*Male + 10*Tech - 6*Tech*Male - 0.2*Tech*Age
with all coefficients significant. She concludes that a) age makes no difference in the
control group, but older students do not seem to benefit as much from the computer
technology, and that b) the effect of GPA is the same regardless of what group a student is
in.
These empirical results suggest that
both conclusions are warranted
neither conclusion iswarranted
only the first conclusion is warranted
only the second conclusion is warranted
These point estimates suggest that in the control class
males and females perform equally
females outperform males
males outperform females
we can only assess relative performance in the new technology group
These point estimates measure the impact of the new technology on male and female
scores, respectively, to be
a) 5 and zero b) 4 and 10 c) –1 and 10 d) 9 and 10
The MLE is popular because it
maximizes Rsquare
minimizes the sum of squared errors
has desirable sampling distribution properties
maximizes both the likelihood and loglikelihood functions
To find the MLE we maximize the
likelihood

log likelihood
probability of having obtained our sample
all of these
In a logit regression, to report the influence of an explanatory variable x on the
probability of observing a one for the dependent variable we report
the slope coefficient estimate for x
the average of the slope coefficient estimates for x of all the observations in the
sample
the slope coefficient estimate for x for the average observation in the sample
none of these
The logit functional form
is linear in the logarithms of the variables
has either zero or one on the left-hand side
forces the left-hand variable to lie between zero and one
none of these
The logit model is employed when
all the regressors are dummy variables
the dependent variable is a dummy variable
we need a flexible functional form
none of these
In the logit model the predicted value of the dependent variable is interpreted as
the probability that the dependent variable is one
the probability that the dependent variable is zero
the fraction of the observations in the sample that are ones
the fraction of the observations in the sample that are zeroes.
To find the maximum likelihood estimates the computer searches over all possible
values of the
dependent variable
independent variables
coefficients
all of the above
The MLE is popular because
it maximizes R-square and so creates the best fit to the data
it is unbiased
it is easily calculated with the help of a computer
none of these
In large samples the MLE is

a) unbiased b) efficient c) normally distributed d) all of these
To predict the value of a dependent dummy variable for a new observation we should
predict it as a one if
the estimated probability of this observation’s dependent variable being a one is
greater than fifty percent
more than half of the observations are ones
the expected payoff of doing so is greater than the expected payoff of predicting it as a
zero
none of these
Which of the following is the best way to measure the prediction success of a logit
specification?
the percentage of correct predictions across all the data
the average of the percent correct predictions in each category
a weighted average of the percent correct predictions in each category, where the
weights are the fractions of the observations in each category
the sum across all the observations of the net benefits from eachobservation’s
prediction
A negative coefficient on an explanatory variable x in a logit specification meansthat an
increase in x will, ceteris paribus,
increase the probability that an observation’s dependent variable is a one
decrease the probability that an observation’s dependent variable is a one
the direction of change of the probability that an observation is a one cannot be
determined unequivocally from the sign of this slope coefficient
You have estimated a logit model and found for a new individual that the estimated
probability of her being a one (as opposed to a zero) is 40%. The benefit of correctly
classifying this person is $1,000, regardless of whether she is a one or a zero. The cost
of classifying this person as a one when she is actually a zero is $500. You should
classify this person as a one when the other misclassification cost exceeds what value?
a) $750 b) $1,000 c) $1250 d) $1500
You have estimated a logit model and found for a new individual that the estimated
probability of her being a one (as opposed to a zero) is 40%. The benefit of correctly
classifying this person is $2,000, regardless of whether she is a one or a zero. The cost
of classifying this person as a zero when she is actually a one is $1600. You should be
indifferent to classifying this person as a one or a zero when the other misclassification
cost equals what value?
a) $100 b) $200 c) $300 d) $400
You have estimated a logit model to determine the probability that an individual is
earning more than ten dollars an hour, with observations earning more than ten
dollars an hour coded as ones; your estimated logit index function is

-22 + 2*Ed – 6*Female + 4*Exp
where Ed is years of education, Female is a dummy with value one for females, and Exp is
years of experience. You have been asked to classify a new observation with 10 years of
education and 2 years of experience. You should classify her as
a) a one b) a zero c) too close to call
d) not enough information to make a classification
In the preceding question, suppose you believe that the influence of experience
depends on gender. To incorporate this into your logit estimation procedure you
should
add an interaction variable defined as the product of Ed and Female
estimate using only the female observations and again using only the male
observations
add a new explanatoryvariable coded as zero for the male observations and whatever is
the value of the experience variable for the female observations
none of the above
From estimating a logit model you have produced a slope estimate of 0.3 on the
explanatory variable x. This means that a unit increase in x will cause
an increase in the probability of being a y=1 observation of 0.3
an increase in the probability of being a y=0 observation of 0.3
an increase in the ratio of these two probabilities of 0.3
none of the above
You have obtained the following regression results using data on law students from
the class of 1980 at your university:
Income = 11 + .24GPA - .15Female + .14Married - .02Married*Female
where the variables are self-explanatory. Consider married individuals with equal GPAs.
Your results suggest that compared to female income, male income is higher by a)
0.01 b) 0.02 c) 0.15 c) 0.17
Suppose you have run the following regression:
y = + x + Urban + Immigrant + Urban*Immigrant +
where Urban is a dummy indicating that an individual lives in a city rather than in a rural
area, and Immigrant is a dummy indicating that an individual is an immigrant rather
than a native. The following three questions refer to this information.
The coefficient is interpreted as the ceteris paribus difference in y between
an urban person and a rural person
an urban native and a rural native
an urban immigrant and a rural immigrant
none of these
The coefficient is interpreted as the ceteris paribus difference in y between
an immigrant and a native

a rural immigrant and a rural native
an urban immigrant and an urban native
none of these
The coefficient is interpreted as the ceteris paribus difference in y between an urban
immigrant and
a rural native
a rural immigrant
an urban native
none of these
You have estimated a logit model to determine the success of an advertising program in
a town, with successes coded as ones; your estimated logit index function is -70 +
2*PerCap + 3*South where PerCap is the per capita income in the town (measured in
thousands of dollars), and South is a dummy with value one for towns in the south and
zero for towns in the north, the only other region. If the advertising program is a
success, you will make $5000; if it is a failure you will lose $3000. You are considering
two towns, one in the south and one in the north, both with per capita incomes of
$35,000. You should undertake the advertising program
in both towns
in neither town
in only the south town
in only the northtown
Week 7: Hypothesis Testing
The square root of an F statistic is distributed as a t statistic. This statement is
a) true b) true only under special conditions c) false
To conduct a t test we need to
divide a parameter estimate by its standard error
estimate something that is supposed to be zero and see if it is zero
estimate something that is supposed to be zero and divide it by its standard error
If a null hypothesis is true, when we impose the restrictions of this null the minimized
sum of squared errors
a) becomes smaller b) does not change c) becomes bigger
d) changes in an indeterminate fashion
If a null hypothesis is false, when we impose the restrictions of this null the
minimized sum of squared errors
a) becomes smaller b) does not change c) becomes bigger
e) changes in an indeterminate fashion
Suppose you have 25 years of quarterly data and specify that demand for your product

is a linear function of price, income, and quarter of the year, where quarter of the year
affects only the intercept. You wish to test the null that ceteris paribus demand is the
same in spring, summer, and fall, against the alternative that demand is different in all
quarters. The degrees of freedom for your F test are
a) 2 and 19 b) 2 and 94 c) 3 and 19 d) 3 and 94
In the preceding question, suppose you wish to test the hypothesis that the entire
relationship (i.e., that the two slopes and the intercept) is the same for all quarters,
versus the alternative that the relationship is completely different in all quarters. The
degrees of freedom for your F test are
a) 3 and 94 b) 6 and 88 c) 9 and 82 d) none of these
In the preceding question, suppose you are certain that the intercepts are different across
the quarters, and wish to test the hypothesis that both slopes are unchanged across the
quarters, against the alternative that the slopes are different in eachquarter.
The degrees of freedom for your F test are
a) 3 and 94 b) 6 and 88 c) 9 and 82 d) none of these
As the sample size becomes very large, the t distribution
collapses to a spike because its variance becomes very small
collapses to normally-distributed spike
approximates more and more closely a normal distribution with mean one
approximates more and more closely a standard normal distribution
Suppose we are using 35 observations to regress wage on an intercept, education,
experience, gender, and dummies for black and hispanic (the base being white). In
addition we are allowing the slope on education to be different for the three race
categories. When using a t test to test for discrimination against females, the degrees of
freedom is
a) 26 b) 27 c) 28 d) 29
After running a regression, to find the covariance between the first and second slope
coefficient estimates we
calculate the square root of the product of their variances
look at the first off-diagonal element of the correlation matrix
look at the first diagonal element of the variance-covariance matrix
none of these
Suppose you have used Eviews to regress output on capital, labor, and a time trend by
clicking on these variables in the order above, or, equivalently, using the command ls y
cap lab time c. To test for constant returns to scale using the Wald – Coefficient
Restrictions button you need to provide the software with the following information
a) cap+lab =1
b) c(1)+c(2)=1
c) c(2)+c(3) = 1

d) none of these
When testing a joint null, an F test is used instead of several separate t tests because
the t tests may not agree with each other
the F test is easier to calculate
the collective results of the t test could mislead
the t tests are impossible to calculate in this case
The rationale behind the F test is that if the null hypothesis is true, by imposing the
null hypothesis restrictions on the OLS estimation the per restriction sum of squared
errors
falls by a significant amount
rises by a significant amount
falls by an insignificant amount
rises by an insignificant amount
Suppose we are regressing wage on an intercept, education, experience, gender, and
dummies for black and hispanic (the base being white). To find the restricted SSE to
calculate an F test to test the null hypothesis that the black and hispanic coefficients are
equal we should regress wage on an intercept, education, experience, gender, and a new
variable constructed as the
sum of the black and hispanic dummies
difference between the black and hispanic dummies
product of the black and hispanic dummies
none of these
In the preceding question, if the null hypothesis is true then, compared to the
unrestricted SSE, the restricted SSE should be
a) smaller b) the same c) larger d) unpredictable
In question 14, if we regress wage on an intercept, education, experience, gender, and a
dummy for white, compared to the restricted SSE in that question, the resulting sum of
squared errors should be
a) smaller b) the same c) larger d) unpredictable
Suppose you have specified the demand for beer (measured in liters)as
LnBeer = 0 + 1lnBeerprice + 2lnOthergoodsprice + 3lnIncome +
where the notation should be obvious. Economists will tell you that in theory this
relationship should be homogeneous of degree zero, meaning that if income and prices all
increase by the same percent, demand should not change. Testing homogeneity of degree
zero means testing the null that
a) 1 = 2 = 3 = b) 1 + 2 + 3 = 0 c) 1 + 2 + 3 = 1 d) none of these
Suppose you have run a logit regression in which defaulting on a credit card payment is

related to people’s income, gender, education, and age, with the coefficients on income
and age, but not education, allowed to be different for males versus females. The next 4
questions relate to this information.
The degrees of freedom for the LR test of the null hypothesis that gender does not
matter is
a) 1 b) 2 c) 3 d) 4
To calculate the LR test statistic for this null we need to compute twice the difference
between the
restricted and unrestricted maximized likelihoods
restricted and unrestricted maximized loglikelihoods
unrestricted and restricted maximized likelihoods
unrestricted and restricted maximized loglikelihoods
Suppose the null that the slopes on income and age are the same for males and
females is true. Then compared to the unrestricted maximized likelihood, the
restricted maximized likelihood should be
a) smaller b) the same c) bigger d) unpredictable
The coefficient on income can be interpreted as ceteris paribus the change in the
resulting from a unit increase in income.
probability of defaulting
odds ratio of defaulting versus not defaulting
log odds ratio of defaulting versus not defaulting
none of these
Week 9: Specification
Specification refers to choice of
test statistic
estimating procedure
functional form and explanatory variables
none of these
Omitting a relevant explanatory variable when running a regression
never creates bias
sometimes creates bias
always creates bias
Omitting a relevant explanatory variable when running a regression usually
increases the variance of coefficient estimates
decreases the variance of coefficient estimates
does not affect the variance of coefficient estimates

Suppose that y = + x + w + but that you have ignored w and regressed y on
only x. If x and w are negatively correlated in your data, the OLS estimate of will be
biased downward if
is positive
is negative
is positive
is negative
Suppose that y = + x + w + but that you have ignored w and regressed yon
only x. The OLS estimate of will be unbiased if x and w are
collinear
orthogonal
positively correlated
negatively correlated
Omitting an explanatory variable from a regression in which you know it belongs
could be a legitimate decision if doing so
increases R-square
decreases the SSE
decreases MSE
decreases variance
In general, omitting a relevant explanatory variable creates
bias and increases variance
bias and decreases variance
no bias and increases variance
no bias and decreases variance
Suppose you know for sure that a variable does not belong in a regression as an
explanatory variable. If someone includes this variable in their regression, in general
this will create
bias and increase variance
bias and decrease variance
no bias and increase variance
no bias and decrease variance
Adding an irrelevant explanatory variable which is orthogonal to the other
explanatory variables causes
bias and no change in variance
bias and an increase in variance
no bias and no change in variance
no bias and an increase in variance

A good thing about data mining is that it
avoids bias
decreases MSE
increases R-square
may uncover an empirical regularity which causes you to improve your specification
A bad thing about data mining is that it is likely to
create bias
capitalize on chance
both of the above
none of the above
The bad effects of data mining can be minimized by
keeping variables in your specification that common sense tell you definitelybelong
setting aside some data to be used to check the specification
performing a sensitivity analysis
all of the above
A sensitivity analysis is conducted by varying the specification to see what happens to
Bias
MSE
R-square
the coefficient estimates
The RESET test is used mainly to check for
collinearity
orthogonality
functional form
capitalization on chance
To perform the RESET test we rerun the regression adding as regressors the squares
and cubes of the
dependent variable
suspect explanatory variable
forecasts of the dependent variable
none of these
The RESET test is
a) a z test b) a t test c) a chi-square test d) an F test
Regressing y on x using a distributed lag model specifies that y is determined by
the lagged value of y
the lagged value of x
several lagged values of x

several lagged values of x, with the coefficients on the lagged x’s decreasing as the
lag becomes longer
Selecting the lag length in a distributed lag model is usually done by
minimizing the MSE
maximizing R-square
maximizing the t values
minimizing an information criterion
A major problem with distributed lag models is that
R-square is low
coefficient estimates are biased
variances of coefficient estimates are large
the lag length is impossible to determine
The rationale behind the Koyck distributed lag is that it
eliminates bias
increases the fit of the equation
exploits an information criterion
incorporates more information into estimation
In the Koyck distributed lag model, as the lag lengthens the coefficients on the lagged
explanatory variable
a) increase and then decrease b) decrease forever
c) decrease for awhile and then become zero d) none of these
Using the lagged value of the dependent variable as an explanatory variable is often
done to
avoid bias
reduce MSE
improve the fit of a specification
facilitate estimation of some complicated models
Week 10: Multicollinearity; Applied Econometrics
Multicollinearity occurs whenever
the dependent variable is highly correlated with the independent variables
the independent variables are highly orthogonal
there is a close linear relationship among the independent variables
there is a close nonlinear relationship among the independent variables
High collinearity is not a problem if
no bias is created

R-square is high
the variance of the error term is small
none of these
The multicollinearity problem is very similar to the problems caused by
nonlinearities
omitted explanatory variables
a small sample size
orthogonality
Multicollinearity causes
low R-squares
biased coefficient estimates
biased coefficient variance estimates
none of these
A symptom of multicollinearity is
estimates don’t change much when a regressor is omitted
t values on important variables are quite big
the variance-covariance matrix contains small numbers
none of these
Suppose your specification is y = x + Male + Female + Weekday + Weekend+
there is no problem with this specification because the intercept has beenomitted
there is high collinearity but not perfect collinearity
there is perfect collinearity
there is orthogonality
Suppose you regress y on x and the square of x.
Estimates will be biased with large variances
It doesn’t make sense to use the square of x as a regressor
The regression will not run because these two regressors are perfectlycorrelated
There should be no problem with this.
A friend has told you that his multiple regression has a high R2
but all the estimates of
the regression slopes are insignificantly different from zero on the basis of t tests of
significance. This has probably happened because the
intercept has been omitted
explanatory variables are highly collinear
explanatory variables are highly orthogonal
dependent variable doesn’t vary by much
Dropping a variable can be a solution to a multicollinearity problem because it

avoids bias
increases t values
eliminates the collinearity
could decrease mean square error
The main way of dealing with a multicollinearity problem is to
drop one of the offending regressors
increase the sample size
incorporate additional information
transform the regressors
A result of multicollinearity is that
t statistics are too small
the variance of the error is overestimated
variances of coefficient estimates are large
A result of multicollinearity is that
OLS is no longer the BLUE
Variances of coefficient estimates are overestimated
R-square is misleadingly small
Estimates are sensitive to small changes in the data
Suppose you are estimating y = + x + z + w + for which the CLRassumptions
hold and x, z, and w are not orthogonal to one another. You estimate incorporating the
information that . To do this you will regress
y on an intercept, 2x, and w
y on an intercept, (x+z), and w
y-x on an intercept, z, and w
none of these
In the preceding question, suppose that in fact is not equal to . Then ingeneral,
compared to regressing without this extra information, your estimate of
is unaffected
is still unbiased
has a smaller variance
nothing can be said about what will happen to this estimate
Economic theory tells us that when estimating the real demand for exports we should
use the
should use the
real; real

real; nominal
nominal; real
exchange rate and when estimating the real demand for money we interest rate. The
blanks should be filled with
nominal; nominal
You have run a regression of the change in inflation on unemployment. Economic
theory tells us that our estimate of the natural rate of unemployment is
the intercept estimate
the slope estimate
minus the intercept estimate divided by the slope estimate
minus the slope estimate divided by the intercept estimate
You have thirty observations from a major golf tournament in which the percentage of
putts made was recorded for distances ranging from one foot to thirty feet, in
increments of one foot (i.e., you have 30 observations). You propose estimating
success as a function of distance. What functional form should you use?
linear
logistic
quadratic
exponential
Starting with a comprehensive model and testing down to find the best specification
has the advantage that
complicated models are inherently better
testing down is guaranteed to find the best specification
testing should be unbiased
pretest bias is eliminated
Before estimating your chosen specification you should
data mine
check for multicollinearity
look at the data
test for zero coefficients
The interocular trauma test is
a) a t test b) an F test c) a chi-square test d) none of the above
When the sample size is quite large, a researcher needs to pay special attention to
coefficient magnitudes
t statistic magnitudes
statistical significance
type I errors

Your only measure of a key economic variable is unsatisfactory but you use it
anyway. This is an example of
knowing the context
asking the right questions
compromising
a sensitivity analysis
“Asking the right question” means
selecting the appropriate null hypothesis
looking for a lost item where you lost it instead of where the light is better
resisting the temptation to change a problem so that it has a mathematically elegant
solution
all of the above
A sensitivity analysis involves
avoiding type I errors
checking for multicollinearity
omitting variables with low t values
examining the impact of specification changes
When testing if a coefficient is zero it is traditional to use a type I error rate of 5%.
When testing if a variable should remain in a specification we should
continue to use a type I error rate of 5%
use a smaller type I error rate
use a larger type I error rate
forget about the type I error rate and instead choose a type II error rate
An example of knowing the context is knowing that
some months have five Sundays
only children from poor families are eligible for school lunch programs
many auctions require a reserve price to be exceeded before an item is sold
all of the above
A type III error occurs when
you make a type I and a type II error simultaneously
type I and type II errors are confused
the right answer is provided to the wrong question
the wrong functional form has been used
The adage that begins with “Graphs force you to notice ….” is completed with
outliers
incorrrect functional forms

what you never expected to see
the real relationships among data
In econometrics, KISS stands for
a) keeping it safely sane b) keep it simple, stupid c) keep it sensibly simple
d) keep inference sophisticatedly simple
An advantage of simple models is that they
do not place unrealistic demands on the data
are less likely to lead to serious mistakes
facilitate subjective insights d) all of the above
An example of the laugh test is that
your coefficient estimates are of unreasonable magnitude
your functional form is very unusual
your coefficient estimates are all negative
some of your t values are negative
Hunting statistical significance with a shotgun means
avoiding multicollinearity by transforming data
throwing every explanatory variable you can think of into your specification
using F tests rather than t tests
using several different type I error rates
“Capitalizing on chance” means that
by luck you have found the correct specification
you have found a specification that explains the peculiarities of your data set
you have found the best way of incorporating capital into the productionfunction
you have done the opposite of data mining
The adage that begins with “All models are wrong, ….” is completed with
especially those with low R-squares
but some are useful
so it is impossible to find a correct specification
but that should not concern us
Those claiming that statistical significance is being misused are referring to the
problem that
there may be a type I error
there may be a type II error
the coefficient magnitude may not be of consequence
there may be too much multicollinearity
Those worried that researchers are “using statistical significance to sanctify a result”

suggest that statistical analysis be supplemented by
looking for corroborating evidence
looking for disconfirming evidence
assessing the magnitude of coefficients
all of the above
To deal with results tainted by subjective specification decisions undertaken during
the heat of econometric battle it is suggested that researchers
eliminate multicollinearity
report a senstitivity analysis
use F tests instead of t tests
use larger type I error rates
You have regressed yt on xt and xt-1, obtaining a positive coefficient estimate on xt, as
expected, but a negative coefficient estimate on lagged x. This
indicates that something is wrong with the regression
implies that the short-run effect of x is smaller than its long-run effect
implies that the short-run effect of x is larger than its long-run effect
is due to high collinearity
Outliers should
be deleted from the data
be set equal to the sample average
prompt an investigation into their legitimacy
be neutralized somehow
Influential observations
can be responsible for a wrong sign
is another name for outliers
require use of an unusual specification
all of the above
Suppose you are estimating the returns to education and so regress wage on years of
education and some other explanatory variables. One problem with this is that people
with higher general ability levels, for which you have no measure, tend to opt for more
years of education, creating bias in your estimation. This bias is referred to as
multicollinearity bias
pretest bias
self-selection bias
omitted variable bias
A wrong sign could result from
a theoretical oversight

an interpretation error
a data problem
all of the above
Week 11: Autocorrelated errors; heteroskedasticity
If errors are nonspherical it means that they are
autocorrelated
heteroskedastic
autocorrelated or heteroskedastic
autocorrelated or heteroskedastic, or both
The most important consequence of nonspherical errors is that
inference is biased
OLS is no longer BLUE
none of these
Upon discovering via a test that you have nonspherical errors you should
use generalized least squares
find the appropriate transformation of the variables
double-check your specification
use an autocorrelation- or heteroskedasticity-consistent variance covariance matrix
estimate
GLS can be performed by running OLS on variables transformed so that the error
term in the transformed relationship is
homoskedastic
spherical
serially uncorrelated
eliminated
Second-order autocorrelated errors means that the current error t is a linear function of
a) t-1 b) t-1 squared c) t-2 d) t-1 and t-2
Suppose you have an autocorrelated error with rho equal to 0.4. You should transform
each variable xt to become
a) .4xt b) .6xt c) xt - .4xt-1 d) .6xt - .4xt-1
Pushing the autocorrelation- or heteroskedasticity-consistent variance-covariance
matrix button in econometrics software when running OLS causes
the GLS estimation procedure to be used
the usual OLS coefficient estimates to be produced, but with corrected estimated
variances of these coefficient estimates

new OLS coefficient estimates to be produced, along with corrected estimated
variances of these coefficient estimates
the observations automatically to be weighted to remove the bias in the coefficient
estimates
A “too-big” t statistic could come about because of
a very large sample size
multicollinearity
upward bias in our variance estimates
downward bias in our variance estimates
A “too-big” t statistic could come about because of
a) Multicollinearity b) a small sample size c) orthogonality d) none of these
The DW test is
called the Durbin-Watson test
should be close to 2.0 when the null is true
defective whenever the lagged value of the dependent variable appears as a regressor
all of the above
The Breusch-Godfrey test is
used to test the null of no autocorrelation
is valid even when the lagged value of the dependent variable appears as a regressor
is a chi-square test
all of the above
To use the Breusch-Godfrey statistic to test the null of no autocorrelation against the
alternative of second order autocorrelated errors, we need to regress the OLS residuals
on and use degrees of freedom for our test statistic. The
blanks are best filled with
two lags of the OLS residuals; 2
the original explanatory variables and one lag of the OLS residuals; 1
the original explanatory variables and two lags of the OLS residuals; 2
the original explanatory variables, their lags, and one lag of the OLS residuals;1
With heteroskedasticity we should use weighted least squares where
by doing so we maximize R-square
use bigger weights on those observations with error terms that have biggervariances
we use bigger weights on those observations with error terms that have smaller
variances
the weights are bigger whenever the coefficient estimates are more reliable
Suppose you are estimating y = + x + z + but that the variance of is
proportional to the square of x. Then to find the GLS estimate we should regress

y on an intercept, 1/x, and z/x
y/x on 1/x and z/x
y/x on an intercept, 1/x, and z/x
not possible because we don’t know the factor of proportionality
none of these
Pushing the heteroskedasticity-consistent variance-covariance matrix button in
econometric software
removes the coefficient estimate bias from using OLS
does not change the OLS coefficient estimates
increases the t values
none of these
Suppose your dependent variable is aggregate household demand for electricity for
various cities. To correct for heteroskedasticity you should
multiply observations by the city size
divide observations by the city size
multiply observations by the square root of the city size
divide observations by the square root of the city size
none of these
Suppose your dependent variable is crime rates for various cities. To correct for
heteroskedasticity you should
multiply observations by the city size
divide observations by the city size
multiply observations by the square root of the city size
divide observations by the square root of the city size
none of these
When using the eyeball test for heteroskedasticity, under the null we would expect the
relationship between the squared residuals and the explanatory variable to be such that
as the explanatory variable gets bigger the squared residual gets bigger
as the explanatory variable gets bigger the squared residual gets smaller
when the explanatory variable is quite small or quite large the squared residual will be
large relative to its value otherwise
there is no evident relationship
Suppose you are estimating the relationship y = + x + z + but you suspect that
the 50 male observations have a different error variance than the 40 female
observations. The degrees of freedom for the Goldfeld-Quandt test are
a) 50 and 40 b) 49 and 39 c) 48 and 38 d) 47 and 37
In the previous question, suppose you had chosen to use the studentized BP test. The
degrees of freedom would then have been

a) 1 b) 2 c) 3 d) 4
In the previous question, to conduct the studentized BP test you would have regressed
the squared residuals on an intercept and
a) x b) z c) x and z d) a dummy for gender
Suppose you are estimating demand for electricity using aggregated data on household
income and on electricity demand across 30 cities of differing sizes Ni. Your
specification is that household demand is a linear function of household income and city
price. To estimate using GLS you should regress
per capita demand on an intercept, price and per capita income
aggregate demand on an intercept, price, and aggregate income
per capita demand on the inverse of Ni, price divided by Ni, and per capita income
none of these
Suppose you are estimating student performance on an economics exam, regressing
exam score on an intercept, GPA, and a dummy MALE. The CLR model assumptions
apply except that you have determined that the error variance for the male observations
is eight but for females it is only two. To estimate using GLS you should transform by
dividing the male observations by 8 and the female observations by 2
multiplying the male observations by 2 and the female observations by 8
dividing the male observations by 2
multiplying the female observations by 8
Suppose the CLR model applies except that the errors are nonspherical of known
form so that you can calculate the GLS estimator. Then
the R-square calculated using the GLS estimates is smaller than the OLS R-square
the R-square calculated using the GLS estimates is equal to the OLS R-square
the R-square calculated using the GLS estimates is larger than the OLS R-square
nothing can be said about the relative magnitudes of R-square
Consider a case in which there is a nonspherical error of known form so that you can
calculate the GLS estimator. You have conducted a Monte Carlo study to investigate the
difference between OLS and GLS, using the computer to generate 2000 samples with
nonspherical errors, from which you calculate the following.
2000 OLS estimates and their average betaolsbar
2000 estimated variances of these OLS estimates and their average betaolsvarbar
the estimated variance of the 2000 OLS estimates, varbetaols.
2000 corresponding GLS estimates and their average betaglsbar
2000 estimated variances of these GLS estimates and their average betaglsvarbar
the estimated variance of the 2000 GLS estimates,varbetagls
The following six questions refer to this information.
You should find that betaolsbar and betaglsbar are
approximately equal, and varbetaols and varbetagls are also approximately equal

not approximately equal, and varbetaols and varbetagls are also not approximately
equal
approximately equal, but varbetaols and varbetagls are not approximately equal
not approximately equal, but varbetaols and varbetagls are approximately equal
You should expect that
betaolsbar and betaglsbar are approximately equal
betaolsbar is bigger than betaglsbar
betaolsbar is smaller than betaglsbar
not possible to determine relative size here
Varbetaols and Varbetagls are approximately equal
Varbetaols is bigger than Varbetagls
Varbetaols is smaller than Varbetagls
You should expect that varbetaols and betaolsvarbar are
approximately equal and varbetagls and betaglsvarbar are also approximatelyequal
not approximately equal but varbetagls and betaglsvarbar are approximately equal
approximately equal but varbetagls and betaglsvarbar are not approximately equal
not approximately equal and varbetagls and betaglsvarbar are also not approximately
equal
varbetaols and betaolsvarbar are approximately equal
varbetaols is bigger than betaolsvarbar
varbetaols is smaller than betaolsvarbar
varbetagls and betaglsvarbar are approximately equal
varbetagls is bigger than betaglsvarbar
varbetagls is smaller than betaglsvarbar
Suppose the CLR model holds but the presence of nonspherical errors causes the
variance estimates of the OLS estimator to be an underestimate. Because of this, when
testing for the significance of a slope coefficient using for our large sample the critical t
value 1.96, the type I error rate
is higher than 5%
is lower than 5%
remains fixed at 5%
not possible to tell what happens to the type I error rate

Suppose you want to undertake a Monte Carlo study to examine the impact of
heteroskedastic errors of the form V( ) = 4 + 9x2
where x is one of the explanatory
variables in your specification. After getting the computer to draw errors from a
standard normal, to create the desired heteroskedasticity you need to multiply the ith
error by
a) 3xi b) 2 + 3xi c) 4 + 9xi
2
d) none of these
Suppose the CLR model assumptions apply to y = + x + z + except that the
variance of the error is proportional to x squared. To produce the GLS estimator you
should regress y/x on
an intercept, 1/x, and z/x
an intercept and z/x
1/x and z/x
not possible to produce GLS because the factor of proportionality is not known
Suppose the CLR model assumptions apply to y = + x + z + . You mistakenly
think that the variance of the error is proportional to x squared and so transform the
data appropriately and run OLS. If x and z are positively correlated in the data, then
your estimate of is
biased upward
biased downward
unbiased
not possible to determine the nature of the bias here
Suppose income is the dependent variable in a regression and contains errors of
measurement (i) caused by people rounding their income to the nearest $100, or (ii)
caused by people not knowing their exact income but always guessing within 5% of
the true value. In case (i) there is
heteroskedasticity and the same for case (ii)
heteroskedasticity but not for case (ii)
no heteroskedasticity but heteroskedasiticity for case (ii)
no heteroskedasticity and the same for case (ii)
Suppose you have regressed score on an economics exam on GPA for 50 individuals,
ordered from smallest to largest GPA. The DW statistic is 1.5; you should conclude that
the errors are autocorrelated
there is heteroskedasticity
there is multicollinearity
there is a functional form misspecification
A regression using the specification y = + x + z + produced SSE =14 using
annual data for 1961-1970, and SSE = 45 using data for 1971-1988. The Goldfeld-

Quant test statistic for a change in error variance beginning in 1971 is
a) 3.2 b) 1.8 c) 1.5 d) none of these
Week 12: Bayesian Statistics
The main difference between Bayesian and classical statisticians is
their choice of prior
their definitions of probability
their views of the type I error rate
the formulas for probability used in calculations
Suppose a classical statistician estimates via OLS an unknown parameter beta and
because the CLR model assumptions hold declares the resulting estimate’s sampling
distribution to be such that it is unbiased and has minimum variance among alllinear
unbiased estimators. For the Bayesian the sampling distribution
is also unbiased
is biased because of the prior
has a smaller variance
does not exist
Suppose the CNLR model applies and with a very large sample size the classical
statistician produces an estimate betahat = 6, with variance 4. With the same data, using an
ignorance prior, a Bayesian produces a normal posterior distribution with mean 6 and
variance 4. The next ten questions refer to this information.
The sampling distribution of betahat
has mean 6
has mean beta
is graphed with beta on the horizontal axis
has the same interpretation as the posterior distribution
The posterior distribution of beta
has mean 6
has mean beta
is graphed with betahat on the horizontal axis
has the same interpretation as the sampling distribution
In this example the Bayesian estimate of beta would be the same as the classical
estimate if the loss function were
a) all-or-nothing b) absolute c) quadratic d) all of the above
If the Bayesian had used an informative prior instead of an ignorance prior the
posterior would have had
the same mean but a smaller variance
the same mean but a larger variance

a different mean and a smaller variance
a different mean and a larger variance
For the Bayesian, the probability that beta is greater than 7 is
a) 40% b) 31% c) 16% d) not a meaningful question
For the classical statistician, the probability that beta is greater than 7 is
a) 40% b) 31% c) 16% d) not a meaningful question
Suppose we want to test the null hypothesis that beta is equal to 4, against the
alternative that beta is greater than 4. The classical statistician’s p value is
approximately
a) .16 b) .31 c) .32 d) none of these
Suppose we want to test the null hypothesis that beta is less than or equal to 4, against
the alternative that beta is greater than 4. The Bayesian statistician’s probability that the
null is true is approximately
a) .16 b) .69 c) .84 d) none of these
The Bayesian would interpret the interval from 2.7 to 9.3 as
an interval which if calculated in repeated samples would cover the true value of beta
90% of the time
a range containing the true value of beta with 90% probability
an interval that the Bayesian would bet contains the true value of beta
Consider the interval from 2.7 to 9.3. For the Bayesian the probability that the true
value of beta is not in this interval is
approximately equal to the probability that beta is less than 3.4
a lot greater than the probability that beta is less than 3.4
a lot less than the probability that beta is less than 3.4
not a meaningful question
Bayes theorem says that the posterior is
equal to the likelihood
proportional to the likelihood
equal to the prior times the likelihood
proportional to the prior times the likelihood
The subjective element in a Bayesian analysis comes about through use of
an ignorance prior
an informative prior
the likelihood
the posterior

The Bayesian loss function tells us
the loss incurred by using a particular point estimate
the expected loss incurred by using a particular point estimate
the loss associated with a posterior distribution
the expected loss associated with a posterior distribution
The usual “Bayesian point estimate” is the mean of the posterior distribution. This
assumes
a quadratic loss function
an absolute loss function
an all-or-nothing loss function
no particular loss function
The Bayesian point estimate is chosen by
minimizing the loss
minimizing expected loss
finding the mean of the posterior distribution
all of the above
From the Bayesian perspective a sensitivity analysis checks to see by how much the
results change when a different
loss function is used
prior is used
posterior is used
data set is used
The main output from a Bayesian analysis is
the likelihood
the prior distribution
the posterior distribution
a point estimate
When hypothesis testing in a Bayesian framework the type I error
is fixed
is irrelevant
is set equal to the type II error
none of the above
The Bayesian accepts/rejects a null hypothesis based on
minimizing the type I error
minimizing the type II error
maximizing the benefit from this decision
maximizing the expected benefit from this decision

Suppose you are a Bayesian and your posterior distribution for next month’s
unemployment rate is a normal distribution with mean 8.0 and variance 0.25. If this
month’s unemployment rate is 8.1 percent, what would you say is the probability that
unemployment will increase from this month to next month?
a) 50% b) 42% c) 5% d) 2.3%
If a Bayesian has a quadratic loss function, his/her preferred point estimate is
the mean of the posterior distribution
the median of the posterior distribution
the mode of the posterior distribution
cannot be determined unless the specific quadratic loss function is known
Suppose the net cost to a firm of undertaking a venture is $1800 if beta is less than or
equal to one and its net profit is $Q if beta is greater than one. Your posterior
distribution for beta is normal with mean 2.28 and variance unity. Any value of Q
bigger than what number entices you to undertake this venture?
a) 100 b) 200 c) 300 d) 450
A Bayesian has a client with a loss function equal to the absolute value of the difference
between the true value of beta and the point estimate of beta. The posterior distribution
is f(beta) = 2*beta for beta between zero and one, with f(beta) zero elsewhere. (This
distribution has mean two-thirds and variance one-eighteenth.) Approximately what
point estimate should be given to this client?
a) 0.50 b) 0.66 c) 0.71 d) 0.75
A Bayesian has a client with a quadratic loss function. The posterior distribution is
beta = 1, 2, and 3 with probabilities 0.1, 0.3 and 0.6, respectively. What point
estimate should be given to this client?
a) 1 b) 2 c) 3 d) none of these
A Bayesian has a client with an all-or-nothing loss function. The posterior distribution is
beta = 1, 2, and 3 with probabilities 0.1, 0.3 and 0.6, respectively. What point estimate
should be given to this client?
a) 1 b) 2 c) 3 d) none of these
Answers
Week 1: Statistical Foundations I 1c, 2c, 3c, 4a, 5b, 6d, 7a, 8c, 9c, 10c, 11a, 12c, 13d,
14c, 15c, 16a, 17a, 18d, 19c, 20a, 21a, 22a, 23a, 24c, 25a, 26d, 27a, 28b, 29c, 30a, 31b,
32c, 33a, 34c, 35b, 36d, 37b, 38a, 39b, 40b, 41d, 42b, 43a, 44b, 45a, 46b, 47c, 48c, 49b,
50d, 51c, 52a
Week 2: Statistical Foundations II 1b, 2c, 3c, 4b, 5c, 6a, 7d, 8b, 9a, 10a, 11d, 12d, 13d,
14a, 15b, 16a, 17a, 18b, 19c, 20d, 21b, 22d, 23b, 24c, 25d, 26b, 27d, 28d, 29d, 30c, 31c,

32a, 33c, 34b, 35c, 36d, 37c, 38b, 39a
Week 3: What is Regression Analysis? 1a, 2c, 3b, 4d, 5c, 6d, 7a, 8d, 9a, 10d, 11b, 12b,
13b, 14d, 15b, 16c, 17c, 18b, 19a, 20a, 21d, 22d, 23d, 24c, 25d, 26b, 27a, 28d, 29c, 30b,
31a, 32b, 33c, 34d, 35b, 36b, 37a, 38c, 39c
Week 4: The CLR Model 1d, 2c, 3b, 4c, 5d, 6c, 7d, 8b, 9b, 10c, 11d, 12c, 13c, 14d, 15d,
16c, 17c, 18a, 19a, 20a, 21c, 22b, 23d, 24a, 25c
Week 5: Sampling Distributions 1d, 2d, 3c, 4d, 5d, 6c, 7d, 8c, 9b, 10c, 11a, 12d, 13c,
14d, 15d, 16d, 17d, 18d, 19d, 20c, 21c, 22a, 23a, 24b, 25b, 26c, 27b, 28c, 29b, 30b, 31b,
32c, 33d, 34b, 35d
Week 6: Dummy Variables 1d, 2c, 3b, 4a, 5a, 6a, 7c, 8b, 9b, 10b, 11d, 12d, 13c, 14c,
15c, 16d, 17c, 18c, 19a, 20c, 21b, 22c, 23d, 24d, 25c, 26b, 27a, 28c, 29d, 30d, 31c, 32d,
33b, 34c, 35d, 36d, 37c, 38d, 39d, 40b, 41b, 42d, 43a
Week 7: Hypothesis Testing 1b, 2c, 3c, 4c, 5b, 6d, 7b, 8d, 9b, 10d, 11b, 12c, 13d, 14a,
15c, 16b, 17b, 18c, 19d, 20a, 21c
Week 9: Specification 1c, 2b, 3b, 4c, 5b, 6c, 7b, 8c, 9c, 10d, 11b, 12d, 13d, 14c, 15d, 16d,
17c, 18d, 19c, 20d, 21b, 22d
Week 10: Multicollinearity; Applied Econometrics 1c, 2d, 3c, 4d, 5d, 6c, 7d, 8b, 9d,
10c, 11d, 12d, 13b, 14c, 15b, 16c, 17b, 18c, 19c, 20d, 21a, 22c, 23d, 24d, 25c, 26d, 27c,
28c, 29c, 30d, 31a, 32b, 33b, 34b, 35c, 36d, 37b, 38c, 39c, 40a, 41c, 42d
Week 11: Nonspherical Errors 1d, 2b, 3c, 4b, 5d, 6c, 7b, 8d, 9d, 10d, 11d, 12c, 13c, 14c,
15b, 16d, 17c, 18d, 19d, 20a, 21d, 22d, 23c, 24a, 25c, 26a, 27b, 28b, 29d, 30a, 31a,
32d, 33a, 34c, 35c, 36d, 37c
Week 12: Bayesian Statistics 1b, 2d, 3b, 4a, 5d, 6c, 7b, 8d, 9a, 10a, 11b, 12a, 13d, 14b,
15a, 16a, 17b, 18b, 19c, 20d, 21d, 22b, 23a, 24b, 25c, 26d, 27c

Econometrics and statistics mcqs part 2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Econometrics and statistics mcqs part 2

Similar to Econometrics and statistics mcqs part 2 (20)

More from punjab university

More from punjab university (20)

Recently uploaded

Recently uploaded (20)

Econometrics and statistics mcqs part 2