&DPO $SPTT 4FDUJPO.VMUJQMF 3FHSFTTJPO1SPG +BTPO .docx

&DPO �� $SPTT 4FDUJPO�
.VMUJQMF 3FHSFTTJPO
1SPG� +BTPO #MFWJOT
%FQBSUNFOU PG &DPOPNJDT
5IF 0IJP 4UBUF 6OJWFSTJUZ
4JNQMF 3FHSFTTJPO /PUBUJPO
Ŕ 1SFWJPVTMZ
XF IBWF CFFO VTJOH UIF TJOHMF�WBSJBCMF NPEFM
:J = β� + β�9J + VJ.
Ŕ 5IF J TVCTDSJQU SFQSFTFOUT B TJOHMF
PCTFSWBUJPO� 'PS B
EBUBTFU XJUI O PCTFSWBUJPOT XF IBWF O
FRVBUJPOT�
:� = β� + β�9� + V�,
:� = β� + β�9� + V�,
��
:O = β� + β�9O + VO.
Ŕ 5IJT NPEFM POMZ BMMPXT GPS POF JOEFQFOEFOU
WBSJBCMF 9J�
Ŕ 5P BMMPX GPS NVMUJQMF JOEFQFOEFOU

WBSJBCMFT JO PVS
SFHSFTTJPO
XF IBWF UP HFOFSBMJ[F UIF OPUBUJPO�
.VMUJQMF 3FHSFTTJPO /PUBUJPO
Ŕ &YUFOEJOH UP , SFHSFTTPST
XF BEE B L TVCTDSJQU�
Ŕ 8F XSJUF UIF NPEFM GPS , SFHSFTTPST BOE O
PCTFSWBUJPOT BT
:J = β� + β�9J� + β�9J� + · · · + β,9J,! "# $
, SFHSFTTPST
+VJ
Ŕ 8SJUJOH PVU UIF FOUJSF TZTUFN PG O FRVBUJPOT�
:� = β� + β�9�� + β�9�� + · · · + β�9�, + V�,
:� = β� + β�9�� + β�9�� + · · · + β�9�, + V�,
��
:O = β� + β�9O� + β�9O� + · · · + β�9O, + VO.
*OUFSQSFUJOH .VMUJQMF 3FHSFTTJPO $PFťDJFOUT
Ŕ $PFťDJFOUT JO NVMUJQMF MJOFBS SFHSFTTJPO BSF
DBMMFE UIF
NVMUJQMF SFHSFTTJPO DPFťDJFOUT PS QBSUJBM
SFHSFTTJPO
DPFťDJFOUT�

Ŕ 1SFWJPVTMZ
β� JO PVS TJOHMF�WBSJBCMF MJOFBS SFHSFTTJPO
NPEFM SFQSFTFOUFE UIF DIBOHF JO UIF EFQFOEFOU
WBSJBCMF
BTTPDJBUFE XJUI B POF�VOJU JODSFBTF JO UIF
EFQFOEFOU
WBSJBCMF�
Ŕ )PMEJOH PUIFS WBSJBCMFT DPOTUBOU� /PX
GPS TPNF WBSJBCMF
9L
βL SFQSFTFOUT UIF JODSFBTF JO 9L BTTPDJBUFE XJUI
B
POF�VOJU JODSFBTF JO UIF L�UI EFQFOEFOU
WBSJBCMF
IPMEJOH
UIF PUIFS JOEFQFOEFOU WBSJBCMFT DPOTUBOU�
&YBNQMF� 'JOBODJBM "JE
&YBNQMF
4UVEFONVOE ƈƆƇƆ
Q� Ɗƈ
Ŕ '*/"*%J = BNPVOU PG ţOBODJBM BJE BXBSEFE
EPMMBST QFS
ZFBS
Ŕ 1"3&/5J = FYQFDUFE GBNJMZ DPOUSJCVUJPO
EPMMBST QFS ZFBS

Ŕ )43"/,J = (1" SBOL JO IJHI TDIPPM QFSDFOUBHF
Ɔ�ƇƆƆ
XJUI ƇƆƆ CFJOH UIF IJHIFTU
'*/"*%J = β� + β�1"3&/5J + β�)43"/,J + VJ
&YBNQMF� 'JOBODJBM "JE DPOU�E
&YBNQMF
!'*/"*%J = �� − �.�� 1"3&/5J + ��.� )43"/,J
Ŕ )FSF β̂� = −�.�� NFBOT UIBU IPMEJOH IJHI TDIPPM
SBOL
ţYFE
TUVEFOUT XIPTF QBSFOUT DBO DPOUSJCVUF BO
BEEJUJPOBM
ƚƇ XJMM SFDFJWF PO BWFSBHF ƚƆ�Ɖƌ MFTT JO
ţOBODJBM BJE�
Ŕ 'PS BO FYUSB ƚƇ
ƆƆƆ JO FYQFDUFE QBSFOUBM DPOUSJCVUJPOT
UIBUōT ƚƉƌƆ MFTT JO ţOBODJBM BJE�
Ŕ �ƚƉƌƆ � ƚƇ
ƆƆƆ × �Ɔ�Ɖƌ�
Ŕ #FDBVTF PG MJOFBSJUZ
UIJT JT UIF FTUJNBUFE FŢFDU GPS CPUI
IJHI BOE MPX SBOLFE TUVEFOUT�

(FPNFUSZ PG .VMUJQMF 3FHSFTTJPO
-0.36
HSRANK
HSRANK
FI
N
A
ID
FI
N
A
ID
FI
N
A
ID
PARENT
PA
RE
NT
87.4
'JHVSF Ƈ� (FPNFUSJD JOUFSQSFUBUJPO PG NVMUJQMF
SFHSFTTJPO

5IF $PFťDJFOU PG %FUFSNJOBUJPO
Ŕ 5IF NPTU DPNNPO NFBTVSF PG UIF PWFSBMM ţU PG B
SFHSFTTJPO JT UIF DPFťDJFOU PG EFUFSNJOBUJPO
EFOPUFE 3��
Ŕ 5IJT NFBTVSF TVNNBSJ[FT UIF ţU PG B TJOHMF
SFHSFTTJPO
JOEFQFOEFOUMZ�
Ŕ *U JT BMTP VTFGVM UP DPNQBSF UIF ţU PG B
DPMMFDUJPO PG
SFHSFTTJPOT XJUI EJŢFSFOU DPNCJOBUJPOT PG
JODMVEFE
JOEFQFOEFOU WBSJBCMFT�
%FDPNQPTJUJPO PG 7BSJBODF
Ŕ 5P EFţOF 3�
XF VTF UIF EFDPNQPTJUJPO PG WBSJBODF�
Ŕ 445 JT UIF UPUBM WBSJBUJPO JO :J
UIF UPUBM TVN PG TRVBSFT�
445 ≡
O%
J=�
(:J − :̄)�.
Ŕ 445 DBO CF GBDUPSFE JOUP UXP DPNQPOFOUT BT

445 = 44& + 443�
Ƈ� 44& DBQUVSFT EFWJBUJPOT JO :J GSPN UIF NFBO :̄
44& ≡
O%
J=�
(:̂J − :̄)�
ƈ� 443 DBQUVSJOH UIF ŏVOFYQMBJOFEŐ PS SFTJEVBM
EFWJBUJPOT
443 ≡
O%
J=�
(:J − :̂J)�
%FDPNQPTJUJPO PG 7BSJBODF
.BUIFNBUJDBMMZ�
445 =
O%
J=�
(:J − :̄)�
=
O%

J=�
(:J − :̂J)� +
O%
J=�
(:̂J − :̄)�
=
O%
J=�
V
̂ �J +
O%
J=�
(:̂J − :̄)�
= 443 + 44&.
*O PUIFS XPSET NPSF JO MJOF XJUI PVS SFHSFTTJPO
GPSN
�
445 = 44& + 443.
6OBEKVTUFE
3�
Ŕ 5IF DPFťDJFOU PG EFUFSNJOBUJPO JT EFţOFE BT

3� =
44&
445
= � −
443
445
.
Ŕ *U JT UIF GSBDUJPO PG UPUBM WBSJBUJPO JO :J
J�F�
445
UIBU JT
FYQMBJOFE CZ UIF JODMVEFE SFHSFTTPST J�F
44&
�
Ŕ .FBTVSFT TVDI BT 3� BSF DBMMFE HPPEOFTT PG ţU
NFBTVSFT�
Ŕ " IJHIFS WBMVF PG 3� JOEJDBUFT UIBU UIF
SFHSFTTJPO ţUT UIF
EBUB CFUUFS TJODF 9 DBO FYQMBJO NPSF PG UIF
WBSJBUJPO JO :�
" 1SPCMFN XJUI 3�
Ŕ 8IFO BEEJOH B WBSJBCMF
3� BMXBZT JODSFBTFT�
Ŕ 5IJT IBQQFOT DBO IBQQFO FWFO JG UIF WBSJBCMF JT
VOSFMBUFE�

Ŕ 5P TFF UIJT
SFDBMM UIBU
3� =
44&
445
.
Ŕ 445 JT B GVODUJPO PG :J POMZ
TP JU JT VODIBOHFE�
Ŕ #VU BEEJOH BO BEEJUJPOBM 9 WBSJBCMF UP UIF
SFHSFTTJPO
XFBLMZ
JODSFBTF 44& BOE IFODF 3��
Ŕ 5IFSFGPSF
MPPLJOH BU 3� BMPOF DBO FODPVSBHF UIF BEEJUJPO
PG UPP NBOZ FYQMBOBUPSZ WBSJBCMFT�
"EKVTUFE 3�
Ŕ 5IF BEKVTUFE 3� JT EFţOFE BT
3̄� = � −
443/(O − , − �)
445/(O − �)
.
Ŕ 5IFSF JT B ŏQFOBMUZŐ JO UIF TFOTF UIBU
JODSFBTJOH UIF
OVNCFS PG SFHSFTTPST

,
EFDSFBTFT 3̄� VOMFTT UIF 443
EFDSFBTFT 44& JODSFBTFT
FOPVHI UP DPNQFOTBUF�
Ŕ *OUVJUJWFMZ
UIF BEKVTUFE 3� NFBTVSFT UIF QFSDFOUBHF PG UIF
WBSJBUJPO JO : BSPVOE :̄ UIBU JT FYQMBJOFE CZ UIF
SFHSFTTPST
XJUI BO BEKVTUNFOU GPS UIF EFHSFFT PG GSFFEPN�
3̄� .BZ *ODSFBTF PS %FDSFBTF
Ŕ "T PQQPTFE UP UIF TUBOEBSE 3�
3̄� NBZ JODSFBTF
EFDSFBTF
PS TUBZ UIF TBNF XIFO BO BEEJUJPOBM SFHSFTTPS JT
BEEFE�
Ŕ 5IF EJSFDUJPO PG UIF DIBOHF XJMM EFQFOE PO
XIFUIFS UIF
ţU JNQSPWFT FOPVHI UP KVTUJGZ UIF EFDSFBTF JO UIF
EFHSFFT
PG GSFFEPN�
Ŕ "T XJUI 3�
3̄� JT CPVOEFE BCPWF CZ Ƈ�Ɔ�
Ŕ )PXFWFS
JU NBZ BMTP CF OFHBUJWF
XIJMF UIF NJOJNVN
QPTTJCMF 3� JT Ɔ�

0SEJOBSZ -FBTU 4RVBSFT 3FHSFTTJPO JO 4UBUB
Ŕ 0SEJOBSZ MFBTU TRVBSFT� 5P SVO 0-4 JO 4UBUB
VTF UIF
ß�¨ß�ãã DPNNBOE GPMMPXFE CZ UIF EFQFOEFOU
WBSJBCMF
BOE UIFO UIF JOEFQFOEFOU WBSJBCMFT� 'PS
FYBNQMF�
ß�¨ß�ãã ā Ā� Ā��
Ŕ 'JUUFE WBMVFT� 5P HFOFSBUF B OFX WBSJBCMF
TBZ ā�ê
DPOUBJOJOH UIF ţUUFE WBMVFT
SVO Üß��°�ê ā�êϔ Ā�
GPMMPXJOH UIF ß�¨ß�ãã DPNNBOE�
Ŕ 3FTJEVBMT� 5P HFOFSBUF B OFX WBSJBCMF
TBZ ï-�ê
DPOUBJOJOH UIF SFTJEVBMT
SVO Üß��°�ê ï-�êϔ ß�ã°�
GPMMPXJOH UIF ß�¨ß�ãã DPNNBOE�
4UBUB &YBNQMF� *OQVU
ïã� §°É�°�
ß�¨ß�ãã §°É�°� Ü�ß�Éê -ãß�É¿
Üß��°�ê ā�êϔ Ā�
Üß��°�ê ï-�êϔ ß�ã°�

4UBUB &YBNQMF� 0VUQVU
. use finaid
. regress finaid parent hsrank
Source SS df MS Number of obs =
50
F(2, 47) = 68.33
Model 1.0496e+09 2 524779682 Prob > F =
0.0000
Residual 360941186 47 7679599.7 R-squared =
0.7441
Adj R-squared = 0.7332
Total 1.4105e+09 49 28785725.5 Root MSE =
2771.2
finaid Coef. Std. Err. t P>|t| [95% Conf.
Interval]
parent -.3567721 .0316851 -11.26 0.000 -.4205143
-.2930299
hsrank 87.37815 20.67413 4.23 0.000 45.78717
128.9691
_cons 8926.929 1739.083 5.13 0.000 5428.346
12425.51
. predict yhat, xb
. predict uhat, resid

CASE
6.3 ELECTRONIC TIMING SYSTEM FOR OLYMPICS
Sarah Chang is the owner of a small electronics company. In six
months, a proposal is due for an electronic timing system for
the next Olympic Games. For several years, Chang's company
has been developing a new microprocessor, a critical component
in a timing system that would be superior to any product
currently on the market. However, progress in research and
development has been slow, and Chang is unsure whether her
staff can produce the microprocessor in time. If they succeed in
developing the microprocessor (probability p1), there is an
excellent chance (probability p2) that Chang's company will win
the $1 million Olympic contract. If they do not, there is a small
chance (probability p3) that she will still be able to win the
same contract with an alternative but inferior timing system that
has already been developed.
If she continues the project, Chang must invest $200,000 in
research and development. In addition, making a proposal
(which she will decide whether to do after seeing whether the
R&D is successful) requires developing a prototype timing
system at an additional cost. This additional cost is $50,000 if
R&D is successful (so that she can develop the new timing
system), and it is $40,000 if R&D is unsuccessful (so that she
needs to go with the older timing system). Finally, if Chang
wins the contract, the finished product will cost an additional
$150,000 to produce.
a. Develop a decision tree that can be used to solve Chang's
problem. You can assume in this part of the problem that she is
using EMV (of her net profit) as a decision criterion. Build the
tree so that she can enter any values for p1, p2, and (in input
cells) and automatically see her optimal EMV and optimal
strategy from the tree.
b. If p2 = 0.8 and p3 = 0.1, what value of p1 makes Chang
indifferent between abandoning the project and going ahead

with it?
c. How much would Chang benefit if she knew for certain that
the Olympic organization would guarantee her the contract?
(This guarantee would be in force only if she were successful in
developing the product.) Assume p1 = 0.4, p2 = 0.8, and p3 =
0.1.
d. Suppose now that this is a relatively big project for Chang.
Therefore, she decides to use expected utility as her criterion,
with an exponential utility function. Using some trial and error,
see which risk tolerance changes her initial decision from “go
ahead” to “abandon” when p1 = 0.4, p2 = 0.8, and p3 = 0.1.
Week 2 - Assignment
Uncertainty
Read Case 6.3: Electronic Timing System for
Olympics on pages 275-276 of the textbook. For this
assignment, you will assess and use the correct support tool to
develop a decision tree as described in
Part “a” of Case
6.3. Analyze and apply the best decision making process to prov
ide answers and brief
explanations for parts “a”, “b”, “c”, and “d”. The answers and e
xplanations can be placed in the same
Excel document as the decision tree.

a.
Develop a decision tree that can be used to solve Chang’s probl
em. You can assume in this
part of the problem that she is using EMV (of her net profit) as
a decision criterion. Build the
tree so that she can enter any values for p1,
p2, and p3 (in input cells) and automatically see
her optimal EMV and optimal strategy from the tree.
b.
If p2 = 0.8 and p3 = 0.1, what value of p1 makes Chang indiffer
ent between abandoning the
project and going ahead with it?
c.
How much would Chang benefit if she knew for certain that the
Olympic organization would
guarantee her the contract? (This guarantee would be in force o
nly if she were successful in
developing the product.) Assume p1 = 0.4, p2 = 0.8, and p3 = 0.
1
d.
Suppose now that this is a relatively big project for Chang. Ther
efore, she decides to use
expected utility as her criterion, with an exponential utility func
tion. Using some trial and error,
see which risk tolerance changes her initial decision from “go a

head” to “abandon” when p1 =
0.4, p2 = 0.8, and p3 = 0.1.
In your Excel document,
1.
Develop a decision tree using the most appropriate support tool
as described in Part a.
2. Calculate the value of p1 as described in Part
b. Show calculations.
3.
Calculate the possible profit using the most appropriate support
tool as described in Part c.
Show calculations.
4. Calculate risk tolerance as described in Part
d. Show calculations.
Carefully review the Grading Rubric
(http://ashford.waypointoutcomes.com/assessment/20454/previe
w) for the criteria that will be used to
evaluate your assignment.
Waypoint Assignment
Submission
The assignments in this course will be submitted to Waypoint.
Please refer to the instructions below to

submit your assignment.
This tool needs to be loaded in a new browser window
1. Click on the Assignment
Submission button below. The Waypoint "Student Dashboard" w
ill open
in a new browser window.
2. Browse for your assignment.
3. Click Upload.
4.
Confirm that your assignment was successfully submitted by vie
wing the appropriate week's
assignment tab in Waypoint.
For more detailed instructions, refer to the Waypoint Tutorial
(https://bridgepoint.equella.ecollege.com/curriculum/file/dc358
708-3d2b-41a6-a000-
ff53b3cc3794/1/Waypoint%20Tutorial.pdf)
(https://bridgepoint.equella.ecollege.com/curriculum/file/dc358
708-3d2b-41a6-a000-
ff53b3cc3794/1/Waypoint%20Tutorial.pdf) .
The session for this tool has expired. Please reload the page to
access the tool again

Econ 5420: Cross Section:
Introduction
Prof. Jason Blevins
Department of Economics
The Ohio State University
http://jblevins.org/
What is Econometrics?
Model ⇐⇒ Econometrics ⇐⇒ Data
Figure 1: What is econometrics?
• Literally “economic measurement”
• Quantitative analysis of economic problems
• Application of statistical methods to connect theoretical
economic models to data
• Bridges abstract economic theory and real-world human
economic activity
• Notion of a “true model”

Relevance
• Academic research
• Every field of economics
• With the exception of pure theory
• Government
• Procurement auctions (e.g., FCC spectrum auctions)
• Assignment of scarce resources (e.g., timber and spectrum
auctions)
• Antitrust
• Environmental regulation
• Business
• Estimate demand for a new product
• Forecasting sales
• Pricing financial assets
• Search engine advertisements
Non-ideal data
• Traditional statistics: controlled experiments
• Economists rarely have this luxury

• Much of econometrics focuses on statistical analysis of
data under the non-ideal circumstances inherent in
measuring economic interactions.
Quote: Ragnar Frisch
[T]here are several aspects of the quantitative
approach to economics, and no single one of these
aspects, taken by itself, should be confounded with
econometrics. Thus, econometrics is by no means
the same as economic statistics. Nor is it identical
with what we call general economic theory, although
a considerable portion of this theory has a definitely
quantitative character. Nor should econometrics be
taken as synonomous with the application of
mathematics to economics. . . . It is the unification of
all three that is powerful. And it is this unification that
constitutes econometrics.
–Frisch (1933)

Roles of Econometrics
In light of these definitions, it is clear that econometrics is used
for:
1. quantifying economic relationships (estimation),
2. testing economic theories, and
3. prediction and forecasting.
Demand Example
Example
Let Q denote the quantity demanded of a particular good. We
expect that Q should depend on P, the price of the good itself,
Ps, the price of a substitute good, and Yd, disposable income.
Without saying more about the specific relationships between
these variables, we can represent this notion using the abstract
functional relationship
Q = f(P, Ps, Yd).
Typically, we would expect demand (Q) to decrease with P

and increase with Ps.
Examples of the Roles of Econometrics
Examples of the three roles applied to the demand model
Q = f(P, Ps, Yd):
1. Quantifying economic relationships: In a linear model
Q = β1 + β2P + β3Ps + β4Yd,
what are the values of the β1, β2, β3, and β4?
2. Testing economic theories: Is the good a normal good?
3. Prediction and forecasting: How many units would be
demanded, given hypothetical values of prices and
income?
Estimation: Quantifying Uncertainty
Q
Yd
(a) Small sample
Q

Yd
(b) Large sample
Q
Yd
(c) Low variance
Figure 2: Statistical significance.
A Note on Causality
• Correlation does not imply causation!
• Suppose we observe that when large numbers of people
carry umbrellas to work it tends to rain.
• Obviously, carrying umbrellas do not cause it to rain!
• This is of course just a correlation, and the causation runs
in the opposite direction.
• For the purposes of prediction, perhaps the number of
umbrellas is a good predictor of rain, but for interpretation
of the underlying process it is nonsensical.

Another Note on Causality
Example
A study of traffic accidents due to alcohol examined police
reports of traffic accidents. For each, researchers recorded
whether the driver had consumed alcohol and whether or not
the report noted an empty beer container in the vehicle.
Statement A: A recent study has found that drinking beer while
driving may lead to increased risk of an accident.
Statement B: A recent study has found that empty beer
containers in cars may lead to increased risk of an accident.
Econometric Data
There are three basic types of econometric data:
1. Cross-sectional data: observations on different individual
units (e.g., people or firms) with no natural ordering.
2. Time series data: observations at different points in time
(e.g., monthly or yearly) with a natural ordering (time).
3. Panel data: a mixture of time series and cross sectional

data consisting of observations on multiple individuals
(unordered) at different points in time (ordered).
Cross-Sectional Data
Example
A cross-sectional dataset on different students including age,
GPA, and hours studied per week:
Student Age GPA Hours
1 20 3.4 10
2 21 3.1 5
3 19 3.9 12
...
...
...
...
Time Series Data: Example
Example
A time series dataset consisting of semester-by-semester

observations on a single student:
Semester Age GPA Hours
Fall 2013 18 3.3 7
Spring 2014 18 3.8 5
Fall 2014 19 3.5 12
...
...
...
...
Panel Data: Example
Example
A panel dataset with observations on multiple students across
multiple semesters:
Student Semester Age GPA Hours
1 Fall 2013 18 3.3 7
1 Spring 2014 18 3.8 5
1 Fall 2014 19 3.5 12
...

...
...
...
...
2 Fall 2013 20 3.1 10
2 Spring 2014 20 2.9 10
2 Fall 2014 21 3.4 18
...
...
...
...
...
Mean, Variance, and Standard Deviation
• Three important features of random variables: mean,
variance, and standard deviation.
• Referred to as “moments” of a distribution.
• Forumla depends on whether the distribution is discrete
(sum) or continuous (integral).
Expected Value

The expected value or mean of a discrete random variable is
the sum of all possible outcomes weighted by the probability
that each outcome occurs.
• Discrete random variable Z takes on a countable number
of outcomes.
• Let k denote the number of outcomes and z1, z2, . . . , zk
are denote the values of the outcomes.
• Let P(z1), P(z2), . . . , P(zk) denote the probabilities
associated with each outcome.
• The expected value of Z, denoted E[Z] or µ is
µ = E[Z] =
k∑
i=1
P(zi)zi = P(z1)z1 + · · · + P(zk)zk
Expected Value
Example
An individual makes a bet with the possibility of losing $1.00,
breaking even, winning $3.00, or winning $5.00. Let Z be a

random variable representing the winnings.
Outcome -$1.00 $0.00 $3.00 $5.00
Probability 0.30 0.40 0.20 0.10
The mean outcome is:
µ = E[Z] = (0.3 × −1) + (0.4 × 0) + (0.2 × 3) + (0.1 × 5)
= −0.3 + 0.6 + 0.5 = 0.8
Therefore, the expected payoff for this bet is $0.80.
Variance
The variance of a discrete random variable Z, denoted Var(Z)
or σ2, is a measure of the variability of the distribution and is
defined as
σ2 = Var[Z] = E[(Z − µ)2] =
k∑
i=1
P(zi)(zi − µ)2.
The variance is the expected value of (Z − µ)2 which is the
anticipated average value of the squared deviations of Z from
the mean.

Variance: Example
Example
Returning to the example, even though the odds are favorable
one might also be concerned about the possibility of extreme
outcomes. The variance of the winnings is
σ2 = [0.3 × (−1 − 0.8)2] + [0.4 × (0 − 0.8)2]
+ [0.2 × (3 − 0.8)2] + [0.1 × (5 − 0.8)2]
= [0.3 × (−1.8)2] + [0.4 × (−0.8)2]
+ [0.2 × (2.2)2] + [0.1 × (4.2)2]
= [0.3 × 3.24] + [0.4 × 0.64] + [0.2 × 4.84] + [0.1 × 17.64]
= 0.972 + 0.256 + 0.968 + 1.764
= 3.96.
Standard Deviation
The standard deviation, denoted σ, is the square root of the
variance.
Example
In our example, the variance was σ2 = 3.96 so the standard
deviation is
σ =

√
3.96 ≈ 1.99
Sample Statistics
In contrast to the population mean, the sample average or
sample mean is a sum over n sampled values (called
realizations) of a random variable where all of the weights are
all equal to 1/n. Let {Z1, Z2, . . . , Zn} denote a sample of size
n
from the distribution of Z. The sample average of
{Z1, Z2, . . . , Zn} is
Z
̄ =
1
n
n∑
i=1
Zi =
1
n
Z1 + · · · +
1
n

Zn.
The sample variance is defined similarly:
s2 =
1
n
n∑
i=1
(Zi − Z
̄ )2.
Normal Distribution
• Mean, variance, and standard deviation are can be
illustrated intuitively with the normal distribution.
• Write N(µ, σ2) to denote the normal distribution with
mean µ and variance σ2.
• The standard deviation is defined as the square root of the
variance, σ in this case.
Normal Distribtion
0.00

0.04
0.09
0.14
0.18
0.22
0.27
0.31
0.36
0.40
0.45
-9.0 -7.2 -5.4 -3.6 -1.8 0.0 1.8 3.6 5.4 7.2 9.0
P
ro
b
a
b
ili
ty
d
e
n
si

ty
x
Figure 3: N(0, 1)
Normal Distribtion
0.00
0.04
0.09
0.14
0.18
0.22
0.27
0.31
0.36
0.40
0.45
-9.0 -7.2 -5.4 -3.6 -1.8 0.0 1.8 3.6 5.4 7.2 9.0
P

ro
b
a
b
ili
ty
d
e
n
si
ty
x
Figure 4: N(1, 2)
Simple Regression
• Ordinary Least Squares (OLS) with a single independent
variable.
• The theoretical model of interest is the linear model
Yi = β0 + β1Xi + ui.
• From this equation, we seek to use the information
contained in a dataset of observations (Xi, Yi) to estimate

the values of β0 and β1, which we call β̂1 and β̂2.
• The fitted values are
Ŷi = β̂0 + β̂1Xi
• The residuals are the differences between the fitted values
Ŷi and the observed values Yi:
ûi ≡ Yi − Ŷi.
The Geometry of Simple Regression
0
Y
XX1 X2
Y2
Ŷ2
Ŷ1
Y1
û2
û1
Figure 5: Estimated regression line and residuals

Simple Regression
• We need to formally define our loss function, the criteria
by which we determine whether the fit is good or not.
• OLS is founded on minimizing the sum of squared
residuals (SSR), defined as
SSR =
n∑
i=1
û2i = û
2
1 + û
2
2 + · · · + û
2
n,
where n is the sample size.
• OLS estimates of β0 and β1 are defined to be the values of
β̂0 and β̂1 which, when plugged into the estimated
regression equation, minimize the SSR.
• Note that different samples (of the same size) yield

different estimates.
Why Ordinary Least Squares?
OLS is used so often for several reasons.
1. It is very straightforward to implement, both by hand and
computationally, and it is simple to work with theoretically.
2. The criteria of minimizing the squared residuals is intuitive.
3. Other nice properties: the regression line passes through
the means of X and Y, (X
̄ , Ȳ), the sum of the residuals is
zero, etc.
Simple Regression: Interpretation
In the regression line
Yi = β̂0 + β̂1Xi,
the coefficient on Xi represents the amount by which we
predict Yi will increase when Xi increases by one unit.
Simple Regression: Interpretation

Example
Let Yi be an individual i’s annual demand for housing in dollars
and let Xi be individual i’s annual income, also measured in
dollars.
Then β̂1 is the number of additional dollars individual i is
predicted to spend on housing when income increases by one
dollar.
The intercept, β̂0, is an individual’s predicted expenditure on
housing when income is zero.
Anscombe's Quartet: A Cautionary Tale
4 6 8 10 12 14 16 18
4
6
8
10
12
x1

y 1
4 6 8 10 12 14 16 18
4
6
8
10
12
x2
y 2
4 6 8 10 12 14 16 18
4
6
8
10
12
x3
y 3
4 6 8 10 12 14 16 18

4
6
8
10
12
x4
y 4
OLS Review: Sum of the Residuals is Zero
To see that the residuals sum to zero, we can look at the
average of the residuals. If the average is zero, so is the sum.
1
n
n∑
i=1
ûi =
1
n
n∑
i=1

(
Yi − β̂0 − β̂1Xi
)
=
1
n
n∑
i=1
Yi − β̂0 − β̂1
1
n
n∑
i=1
Xi
= Ȳ − β̂0 − β̂1X
̄ .
But recall that β̂0 = Ȳ − β̂1X
̄ , and so
1
n
n∑
i=1
ûi = Ȳ − β̂0 − β̂1X
̄ .

= Ȳ − (Ȳ − β̂1X
̄ ) − β̂1X
̄
= 0.
OLS Review: Fitted Value at Mean
Show that the regression line passes through the point (X
̄ , Ȳ).
We can evaluate the regression line at the point X
̄ and show
that the fitted value is indeed Ȳ.
Substituting for β̂0 gives:
Ŷ = β̂0 + β̂1X
= (Ȳ − β̂1X
̄ ) + β̂1X.
Then evaluating at the point X = X
̄ gives:
Ŷ = (Ȳ − β̂1X
̄ ) + β̂1X
̄
= Ȳ.
OLS Review: Average of Predicted Values
Another property is that the average of the predicted values Ŷi
equals the average of the observations Yi:
1
n

n∑
i=1
Ŷi =
1
n
n∑
i=1
(β̂0 − β̂1Xi)
= β̂0 + β̂1X
̄
= (Ȳ − β̂1X
̄ ) + β̂1X
̄
= Ȳ.
References
Frisch, R. (1933). Editor’s note. Econometrica 1, 1–4.
IntroductionWhat is Econometrics?Some
QuotesCausalityEconometric DataMathematical and
BackgroundMean, Variance, and Standard DeviationSample
StatisticsSimple RegressionOLS Review
1/12/18, 11:10 AM Page 1 of 1
User: Jason Blevins
name: <unnamed>
log: /tmp/hw1-problem4.smcl

log type: smcl
opened on: 12 Jan 2018, 11:09:18
1 . use hprice1
2 . regress price sqrft bdrms
Source SS df MS Number of obs =
88
F(2, 85) = 72.96
Model 580009.152 2 290004.576 Prob > F =
0.0000
Residual 337845.354 85 3974.65122 R-squared =
0.6319
Adj R-squared = 0.6233
Total 917854.506 87 10550.0518 Root MSE =
63.045
price Coef. Std. Err. t P>|t| [95% Conf.
Interval]
sqrft .1284362 .0138245 9.29 0.000 .1009495
.1559229
bdrms 15.19819 9.483517 1.60 0.113 -3.657582
34.05396
_cons -19.315 31.04662 -0.62 0.536 -81.04399
42.414
3 . log close
name: <unnamed>
log: /tmp/hw1-problem4.smcl
log type: smcl
closed on: 12 Jan 2018, 11:09:50

Econ 5420: Econometrics II
The Ohio State University
Spring 2018
Prof. Jason Blevins
Homework 1
Due in class on January 16.
Review Chapters 1–3 of Wooldridge and complete the exercises
below. (Textbook exercise
numbers are given in parenthesis for reference, but the first two
are not from our textbook.)
Problem 1. (Studenmund, 1.11) The distinction between the
stochastic error term ui and the
residual ûi is one of the most important in this class.
a. List at least two differences between the error term and the
residual.
b. Usually, we can never observe the error term, but we can get
around this difficulty if we
assume values for the true coefficients. Calculate values of the
error term and residual for
each of the following six observations given that the true β0
equals 0.0, the true β1 equals
1.5, and the estimated regression equation is Ŷi = 0.48 + 1.32 ·
X i :
Yi 2 6 3 8 5 4
X i 1 4 2 5 3 4

(Hint: To answer this question, you’ll have to solve for ui in the
“true model” equation for
Yi .)
Problem 2. In order to estimate a regression equation using
Ordinary Least Squares (OLS), it
must be linear in the coefficients. Determine whether the
coefficients in each of the following
equations could be estimated using OLS.
a. Yi = β0 +β1 ln X i + ui
b. ln Yi = β0 +β1 ln X i + ui
c. Yi = β0 +β1 X β2i + ui
d. Y
β0
i
= β1 +β2 X 2i + ui
e. Yi = β0 +β1 X i +β2 X 2i + ui
Problem 3. (Wooldridge, 3.4) The median starting salary for
new law school graduates is deter-
mined by
ln SALARY = β0 +β1LSAT +β2GPA +β3 ln LIBVOL +β4 ln
COST +β5RANK + u.
where LSAT is the median LSAT score for the graduating class,
GPA is the median college GPA for
the class, LIBVOL is the number of volumes in the law school
library, COST is the annual cost of
attending law school, and RANK is a law school ranking (with

RANK = 1 being the best).
1
Econ 5420: Econometrics II Homework 1
a. Explain why we expect β5 ≤ 0.
b. What signs do you expect for the other slope parameters?
Justify your answers.
c. Using the lawsch85 dataset, the estimated equation is
áln SALARY = 8.34 + 0.0047 LSAT + 0.248 GPA + 0.095 ln
LIBVOL
+ 0.038 ln COST − 0.0033 RANK
n = 136, R2 = 0.842.
What is the predicted ceteris paribus difference in salary for
schools with a median GPA
different by one point? (Report your answer as a percentage.)
d. Interpret the coefficient on the variable ln LIBVOL.
e. Would you say it is better to attend a higher ranked law
school? How much is a difference
in ranking of 20 worth in terms of predicted starting salary?
Problem 4. (Wooldridge, C3.2) Use the hprice1 dataset to
estimate the model
PRICE = β0 +β1SQRFT +β2BDRMS + u
where PRICE is the house price measured in thousands of
dollars.

a. Write out the results in equation form.
b. What is the estimated increase in price for a house with one
more bedroom, holding square
footage constant?
c. What is the estimated increase in price for a house with an
additional bedroom that is 140
square feet in size? Compare this to your answer in part b.
d. What percentage of the variation in price is explained by
square footage and number of
bedrooms?
e. The first house in the sample has SQRFT = 2, 438 and
BDRMS = 4. Find the predicted
selling price for this house from the OLS regression line.
f. The actual selling price of the first house in the sample was
$300,000 (so PRICE = 300).
Find the residual for this house. Does it suggest that the buyer
underpaid or overpaid for
the house?
2

&DPO $SPTT 4FDUJPO.VMUJQMF 3FHSFTTJPO1SPG +BTPO .docx

Recommended

Recommended

More Related Content

Similar to &DPO $SPTT 4FDUJPO.VMUJQMF 3FHSFTTJPO1SPG +BTPO .docx

Similar to &DPO $SPTT 4FDUJPO.VMUJQMF 3FHSFTTJPO1SPG +BTPO .docx (20)

More from mayank272369

More from mayank272369 (20)

Recently uploaded

Recently uploaded (20)

&DPO $SPTT 4FDUJPO.VMUJQMF 3FHSFTTJPO1SPG +BTPO .docx