SlideShare a Scribd company logo
1 of 48
Download to read offline
Basic Statistics and
Econometric approaches
Devesh Roy
IFPRI-ICAR training
21st September, 2015
2
Why study Econometrics?
• Rare in economics (and many other areas without labs!) to have
experimental data
• Need to use nonexperimental, or observational, data to make
inferences
• Important to be able to apply economic theory to real world data
Elements of econometrics
• An empirical analysis uses data to test a theory or to estimate a
relationship
• A formal economic model can be tested
• Theory may be ambiguous as to the effect of some policy change –
can use econometrics to evaluate the program
4
Types of Data – Cross Sectional
• Cross-sectional data as a random sample
• Each observation is a new farmer, firm, etc. with information at a
point in time
• If the data is not a random sample, we have a sample-selection
problem
5
Types of Data – Panel
• Can pool random cross sections and treat similar to a normal cross
section. Will just need to account for time differences.
• Can follow the same random individual observations over time –
known as panel data or longitudinal data
6
Types of Data – Time Series
• Time series data has a separate observation for each time period – e.g.
prices
• Since not a random sample, different problems to consider
• Trends and seasonality will be important
7
The Question of Causality
• Simply establishing a relationship between variables is not sufficient
• In economics and most of the work we strive to show that effect is
causal
• If we truly control for enough other variables, then the estimated
ceteris paribus effect can often be considered to be causal
• Can be difficult to establish causality
8
Example: Returns to Education (Wooldridge
textbook example)
• A model of human capital investment implies getting more education
should lead to higher earnings
• In the simplest case, this implies an equation like
ueducationEarnings  10 
9
Example: (continued)
• The estimate of 1, is the return to education, but
can it be considered causal?
• While the error term, u, includes other factors
affecting earnings, want to control for as much as
possible
• Some things are still unobserved, which can
threaten establishing causality
Economics 20 - Prof. Anderson 10
Simple linear regression
• y = 0 + 1x + u, we typically refer to y as the
• Dependent Variable, or
• Left-Hand Side Variable, or
• Explained Variable, or
• Regressand
11
Terminology, continued
• In the simple linear regression of y on x, we typically refer to x as the
• Independent Variable, or
• Right-Hand Side Variable, or
• Explanatory Variable, or
• Regressor, or
• Covariate, or
• Control Variables
12
A Simple Assumption
• The average value of u, the error term, in the population is 0. That is,
• E(u) = 0
• This is not a restrictive assumption, since we can always use 0 to
normalize E(u) to 0
13
Zero Conditional Mean (very important
condition)
• We need to make a crucial assumption about how u and x are related
• We want it to be the case that knowing something about x does not
give us any information about u, so that they are completely
unrelated. That is, that
• E(u|x) = E(u) = 0, which implies
• E(y|x) = 0 + 1x
14
.
.
x1 x2
E(y|x) as a linear function of x, where for any x
the distribution of y is centered about E(y|x)
E(y|x) = 0 + 1x
y
f(y)
15
Ordinary Least Squares
• Basic idea of regression is to estimate the population parameters
from a sample
• Let {(xi,yi): i=1, …,n} denote a random sample of size n from the
population
• For each observation in this sample, it will be the case that
• yi = 0 + 1xi + ui
Economics 20 - Prof. Anderson 16
.
.
.
.
y4
y1
y2
y3
x1 x2 x3 x4
}
}
{
{
u1
u2
u3
u4
x
y
Population regression line, sample data points
and the associated error terms
E(y|x) = 0 + 1x
Gauss Markov assumptions
• Assumption 1- linearity in parameters- Population equation is linear
in the (unknown) parameters.
• Assumption 2- Sample is random
• Assumption 3- There is variation in x
• Assumption 4- Zero conditional mean E(u/x)=0
18
Deriving OLS Estimates
• To derive the OLS estimates we need to realize that our main
assumption of E(u|x) = E(u) = 0 also implies that
• Cov(x,u) = E(xu) = 0
• Why? Remember from basic probability that Cov(X,Y) = E(XY) –
E(X)E(Y)
Economics 20 - Prof. Anderson 19
Deriving OLS continued
• We can write our 2 restrictions just in terms of x, y, 0 and 1 , since u
= y – 0 – 1x
• E(y – 0 – 1x) = 0
• E[x(y – 0 – 1x)] = 0
• These are called moment restrictions
20
Deriving OLS using M.O.M.
• The method of moments approach to estimation implies imposing
the population moment restrictions on the sample moments
• What does this mean? Recall that for E(X), the mean of a population
distribution, a sample estimator of E(X) is simply the arithmetic mean
of the sample
21
More Derivation of OLS
• We want to choose values of the parameters that will ensure that
the sample versions of our moment restrictions are true
• The sample versions are as follows:
 
  0ˆˆ
0ˆˆ
1
10
1
1
10
1








n
i
iii
n
i
ii
xyxn
xyn


22
More Derivation of OLS
• Given the definition of a sample mean, and
properties of summation, we can rewrite the first
condition as follows
xy
xy
10
10
ˆˆ
or
,ˆˆ




23
More Derivation of OLS
  
   
    








n
i
ii
n
i
i
n
i
ii
n
i
ii
n
i
iii
xxyyxx
xxxyyx
xxyyx
1
2
1
1
1
1
1
1
11
ˆ
ˆ
0ˆˆ



24
So the OLS estimated slope is
  
 
  0thatprovided
ˆ
1
2
1
2
1
1










n
i
i
n
i
i
n
i
ii
xx
xx
yyxx

25
Summary of OLS slope estimate
• The slope estimate is the sample covariance between x and y divided
by the sample variance of x
• If x and y are positively correlated, the slope will be positive
• If x and y are negatively correlated, the slope will be negative
• Only need x to vary in our sample
26
More OLS
• Intuitively, OLS is fitting a line through the sample points such that
the sum of squared residuals is as small as possible, hence the term
least squares
• The residual, û, is an estimate of the error term, u, and is the
difference between the fitted line (sample regression function) and
the sample point
Economics 20 - Prof. Anderson 27
.
.
.
.
y4
y1
y2
y3
x1 x2 x3 x4
}
}
{
{
û1
û2
û3
û4
x
y
Sample regression line, sample data points
and the associated estimated error terms
xy 10
ˆˆˆ  
28
Minimizing the sum of least squares
• Given the intuitive idea of fitting a line, we can set up a formal
minimization problem
• That is, we want to choose our parameters such that we minimize
the following:
    

n
i
ii
n
i
i xyu
1
2
10
1
2 ˆˆˆ 
29
Alternate approach, continued
• If one uses calculus to solve the minimization problem for the two
parameters you obtain the following first order conditions, which are
the same as we obtained before, multiplied by n
 
  0ˆˆ
0ˆˆ
1
10
1
10






n
i
iii
n
i
ii
xyx
xy


30
Algebraic Properties of OLS
• The sum of the OLS residuals is zero
• Thus, the sample average of the OLS residuals is zero as well
• The sample covariance between the regressors and the OLS residuals
is zero
• The OLS regression line always goes through the mean of the sample
31
Algebraic Properties (precise)
xy
ux
n
u
u
n
i
ii
n
i
in
i
i
10
1
1
1
ˆˆ
0ˆ
0
ˆ
thus,and0ˆ
 








Economics 20 - Prof. Anderson 32
More terminology
 
 
SSRSSESSTThen
(SSR)squaresofsumresidualtheisˆ
(SSE)squaresofsumexplainedtheisˆ
(SST)squaresofsumtotaltheis
:followingthedefinethenWeˆˆ
part,dunexplaineanandpart,explainedanofup
madebeingasnobservatioeachofcan thinkWe
2
2
2







i
i
i
iii
u
yy
yy
uyy
Economics 20 - Prof. Anderson 33
Proof that SST = SSE + SSR
      
  
   
 
 

 







0ˆˆthatknowweand
SSEˆˆ2SSR
ˆˆˆ2ˆ
ˆˆ
ˆˆ
22
2
22
yyu
yyu
yyyyuu
yyu
yyyyyy
ii
ii
iiii
ii
iiii
Economics 20 - Prof. Anderson 34
Goodness-of-Fit
• How do we think about how well our sample regression line fits our
sample data?
• Can compute the fraction of the total sum of squares (SST) that is
explained by the model, call this the R-squared of regression
• R2 = SSE/SST = 1 – SSR/SST
35
Unbiasedness of OLS
• Assume the population model is linear in parameters as y = 0 + 1x +
u
• Assume we can use a random sample of size n, {(xi, yi): i=1, 2, …, n},
from the population model. Thus we can write the sample model yi =
0 + 1xi + ui
• Assume E(u|x) = 0 and thus E(ui|xi) = 0
• Much of the problem in econometric estimation that we are going
to face is because of violation of this assumption
• Assume there is variation in xi
Economics 20 - Prof. Anderson 36
Unbiasedness of OLS (cont)
• In order to think about unbiasedness, we need to rewrite our
estimator in terms of the population parameter
• Start with a simple rewrite of the formula as
 
 




22
21 where,ˆ
xxs
s
yxx
ix
x
ii

Economics 20 - Prof. Anderson 37
Unbiasedness of OLS (cont)
    
   
 
   
  ii
iii
ii
iii
iiiii
uxx
xxxxx
uxx
xxxxx
uxxxyxx










10
10
10



Economics 20 - Prof. Anderson 38
Unbiasedness of OLS (cont)
 
   
 
 
211
2
1
2
ˆ
thusand,
asrewrittenbecannumeratorthe,so
,0
x
ii
iix
iii
i
s
uxx
uxxs
xxxxx
xx











39
Unbiasedness Summary
• The OLS estimates of 1 and 0 are unbiased
• Proof of unbiasedness depends on assumptions – if any assumption
fails, then OLS is not necessarily unbiased
• Remember unbiasedness is a description of the estimator – in a given
sample we may be “near” or “far” from the true parameter
40
Variance of the OLS Estimators
• Now we know that the sampling distribution of our estimate is
centered around the true parameter
• Want to think about how spread out this distribution is
• Much easier to think about this variance under an additional
assumption, so
• Assume Var(u|x) = s2 (Homoskedasticity)
41
Variance of OLS (cont)
• Var(u|x) = E(u2|x)-[E(u|x)]2
• E(u|x) = 0, so s2 = E(u2|x) = E(u2) = Var(u)
• Thus s2 is also the unconditional variance, called the error variance
• s, the square root of the error variance is called the standard
deviation of the error
• Can say: E(y|x)=0 + 1x and Var(y|x) = s2
Economics 20 - Prof. Anderson 42
.
.
x1 x2
Homoskedastic Case
E(y|x) = 0 + 1x
y
f(y|x)
Economics 20 - Prof. Anderson 43
.
xx1 x2
f(y|x)
Heteroskedastic Case
x3
.
. E(y|x) = 0 + 1x
44
Variance of OLS (cont)
 
   
 12
22
2
2
2
2
2
2
222
2
2
2
2
2
2
2
211
ˆ1
11
11
1ˆ
ss
ss

Var
s
s
s
d
s
d
s
uVard
s
udVar
s
ud
s
VarVar
x
x
x
i
x
i
x
ii
x
ii
x
ii
x















































45
Variance of OLS Summary
• The larger the error variance, s2, the larger the variance of the slope
estimate
• The larger the variability in the xi, the smaller the variance of the
slope estimate
• As a result, a larger sample size should decrease the variance of the
slope estimate
• Problem that the error variance is unknown
46
Estimating the Error Variance
• We don’t know what the error variance, s2, is, because we don’t
observe the errors, ui
• What we observe are the residuals, ûi
• We can use the residuals to form an estimate of the error variance
47
Error Variance Estimate (cont)
 
   
 
 2/ˆ
2
1
ˆ
isofestimatorunbiasedanThen,
ˆˆ
ˆˆ
ˆˆˆ
22
2
1100
1010
10






 nSSRu
n
u
xux
xyu
i
i
iii
iii
s
s



48
Error Variance Estimate (cont)
 
     2
1
2
1
1
2
/ˆˆse
,ˆoferrorstandardthe
havethen weforˆsubstituteweif
ˆsdthatrecall
regressiontheoferrorStandardˆˆ
 


xx
s
i
x
s

ss
s
ss

More Related Content

Viewers also liked

Interesting Research Paper Topics
Interesting Research Paper TopicsInteresting Research Paper Topics
Interesting Research Paper TopicsEssayAcademy
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generationrsathishwaran
 
Seven statistical tools of quality
Seven statistical tools of qualitySeven statistical tools of quality
Seven statistical tools of qualityBalaji Tamilselvam
 
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...inventy
 
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNINGBASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNINGCheryl Asia
 
Inferential statistics powerpoint
Inferential statistics powerpointInferential statistics powerpoint
Inferential statistics powerpointkellula
 
Inferential statistics.ppt
Inferential statistics.pptInferential statistics.ppt
Inferential statistics.pptNursing Path
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statisticsMona Sajid
 
Indian Paper Industry
Indian Paper IndustryIndian Paper Industry
Indian Paper IndustryJaspal Singh
 
Statistical tools in research
Statistical tools in researchStatistical tools in research
Statistical tools in researchShubhrat Sharma
 
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...Spandan Mishra, PhD.
 
Statistical Process Control (SPC) Tools - 7 Basic Tools
Statistical Process Control (SPC) Tools - 7 Basic ToolsStatistical Process Control (SPC) Tools - 7 Basic Tools
Statistical Process Control (SPC) Tools - 7 Basic ToolsMadeleine Lee
 

Viewers also liked (13)

Interesting Research Paper Topics
Interesting Research Paper TopicsInteresting Research Paper Topics
Interesting Research Paper Topics
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Seven statistical tools of quality
Seven statistical tools of qualitySeven statistical tools of quality
Seven statistical tools of quality
 
Consignment
ConsignmentConsignment
Consignment
 
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
Comparative Study of the Quality of Life, Quality of Work Life and Organisati...
 
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNINGBASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
BASIC STATISTICAL TOOLS IN EDUCATIONAL PLANNING
 
Inferential statistics powerpoint
Inferential statistics powerpointInferential statistics powerpoint
Inferential statistics powerpoint
 
Inferential statistics.ppt
Inferential statistics.pptInferential statistics.ppt
Inferential statistics.ppt
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statistics
 
Indian Paper Industry
Indian Paper IndustryIndian Paper Industry
Indian Paper Industry
 
Statistical tools in research
Statistical tools in researchStatistical tools in research
Statistical tools in research
 
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
A Multivariate Cumulative Sum Method for Continuous Damage Monitoring with La...
 
Statistical Process Control (SPC) Tools - 7 Basic Tools
Statistical Process Control (SPC) Tools - 7 Basic ToolsStatistical Process Control (SPC) Tools - 7 Basic Tools
Statistical Process Control (SPC) Tools - 7 Basic Tools
 

More from International Food Policy Research Institute- South Asia Office

More from International Food Policy Research Institute- South Asia Office (20)

8. Dr Rao.pdf
8. Dr Rao.pdf8. Dr Rao.pdf
8. Dr Rao.pdf
 
13. Manish Patel.pdf
13. Manish Patel.pdf13. Manish Patel.pdf
13. Manish Patel.pdf
 
15. Smita Sirohi.pdf
15. Smita Sirohi.pdf15. Smita Sirohi.pdf
15. Smita Sirohi.pdf
 
6. CD Mayee.pdf
6. CD Mayee.pdf6. CD Mayee.pdf
6. CD Mayee.pdf
 
10. Keshavulu_1.pdf
10. Keshavulu_1.pdf10. Keshavulu_1.pdf
10. Keshavulu_1.pdf
 
12. Swati Nayak.pdf
12. Swati Nayak.pdf12. Swati Nayak.pdf
12. Swati Nayak.pdf
 
16. Dr Anjani.pdf
16. Dr Anjani.pdf16. Dr Anjani.pdf
16. Dr Anjani.pdf
 
9. Malavika Dadlani_1.pdf
9. Malavika Dadlani_1.pdf9. Malavika Dadlani_1.pdf
9. Malavika Dadlani_1.pdf
 
7. Neeru Bhooshan.pdf
7. Neeru Bhooshan.pdf7. Neeru Bhooshan.pdf
7. Neeru Bhooshan.pdf
 
4. Raj Ganesh.pdf
4. Raj Ganesh.pdf4. Raj Ganesh.pdf
4. Raj Ganesh.pdf
 
11. Surinder K Tikoo_1.pdf
11. Surinder K Tikoo_1.pdf11. Surinder K Tikoo_1.pdf
11. Surinder K Tikoo_1.pdf
 
3. DK Yadava.pdf
3. DK Yadava.pdf3. DK Yadava.pdf
3. DK Yadava.pdf
 
14. Paresh Verma_1.pdf
14. Paresh Verma_1.pdf14. Paresh Verma_1.pdf
14. Paresh Verma_1.pdf
 
5. Ram Kaundinya.pdf
5. Ram Kaundinya.pdf5. Ram Kaundinya.pdf
5. Ram Kaundinya.pdf
 
1. Dr Anjani.pdf
1. Dr Anjani.pdf1. Dr Anjani.pdf
1. Dr Anjani.pdf
 
2. David Spielman.pdf
2. David Spielman.pdf2. David Spielman.pdf
2. David Spielman.pdf
 
GFPR 2021 South Asia Launch Ppt - Dr. Shahidur Rashid
GFPR 2021 South Asia Launch Ppt - Dr. Shahidur RashidGFPR 2021 South Asia Launch Ppt - Dr. Shahidur Rashid
GFPR 2021 South Asia Launch Ppt - Dr. Shahidur Rashid
 
GFPR 2021 South Asia Launch Ppt - Dr. Johan Swinnen
GFPR 2021 South Asia Launch Ppt - Dr. Johan SwinnenGFPR 2021 South Asia Launch Ppt - Dr. Johan Swinnen
GFPR 2021 South Asia Launch Ppt - Dr. Johan Swinnen
 
Book Launch : Agricultural Transformation in Nepal: Trends, Prospects and Pol...
Book Launch : Agricultural Transformation in Nepal: Trends, Prospects and Pol...Book Launch : Agricultural Transformation in Nepal: Trends, Prospects and Pol...
Book Launch : Agricultural Transformation in Nepal: Trends, Prospects and Pol...
 
Understanding the landscape of pulse policy in India and implications for trade
Understanding the landscape of pulse policy in India and implications for tradeUnderstanding the landscape of pulse policy in India and implications for trade
Understanding the landscape of pulse policy in India and implications for trade
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 

Recently uploaded (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 

ICAR- IFPRI - Basic statistics and econometric approaches lecture 3 - Devesh Roy

  • 1. Basic Statistics and Econometric approaches Devesh Roy IFPRI-ICAR training 21st September, 2015
  • 2. 2 Why study Econometrics? • Rare in economics (and many other areas without labs!) to have experimental data • Need to use nonexperimental, or observational, data to make inferences • Important to be able to apply economic theory to real world data
  • 3. Elements of econometrics • An empirical analysis uses data to test a theory or to estimate a relationship • A formal economic model can be tested • Theory may be ambiguous as to the effect of some policy change – can use econometrics to evaluate the program
  • 4. 4 Types of Data – Cross Sectional • Cross-sectional data as a random sample • Each observation is a new farmer, firm, etc. with information at a point in time • If the data is not a random sample, we have a sample-selection problem
  • 5. 5 Types of Data – Panel • Can pool random cross sections and treat similar to a normal cross section. Will just need to account for time differences. • Can follow the same random individual observations over time – known as panel data or longitudinal data
  • 6. 6 Types of Data – Time Series • Time series data has a separate observation for each time period – e.g. prices • Since not a random sample, different problems to consider • Trends and seasonality will be important
  • 7. 7 The Question of Causality • Simply establishing a relationship between variables is not sufficient • In economics and most of the work we strive to show that effect is causal • If we truly control for enough other variables, then the estimated ceteris paribus effect can often be considered to be causal • Can be difficult to establish causality
  • 8. 8 Example: Returns to Education (Wooldridge textbook example) • A model of human capital investment implies getting more education should lead to higher earnings • In the simplest case, this implies an equation like ueducationEarnings  10 
  • 9. 9 Example: (continued) • The estimate of 1, is the return to education, but can it be considered causal? • While the error term, u, includes other factors affecting earnings, want to control for as much as possible • Some things are still unobserved, which can threaten establishing causality
  • 10. Economics 20 - Prof. Anderson 10 Simple linear regression • y = 0 + 1x + u, we typically refer to y as the • Dependent Variable, or • Left-Hand Side Variable, or • Explained Variable, or • Regressand
  • 11. 11 Terminology, continued • In the simple linear regression of y on x, we typically refer to x as the • Independent Variable, or • Right-Hand Side Variable, or • Explanatory Variable, or • Regressor, or • Covariate, or • Control Variables
  • 12. 12 A Simple Assumption • The average value of u, the error term, in the population is 0. That is, • E(u) = 0 • This is not a restrictive assumption, since we can always use 0 to normalize E(u) to 0
  • 13. 13 Zero Conditional Mean (very important condition) • We need to make a crucial assumption about how u and x are related • We want it to be the case that knowing something about x does not give us any information about u, so that they are completely unrelated. That is, that • E(u|x) = E(u) = 0, which implies • E(y|x) = 0 + 1x
  • 14. 14 . . x1 x2 E(y|x) as a linear function of x, where for any x the distribution of y is centered about E(y|x) E(y|x) = 0 + 1x y f(y)
  • 15. 15 Ordinary Least Squares • Basic idea of regression is to estimate the population parameters from a sample • Let {(xi,yi): i=1, …,n} denote a random sample of size n from the population • For each observation in this sample, it will be the case that • yi = 0 + 1xi + ui
  • 16. Economics 20 - Prof. Anderson 16 . . . . y4 y1 y2 y3 x1 x2 x3 x4 } } { { u1 u2 u3 u4 x y Population regression line, sample data points and the associated error terms E(y|x) = 0 + 1x
  • 17. Gauss Markov assumptions • Assumption 1- linearity in parameters- Population equation is linear in the (unknown) parameters. • Assumption 2- Sample is random • Assumption 3- There is variation in x • Assumption 4- Zero conditional mean E(u/x)=0
  • 18. 18 Deriving OLS Estimates • To derive the OLS estimates we need to realize that our main assumption of E(u|x) = E(u) = 0 also implies that • Cov(x,u) = E(xu) = 0 • Why? Remember from basic probability that Cov(X,Y) = E(XY) – E(X)E(Y)
  • 19. Economics 20 - Prof. Anderson 19 Deriving OLS continued • We can write our 2 restrictions just in terms of x, y, 0 and 1 , since u = y – 0 – 1x • E(y – 0 – 1x) = 0 • E[x(y – 0 – 1x)] = 0 • These are called moment restrictions
  • 20. 20 Deriving OLS using M.O.M. • The method of moments approach to estimation implies imposing the population moment restrictions on the sample moments • What does this mean? Recall that for E(X), the mean of a population distribution, a sample estimator of E(X) is simply the arithmetic mean of the sample
  • 21. 21 More Derivation of OLS • We want to choose values of the parameters that will ensure that the sample versions of our moment restrictions are true • The sample versions are as follows:     0ˆˆ 0ˆˆ 1 10 1 1 10 1         n i iii n i ii xyxn xyn  
  • 22. 22 More Derivation of OLS • Given the definition of a sample mean, and properties of summation, we can rewrite the first condition as follows xy xy 10 10 ˆˆ or ,ˆˆ    
  • 23. 23 More Derivation of OLS                     n i ii n i i n i ii n i ii n i iii xxyyxx xxxyyx xxyyx 1 2 1 1 1 1 1 1 11 ˆ ˆ 0ˆˆ   
  • 24. 24 So the OLS estimated slope is        0thatprovided ˆ 1 2 1 2 1 1           n i i n i i n i ii xx xx yyxx 
  • 25. 25 Summary of OLS slope estimate • The slope estimate is the sample covariance between x and y divided by the sample variance of x • If x and y are positively correlated, the slope will be positive • If x and y are negatively correlated, the slope will be negative • Only need x to vary in our sample
  • 26. 26 More OLS • Intuitively, OLS is fitting a line through the sample points such that the sum of squared residuals is as small as possible, hence the term least squares • The residual, û, is an estimate of the error term, u, and is the difference between the fitted line (sample regression function) and the sample point
  • 27. Economics 20 - Prof. Anderson 27 . . . . y4 y1 y2 y3 x1 x2 x3 x4 } } { { û1 û2 û3 û4 x y Sample regression line, sample data points and the associated estimated error terms xy 10 ˆˆˆ  
  • 28. 28 Minimizing the sum of least squares • Given the intuitive idea of fitting a line, we can set up a formal minimization problem • That is, we want to choose our parameters such that we minimize the following:       n i ii n i i xyu 1 2 10 1 2 ˆˆˆ 
  • 29. 29 Alternate approach, continued • If one uses calculus to solve the minimization problem for the two parameters you obtain the following first order conditions, which are the same as we obtained before, multiplied by n     0ˆˆ 0ˆˆ 1 10 1 10       n i iii n i ii xyx xy  
  • 30. 30 Algebraic Properties of OLS • The sum of the OLS residuals is zero • Thus, the sample average of the OLS residuals is zero as well • The sample covariance between the regressors and the OLS residuals is zero • The OLS regression line always goes through the mean of the sample
  • 32. Economics 20 - Prof. Anderson 32 More terminology     SSRSSESSTThen (SSR)squaresofsumresidualtheisˆ (SSE)squaresofsumexplainedtheisˆ (SST)squaresofsumtotaltheis :followingthedefinethenWeˆˆ part,dunexplaineanandpart,explainedanofup madebeingasnobservatioeachofcan thinkWe 2 2 2        i i i iii u yy yy uyy
  • 33. Economics 20 - Prof. Anderson 33 Proof that SST = SSE + SSR                             0ˆˆthatknowweand SSEˆˆ2SSR ˆˆˆ2ˆ ˆˆ ˆˆ 22 2 22 yyu yyu yyyyuu yyu yyyyyy ii ii iiii ii iiii
  • 34. Economics 20 - Prof. Anderson 34 Goodness-of-Fit • How do we think about how well our sample regression line fits our sample data? • Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression • R2 = SSE/SST = 1 – SSR/SST
  • 35. 35 Unbiasedness of OLS • Assume the population model is linear in parameters as y = 0 + 1x + u • Assume we can use a random sample of size n, {(xi, yi): i=1, 2, …, n}, from the population model. Thus we can write the sample model yi = 0 + 1xi + ui • Assume E(u|x) = 0 and thus E(ui|xi) = 0 • Much of the problem in econometric estimation that we are going to face is because of violation of this assumption • Assume there is variation in xi
  • 36. Economics 20 - Prof. Anderson 36 Unbiasedness of OLS (cont) • In order to think about unbiasedness, we need to rewrite our estimator in terms of the population parameter • Start with a simple rewrite of the formula as         22 21 where,ˆ xxs s yxx ix x ii 
  • 37. Economics 20 - Prof. Anderson 37 Unbiasedness of OLS (cont)                  ii iii ii iii iiiii uxx xxxxx uxx xxxxx uxxxyxx           10 10 10   
  • 38. Economics 20 - Prof. Anderson 38 Unbiasedness of OLS (cont)           211 2 1 2 ˆ thusand, asrewrittenbecannumeratorthe,so ,0 x ii iix iii i s uxx uxxs xxxxx xx           
  • 39. 39 Unbiasedness Summary • The OLS estimates of 1 and 0 are unbiased • Proof of unbiasedness depends on assumptions – if any assumption fails, then OLS is not necessarily unbiased • Remember unbiasedness is a description of the estimator – in a given sample we may be “near” or “far” from the true parameter
  • 40. 40 Variance of the OLS Estimators • Now we know that the sampling distribution of our estimate is centered around the true parameter • Want to think about how spread out this distribution is • Much easier to think about this variance under an additional assumption, so • Assume Var(u|x) = s2 (Homoskedasticity)
  • 41. 41 Variance of OLS (cont) • Var(u|x) = E(u2|x)-[E(u|x)]2 • E(u|x) = 0, so s2 = E(u2|x) = E(u2) = Var(u) • Thus s2 is also the unconditional variance, called the error variance • s, the square root of the error variance is called the standard deviation of the error • Can say: E(y|x)=0 + 1x and Var(y|x) = s2
  • 42. Economics 20 - Prof. Anderson 42 . . x1 x2 Homoskedastic Case E(y|x) = 0 + 1x y f(y|x)
  • 43. Economics 20 - Prof. Anderson 43 . xx1 x2 f(y|x) Heteroskedastic Case x3 . . E(y|x) = 0 + 1x
  • 44. 44 Variance of OLS (cont)        12 22 2 2 2 2 2 2 222 2 2 2 2 2 2 2 211 ˆ1 11 11 1ˆ ss ss  Var s s s d s d s uVard s udVar s ud s VarVar x x x i x i x ii x ii x ii x                                               
  • 45. 45 Variance of OLS Summary • The larger the error variance, s2, the larger the variance of the slope estimate • The larger the variability in the xi, the smaller the variance of the slope estimate • As a result, a larger sample size should decrease the variance of the slope estimate • Problem that the error variance is unknown
  • 46. 46 Estimating the Error Variance • We don’t know what the error variance, s2, is, because we don’t observe the errors, ui • What we observe are the residuals, ûi • We can use the residuals to form an estimate of the error variance
  • 47. 47 Error Variance Estimate (cont)          2/ˆ 2 1 ˆ isofestimatorunbiasedanThen, ˆˆ ˆˆ ˆˆˆ 22 2 1100 1010 10        nSSRu n u xux xyu i i iii iii s s   
  • 48. 48 Error Variance Estimate (cont)        2 1 2 1 1 2 /ˆˆse ,ˆoferrorstandardthe havethen weforˆsubstituteweif ˆsdthatrecall regressiontheoferrorStandardˆˆ     xx s i x s  ss s ss