Probability theory 2

Outline
Topics
Parametric distributions
Random sampling and sampling distribution of Y ¯
Law of large numbers and central limit theorem

Applied Statistics for Economics
3. Parametric Probability Distributions, Random
Sampling, and the Law of Large Numbers

SFC - juliohuato@gmail.com

Spring 2012

SFC - juliohuato@gmail.com Applied Statistics for Economics 3. Parametric Probability Dis

Outline
Topics

Topics


¯
Random sampling and sampling distribution of Y



Outline
Topics

Topics

The topics for this chapter are:
1. The normal, chi-square, F , and t distributions
2. Random sampling and the distribution of the sample average
3. Large-sample approximations and laws of large numbers


Outline
Topics


The most widely used distributions in econometrics are the following:

1. Normal N(µ, σ 2 )
2. Chi-squared χ2
m

3. Student tm
4. F distribution Fm,n


Outline
Topics

Normal distribution

The normal distribution has the bell shape probability density. The
normal density with mean µ and variance σ 2 is symmetric around
its mean. It has approximately 68% of its probability mass between
µ − σ and µ + σ; 95% between µ − 2σ and µ + 2σ; and 99.7%
between µ − 3σ and µ + 3σ.
The normal with mean µ and variance σ 2 is denoted as N(µ, σ 2 ).
The standard normal distribution is the normal distribution with
mean µ = 0 and variance σ 2 = 1. It’s denoted as N(0, 1).


Outline
Topics

Normal distribution

Random variables with a standard normal distribution are denoted
as Z . The standard normal cumulative distribution function is
denoted by Φ: Pr(Z ≤ c) = Φ(c), where c is a constant.
The textbook tables give you the values of the standard normal
cumulative function. So does Excel.
If you have a normally distributed r.v. Y and want to find specific
probabilities using the tables, standardize it first:

(Y − µ)
Z=
σ


Outline
Topics

Normal distribution

Let Y ∼ N(µ, σ 2 ). Then Z = (Y − µ)/σ.
Let c1 and c2 be two numbers such that c1 < c2 and let d1 = (c1 − µ)/σ
and d2 = (c2 − µ)/σ. Then:

Pr(Y ≤ c2 ) = Pr(Z ≤ d2 ) = Φ(d2 )

Pr(Y ≥ c1 ) = Pr(Z ≥ d1 ) = 1 − Φ(d1 )
Pr(c1 ≤ Y ≤ c2 ) = Pr(d1 ≤ Z ≤ d2 ) = Φ(d2 ) − Φ(d1 )


Outline
Topics

Multivariate normal distribution

The normal distribution generalized to many r.v.’s is called the
multivariate normal. For two, X and Y , it’s called the bivariate normal.
If X and Y have a bivariate normal distribution with covariance σXY ,
while a and b are constants, then
aX + bY ∼ N(aµX + bµY , a2 σX + a2 σX + 2abσXY ).
2 2

Similarly, if n r.v.’s have a multivariate normal distribution, then:
1. any linear combination of these variables is normally distributed,
2. the marginal distribution of each of the variables is normal, and
3. the r.v.’s are independent if, also, their covariances are zero.1

1
We said before that if two r.v.’s are independent, then their covariance is zero. We also said the converse is
not necessarily true. In the special case of a joint normal distribution, the converse is true.

Outline
Topics

Chi-squared

The chi-squared distribution is the distribution of the sum of m squared
independent standard normal r.v.’s. This distribution depends on m (the
‘degrees of freedom’ of the distribution).
Let Z1 , Z2 , Z3 be three independent standard normal r.v.’s Then
2 2 2
Z1 + Z2 + Z3 has a chi-squared distribution with 3 degrees of freedom.
Formally and in general:

(Z1 + · · · + Zm ) ∼ χ2
2 2
m


Outline
Topics

Student t distribution

The Student t distribution with m degrees of freedom is deﬁned as the
distribution of the ratio of a standard normal variable, divided by the
square root of an independently distributed chi-squared r.v. with m
degrees of freedom divided by m.
Let Z be a standard normal r.v., W a r.v. with a chi-squared distribution
with m degrees of freedom, and Z and W are independently distributed.
Then
Z / W /m ∼ tm
The t density function has a bell shape, similar to the normal. But when
m is small (20 or less) the tails are fatter. With m > 30, the t is
approximated well by the standard normal, and t∞ converges to the
standard normal.


Outline
Topics

The F distribution

The F distribution with (m, n) d.f. is defined as the distribution of the
ratio of a chi-squared r.v. with m d.f., divided by m, to an independently
distributed chi-squared r.v. with n d.f., divided by n.
Let W be a chi-squared r.v. with m d.f., V a chi-squared r.v. with n d.f.,
where W and V are independently distributed. Then

W /m
∼ Fm,n
V /n

When the d.f. of the denominator (n) increase indefinitely, then the r.v.
V approximates the mean of an infinite number of chi-squared r.v.’s.
And the mean of an infinite number of chi-squared r.v.’s is 1, because the
mean of a standard normal r.v. is 1. In other words, the Fm,∞
distribution of W /m converges to the χ2 distribution of W /m.
V /n m


Outline
Topics

Random sampling

Virtually all the statistical and econometric procedures we’ll use involve
averages of a sample of data. That’s why we need to characterize the
distribution of sample averages.
Random sampling is randomly drawing a sample from a larger
population. The average of a sample is, therefore, a r.v. – because it
depends on the particular sample used. Since it is a random variable, the
average sample has a probability distribution (the sampling distribution).
But before we talk about the average of a random sample, let’s say more
about random sampling in general.


Outline
Topics

Random sampling
To say it differently, random sampling is the selection at random of n
objects from a population such that each member of the population is
equally likely to be included in the sample.
Example: Suppose you record the length of your commute to school and
the weather on a sample of days picked randomly. The population from
which you draw your sample is all your commuting days. If you draw your
sample randomly, each day of commute will have an equal chance to be
picked.
Since the choice of days is random, learning about the weather on a
given sampled day won’t tell you anything about the length of commute
on any other sample day. That is, the value of the commuting time on
each sample day is an independently distributed r.v.
Let the observations in the sample be Y1 , . . . , Yn . Because the days are
picked randomly, the value of the r.v. on day i, Yi is itself random. If you
pick different days, you get different values of Y . Because of random
sampling, you can treat Yi as a r.v.: before it is sampled, Yi can have
many possible values; after sampled, YApplied Statistics for Economics 3. Parametric Probability Dis
SFC - juliohuato@gmail.com i has a specific value.

Outline
Topics

i.i.d.
Since Y1 , . . . , Yn are drawn randomly from the same population (e.g.,
commuting days), the marginal distribution of Yi is the same for each
i = 1, . . . , n. And this marginal distribution is the marginal distribution of
the population variable Y being sampled. When Yi has the same
marginal distribution for i = 1, . . . , n, then Y1 , . . . , Yn are said to be
identically distributed.
And when Y1 , . . . , Yn are drawn from the same distribution and are
independently distributed, they are said to be i.i.d. (independently and
identically distributed).
Formally: In a simple random sample, n objects are drawn at random
from a population and each object is equally likely to be drawn. The
value of the r.v. Y for the ith randomly drawn object is Yi . Since each
object is equally likely to be drawn and the distribution of Yi is the same
for all i, the r.v.’s Y1 , . . . , Yn are i.i.d.; that is the distribution of Yi is the
same for all i = 1, . . . , n and Y1 is distributed independently of
Y2 , . . . , Yn , etc.

Outline
Topics

Sampling distribution of the sample average
¯
The sample average, Y , of the n observations Y1 , Y2 , . . . , Yn is:
n
¯ 1 1
Y = (Y1 + Y2 + · · · + Yn ) = Yi
n n
i=1

By drawing a random sample, we ensure that the sample average is a r.v.
Since the sample is random, each Yi is random. Since the n observations
are random, their average is random. If we had drawn a different sample,
the Y ’s would have been different and their average would have been
¯
different. From sample to sample, the value of Y changes.
¯
Since Y is a r.v., it has a probability distribution. It is called the
sampling distribution of Y : the probability of the possible values of Y ¯
that could be computed for different possible samples Y1 , Y2 , . . . , Yn .
The sample average and their sampling distributions play a key role in
statistics.

Outline
Topics

¯
Mean of Y

2
Let the observations Y1 , Y2 , . . . , Yn be i.i.d. and µY and σY be the mean
and variance of Yi . (All Yi have the same mean and variance since the
observations are i.i.d. draws.)
If n = 2, then mean of Y1 + Y2 is E (Y1 + Y2 ) = µY + µY = 2µY .
Therefore, the mean of the sample average is
E [ 1 (Y1 + Y2 )] = ( 1 )2µY = µY . In general,
2 2

n
¯ 1
E (Y ) = E (Yi ) = µY
n
i=1

Question: What’s the variance of (aX + bY )?


Outline
Topics

¯
Variance of Y

We learned before that var(aX + bY ) = a2 σX + 2abσXY + b 2 σY .
2 2
2
With two i.i.d. draws (n = 2), var(Y1 + Y2 ) = 2σY . And var(Y ¯ ) = 1 σ2 .
2 Y
Why does the covariance term drops out?
For general n, since Y1 , Y2 , . . . , Yn are i.i.d. (Yi = Yj ) for i = j, so the
cov(Y1 , Y2 ) = 0,
n
¯ 1
var(Y ) = var Yi
n
i=1
2
2 σY
σY =
¯
n
The standard deviation:
¯ σY
s.d.(Y ) = √
n


Outline
Topics

¯
Mean, variance, and s.d. of Y

Just to summarize these results:
¯
E (Y ) = µY
2
¯ σY
var(Y ) =
n
σ
¯ ) = √Y
s.d.(Y
n
Note: These results hold regardless of the distribution of Y . But if
2 ¯
Y1 , . . . , Yn are i.i.d. draws from Y ∼ N(µY , σY ), then E (Y ) = µY and
var(Y ¯ ) = σ 2 /n. In other words, Y ∼ N(µY , σ 2 /n).
¯
Y Y
Random sampling ensures that the observations are i.i.d. draws from the
population r.v.


Outline
Topics

Law of large numbers

Sampling distributions are key in developing statistical and econometric
procedures. That’s why it is important to understand, mathematically,
the sampling distribution of Y . ¯
There are two approaches to characterizing the sampling distribution of
¯
Y : (1) the ‘exact’ approach and (2) the ‘approximate’ approach.
The exact approach requires the mathematical derivation of a formula for
the sampling distribution that holds for any value of n. The result is
¯
called the exact or ﬁnite-sample distribution of Y . As we learned, if Y
¯
is a normal r.v. and Y1 , . . . , Yn are i.i.d., then the exact distribution of Y
2
is normal with mean µY and variance σY /n.


Outline
Topics


What if Y is not a normal r.v.? Then, the derivation of the exact
¯
probability distribution of Y is very hard. That’s why we use the
approximate or large-sample approach. The resulting sampling
distribution is often called an asymptotic distribution (asymptotic
means that the approximation becomes exact in the limit when n is very
large).
The beauty of this is that the approximations can be very accurate once
the sample size goes over, say, n = 30. If we use really large samples
(thousands or tens of thousands of observations), then we can
comfortably rely on asymptotic distributions since they become adequate
approximations to the exact sampling distributions.


Outline
Topics

In deriving asymptotic sampling distributions, we will invoke two strong
mathematical facts: (1) the law of large numbers and (2) the central
limit theorem.
The law of large numbers says that if the observations in a sample Yi ,
i, . . . , n are i.i.d. with E (Yi ) = µY and if large outliers are unlikely (in
2
other words, if the variance of Yi is ﬁnite: var(Yi ) = σY < ∞), then Y ¯
converges in probability to µY .
¯
The sample average Y converges in probability to (or “is consistent
¯
for”) µY if the probability that Y is “close” to µY becomes arbitrarily
close to one as n increases.
(Usually, when statisticians say that a given sample average is
consistent, they mean that the sample average converges in probability
to the population average. In other words, they say that the higher n is,
the closer the sample average gets to the population average. This
concept is key in estimating the population average from a sample.)

Outline
Topics

Central limit theorem

If the observations in a sample Y1 , . . . , Yn are i.i.d. with E (Yi ) = µY and
2 2
var(Yi ) = σY , where 0 < σY < ∞, and regardless of the distribution of
Yi , then as n increases indeﬁnitely (n → ∞) the distribution of Y ¯
becomes arbitrarily well approximated by a normal distribution with mean
¯ 2 2
E (Y ) = µY and variance σY = σY /n.
¯
¯ 2
In other words, the distribution of (Y − µY )/σY (where σY = σY /n)
¯ ¯
2

becomes arbitrarily well approximated by the standard normal
distribution.


Outline
Topics

Central limit theorem

How large should n be for this approximation to normality to be good? It
¯
depends on the distribution of Yi . If Yi is normal, then Y is normal for
any n (even if small). If Yi has a distribution very far from normal, then
the approximation requires that n ≥ 30. For sure, when n ≥ 100, the
¯
distribution of Y should look pretty normal.
¯
Since the distribution of Y approaches the normal as n grows large, then
¯
Y is said to be asymptotically normally distributed.
We’re ready for statistics!


Probability theory 2

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Probability theory 2

Similar to Probability theory 2 (20)

More from Julio Huato

More from Julio Huato (20)

Recently uploaded

Recently uploaded (20)

Probability theory 2