ERF Training WorkshopPanel Data 5

ERF Training Workshop
Panel Data 5
Raimundo Soto
Instituto de Economía, PUC-Chile

DISCRETE VARIABLE MODELS
• A discrete variable is usually represented by 𝑦𝑖𝑡 = 0,1
• For example,
0 = 𝑑𝑖𝑑 𝑛𝑜𝑡 𝑡𝑎𝑘𝑒 𝑡ℎ𝑒 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡
1 = 𝑑𝑖𝑑 𝑡𝑎𝑘𝑒 𝑡ℎ𝑒 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡
• Assume: 𝑃𝑟𝑜𝑏 𝑦𝑖𝑡 = 1 = 𝑝
• Hence: 𝑃𝑟𝑜𝑏 𝑦𝑖𝑡 = 0 = 1 − 𝑝
2

DISCRETE VARIABLE MODELS
• The expected value of 𝑦𝑖𝑡 is:
𝐸 𝑦 = 𝑃𝑟𝑜𝑏 𝑦𝑖𝑡 = 1 ∗ 1 + 𝑃𝑟𝑜𝑏 𝑦𝑖𝑡 = 0 ∗ 0
= 𝑃𝑟𝑜𝑏 𝑦𝑖𝑡 = 1 = 𝑝
• Let us assume now that 𝑝 = 𝐹 𝑥𝑖𝑡, 𝛽
• Then: 𝑃𝑟𝑜𝑏 𝑦𝑖𝑡 = 1 = 𝐹 𝑥𝑖𝑡, 𝛽 = 𝐸 𝑦|𝑥
3

LINEAR PROBABILITY MODEL
• Let us estimate a linear model of 𝐹 𝑥𝑖𝑡, 𝛽
𝑦𝑖𝑡 = 𝛼𝑖 + 𝛽𝑥𝑖𝑡 + 𝜀𝑖𝑡
• Naturally, it verifies that
𝐸 𝑌|𝑋 = 𝛼𝑖 + 𝛽𝑥𝑖𝑡
• Problem: forecast 𝑦𝑖𝑡 = 𝛼𝑖 + 𝛽𝑥𝑖𝑡 could lie outside 0,1
4

• Furthermore, the model is heteroskedastic and rather
awkward because heteroskedasticity now depends on 𝛽.
• Since 𝛼𝑖 + 𝛽𝑥𝑖𝑡 + 𝜀𝑖𝑡 must be either 1 o 0, then
𝜀𝑖𝑡 = −𝛼𝑖 − 𝛽𝑥𝑖𝑡 with probability 1 − 𝐹( 𝑥𝑖𝑡, 𝛼, 𝛽)
𝜀𝑖𝑡 = 1 − 𝛼𝑖 − 𝛽𝑥𝑖𝑡 with probability 𝐹( 𝑥𝑖𝑡, 𝛼, 𝛽)
• Thus, the variance of 𝜀𝑖𝑡 is:
𝑉𝑎𝑟 𝜀𝑖𝑡 = 𝛼𝑖 + 𝛽𝑥𝑖𝑡 1 − 𝛼𝑖 − 𝛽𝑥𝑖𝑡
5

• Let’s keep the notion of studying the probability that 𝑦𝑖𝑡 = 1 but
change the specification so that the probability is properly defined
(between 0 and 1)
• The latter obtains if 𝐹 𝑥𝑖𝑡, 𝛽 is a cumulative distribution function
(CDF)
• There are two main families of models:
– Probabilistic models using the logistic CDF, called Logit
– Probabilistic models using the normal CDF, called Probit
• Both types of models are highly non-linear
– Cannot be estimated using Least Squares
– Must be estimated using likelihood functions when such estimator exists
6

NON-LINEAR PROBABILITY MODEL
• Logit Model
Λ 𝑤 =
𝑒 𝑤
1 + 𝑒 𝑤
– Note: variance is 𝜋2
3
• Probit Model
𝜙 𝑤 =
−∞
𝑤
1
2𝜋
𝑒
−𝑤2
2 𝑑𝑤
– Note: 𝜎2 = 1 , it cannot be identified in a two-state model
7

Density Function
8
Normal
Logistic

Cumulative Distribution Function
9

10
Linear Model
(OK between 0.3 and 0.7)

11
Logistic gives
more probability
to (y=0)

PARAMETERS AND MARGINAL EFFECTS
• In the linear model, 𝑦𝑖𝑡 = 𝛼𝑖 + 𝛽𝑥𝑖𝑡 + 𝜀𝑖𝑡, parameters are
simply:
𝜕𝐸 𝑦|𝑥
𝜕𝑥
=
𝜕 𝛼𝑖 + 𝛽𝑥𝑖𝑡 + 𝜀𝑖𝑡
𝜕𝑥𝑖𝑡
= 𝛽
• In a discrete variable model:
𝜕𝐸 𝑦|𝑥
𝜕𝑥
=
𝜕𝐹 𝛼𝑖 + 𝛽𝑥𝑖𝑡 + 𝜀𝑖𝑡
𝜕𝛽𝑥𝑖𝑡
∗
𝜕𝑥𝑖𝑡
= 𝛽𝑓 𝛼𝑖 + 𝛽𝑥𝑖𝑡 ≠ 𝛽
Where 𝑑(∙) = 𝑑𝐹(∙)
12

• Note, however, that in the logistic:
𝜕𝐸 𝑦|𝑥
𝜕𝑥
=
𝜕Λ 𝛼𝑖 + 𝛽𝑥𝑖𝑡 + 𝜀𝑖𝑡
𝜕(𝛼𝑖+𝛽𝑥𝑖𝑡)
=
𝑒 𝛼 𝑖+𝛽𝑥 𝑖𝑡
1 + 𝑒 𝛼 𝑖+𝛽𝑥 𝑖𝑡 2
= Λ 𝛼𝑖 + 𝛽𝑥𝑖𝑡 1 − Λ 𝛼𝑖 + 𝛽𝑥𝑖𝑡
• So that:
𝜕𝐸 𝑦|𝑥
𝜕𝑥
=
𝜕Λ 𝛼𝑖 + 𝛽𝑥𝑖𝑡 + 𝜀𝑖𝑡
∗
𝜕𝑥𝑖𝑡
= 𝛽Λ 𝛼𝑖 + 𝛽𝑥𝑖𝑡 1 − Λ 𝛼𝑖 + 𝛽𝑥𝑖𝑡
13

• In standard models (not panel) there is a certain
correspondence among estimators of linear models (L),
probit (𝜙) and logit (Λ):
𝛽Λ ≅ 𝛽 𝜙 ∗ 1.6
𝛽𝐿 ≅ 𝛽 𝜙 ∗ 0.4
Except the constant
𝛽𝐿 ≅ 𝛽 𝜙 ∗ 0.4 + 0.5
14

MULTI-STATE MODELS
• So far, we have specified the discrete variable as 𝑦𝑖𝑡 = 0,1
• We could have three (o more) states
• For example,
0 = 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑜 𝐶𝑎𝑖𝑟𝑜 𝑏𝑦 𝑏𝑢𝑠
1 = 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑜 𝐶𝑎𝑖𝑟𝑜 𝑏𝑦 𝑡𝑟𝑎𝑖𝑛
2 = 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑜 𝐶𝑎𝑖𝑟𝑜 𝑏𝑦 𝑝𝑙𝑎𝑛𝑒
• Problems are similar (slightly more complicated)
15

GENERAL PLAN
• It seems we have 4 cases:
16
Fixed
Effects
Random
Effects
Logit
Probit

GENERAL PLAN
• It seems we have 4 cases:
17
Fixed
Effects
Random
Effects
Logit Almost OK Bias
Probit Bias Ok

LOGIT MODEL
• Assume T=2 and consider the (log) likelihood function of a
sample of size N:
log 𝐿 = −
𝑖=1
𝑁
𝑡=1
2
log 1 + 𝑒𝑥𝑝 𝛼𝑖 + 𝛽𝑥𝑖𝑡 +
𝑖=1
𝑁
𝑡=1
2
𝑦𝑖𝑡 𝑒𝑥𝑝 𝛼𝑖 + 𝛽𝑥𝑖𝑡
• Suppose that 𝑥𝑖𝑡 = 0 if 𝑡 = 0 and 𝑥𝑖𝑡 = 1 if 𝑡 = 1. Then the
first derivatives are:
18

LOGIT MODEL
𝜕 log 𝐿
𝜕𝛽
=
𝑖=1
𝑁
𝑡=1
2
𝑦𝑖𝑡 −
1 + 𝑒 𝛼 𝑖+𝛽𝑥 𝑖𝑡
𝑥𝑖𝑡
𝜕 log 𝐿
𝜕𝛼𝑖
=
𝑖=1
𝑁
𝑡=1
2
𝑦𝑖𝑡 −
1 + 𝑒 𝛼 𝑖+𝛽𝑥 𝑖𝑡
• Solving:
𝛼𝑖 = ∞ if 𝑦𝑖1 + 𝑦𝑖2 = 2
𝛼𝑖 = −∞ if 𝑦𝑖1 + 𝑦𝑖2 = 0
𝛼𝑖 = −
𝛽
2
if 𝑦𝑖1 + 𝑦𝑖2 = 1
19

LOGIT MODEL
• The estimator of 𝛼𝑖 does not exist if 𝑡=1
𝑇
𝑦𝑖𝑡 = 0 o
𝑡=1
𝑇
𝑦𝑖𝑡 = 𝑇. Obviously!
• The estimator of 𝛽 is inconsistent with fixed T if 𝑛 → ∞
because of the incidental parameter problem (Neyman and
Scott, 1948): as N grows, so thus the number of parameters
to be estimated (one 𝛼𝑖 for each i)
• In fact, the estimator of 𝛽 is inconsistent, 𝑝𝑙𝑖𝑚 𝛽 = 2𝛽
(Hsiao, pp. 160-161)
20

LOGIT MODEL
• However, there is a Logit estimator that is consistent, called
conditional logit
• It uses only the information from units that switch states,
i.e., when a unit from 0 → 1 or from 1 → 0
• Therefore, the estimator is conditional on observing a
change in state, which allows identifying first 𝛽 and later
𝛼𝑖
• Note that this estimator eliminates:
– All units that do not change state (always 0 or 1)
– All variables that do not change in time. 
21

LOGIT MODEL
• The conditional Logit model allows estimating 𝛽 using
Newton-Raphson algorithm iterative (o similar techniques,
e.g. BHHH). The same algorithm produces the variance of
the estimators, 𝑣𝑎𝑟 𝛽 .
• Scores (1st derivatives) 𝑠 𝛽 =
𝜕ℒ 𝑐 𝛽
𝜕𝛽
= 𝑖 1(0 < 𝑦𝑖+ <
22

PROBIT MODEL
23
• Following the linear probability model, we assume that
individual effects are random and distribute with Normal
distribution
𝑦𝑖𝑡 ≠ 0 ↔ 𝛽𝑥𝑖𝑡 + 𝛼𝑖 + 𝜀𝑖𝑡 > 0
• Let 𝜈𝑖𝑡 = 𝛼𝑖 + 𝜀𝑖𝑡
• Then, 𝜈𝑖𝑡 ∼ 𝑁 0, 𝜎𝜈
2
• And: 𝐹 𝑦, 𝑧 =
𝜙 𝑧 𝑠𝑖 𝑦 ≠ 0
1 − 𝜙 𝑧 𝑦 = 0

PROBIT MODEL
24
• The likelihood of each panel is
𝑃𝑟 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑛|𝑥𝑖1, 𝑥𝑖2, … , 𝑥𝑖𝑛 =
−∞
∞
𝑒
− 𝜈𝑖𝑡
2
2𝜎2
2𝜋𝜎2
𝑡=1
𝑛
𝐹 𝑦𝑖𝑡, 𝛼𝑖 + 𝛽𝑥𝑖𝑡 𝑑𝜈𝑖
• This integral can be represented by
=
−∞
∞
𝑔 𝑦𝑖𝑡, 𝑥𝑖𝑡, 𝜈𝑖 𝑑𝜈𝑖
• And approximated using Gauss-Hermite quadrature
methods (e.g., using 𝑒−𝑥2
):

PROBIT MODEL
25
• Quadrature’s notion:
−∞
∞
𝑒−𝑥2
ℎ 𝑥 𝑑𝑥 ≈
𝑚=1
𝑀
𝜔 𝑚
∗ℎ 𝑎 𝑚
∗
where 𝜔 𝑚 are weights, ℎ 𝑎 𝑚
∗
are quadrature ordinates and M are
quadrature points
– The idea is properly approximate that ℎ(𝑥) using a polynomial of
order M (this is the key issue)
• The above integral de can be written as :
−∞
∞
𝑓 𝑥 𝑑𝑥 ≈
𝑚=1
𝑀
𝜔 𝑚
∗ 𝑒 𝑎 𝑚
∗ 2
ℎ 𝑎 𝑚
∗

PROBIT MODEL
26
• The likelihood function of each panel (unit) is
approximated using:
𝑙𝑖 = 2 𝜎𝑖
𝑚=1
𝑀
𝑤 𝑚
∗ 𝑒 𝑎 𝑚
∗ 2
𝑔 𝑦𝑖𝑡, 𝑥𝑖𝑡, 2 𝜎𝑖 𝑎 𝑚
∗ + 𝜀𝑖
• The likelihood function of the complete sample (all units)
is approximated using:
𝐿 ≈
𝑖=1
𝑛
𝑤𝑖 log 2 𝜎𝑖
𝑚=1
𝑀
𝑤 𝑚
∗
𝑒 𝑎 𝑚
∗ 2 𝑒− 2 𝜎 𝑖 𝑎 𝑚
∗ + 𝜀 𝑖
2𝜋𝜎2
𝑡=1
𝑛
𝐹 𝑦𝑖𝑡, 𝛼𝑖 + 𝛽𝑥𝑖𝑡 + 2 𝜎𝑖 𝑎 𝑚
∗
+ 𝜀𝑖

Likelihood Estimation
• The likelihood function is “the probability that a sample of
observations of size n is a realization of a particular
distribution 𝑓 𝑦𝑖𝑡, 𝑥𝑖𝑡|𝜃 "
• Let’s call it ℒ 𝑛 𝜃|𝑦𝑖𝑡, 𝑥𝑖𝑡
27

Example of Likelihood Estimation
• Consider “bicycle accidents in the campus” in a
given year. Suppose we record the following
sample
{2,0,3,4,1,3,0,2,3,4,3,5}
• What do you think is the model that generated this
sample?
• What do you think is the distribution that
generated this sample?
28

• OK, let’s try Poisson (although I think it is Normal)
• Poisson distribution of each observation:
• When observations are independent, the joint probability
or likelihood function is the product of the marginals
29

• We want to pick 𝜃 so as to make this probability
(likelihood) to be a maximum. There are two ways:
– Trying different values (most often used)
– Use calculus
• Our likelihood function is really ugly (non linear)
• But we can use logs to make it much nicer
30

• Taking logs:
• To get the optimal 𝜃: derive twice, equalize to zero,
make sure second derivative is negative, and
obtain 𝜃 from first derivative:
• 1st derivative: −12 + 30 ∗
1
𝜃
• 2nd derivative: −30 ∗
1
𝜃2 which is negative
• Therefore 𝜃 = 2.5
31

• Of all Poisson distributions, the one that best
describes the data is that with parameter 2.5
• What about my normal?
– Well I can fit the normal to the data and find 𝜇, 𝜎2
• Is you model better than mine?
– No way!!!
32

• The likelihood function is “the probability that a sample of
observations of size n is a realization of a particular
distribution 𝑓 𝑦𝑖𝑡, 𝑥𝑖𝑡|𝜃 "
ℒ 𝑛 𝜃|𝑦𝑖𝑡, 𝑥𝑖𝑡
• The joint distribution is the product of the conditional
density times the marginal density:
𝑓 𝑦𝑖𝑡, 𝑥𝑖𝑡|𝜃 = 𝑓 𝑦𝑖𝑡|𝑥𝑖𝑡, 𝜃 𝑓 𝑥𝑖𝑡|𝜃
33

• A statistic is sufficient with regards to a model and its
unknown parameters if "no other statistic can be calculated
using the same sample that could bring additional
information vis-à-vis the true value of the parameters”.
• Usually, a sufficient statistic is a simple function of the data,
e.g., the sum of the observations.
• In our case, a sufficient statistic of 𝑓 𝑥𝑖𝑡|𝜃 is the sum of
the observations (the units that change state, because in
the others 𝛼𝑖 is undefined)
34

• Therefore, the conditional log likelihood function is:
ℒ 𝑐 𝛽 =
𝑖
1 0 < 𝑦𝑖+ < 𝑇
𝑡
𝑦𝑖𝑡 𝑥𝑖𝑡
′
𝛽 − log
𝑧 𝑦 𝑖+
𝑒 𝑡 𝑧 𝑡 𝑥 𝑖𝑡´𝛽
Where the 𝑧𝑡 represents all possible cases where there is a
change in state.
35

ERF Training WorkshopPanel Data 5

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to ERF Training WorkshopPanel Data 5

Similar to ERF Training WorkshopPanel Data 5 (20)

More from Economic Research Forum

More from Economic Research Forum (20)

Recently uploaded

Recently uploaded (20)