Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

# ERF Training Workshop Panel Data 5

792 views

Published on

Raimundo Soto - Catholic University of Chile

ERF Training on Advanced Panel Data Techniques Applied to Economic Modelling

29 -31 October, 2018
Cairo, Egypt

Published in: Government & Nonprofit
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here

Are you sure you want to  Yes  No
Your message goes here

Are you sure you want to  Yes  No
Your message goes here

Are you sure you want to  Yes  No
Your message goes here

Are you sure you want to  Yes  No
Your message goes here

Are you sure you want to  Yes  No
Your message goes here
• Be the first to like this

### ERF Training Workshop Panel Data 5

1. 1. ERF Training Workshop Panel Data 5 Raimundo Soto Instituto de EconomΓ­a, PUC-Chile
2. 2. DISCRETE VARIABLE MODELS β’ A discrete variable is usually represented by π¦ππ‘ = 0,1 β’ For example, 0 = πππ πππ‘ π‘πππ π‘βπ π‘ππππ‘ππππ‘ 1 = πππ π‘πππ π‘βπ π‘ππππ‘ππππ‘ β’ Assume: ππππ π¦ππ‘ = 1 = π β’ Hence: ππππ π¦ππ‘ = 0 = 1 β π 2
3. 3. DISCRETE VARIABLE MODELS β’ The expected value of π¦ππ‘ is: πΈ π¦ = ππππ π¦ππ‘ = 1 β 1 + ππππ π¦ππ‘ = 0 β 0 = ππππ π¦ππ‘ = 1 = π β’ Let us assume now that π = πΉ π₯ππ‘, π½ β’ Then: ππππ π¦ππ‘ = 1 = πΉ π₯ππ‘, π½ = πΈ π¦|π₯ 3
4. 4. LINEAR PROBABILITY MODEL β’ Let us estimate a linear model of πΉ π₯ππ‘, π½ π¦ππ‘ = πΌπ + π½π₯ππ‘ + πππ‘ β’ Naturally, it verifies that πΈ π|π = πΌπ + π½π₯ππ‘ β’ Problem: forecast π¦ππ‘ = πΌπ + π½π₯ππ‘ could lie outside 0,1 4
5. 5. LINEAR PROBABILITY MODEL β’ Furthermore, the model is heteroskedastic and rather awkward because heteroskedasticity now depends on π½. β’ Since πΌπ + π½π₯ππ‘ + πππ‘ must be either 1 o 0, then πππ‘ = βπΌπ β π½π₯ππ‘ with probability 1 β πΉ( π₯ππ‘, πΌ, π½) πππ‘ = 1 β πΌπ β π½π₯ππ‘ with probability πΉ( π₯ππ‘, πΌ, π½) β’ Thus, the variance of πππ‘ is: πππ πππ‘ = πΌπ + π½π₯ππ‘ 1 β πΌπ β π½π₯ππ‘ 5
6. 6. LINEAR PROBABILITY MODEL β’ Letβs keep the notion of studying the probability that π¦ππ‘ = 1 but change the specification so that the probability is properly defined (between 0 and 1) β’ The latter obtains if πΉ π₯ππ‘, π½ is a cumulative distribution function (CDF) β’ There are two main families of models: β Probabilistic models using the logistic CDF, called Logit β Probabilistic models using the normal CDF, called Probit β’ Both types of models are highly non-linear β Cannot be estimated using Least Squares β Must be estimated using likelihood functions when such estimator exists 6
7. 7. NON-LINEAR PROBABILITY MODEL β’ Logit Model Ξ π€ = π π€ 1 + π π€ β Note: variance is π2 3 β’ Probit Model π π€ = ββ π€ 1 2π π βπ€2 2 ππ€ β Note: π2 = 1 , it cannot be identified in a two-state model 7
8. 8. Density Function 8 Normal Logistic
9. 9. Cumulative Distribution Function 9
10. 10. Cumulative Distribution Function 10 Linear Model (OK between 0.3 and 0.7)
11. 11. Cumulative Distribution Function 11 Logistic gives more probability to (y=0)
12. 12. PARAMETERS AND MARGINAL EFFECTS β’ In the linear model, π¦ππ‘ = πΌπ + π½π₯ππ‘ + πππ‘, parameters are simply: ππΈ π¦|π₯ ππ₯ = π πΌπ + π½π₯ππ‘ + πππ‘ ππ₯ππ‘ = π½ β’ In a discrete variable model: ππΈ π¦|π₯ ππ₯ = ππΉ πΌπ + π½π₯ππ‘ + πππ‘ ππ½π₯ππ‘ β ππ½π₯ππ‘ ππ₯ππ‘ = π½π πΌπ + π½π₯ππ‘ β  π½ Where π(β) = ππΉ(β) 12
13. 13. PARAMETERS AND MARGINAL EFFECTS β’ Note, however, that in the logistic: ππΈ π¦|π₯ ππ₯ = πΞ πΌπ + π½π₯ππ‘ + πππ‘ π(πΌπ+π½π₯ππ‘) = π πΌ π+π½π₯ ππ‘ 1 + π πΌ π+π½π₯ ππ‘ 2 = Ξ πΌπ + π½π₯ππ‘ 1 β Ξ πΌπ + π½π₯ππ‘ β’ So that: ππΈ π¦|π₯ ππ₯ = πΞ πΌπ + π½π₯ππ‘ + πππ‘ ππ½π₯ππ‘ β ππ½π₯ππ‘ ππ₯ππ‘ = π½Ξ πΌπ + π½π₯ππ‘ 1 β Ξ πΌπ + π½π₯ππ‘ 13
14. 14. PARAMETERS AND MARGINAL EFFECTS β’ In standard models (not panel) there is a certain correspondence among estimators of linear models (L), probit (π) and logit (Ξ): π½Ξ β π½ π β 1.6 π½πΏ β π½ π β 0.4 Except the constant π½πΏ β π½ π β 0.4 + 0.5 14
15. 15. MULTI-STATE MODELS β’ So far, we have specified the discrete variable as π¦ππ‘ = 0,1 β’ We could have three (o more) states β’ For example, 0 = π‘πππ£ππ π‘π πΆππππ ππ¦ ππ’π  1 = π‘πππ£ππ π‘π πΆππππ ππ¦ π‘ππππ 2 = π‘πππ£ππ π‘π πΆππππ ππ¦ πππππ β’ Problems are similar (slightly more complicated) 15
16. 16. GENERAL PLAN β’ It seems we have 4 cases: 16 Fixed Effects Random Effects Logit Probit
17. 17. GENERAL PLAN β’ It seems we have 4 cases: 17 Fixed Effects Random Effects Logit Almost OK Bias Probit Bias Ok
18. 18. LOGIT MODEL β’ Assume T=2 and consider the (log) likelihood function of a sample of size N: log πΏ = β π=1 π π‘=1 2 log 1 + ππ₯π πΌπ + π½π₯ππ‘ + π=1 π π‘=1 2 π¦ππ‘ ππ₯π πΌπ + π½π₯ππ‘ β’ Suppose that π₯ππ‘ = 0 if π‘ = 0 and π₯ππ‘ = 1 if π‘ = 1. Then the first derivatives are: 18
19. 19. LOGIT MODEL π log πΏ ππ½ = π=1 π π‘=1 2 π¦ππ‘ β π πΌ π+π½π₯ ππ‘ 1 + π πΌ π+π½π₯ ππ‘ π₯ππ‘ π log πΏ ππΌπ = π=1 π π‘=1 2 π¦ππ‘ β π πΌ π+π½π₯ ππ‘ 1 + π πΌ π+π½π₯ ππ‘ β’ Solving: πΌπ = β if π¦π1 + π¦π2 = 2 πΌπ = ββ if π¦π1 + π¦π2 = 0 πΌπ = β π½ 2 if π¦π1 + π¦π2 = 1 19
20. 20. LOGIT MODEL β’ The estimator of πΌπ does not exist if π‘=1 π π¦ππ‘ = 0 o π‘=1 π π¦ππ‘ = π. Obviously! β’ The estimator of π½ is inconsistent with fixed T if π β β because of the incidental parameter problem (Neyman and Scott, 1948): as N grows, so thus the number of parameters to be estimated (one πΌπ for each i) β’ In fact, the estimator of π½ is inconsistent, ππππ π½ = 2π½ (Hsiao, pp. 160-161) 20
21. 21. LOGIT MODEL β’ However, there is a Logit estimator that is consistent, called conditional logit β’ It uses only the information from units that switch states, i.e., when a unit from 0 β 1 or from 1 β 0 β’ Therefore, the estimator is conditional on observing a change in state, which allows identifying first π½ and later πΌπ β’ Note that this estimator eliminates: β All units that do not change state (always 0 or 1) β All variables that do not change in time. ο 21
22. 22. LOGIT MODEL β’ The conditional Logit model allows estimating π½ using Newton-Raphson algorithm iterative (o similar techniques, e.g. BHHH). The same algorithm produces the variance of the estimators, π£ππ π½ . β’ Scores (1st derivatives) π  π½ = πβ π π½ ππ½ = π 1(0 < π¦π+ < 22
23. 23. PROBIT MODEL 23 β’ Following the linear probability model, we assume that individual effects are random and distribute with Normal distribution π¦ππ‘ β  0 β π½π₯ππ‘ + πΌπ + πππ‘ > 0 β’ Let πππ‘ = πΌπ + πππ‘ β’ Then, πππ‘ βΌ π 0, ππ 2 β’ And: πΉ π¦, π§ = π π§ π π π¦ β  0 1 β π π§ π¦ = 0
24. 24. PROBIT MODEL 24 β’ The likelihood of each panel is ππ π¦π1, π¦π2, β¦ , π¦ππ|π₯π1, π₯π2, β¦ , π₯ππ = ββ β π β πππ‘ 2 2π2 2ππ2 π‘=1 π πΉ π¦ππ‘, πΌπ + π½π₯ππ‘ πππ β’ This integral can be represented by = ββ β π π¦ππ‘, π₯ππ‘, ππ πππ β’ And approximated using Gauss-Hermite quadrature methods (e.g., using πβπ₯2 ):
25. 25. PROBIT MODEL 25 β’ Quadratureβs notion: ββ β πβπ₯2 β π₯ ππ₯ β π=1 π π π ββ π π β where π π are weights, β π π β are quadrature ordinates and M are quadrature points β The idea is properly approximate that β(π₯) using a polynomial of order M (this is the key issue) β’ The above integral de can be written as : ββ β π π₯ ππ₯ β π=1 π π π β π π π β 2 β π π β
26. 26. PROBIT MODEL 26 β’ The likelihood function of each panel (unit) is approximated using: ππ = 2 ππ π=1 π π€ π β π π π β 2 π π¦ππ‘, π₯ππ‘, 2 ππ π π β + ππ β’ The likelihood function of the complete sample (all units) is approximated using: πΏ β π=1 π π€π log 2 ππ π=1 π π€ π β π π π β 2 πβ 2 π π π π β + π π 2ππ2 π‘=1 π πΉ π¦ππ‘, πΌπ + π½π₯ππ‘ + 2 ππ π π β + ππ
27. 27. Likelihood Estimation β’ The likelihood function is βthe probability that a sample of observations of size n is a realization of a particular distribution π π¦ππ‘, π₯ππ‘|π " β’ Letβs call it β π π|π¦ππ‘, π₯ππ‘ 27
28. 28. Example of Likelihood Estimation β’ Consider βbicycle accidents in the campusβ in a given year. Suppose we record the following sample {2,0,3,4,1,3,0,2,3,4,3,5} β’ What do you think is the model that generated this sample? β’ What do you think is the distribution that generated this sample? 28
29. 29. Example of Likelihood Estimation β’ OK, letβs try Poisson (although I think it is Normal) β’ Poisson distribution of each observation: β’ When observations are independent, the joint probability or likelihood function is the product of the marginals 29
30. 30. Example of Likelihood Estimation β’ We want to pick π so as to make this probability (likelihood) to be a maximum. There are two ways: β Trying different values (most often used) β Use calculus β’ Our likelihood function is really ugly (non linear) β’ But we can use logs to make it much nicer 30
31. 31. Example of Likelihood Estimation β’ Taking logs: β’ To get the optimal π: derive twice, equalize to zero, make sure second derivative is negative, and obtain π from first derivative: β’ 1st derivative: β12 + 30 β 1 π β’ 2nd derivative: β30 β 1 π2 which is negative β’ Therefore π = 2.5 31
32. 32. Example of Likelihood Estimation β’ Of all Poisson distributions, the one that best describes the data is that with parameter 2.5 β’ What about my normal? β Well I can fit the normal to the data and find π, π2 β’ Is you model better than mine? β No way!!! 32
33. 33. Likelihood Estimation β’ The likelihood function is βthe probability that a sample of observations of size n is a realization of a particular distribution π π¦ππ‘, π₯ππ‘|π " β π π|π¦ππ‘, π₯ππ‘ β’ The joint distribution is the product of the conditional density times the marginal density: π π¦ππ‘, π₯ππ‘|π = π π¦ππ‘|π₯ππ‘, π π π₯ππ‘|π 33
34. 34. Likelihood Estimation β’ A statistic is sufficient with regards to a model and its unknown parameters if "no other statistic can be calculated using the same sample that could bring additional information vis-Γ -vis the true value of the parametersβ. β’ Usually, a sufficient statistic is a simple function of the data, e.g., the sum of the observations. β’ In our case, a sufficient statistic of π π₯ππ‘|π is the sum of the observations (the units that change state, because in the others πΌπ is undefined) 34
35. 35. Likelihood Estimation β’ Therefore, the conditional log likelihood function is: β π π½ = π 1 0 < π¦π+ < π π‘ π¦ππ‘ π₯ππ‘ β² π½ β log π§ π¦ π+ π π‘ π§ π‘ π₯ ππ‘Β΄π½ Where the π§π‘ represents all possible cases where there is a change in state.ο 35