Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ERF Training Workshop Panel Data 5

187 views

Published on

Raimundo Soto - Catholic University of Chile

ERF Training on Advanced Panel Data Techniques Applied to Economic Modelling

29 -31 October, 2018
Cairo, Egypt

Published in: Government & Nonprofit
  • Be the first to comment

  • Be the first to like this

ERF Training Workshop Panel Data 5

  1. 1. ERF Training Workshop Panel Data 5 Raimundo Soto Instituto de EconomΓ­a, PUC-Chile
  2. 2. DISCRETE VARIABLE MODELS β€’ A discrete variable is usually represented by 𝑦𝑖𝑑 = 0,1 β€’ For example, 0 = 𝑑𝑖𝑑 π‘›π‘œπ‘‘ π‘‘π‘Žπ‘˜π‘’ π‘‘β„Žπ‘’ π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘ 1 = 𝑑𝑖𝑑 π‘‘π‘Žπ‘˜π‘’ π‘‘β„Žπ‘’ π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘ β€’ Assume: π‘ƒπ‘Ÿπ‘œπ‘ 𝑦𝑖𝑑 = 1 = 𝑝 β€’ Hence: π‘ƒπ‘Ÿπ‘œπ‘ 𝑦𝑖𝑑 = 0 = 1 βˆ’ 𝑝 2
  3. 3. DISCRETE VARIABLE MODELS β€’ The expected value of 𝑦𝑖𝑑 is: 𝐸 𝑦 = π‘ƒπ‘Ÿπ‘œπ‘ 𝑦𝑖𝑑 = 1 βˆ— 1 + π‘ƒπ‘Ÿπ‘œπ‘ 𝑦𝑖𝑑 = 0 βˆ— 0 = π‘ƒπ‘Ÿπ‘œπ‘ 𝑦𝑖𝑑 = 1 = 𝑝 β€’ Let us assume now that 𝑝 = 𝐹 π‘₯𝑖𝑑, 𝛽 β€’ Then: π‘ƒπ‘Ÿπ‘œπ‘ 𝑦𝑖𝑑 = 1 = 𝐹 π‘₯𝑖𝑑, 𝛽 = 𝐸 𝑦|π‘₯ 3
  4. 4. LINEAR PROBABILITY MODEL β€’ Let us estimate a linear model of 𝐹 π‘₯𝑖𝑑, 𝛽 𝑦𝑖𝑑 = 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 + πœ€π‘–π‘‘ β€’ Naturally, it verifies that 𝐸 π‘Œ|𝑋 = 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 β€’ Problem: forecast 𝑦𝑖𝑑 = 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 could lie outside 0,1 4
  5. 5. LINEAR PROBABILITY MODEL β€’ Furthermore, the model is heteroskedastic and rather awkward because heteroskedasticity now depends on 𝛽. β€’ Since 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 + πœ€π‘–π‘‘ must be either 1 o 0, then πœ€π‘–π‘‘ = βˆ’π›Όπ‘– βˆ’ 𝛽π‘₯𝑖𝑑 with probability 1 βˆ’ 𝐹( π‘₯𝑖𝑑, 𝛼, 𝛽) πœ€π‘–π‘‘ = 1 βˆ’ 𝛼𝑖 βˆ’ 𝛽π‘₯𝑖𝑑 with probability 𝐹( π‘₯𝑖𝑑, 𝛼, 𝛽) β€’ Thus, the variance of πœ€π‘–π‘‘ is: π‘‰π‘Žπ‘Ÿ πœ€π‘–π‘‘ = 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 1 βˆ’ 𝛼𝑖 βˆ’ 𝛽π‘₯𝑖𝑑 5
  6. 6. LINEAR PROBABILITY MODEL β€’ Let’s keep the notion of studying the probability that 𝑦𝑖𝑑 = 1 but change the specification so that the probability is properly defined (between 0 and 1) β€’ The latter obtains if 𝐹 π‘₯𝑖𝑑, 𝛽 is a cumulative distribution function (CDF) β€’ There are two main families of models: – Probabilistic models using the logistic CDF, called Logit – Probabilistic models using the normal CDF, called Probit β€’ Both types of models are highly non-linear – Cannot be estimated using Least Squares – Must be estimated using likelihood functions when such estimator exists 6
  7. 7. NON-LINEAR PROBABILITY MODEL β€’ Logit Model Ξ› 𝑀 = 𝑒 𝑀 1 + 𝑒 𝑀 – Note: variance is πœ‹2 3 β€’ Probit Model πœ™ 𝑀 = βˆ’βˆž 𝑀 1 2πœ‹ 𝑒 βˆ’π‘€2 2 𝑑𝑀 – Note: 𝜎2 = 1 , it cannot be identified in a two-state model 7
  8. 8. Density Function 8 Normal Logistic
  9. 9. Cumulative Distribution Function 9
  10. 10. Cumulative Distribution Function 10 Linear Model (OK between 0.3 and 0.7)
  11. 11. Cumulative Distribution Function 11 Logistic gives more probability to (y=0)
  12. 12. PARAMETERS AND MARGINAL EFFECTS β€’ In the linear model, 𝑦𝑖𝑑 = 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 + πœ€π‘–π‘‘, parameters are simply: πœ•πΈ 𝑦|π‘₯ πœ•π‘₯ = πœ• 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 + πœ€π‘–π‘‘ πœ•π‘₯𝑖𝑑 = 𝛽 β€’ In a discrete variable model: πœ•πΈ 𝑦|π‘₯ πœ•π‘₯ = πœ•πΉ 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 + πœ€π‘–π‘‘ πœ•π›½π‘₯𝑖𝑑 βˆ— πœ•π›½π‘₯𝑖𝑑 πœ•π‘₯𝑖𝑑 = 𝛽𝑓 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 β‰  𝛽 Where 𝑑(βˆ™) = 𝑑𝐹(βˆ™) 12
  13. 13. PARAMETERS AND MARGINAL EFFECTS β€’ Note, however, that in the logistic: πœ•πΈ 𝑦|π‘₯ πœ•π‘₯ = πœ•Ξ› 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 + πœ€π‘–π‘‘ πœ•(𝛼𝑖+𝛽π‘₯𝑖𝑑) = 𝑒 𝛼 𝑖+𝛽π‘₯ 𝑖𝑑 1 + 𝑒 𝛼 𝑖+𝛽π‘₯ 𝑖𝑑 2 = Ξ› 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 1 βˆ’ Ξ› 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 β€’ So that: πœ•πΈ 𝑦|π‘₯ πœ•π‘₯ = πœ•Ξ› 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 + πœ€π‘–π‘‘ πœ•π›½π‘₯𝑖𝑑 βˆ— πœ•π›½π‘₯𝑖𝑑 πœ•π‘₯𝑖𝑑 = 𝛽Λ 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 1 βˆ’ Ξ› 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 13
  14. 14. PARAMETERS AND MARGINAL EFFECTS β€’ In standard models (not panel) there is a certain correspondence among estimators of linear models (L), probit (πœ™) and logit (Ξ›): 𝛽Λ β‰… 𝛽 πœ™ βˆ— 1.6 𝛽𝐿 β‰… 𝛽 πœ™ βˆ— 0.4 Except the constant 𝛽𝐿 β‰… 𝛽 πœ™ βˆ— 0.4 + 0.5 14
  15. 15. MULTI-STATE MODELS β€’ So far, we have specified the discrete variable as 𝑦𝑖𝑑 = 0,1 β€’ We could have three (o more) states β€’ For example, 0 = π‘‘π‘Ÿπ‘Žπ‘£π‘’π‘™ π‘‘π‘œ πΆπ‘Žπ‘–π‘Ÿπ‘œ 𝑏𝑦 𝑏𝑒𝑠 1 = π‘‘π‘Ÿπ‘Žπ‘£π‘’π‘™ π‘‘π‘œ πΆπ‘Žπ‘–π‘Ÿπ‘œ 𝑏𝑦 π‘‘π‘Ÿπ‘Žπ‘–π‘› 2 = π‘‘π‘Ÿπ‘Žπ‘£π‘’π‘™ π‘‘π‘œ πΆπ‘Žπ‘–π‘Ÿπ‘œ 𝑏𝑦 π‘π‘™π‘Žπ‘›π‘’ β€’ Problems are similar (slightly more complicated) 15
  16. 16. GENERAL PLAN β€’ It seems we have 4 cases: 16 Fixed Effects Random Effects Logit Probit
  17. 17. GENERAL PLAN β€’ It seems we have 4 cases: 17 Fixed Effects Random Effects Logit Almost OK Bias Probit Bias Ok
  18. 18. LOGIT MODEL β€’ Assume T=2 and consider the (log) likelihood function of a sample of size N: log 𝐿 = βˆ’ 𝑖=1 𝑁 𝑑=1 2 log 1 + 𝑒π‘₯𝑝 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 + 𝑖=1 𝑁 𝑑=1 2 𝑦𝑖𝑑 𝑒π‘₯𝑝 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 β€’ Suppose that π‘₯𝑖𝑑 = 0 if 𝑑 = 0 and π‘₯𝑖𝑑 = 1 if 𝑑 = 1. Then the first derivatives are: 18
  19. 19. LOGIT MODEL πœ• log 𝐿 πœ•π›½ = 𝑖=1 𝑁 𝑑=1 2 𝑦𝑖𝑑 βˆ’ 𝑒 𝛼 𝑖+𝛽π‘₯ 𝑖𝑑 1 + 𝑒 𝛼 𝑖+𝛽π‘₯ 𝑖𝑑 π‘₯𝑖𝑑 πœ• log 𝐿 πœ•π›Όπ‘– = 𝑖=1 𝑁 𝑑=1 2 𝑦𝑖𝑑 βˆ’ 𝑒 𝛼 𝑖+𝛽π‘₯ 𝑖𝑑 1 + 𝑒 𝛼 𝑖+𝛽π‘₯ 𝑖𝑑 β€’ Solving: 𝛼𝑖 = ∞ if 𝑦𝑖1 + 𝑦𝑖2 = 2 𝛼𝑖 = βˆ’βˆž if 𝑦𝑖1 + 𝑦𝑖2 = 0 𝛼𝑖 = βˆ’ 𝛽 2 if 𝑦𝑖1 + 𝑦𝑖2 = 1 19
  20. 20. LOGIT MODEL β€’ The estimator of 𝛼𝑖 does not exist if 𝑑=1 𝑇 𝑦𝑖𝑑 = 0 o 𝑑=1 𝑇 𝑦𝑖𝑑 = 𝑇. Obviously! β€’ The estimator of 𝛽 is inconsistent with fixed T if 𝑛 β†’ ∞ because of the incidental parameter problem (Neyman and Scott, 1948): as N grows, so thus the number of parameters to be estimated (one 𝛼𝑖 for each i) β€’ In fact, the estimator of 𝛽 is inconsistent, π‘π‘™π‘–π‘š 𝛽 = 2𝛽 (Hsiao, pp. 160-161) 20
  21. 21. LOGIT MODEL β€’ However, there is a Logit estimator that is consistent, called conditional logit β€’ It uses only the information from units that switch states, i.e., when a unit from 0 β†’ 1 or from 1 β†’ 0 β€’ Therefore, the estimator is conditional on observing a change in state, which allows identifying first 𝛽 and later 𝛼𝑖 β€’ Note that this estimator eliminates: – All units that do not change state (always 0 or 1) – All variables that do not change in time.  21
  22. 22. LOGIT MODEL β€’ The conditional Logit model allows estimating 𝛽 using Newton-Raphson algorithm iterative (o similar techniques, e.g. BHHH). The same algorithm produces the variance of the estimators, π‘£π‘Žπ‘Ÿ 𝛽 . β€’ Scores (1st derivatives) 𝑠 𝛽 = πœ•β„’ 𝑐 𝛽 πœ•π›½ = 𝑖 1(0 < 𝑦𝑖+ < 22
  23. 23. PROBIT MODEL 23 β€’ Following the linear probability model, we assume that individual effects are random and distribute with Normal distribution 𝑦𝑖𝑑 β‰  0 ↔ 𝛽π‘₯𝑖𝑑 + 𝛼𝑖 + πœ€π‘–π‘‘ > 0 β€’ Let πœˆπ‘–π‘‘ = 𝛼𝑖 + πœ€π‘–π‘‘ β€’ Then, πœˆπ‘–π‘‘ ∼ 𝑁 0, 𝜎𝜈 2 β€’ And: 𝐹 𝑦, 𝑧 = πœ™ 𝑧 𝑠𝑖 𝑦 β‰  0 1 βˆ’ πœ™ 𝑧 𝑦 = 0
  24. 24. PROBIT MODEL 24 β€’ The likelihood of each panel is π‘ƒπ‘Ÿ 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑛|π‘₯𝑖1, π‘₯𝑖2, … , π‘₯𝑖𝑛 = βˆ’βˆž ∞ 𝑒 βˆ’ πœˆπ‘–π‘‘ 2 2𝜎2 2πœ‹πœŽ2 𝑑=1 𝑛 𝐹 𝑦𝑖𝑑, 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 π‘‘πœˆπ‘– β€’ This integral can be represented by = βˆ’βˆž ∞ 𝑔 𝑦𝑖𝑑, π‘₯𝑖𝑑, πœˆπ‘– π‘‘πœˆπ‘– β€’ And approximated using Gauss-Hermite quadrature methods (e.g., using π‘’βˆ’π‘₯2 ):
  25. 25. PROBIT MODEL 25 β€’ Quadrature’s notion: βˆ’βˆž ∞ π‘’βˆ’π‘₯2 β„Ž π‘₯ 𝑑π‘₯ β‰ˆ π‘š=1 𝑀 πœ” π‘š βˆ—β„Ž π‘Ž π‘š βˆ— where πœ” π‘š are weights, β„Ž π‘Ž π‘š βˆ— are quadrature ordinates and M are quadrature points – The idea is properly approximate that β„Ž(π‘₯) using a polynomial of order M (this is the key issue) β€’ The above integral de can be written as : βˆ’βˆž ∞ 𝑓 π‘₯ 𝑑π‘₯ β‰ˆ π‘š=1 𝑀 πœ” π‘š βˆ— 𝑒 π‘Ž π‘š βˆ— 2 β„Ž π‘Ž π‘š βˆ—
  26. 26. PROBIT MODEL 26 β€’ The likelihood function of each panel (unit) is approximated using: 𝑙𝑖 = 2 πœŽπ‘– π‘š=1 𝑀 𝑀 π‘š βˆ— 𝑒 π‘Ž π‘š βˆ— 2 𝑔 𝑦𝑖𝑑, π‘₯𝑖𝑑, 2 πœŽπ‘– π‘Ž π‘š βˆ— + πœ€π‘– β€’ The likelihood function of the complete sample (all units) is approximated using: 𝐿 β‰ˆ 𝑖=1 𝑛 𝑀𝑖 log 2 πœŽπ‘– π‘š=1 𝑀 𝑀 π‘š βˆ— 𝑒 π‘Ž π‘š βˆ— 2 π‘’βˆ’ 2 𝜎 𝑖 π‘Ž π‘š βˆ— + πœ€ 𝑖 2πœ‹πœŽ2 𝑑=1 𝑛 𝐹 𝑦𝑖𝑑, 𝛼𝑖 + 𝛽π‘₯𝑖𝑑 + 2 πœŽπ‘– π‘Ž π‘š βˆ— + πœ€π‘–
  27. 27. Likelihood Estimation β€’ The likelihood function is β€œthe probability that a sample of observations of size n is a realization of a particular distribution 𝑓 𝑦𝑖𝑑, π‘₯𝑖𝑑|πœƒ " β€’ Let’s call it β„’ 𝑛 πœƒ|𝑦𝑖𝑑, π‘₯𝑖𝑑 27
  28. 28. Example of Likelihood Estimation β€’ Consider β€œbicycle accidents in the campus” in a given year. Suppose we record the following sample {2,0,3,4,1,3,0,2,3,4,3,5} β€’ What do you think is the model that generated this sample? β€’ What do you think is the distribution that generated this sample? 28
  29. 29. Example of Likelihood Estimation β€’ OK, let’s try Poisson (although I think it is Normal) β€’ Poisson distribution of each observation: β€’ When observations are independent, the joint probability or likelihood function is the product of the marginals 29
  30. 30. Example of Likelihood Estimation β€’ We want to pick πœƒ so as to make this probability (likelihood) to be a maximum. There are two ways: – Trying different values (most often used) – Use calculus β€’ Our likelihood function is really ugly (non linear) β€’ But we can use logs to make it much nicer 30
  31. 31. Example of Likelihood Estimation β€’ Taking logs: β€’ To get the optimal πœƒ: derive twice, equalize to zero, make sure second derivative is negative, and obtain πœƒ from first derivative: β€’ 1st derivative: βˆ’12 + 30 βˆ— 1 πœƒ β€’ 2nd derivative: βˆ’30 βˆ— 1 πœƒ2 which is negative β€’ Therefore πœƒ = 2.5 31
  32. 32. Example of Likelihood Estimation β€’ Of all Poisson distributions, the one that best describes the data is that with parameter 2.5 β€’ What about my normal? – Well I can fit the normal to the data and find πœ‡, 𝜎2 β€’ Is you model better than mine? – No way!!! 32
  33. 33. Likelihood Estimation β€’ The likelihood function is β€œthe probability that a sample of observations of size n is a realization of a particular distribution 𝑓 𝑦𝑖𝑑, π‘₯𝑖𝑑|πœƒ " β„’ 𝑛 πœƒ|𝑦𝑖𝑑, π‘₯𝑖𝑑 β€’ The joint distribution is the product of the conditional density times the marginal density: 𝑓 𝑦𝑖𝑑, π‘₯𝑖𝑑|πœƒ = 𝑓 𝑦𝑖𝑑|π‘₯𝑖𝑑, πœƒ 𝑓 π‘₯𝑖𝑑|πœƒ 33
  34. 34. Likelihood Estimation β€’ A statistic is sufficient with regards to a model and its unknown parameters if "no other statistic can be calculated using the same sample that could bring additional information vis-Γ -vis the true value of the parameters”. β€’ Usually, a sufficient statistic is a simple function of the data, e.g., the sum of the observations. β€’ In our case, a sufficient statistic of 𝑓 π‘₯𝑖𝑑|πœƒ is the sum of the observations (the units that change state, because in the others 𝛼𝑖 is undefined) 34
  35. 35. Likelihood Estimation β€’ Therefore, the conditional log likelihood function is: β„’ 𝑐 𝛽 = 𝑖 1 0 < 𝑦𝑖+ < 𝑇 𝑑 𝑦𝑖𝑑 π‘₯𝑖𝑑 β€² 𝛽 βˆ’ log 𝑧 𝑦 𝑖+ 𝑒 𝑑 𝑧 𝑑 π‘₯ 𝑖𝑑´𝛽 Where the 𝑧𝑑 represents all possible cases where there is a change in state. 35

Γ—