Your SlideShare is downloading.
×

- 1. National Accounts and SAM Estimation Using Cross-Entropy Methods Sherman Robinson
- 2. Estimation Problem • Partial equilibrium models such as IMPACT require balanced and consistent datasets the represent disaggregated production and demand by commodity • Estimating such a dataset requires an efficiency method to incorporate and reconcile information from a variety of sources 2
- 3. Primary Data Sources for IMPACT Base Year • FAOSTAT for country totals for: – Production: Area, Yields and Supply – Demand: Total, Food, Intermediate, Feed, Other Demands – Trade: Exports, Imports, Net Trade – Nutrition: Calories per capita, calories per kg of commodity • AQUASTAT for country irrigation and rainfed production • SPAM pixel level estimation of global allocation of production 3
- 4. Estimating a Consistent and Disaggregated Database 4 Estimate IMPACT Country Database • FAOSTAT Estimate Technology Disaggregated Production • IMPACT Country Database • FAO AQUASTAT Estimate Geographic Disaggregated Production • Technology Disaggregated • SPAM
- 5. 5 Bayesian Work Plan Priors on values and estimation errors of production, demand, and trade Estimation by Cross- Entropy Method Check results against priors and identify potential data problems New information to correct identified problems
- 6. Information Theory Approach • Goal is to recover parameters and data we observe imperfectly. Estimation rather than prediction. • Assume very little information about the error generating process and nothing about the functional form of the error distribution. • Very different from standard statistical approaches (e.g., econometrics). – Usually have lots of data 6
- 7. Estimation Principles • Use all the information you have. • Do not use or assume any information you do not have. • Arnold Zellner: “Efficient Information Processing Rule (IPR).” • Close links to Bayesian estimation 7
- 8. Information Theory • Need to be flexible in incorporating information in parameter/data estimation – Lots of different forms of information • In classic statistics, “information” in a data set can summarized by the moments of the distribution of the data – Summarizes what is needed for estimation • We need a broader view of “estimation” and need to define “information” 8
- 9. 9 An analogy from physics initial state of motion. final state of motion. Force Force is whatever induces a change of motion: dt pd F
- 10. 10 Inference is dynamics as well old beliefs new beliefs information “Information” is what induces a change in rational beliefs.
- 11. Information Theory • Suppose an event E will occur with probability p. What is the information content of a message stating that E occurs? • If p is “high”, event occurrence has little “information.” If p is low, event occurrence is a surprise, and contains a lot of information – Content of the message is not the issue: amount, not meaning, of information 11
- 12. Information Theory • Shannon (1948) developed a formal measure of “information content” of the arrival of a message (he worked for AT&T) )(0 0)(1 )/1log()( phthenpIF phthenpIF pph 12
- 13. Information Theory • For a set of events, the expected information content of a message before it arrives is the entropy measure: 1 1 ( ) ( ) log( ) and 1 n n k k k k k k k k H p p h p p p p 13
- 14. 14 Claude Shannon
- 15. E.T. Jaynes • Jaynes proposed using the Shannon entropy measure in estimation • Maximum entropy (MaxEnt) principle: – Out of all probability distributions that are consistent with the constraints, choose the one that has maximum uncertainty (maximizes the Shannon entropy metric) • Idea of estimating probabilities (or frequencies) – In the absence of any constraints, entropy is maximized for the uniform distribution 15
- 16. 16 E.T. Jaynes
- 17. Estimation With a Prior • The estimation problem is to estimate a set of probabilities that are “close” to a known prior and that satisfy various known moment constraints. • Jaynes suggested using the criterion of minimizing the Kullback-Leibler “cross entropy” (CE) “divergence” between the estimated probabilities and the prior. 17
- 18. 18 Cross Entropy Estimation Minimize: log log log where is the prior probability. k k k k k k kk p p p p p p p “Divergence”, not “distance”. Measure is not symmetric and does not satisfy the triangle inequality. It is not a “norm”.
- 19. MaxEnt vs Cross-Entropy • If the prior is specified as a uniform distri-bution, the CE estimate is equivalent to the MaxEnt estimate • Laplace’s Principle of Insufficient Reason: In the absence of any information, you should choose the uniform distribution, which has maximum uncertainty – Uniform distribution as a prior is an admission of “ignorance”, not knowledge 19
- 20. Cross Entropy Measure • Two kinds of information – Prior distribution of the probabilities – Moments of the distribution • Can know any moments – Can also specify inequalities – Moments with error will be considered – Summary statistics such as quantiles 20
- 21. 21 Cross-Entropy Measure K k 1 , 1 1 Minimize ln subject to constraints (information) about moments and the adding-up constraint (finite distribution) 1 k k k K k t k t k k k k p p p p x y p
- 22. 22 Lagrangian 1 , 1 1 1 ln 1 K k k k k T K t t k t k t k K k k p L p p y p x p
- 23. 23 First Order Conditions T t kttkk xpp 1 , 01lnln K k ktkt xpy 1 , 0 01 1 K k kp
- 24. 24 Solution , 11 2 , 1 1 exp ( , ,..., ) where exp T k k t t k tT K T k t t k k t p p x p x
- 25. Cross-Entropy (CE) Estimates • Ω is called the “partition function”. • Can be viewed as a limiting form (non- parametric) of a Bayesian estimator, transforming prior and sample information into posterior estimates of probabilities. • Not strictly Bayesian because you do not specify the prior as a frequency function, but a discrete set of probabilities. 25
- 26. From Probabilities to Parameters • From information theory, we now have a way to use “information” to estimate probabilities • But in economics, we want to estimate parameters of a model or a “consistent” data set • How do we move from estimating probabilities to estimating parameters and/or data? 26
- 27. Types of Information • Values: – Areas, production, demand, trade • Coefficients: technology – Crop and livestock yields – Input-output coefficients for processed commodities (sugar, oils) • Prior Distribution of measurement error: – Mean – Standard error of measurement – “Informative” or “uninformative” prior distribution 27
- 28. Data Estimation • Generate a prior “best” estimate of all entries: Values and/or coefficients. • A “prototype” based on: – Values and aggregates • Historical and current data • Expert Knowledge – Coefficients: technology and behavior • Current and/or historical data • Assumption of behavior and technical stability 28
- 29. Estimation Constraints • Nationally – Area times Yield = Production by crop – Total area = Sum of area over crops – Total Demand = Sum of demand over types of demand – Net trade = Supply – Demand • Globally – Net trade sums to 0 29
- 30. Measurement Error • Error specification – Error on coefficients or values – Additive or multiplicative errors • Multiplicative errors – Logarithmic distribution – Errors cannot be negative • Additive – Possibility of entries changing sign 30
- 31. Error Specification ,k ,k ,k ,k ,k Typical error specification (additive): x = x where 0 1 and 1 and is the "support set" for the errors i i i i i i k i i k i e e W v W W v 31
- 32. Error Specification • Errors are weighted averages of support set values – The v parameters are fixed and have units of item being estimated. – The W variables are probabilities that need to be estimated. • Convert problem of estimating errors to one of estimating probabilities. 32
- 33. Error Specification • The technique provides a bridge between standard estimation where parameters to be estimated are in “natural” units and the information approach where the parameters are probabilities. – The specified support set provides the link. 33
- 34. Error Specification • Conversion of a “standard” stochastic specification with continuous random variables into a specification with a discrete set of probabilities – Golan, Judge, Miller • Problem is to estimate a discrete probability distribution 34
- 35. Uninformative Prior • Prior incorporates only information about the bounds between which the errors must fall. • Uniform distribution is the continuous uninformative prior in Bayesian analysis. – Laplace: Principle of insufficient reason • We specify a finite probability distribution that approximates the uniform distribution. 35
- 36. Uninformative Prior • Assume that the bounds are set at ±3s where s is a constant. • For uniform distribution, the variance is: 36 2 2 2 3 3 3 12 s s s
- 37. 37 7-Element Support Set 1 2 3 4 5 6 7 3 2 0 2 3 v s v s v s v v s v s v s 2 2 2 2 2 1 and the prior is 7 9 4 1 1 4 9 4 7 k k k k w v w s s
- 38. Uninformative Prior • Finite uniform prior with 7-element support set is a conservative uninformative prior. • Adding more elements would more closely approximate the continuous uniform distribution, reducing the prior variance toward the limit of 3s2. • Posterior distribution is essentially unconstrained. 38
- 39. Informative Prior • Start with a prior on both mean and standard deviation of the error distribution – Prior mean is normally zero. – Standard deviation of e is the prior on the standard error of measurement of item. • Define the support set with s=σ so that the bounds are now ±3σ. 39
- 40. 40 Informative Prior, 2 Parameters 2 2 ,k ,ki i i k W v Variance ,k ,k 0i i k W v Mean
- 41. 41 3-Element Support Set ,1 ,3 ,5 3 0 3 i i i i i v v v
- 42. 42 Informative Prior, 2 Parameters 2 2 2 ,1 ,2 ,39 0 9i i i i i iW W W ,1 ,3 ,2 ,1 ,3 1 18 16 1 18 i i i i i W W W W W
- 43. Informative Prior: 4 Parameters • Must specify prior for additional statistics – Skewness and Kurtosis • Assume symmetric distribution: – Skewness is zero. • Specify normal prior: – Kurtosis is a function of σ. • Can recover additional information on error distribution. 43
- 44. 44 Informative Prior, 4 Parameters 2 2 ,k ,ki i i k W v 4 4 ,k ,k 3i i i k W v Variance Kurtosis ,k ,k 0i i k W v Mean 3 ,k ,k 0i i k W v Skewness
- 45. 45 5-Element Support Set ,1 ,2 ,3 ,4 ,5 3.0 1.5 0 1.5 3.0 i i i i i i i i i v v v v v
- 46. 46 Informative Prior, 4 Parameters 2 2 2 ,1 ,2 ,3 2 2 ,2 ,1 4 4 4 ,1 ,2 4 4 ,3 ,2 ,1 9 2.25 0 2.25 9 81 3 81 16 81 0 81 16 i i i i i i i i i t i i i i i i i i i t W W W W W W W W W W ,1 ,5 ,2 ,4 ,3 1 16 48 ; ; 162 81 81 i i i i iW W W W W
- 47. Implementation • Implement program in GAMS – Large, difficult, estimation problem – Major advances in solvers. Solution is now robust and routine. • CE minimand similar to maximum likelihood estimators. • Excel front end for GAMS program – Easy to use 47
- 48. Implementation 48 IMPACT 3 FAOSTAT Database Data Estimation with Cross Entropy Nationally: Trade = Supply - Demand Nationally: Area X Yield = Supply Globally: Supply = Demand Data Cleaning and Setting Priors Crop Production Livestock Production Commodity Demand and Trade Processed Commodities (oilseeds, sugar, etc.) Data Collection Commodity Balance Food Balance