National Accounts and SAM
Estimation Using
Cross-Entropy Methods
Sherman Robinson
Estimation Problem
• Partial equilibrium models such as IMPACT
require balanced and consistent datasets the
represent disaggregated production and
demand by commodity
• Estimating such a dataset requires an
efficiency method to incorporate and
reconcile information from a variety of
sources
2
Primary Data Sources for IMPACT
Base Year
• FAOSTAT for country totals for:
– Production: Area, Yields and Supply
– Demand: Total, Food, Intermediate, Feed, Other
Demands
– Trade: Exports, Imports, Net Trade
– Nutrition: Calories per capita, calories per kg of
commodity
• AQUASTAT for country irrigation and rainfed
production
• SPAM pixel level estimation of global allocation of
production
3
Estimating a Consistent and
Disaggregated Database
4
Estimate
IMPACT Country
Database
• FAOSTAT
Estimate
Technology
Disaggregated
Production
• IMPACT Country Database
• FAO AQUASTAT
Estimate
Geographic
Disaggregated
Production
• Technology
Disaggregated
• SPAM
5
Bayesian Work Plan
Priors on values and
estimation errors of
production, demand, and
trade
Estimation by Cross-
Entropy Method
Check results against
priors and identify
potential data problems
New information to
correct identified
problems
Information Theory Approach
• Goal is to recover parameters and data we
observe imperfectly. Estimation rather than
prediction.
• Assume very little information about the error
generating process and nothing about the
functional form of the error distribution.
• Very different from standard statistical
approaches (e.g., econometrics).
– Usually have lots of data
6
Estimation Principles
• Use all the information you have.
• Do not use or assume any information you do
not have.
• Arnold Zellner: “Efficient Information
Processing Rule (IPR).”
• Close links to Bayesian estimation
7
Information Theory
• Need to be flexible in incorporating
information in parameter/data estimation
– Lots of different forms of information
• In classic statistics, “information” in a data set
can summarized by the moments of the
distribution of the data
– Summarizes what is needed for estimation
• We need a broader view of “estimation” and
need to define “information”
8
9
An analogy from physics
initial state
of motion.
final state of
motion.
Force
Force is whatever induces a change of motion:
dt
pd
F


10
Inference is dynamics as well
old beliefs new beliefs
information
“Information” is what induces a change in
rational beliefs.
Information Theory
• Suppose an event E will occur with probability
p. What is the information content of a
message stating that E occurs?
• If p is “high”, event occurrence has little
“information.” If p is low, event occurrence is a
surprise, and contains a lot of information
– Content of the message is not the issue: amount,
not meaning, of information
11
Information Theory
• Shannon (1948) developed a formal measure of
“information content” of the arrival of a message
(he worked for AT&T)



)(0
0)(1
)/1log()(
phthenpIF
phthenpIF
pph
12
Information Theory
• For a set of events, the expected information
content of a message before it arrives is the
entropy measure:
1 1
( ) ( ) log( )
and 1
n n
k k k k
k k
k
k
H p p h p p p
p
 
  

 

13
14
Claude Shannon
E.T. Jaynes
• Jaynes proposed using the Shannon entropy
measure in estimation
• Maximum entropy (MaxEnt) principle:
– Out of all probability distributions that are
consistent with the constraints, choose the one
that has maximum uncertainty (maximizes the
Shannon entropy metric)
• Idea of estimating probabilities (or frequencies)
– In the absence of any constraints, entropy is
maximized for the uniform distribution
15
16
E.T. Jaynes
Estimation With a Prior
• The estimation problem is to estimate a set of
probabilities that are “close” to a known prior
and that satisfy various known moment
constraints.
• Jaynes suggested using the criterion of
minimizing the Kullback-Leibler “cross
entropy” (CE) “divergence” between the
estimated probabilities and the prior.
17
18
Cross Entropy Estimation
 
Minimize:
log log log
where is the prior probability.
k
k k k k
k kk
p
p p p p
p
p
 
  
 
 
“Divergence”, not “distance”. Measure is not symmetric
and does not satisfy the triangle inequality. It is not a
“norm”.
MaxEnt vs Cross-Entropy
• If the prior is specified as a uniform distri-bution,
the CE estimate is equivalent to the MaxEnt
estimate
• Laplace’s Principle of Insufficient Reason: In the
absence of any information, you should choose
the uniform distribution, which has maximum
uncertainty
– Uniform distribution as a prior is an admission of
“ignorance”, not knowledge
19
Cross Entropy Measure
• Two kinds of information
– Prior distribution of the probabilities
– Moments of the distribution
• Can know any moments
– Can also specify inequalities
– Moments with error will be considered
– Summary statistics such as quantiles
20
21
Cross-Entropy Measure
K
k 1
,
1
1
Minimize
ln
subject to constraints (information) about moments
and the adding-up constraint (finite distribution)
1
k
k
k
K
k t k t
k
k
k
k
p
p
p
p x y
p



 
 
 
 




22
Lagrangian
1
,
1 1
1
ln
1
K
k
k
k k
T K
t t k t k
t k
K
k
k
p
L p
p
y p x
p



 

 
  
 
 
   
 
 
 
 

 

23
First Order Conditions


T
t
kttkk xpp
1
, 01lnln 


K
k
ktkt xpy
1
, 0
01
1
 
K
k
kp
24
Solution
 
,
11 2
,
1 1
exp
( , ,..., )
where
exp
T
k
k t t k
tT
K T
k t t k
k t
p
p x
p x

  
 

 
 
    
 
   
 

 
Cross-Entropy (CE) Estimates
• Ω is called the “partition function”.
• Can be viewed as a limiting form (non-
parametric) of a Bayesian estimator,
transforming prior and sample information
into posterior estimates of probabilities.
• Not strictly Bayesian because you do not
specify the prior as a frequency function, but a
discrete set of probabilities.
25
From Probabilities to Parameters
• From information theory, we now have a way
to use “information” to estimate probabilities
• But in economics, we want to estimate
parameters of a model or a “consistent” data
set
• How do we move from estimating
probabilities to estimating parameters and/or
data?
26
Types of Information
• Values:
– Areas, production, demand, trade
• Coefficients: technology
– Crop and livestock yields
– Input-output coefficients for processed
commodities (sugar, oils)
• Prior Distribution of measurement error:
– Mean
– Standard error of measurement
– “Informative” or “uninformative” prior
distribution
27
Data Estimation
• Generate a prior “best” estimate of all entries:
Values and/or coefficients.
• A “prototype” based on:
– Values and aggregates
• Historical and current data
• Expert Knowledge
– Coefficients: technology and behavior
• Current and/or historical data
• Assumption of behavior and technical stability
28
Estimation Constraints
• Nationally
– Area times Yield = Production by crop
– Total area = Sum of area over crops
– Total Demand = Sum of demand over types of
demand
– Net trade = Supply – Demand
• Globally
– Net trade sums to 0
29
Measurement Error
• Error specification
– Error on coefficients or values
– Additive or multiplicative errors
• Multiplicative errors
– Logarithmic distribution
– Errors cannot be negative
• Additive
– Possibility of entries changing sign
30
Error Specification
,k ,k
,k
,k
,k
Typical error specification (additive):
x = x
where 0 1
and 1
and is the "support set" for the errors
i i i
i i i
k
i
i
k
i
e
e W v
W
W
v


 



31
Error Specification
• Errors are weighted averages of support set
values
– The v parameters are fixed and have units of item
being estimated.
– The W variables are probabilities that need to be
estimated.
• Convert problem of estimating errors to one
of estimating probabilities.
32
Error Specification
• The technique provides a bridge between
standard estimation where parameters to be
estimated are in “natural” units and the
information approach where the parameters
are probabilities.
– The specified support set provides the link.
33
Error Specification
• Conversion of a “standard” stochastic
specification with continuous random
variables into a specification with a discrete
set of probabilities
– Golan, Judge, Miller
• Problem is to estimate a discrete probability
distribution
34
Uninformative Prior
• Prior incorporates only information about the
bounds between which the errors must fall.
• Uniform distribution is the continuous
uninformative prior in Bayesian analysis.
– Laplace: Principle of insufficient reason
• We specify a finite probability distribution that
approximates the uniform distribution.
35
Uninformative Prior
• Assume that the bounds are set at ±3s where
s is a constant.
• For uniform distribution, the variance is:
36
  
2
2 2
3 3
3
12
s s
s
 
 
37
7-Element Support Set
1 2 3 4
5 6 7
3 2 0
2 3
v s v s v s v
v s v s v s
      
     
 
2 2
2
2 2
1
and the prior is
7
9 4 1 1 4 9 4
7
k k k
k
w v w
s
s


  
      

Uninformative Prior
• Finite uniform prior with 7-element support set is
a conservative uninformative prior.
• Adding more elements would more closely
approximate the continuous uniform distribution,
reducing the prior variance toward the limit of
3s2.
• Posterior distribution is essentially unconstrained.
38
Informative Prior
• Start with a prior on both mean and standard
deviation of the error distribution
– Prior mean is normally zero.
– Standard deviation of e is the prior on the
standard error of measurement of item.
• Define the support set with s=σ so that the
bounds are now ±3σ.
39
40
Informative Prior, 2 Parameters
2 2
,k ,ki i i
k
W v   Variance
,k ,k 0i i
k
W v  Mean
41
3-Element Support Set
,1
,3
,5
3
0
3
i i
i
i i
v
v
v


 

 
42
Informative Prior, 2 Parameters
     2 2 2
,1 ,2 ,39 0 9i i i i i iW W W        
,1 ,3
,2 ,1 ,3
1
18
16
1
18
i i
i i i
W W
W W W
 
   
Informative Prior: 4 Parameters
• Must specify prior for additional statistics
– Skewness and Kurtosis
• Assume symmetric distribution:
– Skewness is zero.
• Specify normal prior:
– Kurtosis is a function of σ.
• Can recover additional information on error
distribution.
43
44
Informative Prior, 4 Parameters
2 2
,k ,ki i i
k
W v  
4 4
,k ,k 3i i i
k
W v  
Variance
Kurtosis
,k ,k 0i i
k
W v  Mean
3
,k ,k 0i i
k
W v  Skewness
45
5-Element Support Set
,1
,2
,3
,4
,5
3.0
1.5
0
1.5
3.0
i i
i i
i
i i
i i
v
v
v
v
v




 
 

 
 
46
Informative Prior, 4 Parameters
     
   
 
   
2 2 2
,1 ,2 ,3
2 2
,2 ,1
4 4 4
,1 ,2
4 4
,3 ,2 ,1
9 2.25 0
2.25 9
81
3 81
16
81
0 81
16
i i i i i i
i i i t
i i i i i
i i i i t
W W W
W W
W W
W W W
  
 
  
 
      
  
 
     
 
 
     
 
,1 ,5 ,2 ,4 ,3
1 16 48
; ;
162 81 81
i i i i iW W W W W    
Implementation
• Implement program in GAMS
– Large, difficult, estimation problem
– Major advances in solvers. Solution is now robust
and routine.
• CE minimand similar to maximum likelihood estimators.
• Excel front end for GAMS program
– Easy to use
47
Implementation
48
IMPACT 3 FAOSTAT Database
Data Estimation with Cross Entropy
Nationally: Trade = Supply - Demand Nationally: Area X Yield = Supply Globally: Supply = Demand
Data Cleaning and Setting Priors
Crop Production Livestock Production
Commodity Demand and
Trade
Processed Commodities
(oilseeds, sugar, etc.)
Data Collection
Commodity Balance Food Balance

Core Training Presentations- 3 Estimating an Ag Database using CE Methods

  • 1.
    National Accounts andSAM Estimation Using Cross-Entropy Methods Sherman Robinson
  • 2.
    Estimation Problem • Partialequilibrium models such as IMPACT require balanced and consistent datasets the represent disaggregated production and demand by commodity • Estimating such a dataset requires an efficiency method to incorporate and reconcile information from a variety of sources 2
  • 3.
    Primary Data Sourcesfor IMPACT Base Year • FAOSTAT for country totals for: – Production: Area, Yields and Supply – Demand: Total, Food, Intermediate, Feed, Other Demands – Trade: Exports, Imports, Net Trade – Nutrition: Calories per capita, calories per kg of commodity • AQUASTAT for country irrigation and rainfed production • SPAM pixel level estimation of global allocation of production 3
  • 4.
    Estimating a Consistentand Disaggregated Database 4 Estimate IMPACT Country Database • FAOSTAT Estimate Technology Disaggregated Production • IMPACT Country Database • FAO AQUASTAT Estimate Geographic Disaggregated Production • Technology Disaggregated • SPAM
  • 5.
    5 Bayesian Work Plan Priorson values and estimation errors of production, demand, and trade Estimation by Cross- Entropy Method Check results against priors and identify potential data problems New information to correct identified problems
  • 6.
    Information Theory Approach •Goal is to recover parameters and data we observe imperfectly. Estimation rather than prediction. • Assume very little information about the error generating process and nothing about the functional form of the error distribution. • Very different from standard statistical approaches (e.g., econometrics). – Usually have lots of data 6
  • 7.
    Estimation Principles • Useall the information you have. • Do not use or assume any information you do not have. • Arnold Zellner: “Efficient Information Processing Rule (IPR).” • Close links to Bayesian estimation 7
  • 8.
    Information Theory • Needto be flexible in incorporating information in parameter/data estimation – Lots of different forms of information • In classic statistics, “information” in a data set can summarized by the moments of the distribution of the data – Summarizes what is needed for estimation • We need a broader view of “estimation” and need to define “information” 8
  • 9.
    9 An analogy fromphysics initial state of motion. final state of motion. Force Force is whatever induces a change of motion: dt pd F  
  • 10.
    10 Inference is dynamicsas well old beliefs new beliefs information “Information” is what induces a change in rational beliefs.
  • 11.
    Information Theory • Supposean event E will occur with probability p. What is the information content of a message stating that E occurs? • If p is “high”, event occurrence has little “information.” If p is low, event occurrence is a surprise, and contains a lot of information – Content of the message is not the issue: amount, not meaning, of information 11
  • 12.
    Information Theory • Shannon(1948) developed a formal measure of “information content” of the arrival of a message (he worked for AT&T)    )(0 0)(1 )/1log()( phthenpIF phthenpIF pph 12
  • 13.
    Information Theory • Fora set of events, the expected information content of a message before it arrives is the entropy measure: 1 1 ( ) ( ) log( ) and 1 n n k k k k k k k k H p p h p p p p          13
  • 14.
  • 15.
    E.T. Jaynes • Jaynesproposed using the Shannon entropy measure in estimation • Maximum entropy (MaxEnt) principle: – Out of all probability distributions that are consistent with the constraints, choose the one that has maximum uncertainty (maximizes the Shannon entropy metric) • Idea of estimating probabilities (or frequencies) – In the absence of any constraints, entropy is maximized for the uniform distribution 15
  • 16.
  • 17.
    Estimation With aPrior • The estimation problem is to estimate a set of probabilities that are “close” to a known prior and that satisfy various known moment constraints. • Jaynes suggested using the criterion of minimizing the Kullback-Leibler “cross entropy” (CE) “divergence” between the estimated probabilities and the prior. 17
  • 18.
    18 Cross Entropy Estimation  Minimize: log log log where is the prior probability. k k k k k k kk p p p p p p p          “Divergence”, not “distance”. Measure is not symmetric and does not satisfy the triangle inequality. It is not a “norm”.
  • 19.
    MaxEnt vs Cross-Entropy •If the prior is specified as a uniform distri-bution, the CE estimate is equivalent to the MaxEnt estimate • Laplace’s Principle of Insufficient Reason: In the absence of any information, you should choose the uniform distribution, which has maximum uncertainty – Uniform distribution as a prior is an admission of “ignorance”, not knowledge 19
  • 20.
    Cross Entropy Measure •Two kinds of information – Prior distribution of the probabilities – Moments of the distribution • Can know any moments – Can also specify inequalities – Moments with error will be considered – Summary statistics such as quantiles 20
  • 21.
    21 Cross-Entropy Measure K k 1 , 1 1 Minimize ln subjectto constraints (information) about moments and the adding-up constraint (finite distribution) 1 k k k K k t k t k k k k p p p p x y p               
  • 22.
    22 Lagrangian 1 , 1 1 1 ln 1 K k k k k TK t t k t k t k K k k p L p p y p x p                               
  • 23.
    23 First Order Conditions   T t kttkkxpp 1 , 01lnln    K k ktkt xpy 1 , 0 01 1   K k kp
  • 24.
    24 Solution   , 11 2 , 11 exp ( , ,..., ) where exp T k k t t k tT K T k t t k k t p p x p x                           
  • 25.
    Cross-Entropy (CE) Estimates •Ω is called the “partition function”. • Can be viewed as a limiting form (non- parametric) of a Bayesian estimator, transforming prior and sample information into posterior estimates of probabilities. • Not strictly Bayesian because you do not specify the prior as a frequency function, but a discrete set of probabilities. 25
  • 26.
    From Probabilities toParameters • From information theory, we now have a way to use “information” to estimate probabilities • But in economics, we want to estimate parameters of a model or a “consistent” data set • How do we move from estimating probabilities to estimating parameters and/or data? 26
  • 27.
    Types of Information •Values: – Areas, production, demand, trade • Coefficients: technology – Crop and livestock yields – Input-output coefficients for processed commodities (sugar, oils) • Prior Distribution of measurement error: – Mean – Standard error of measurement – “Informative” or “uninformative” prior distribution 27
  • 28.
    Data Estimation • Generatea prior “best” estimate of all entries: Values and/or coefficients. • A “prototype” based on: – Values and aggregates • Historical and current data • Expert Knowledge – Coefficients: technology and behavior • Current and/or historical data • Assumption of behavior and technical stability 28
  • 29.
    Estimation Constraints • Nationally –Area times Yield = Production by crop – Total area = Sum of area over crops – Total Demand = Sum of demand over types of demand – Net trade = Supply – Demand • Globally – Net trade sums to 0 29
  • 30.
    Measurement Error • Errorspecification – Error on coefficients or values – Additive or multiplicative errors • Multiplicative errors – Logarithmic distribution – Errors cannot be negative • Additive – Possibility of entries changing sign 30
  • 31.
    Error Specification ,k ,k ,k ,k ,k Typicalerror specification (additive): x = x where 0 1 and 1 and is the "support set" for the errors i i i i i i k i i k i e e W v W W v        31
  • 32.
    Error Specification • Errorsare weighted averages of support set values – The v parameters are fixed and have units of item being estimated. – The W variables are probabilities that need to be estimated. • Convert problem of estimating errors to one of estimating probabilities. 32
  • 33.
    Error Specification • Thetechnique provides a bridge between standard estimation where parameters to be estimated are in “natural” units and the information approach where the parameters are probabilities. – The specified support set provides the link. 33
  • 34.
    Error Specification • Conversionof a “standard” stochastic specification with continuous random variables into a specification with a discrete set of probabilities – Golan, Judge, Miller • Problem is to estimate a discrete probability distribution 34
  • 35.
    Uninformative Prior • Priorincorporates only information about the bounds between which the errors must fall. • Uniform distribution is the continuous uninformative prior in Bayesian analysis. – Laplace: Principle of insufficient reason • We specify a finite probability distribution that approximates the uniform distribution. 35
  • 36.
    Uninformative Prior • Assumethat the bounds are set at ±3s where s is a constant. • For uniform distribution, the variance is: 36    2 2 2 3 3 3 12 s s s    
  • 37.
    37 7-Element Support Set 12 3 4 5 6 7 3 2 0 2 3 v s v s v s v v s v s v s                2 2 2 2 2 1 and the prior is 7 9 4 1 1 4 9 4 7 k k k k w v w s s             
  • 38.
    Uninformative Prior • Finiteuniform prior with 7-element support set is a conservative uninformative prior. • Adding more elements would more closely approximate the continuous uniform distribution, reducing the prior variance toward the limit of 3s2. • Posterior distribution is essentially unconstrained. 38
  • 39.
    Informative Prior • Startwith a prior on both mean and standard deviation of the error distribution – Prior mean is normally zero. – Standard deviation of e is the prior on the standard error of measurement of item. • Define the support set with s=σ so that the bounds are now ±3σ. 39
  • 40.
    40 Informative Prior, 2Parameters 2 2 ,k ,ki i i k W v   Variance ,k ,k 0i i k W v  Mean
  • 41.
    41 3-Element Support Set ,1 ,3 ,5 3 0 3 ii i i i v v v       
  • 42.
    42 Informative Prior, 2Parameters      2 2 2 ,1 ,2 ,39 0 9i i i i i iW W W         ,1 ,3 ,2 ,1 ,3 1 18 16 1 18 i i i i i W W W W W      
  • 43.
    Informative Prior: 4Parameters • Must specify prior for additional statistics – Skewness and Kurtosis • Assume symmetric distribution: – Skewness is zero. • Specify normal prior: – Kurtosis is a function of σ. • Can recover additional information on error distribution. 43
  • 44.
    44 Informative Prior, 4Parameters 2 2 ,k ,ki i i k W v   4 4 ,k ,k 3i i i k W v   Variance Kurtosis ,k ,k 0i i k W v  Mean 3 ,k ,k 0i i k W v  Skewness
  • 45.
    45 5-Element Support Set ,1 ,2 ,3 ,4 ,5 3.0 1.5 0 1.5 3.0 ii i i i i i i i v v v v v             
  • 46.
    46 Informative Prior, 4Parameters                 2 2 2 ,1 ,2 ,3 2 2 ,2 ,1 4 4 4 ,1 ,2 4 4 ,3 ,2 ,1 9 2.25 0 2.25 9 81 3 81 16 81 0 81 16 i i i i i i i i i t i i i i i i i i i t W W W W W W W W W W                                         ,1 ,5 ,2 ,4 ,3 1 16 48 ; ; 162 81 81 i i i i iW W W W W    
  • 47.
    Implementation • Implement programin GAMS – Large, difficult, estimation problem – Major advances in solvers. Solution is now robust and routine. • CE minimand similar to maximum likelihood estimators. • Excel front end for GAMS program – Easy to use 47
  • 48.
    Implementation 48 IMPACT 3 FAOSTATDatabase Data Estimation with Cross Entropy Nationally: Trade = Supply - Demand Nationally: Area X Yield = Supply Globally: Supply = Demand Data Cleaning and Setting Priors Crop Production Livestock Production Commodity Demand and Trade Processed Commodities (oilseeds, sugar, etc.) Data Collection Commodity Balance Food Balance