Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P.Righi, The anticipated variance as a measure for the accuracy of complex multisource statistics| (updates 2018 )

MEASURING THE ACCURACY OF
AGGREGATES FROM A STATISTICAL
REGISTER
Piero Demetrio Falorsi, Francesca Petrarca, Paolo Righi,
Workshop Comitato Consultivo per le Metodologie Statistiche - Roma, 19 November 2018
-

Overview
 Background
 Formal definition of the problem and motivation
 The measure of accuracy
 Computational aspects
 Strategies for making users aware of the accuracy
 Preliminary conclusions & further steps
P. D Falorsi, F. Petrarca, P. Righi– workshop CCMS, 19 november 2018

Background
The register values are the output of
statistical processes subject to statistical
uncertainty with respect to both units and
variables.
The availability of a register enables
different stakeholders to produce
estimates for different domains by
summing up the domain values in the
register.
Some of these estimates could be highly
inaccurate.
The Italian Integrated System of
Statistical Registers

Definition of the problem and motivation
𝑦 𝑘 = 𝑦 𝑘 + 𝑒 𝑘
True
value
Theoretical value generated by a
model → 𝑦 𝑘= 𝑓 𝐱 𝑘; 𝝑
Random
error
𝑦 𝑘 values observed
in a sample S
𝜆 𝑘 =
𝜆 𝑘 = 1 if 𝑘 ∈ 𝑆
𝜆 𝑘 = 0 otherways
𝐸 𝑃 𝜆 𝑘 = 𝜋 𝑘 inclusion prob.
Model uncertainty
𝐸 𝑀 𝒆 = 𝟎 𝑁
𝑉 𝑀 𝒆𝒆′ = 𝚺 𝑦
𝐸 𝑀=Model Expectation
𝑉 𝑀=Model Variance
Sampling uncertainty
𝐸 𝑃 𝝀 = 𝝅
𝐸 𝑃 𝝀 = 𝝅
𝐸 𝑃=Sampling expectation
𝑉𝑃=Sampling Variance

𝑌𝑑 =
𝑘∈𝑅 𝑑
𝑦 𝑘 =
𝑘∈𝑅 𝑑
𝑓 𝐱 𝑘; 𝝑 + 𝑒 𝑘
𝑌𝑑 =
𝑘∈𝑅 𝑑
𝑦 𝑘 =
𝑘∈𝑅 𝑑
𝑓 𝐱 𝑘; 𝒕
𝑦 𝑘 = 𝑓 𝐱 𝑘; 𝒕 where 𝒕 is the estimate of 𝝑 based on
the observation of the the
values 𝑦 𝑘 on the sample S
Target unknown
parameter
Register prediction
MOTIVATION: How to make users aware of the
accuracy considering both the
sources of uncertainty (Model and
design)

Topic Register Statistical
Analysis
Living population (with weights
for over/undercoverage)
Population Overcoverage/
ondercoverage
models
First
presentation of
the paper in the
adivisory
Level of instruction Population GLM
Employement status Occupation HMM
Census Microdata Database SAE
Projections
Local units (main variables) Economic
units
Regression
Economic variables Frame Model assisted
projection
Main cultivar Farm register Model assisted
projection
An incomplete list of cases in which it is necessary to
make the users aware of the accuracy

The measure of accuracy
In our observational setting:
 The sampling design enables the observation of the sample S
 the statistical model M generates the variable y
Proposed Measure: Anticipated Variance: (Isaki and Fuller, 1982;
Sarndäl et al., 1992; Nedyalkova and Tillé, 2008; Nirel, and Glickman, 2009; Falorsi
and Righi, 2015)
The AV neutralizes the variability due to a pure model variability of the
parameter 𝑌𝑑
Alternative measure Global Variance (Wolter (1986) 𝐺𝑉 𝑌𝑑 =
𝐸 𝑃 𝐸 𝑀 𝑌𝑑 − 𝐸 𝑃 𝐸 𝑀(𝑌𝑑)
2
= 𝐸 𝑃 𝑉 𝑀 𝑌𝑑 𝝀 + 𝑉𝑃 𝐸 𝑀 𝑌𝑑 𝝀 .
𝐴𝑉 𝑌𝑑 = 𝐸 𝑃 𝐸 𝑀 𝑌𝑑 − 𝑌𝑑
2
= 𝐸 𝑃 𝑉 𝑀 𝑌𝑑 𝝀 + 𝑉𝑃 𝐸 𝑀 𝑌𝑑 𝝀 − 𝑉 𝑀 𝑌𝑑 .

Computational aspects
Two main approximations:
 We consider the Taylor’s series expansion of the
function 𝑓 𝒙 𝑘; 𝒕 evaluated at the point 𝑓 𝒙 𝑘; 𝝑
 We approximate the actual sampling design with a
Poisson sampling design which has the same first
order inclusion probabilities as the actual design.
This makes for a conservative measure of the sampling
variability.

Computational aspects: Term 𝐸 𝑃 𝑉 𝑀 𝑌𝑑,𝑎𝑝𝑝 𝝀
𝑉 𝑀 𝑌𝑑,𝑎𝑝𝑝 𝝀 ≅ 𝜸 𝑑
′
𝐅 𝑉 𝑀 𝒕 𝝀 𝐅′ 𝜸 𝑑
 𝑉 𝑀 𝒕 𝝀 may be derived with the usual inferential
approaches.
 The vector 𝝀 should be explicitly introduced in the
formula of 𝑉 𝑀 𝒕 𝝀 in such a way it is computed on
the whole set R.
𝐸 𝑃 𝑉 𝑀 𝑌𝑑,𝑎𝑝𝑝 𝝀 = 𝜸 𝑑
′
𝐅 𝐸 𝑃 𝑉 𝑀 𝒕 𝝀 𝐅′ 𝜸 𝑑.
Matrix of derivatives
of f with respect to 𝝑
Domain membership
vector
Calculated at the term 0 of the linear
approximation of 𝑉 𝑀 𝒕 𝝀 evaluated at 𝝅

Computational aspects: Term 𝑉𝑃 𝐸 𝑀 𝑌𝑑,𝑎𝑝𝑝 𝝀
𝑉𝑃 𝐸 𝑀 𝑌𝑑,𝑎𝑝𝑝 𝛌 ≅ 𝜸 𝑑
′
𝐅𝑉𝑃 𝒕 𝐅′ 𝜸 𝑑
′
𝑉𝑃 𝒕 ≅ 𝐆 𝜆
′
𝑉𝑃 𝛌 𝐆 𝜆 ≤ 𝐆 𝜆
′
𝑫 𝜋𝑗 1 − 𝜋𝑗 𝐆 𝜆
solution of the system of estimating
equations in which the y values are
substituted by their predictions 𝑦
Matrix of derivatives
of esimating
equations with
respect to 𝛌
The diagonal matrix of the variances
under a Poisson sampling

Example: the classical simple linear model
• 𝑦 𝑘 = 𝑓 𝐱 𝑘; 𝝑 = 𝐱 𝑘
′
𝜽 with 𝚺 𝑦 = 𝜎 𝟐
𝐈
• 𝒕 is obtained as solution of the system of estimating
equations:
𝑗∈𝑅
𝐱𝑗 𝐱𝑗
′
𝜆𝑗
−1
𝑗∈𝑅
𝐱𝑗 𝑦𝑗 𝜆𝑗 − 𝒕 = 𝟎𝐼.
• The standard expression for computing the matrix variance
𝑉 𝑀 𝒕 𝝀 = 𝜎2
𝑗∈𝑅
𝐱𝑗 𝐱𝑗
′
𝜆𝑗
−1
• The sampling expected values (term 0 of the linear approx.)
𝐸 𝑃 𝑉 𝑀 𝒕 𝝀 ≅ 𝜎2
𝑗∈𝑅
𝐱𝑗 𝐱𝑗
′
𝜋𝑗
−1

Example: General linear model
• 𝑦 𝑘 = 𝑓 𝐱 𝑘; 𝝑 = 𝐱 𝑘
′
𝜽 with general 𝚺 𝑦
.
• Matrix variance
𝑉 𝑀 𝒕 𝝀 = 𝐗 𝑆
′
𝚺 𝑦,𝑠
−1
𝐗 𝑆
−1
= 𝐗′ 𝑫(𝜆𝑗)𝚺 𝑦
−1 𝑫(𝜆𝑗)𝐗
−1
Where 𝑫(𝜆𝑗) = 𝑑𝑖𝑎𝑔 𝜆𝑗; 𝑗 = 1, . . . , 𝑁
𝐸 𝑃 𝑉 𝑀 𝒕 𝝀 ≅ 𝜎2
𝑗∈𝑅
𝐱𝑗 𝐱𝑗
′
𝜋𝑗
−1

Example: GLM
• Estimating equations
𝑯 𝒕 = 𝐅𝑆
′
𝚺 𝑦,𝑆
−1
𝒚 𝑆 𝒕 − 𝒚 𝑆 = 𝟎𝐼,
= 𝐅′ 𝑫 𝜆𝑗 𝚺 𝑦
−1 𝑫 𝜆𝑗 𝒚 𝒕 − 𝒚 = 𝟎𝐼.
• Matrix variance
𝑉 𝑀 𝒕 𝝀
= 𝑭′ 𝑫 𝜆𝑗 𝚺 𝑦
−1 𝑫 𝜆𝑗 𝑉 𝑀 𝒚 𝒕 𝝀 + 𝚺 𝑦 − 2𝐶𝑜𝑣 𝑀 𝒚 𝒕 , 𝒚 𝝀
∙ 𝑫 𝜆𝑗 𝚺 𝑦
−1 𝑫 𝜆𝑗 𝐅 𝑨 𝝑 𝝀 −1
may be obtained from the above by substituting 𝝀 with 𝝅

Strategies for making users aware of the accuracy
The plug-in estimate of the AV may be computed by
replacing the estimates 𝒕 , 𝒚 and 𝚺 𝑦 instead of the
unknown parameters 𝝑 , 𝒚 and 𝚺 𝑦 in the expressions
of the different components of the AV.
These plug-in estimates Ziegler (2015, point 5, pp.121)
are strongly consistent estimator of the variance
This is non computationally feasible for a generic
register user
Users may define their aggregates on the fly

Two strategies for ensuring the users be aware of
the accuracy
1. The first is based on the development of a software
applications that together with the production of the
aggregates 𝑌𝑑 will provide the user the estimates of
the corresponding AV
2. The second exploits the existing relationship
between the squared relative error
𝐴𝑉 𝑌 𝑑
𝑌𝑑
2 = 𝜖2 𝑌𝑑
and the total of the estimate 𝑌𝑑
A model often used for the presentation of the
sampling errors in the Italian social sample surveys
is 𝜖2 𝑌𝑑 = 𝛼1 𝑌𝑑
𝛼2
𝑢 𝑑

Both strategies are based on developments above
presented
The second is less computationally cumbersome
on the fly

Preliminary conclusions & further steps
• We are reflecting on different strategies which allow
the users of a statistical register to be of aware of
the accuracy of their estimates.
• We have proposed the AV as suitable measure for
the accuracy
• We have deepened some aspects for the
computation the different component of the AV,
considering a simplified statistical setting.
• Further steps in this research line are those of
evaluating the strengths, robustness and
computational feasibility of the results with some
simulation studies.

Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P.Righi, The anticipated variance as a measure for the accuracy of complex multisource statistics| (updates 2018 )

Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P.Righi, The anticipated variance as a measure for the accuracy of complex multisource statistics| (updates 2018 )

Recommended

Recommended

More Related Content

Similar to Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P.Righi, The anticipated variance as a measure for the accuracy of complex multisource statistics| (updates 2018 )

Similar to Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P.Righi, The anticipated variance as a measure for the accuracy of complex multisource statistics| (updates 2018 ) (20)

More from Istituto nazionale di statistica

More from Istituto nazionale di statistica (20)

Recently uploaded

Recently uploaded (20)

Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P.Righi, The anticipated variance as a measure for the accuracy of complex multisource statistics| (updates 2018 )