Kommunikasjon: Communicating accuracy of register statistics
1. Communicating Accuracy of
Register Statistics
Thomas Laitila
Statistics Sweden and Örebro university
Presentation at Nordiskt Statistikermöte, Bergen, 2013
2. Outline
Why - Why measure accuracy?
Criteria - Criteria on measures of uncertainty
of register statistics
CIm - Confidence Image
Example
Discussion
Bergen, 2013 2Thomas Laitila
3. Why - Some basic questions
• What is statistics (statistical inference
methods) all about?
• What is making statistics so special, why is it
of value to us?
Bergen, 2013 3Thomas Laitila
4. Why - Chatterjee (2003)
• There are two methods for deriving
statements – deduction and induction
• Statistics is a prolongation of epistemiology
(theory on knowledge and knowledge
building)
• Statistics provide with a method for inductive
inference
Bergen, 2013 4Thomas Laitila
6. Why - Induction, example
Ignorable nonresponse
Sample of units
Swedish labor market
Estimate of rate of
unemployment
Bergen, 2013 6Thomas Laitila
7. Why - Induction, another example
Register on units
Swedish labor market
Estimate of rate of
unemployment
Derived variables
Bergen, 2013 7Thomas Laitila
8. Why - Induction and Evidence
• All evidence come with uncertainty of the
general
• Statements derived by induction are uncertain
• Example: Inductive statement – A man will
inevitably die
– Evidence - No man born for more than e.g. 150
years ago are still alive.
Bergen, 2013 8Thomas Laitila
9. Why - Why is statistical inference so
special?
• Statistics is the only theory yet, providing with
objective measures of uncertainty of inductive
inference.
• Objective measures of importance for general
communication of statistics.
Bergen, 2013 9Thomas Laitila
10. Why - Summing up
• Register statistics yield inductive statements
• Register statistics are thereby uncertain
• Statistical inference provide with objective
measurements of uncertainty
• Inference on register statistics should be
founded in statistical inference theory
• Do we have appropriate statistical tools?
Yes, and no
Bergen, 2013 10Thomas Laitila
11. Criteria - Approaches for statistical
inference on register statistics
• Model based methods
– Multivariate techniques
– Data mining methods
– Stochastic processes
– and more
• Sample surveys
– Use sample surveys as a complement for
measuring uncertainty
Bergen, 2013 11Thomas Laitila
12. Criteria – Criteria on a measure
a) Founded within statistical inference theory
• Interpretable and objective measures
b) Easy to interpret by users
• How easy is the interpretation of an ordinary
confidence interval?
c) Of low cost
d) Comparable with measures in sample surveys
• Comparability/coherency
Bergen, 2013 12Thomas Laitila
13. CIm – A new statistical tool
• Statistical inference methods centers around
– a point estimator, and
– its sampling distribution
• In register statistics, treating variables as fixed,
there is
– a point estimate, but
– its sampling distribution is degenerate
• One alley of finding appropriate tools for register
statistics is to develop statistical inference
procedures which are not based on the sampling
distribution of an estimator!
Bergen, 2013 13Thomas Laitila
14. CIm - Laitila (2012)
• Confidence Images
• Idéa: Use external information to restrict the
potential values of study variables (y1,y2,…,yN)
– This will restrict the potential values of the population
parameter of interest t=f(y1,y2,…,yN)
– The more information, the more t is restricted.
• Information can come in any form, as long it
comes with a measure of uncertainty
• We can use registers, sample surveys, old
statistics, google, facebook, whatever!!!
Bergen, 2013 14Thomas Laitila
15. Example - Estimation of total number
of cattle in Swedish farms
County N:o units N:o missing values Sum of y_k
1 18713 3817 393797
2 14321 2918 296944
3 12281 2475 261832
4 10836 2213 216535
5 8646 1763 185285
6 7233 1485 148029
Total 72030 14671 1502422
Table 1: Information1 in available register on farms (N=72030)
1) No measurement or coverage errors in the register.
Problem: Estimate the total number of cattle with an interval estimate
using the information in the register, which contains missing values.
Bergen, 2013 15Thomas Laitila
16. Example - Pieces of information
• A1: Available data in the register
• A2: The 100 largest farms are in the register
and the N:o cattle for the 100th largest farm is
553.
• A3: Table 2 (below)
• A4: A 95% CI of the proportion of farms with
zero cattle: 0.6 – 0.71
Bergen, 2013 16Thomas Laitila
17. In register In population
County y_k=0 y_k>=553 y_k>=100 y_k>=100
1 9108 29 1252 1288
2 6989 17 931 959
3 5960 21 784 800
4 5329 12 677 701
5 4196 10 581 601
6 3565 11 467 477
Total 35147 100 4692 4826
Table 2: Additional information (N:o units)
Example – Table 2
Bergen, 2013 17Thomas Laitila
18. Example – Calculated CIms
Information
Used
Confidence
Level
Lower
bound
Upper
bound
A1 - A2 100% 1502 9615
A1 - A3 100% 1516 3016
A1 – A4 95% 1516 2217
Table 3: Confidence intervals for the total number of cattle based
on information sets A1 – A4. (Thousands cattle, True value 1,56 million)
Bergen, 2013 18Thomas Laitila
19. Discussion
• Image instead of interval as the method may
not provide a “connected” interval of points.
The CIm may consist of e.g. separate disjoint
intervals
• The CIm can directly be generalized to
multivariate cases.
• Easy calculated in some cases, in others
calculation can be a most complicated thing.
Research needed here.
Bergen, 2013 19Thomas Laitila
20. Discussion
• The CIm fulfill all the four criteria listed above.
• Most interesting:
– Traditional confidence intervals are special cases
of Cims
– Any kind of information (data) can be used, as
long as there is a probability measure of its
certainty
• The CIm is a theory, there is a need for
methodological developments.
Bergen, 2013 20Thomas Laitila
21. Thanks for Your attention!
Request of paper Laitila (2012)
thomas.laitila@scb.se
Bergen, 2013 Thomas Laitila 21