This document outlines a framework developed by Statistics Netherlands to assess the quality of administrative data sources for statistical purposes. It identifies three "hyperdimensions" - source, metadata, and data - to structure 57 quality indicators across 19 dimensions. The framework was tested on 8 administrative data sources using quality checklists. Most sources scored well, though one was found to have delivery and definition issues. The document discusses developing approaches to evaluate the technical and accuracy-related quality of the data dimension. Future work includes fully focusing on the data hyperdimension and studying administrative data quality in a European context through the BLUE-ETS project.
Benefits and Challenges of Using Open Educational Resources
Determination of administrative data quality: recent results and new developments.
1. Determination of Administrative Data Quality:
Recent results and new developments
Piet J.H. Daas, Saskia J.L. Ossen,
and Martijn Tennekes
Statistics Netherlands
May 6, 2010, Helsinki, Finland
2. Overview
Introduction
View on quality
Framework developed for admin. data sources
• Construction and composition
Application (first part)
• Checklist and results
New developments
• Ideas and future work
• BLUE-ETS
2
3. Introduction
Statistics Netherlands increases the use of
data (sources) collected and maintained by
others
• To decrease response burden and costs
As a result, Statistics Netherlands becomes:
• More dependent on administrative data sources
• Must be able to monitor the quality of those data
sources
– What is ‘quality’ in this context?
3
4. View on quality
Statistics Netherlands defines quality of
administrative data sources as:
“Usability for the production of statistics”
Differs from ‘quality’ as used by the data source
keeper
– Often does not have statistical use in mind
– Can’t use the quality report of the data source
keeper (if available)
And it is quality of the input !
4
5. Framework developed
No standard framework available for input quality
of administrative data sources
Quality of administrative data is only occasionally
observed in the literature
• Majority of studies on quality and statistics focus on:
– output quality
– quality of survey data
Framework for the determination of the quality of
administrative data sources based on:
• Statistics Netherlands experiences and ideas
• Including the results published by others 5
6. Framework overview (1)
Many quality indicators were identified
• In total 57!
Many dimensions were identified
• In total 19
How to combine and structure these indicators?
• Distinguish different views on quality
• Alternative name is Hyperdimensions
3 Hyperdimensions were required to combine all
quality indicators into a single framework !!
• First step towards a structured approach
6
7. Framework overview (2)
Three high level views on the input quality of
administrative data sources
• 3 hyperdimensions
7
8. 3 Different high level views on quality
Framework overview (2)
Three high level views on the input quality of
administrative data sources
• 3 hyperdimensions
8
9. 3 Different high level views on quality
METADATA:
Focuses on the
SOURCE: - Focus on data source as a whole
(availability of the)
- Delivery related aspects information required to
- and some other things understand and use the
data in the data source
SO
UR A
CE T
A DATA:
D - Technical checks
- Accuracy related
issues
9
10. Determine Source and Metadata quality
With a checklist
• Used for both Source and
Metadata
Tested 8 administrative data sources
• Took on average about 2 hours per
data source
Results expressed at the
dimensional level
• 5 for Source, 4 for Metadata
10
11. Checklist results (1) - Source
Table 1. Evaluation results for the Source hyperdimension
Dimensions Data Sources
IPA SFR CWI ERR 1FigHE 1FigSGE NCP MBA
1. Supplier + + + + + + + +
2. Relevance + + + o + + + +
3. Privacy and + + + + + +/o + +
Security
4. Delivery o + - + + o + +
5. Procedures + +/o + +/o +/o +/o o +
+, good; o, reasonable; -, poor; ?, unclear
IPA: Insurance Policy records Administration; 1FigHE: coordinated register for Higher Education
SFR: Student Finance Register; 1FigSGE: coordinated register for Secondary General Education
CWI: register of Centre for Work and Income; NCP: National Car Pass register
ERR: Exam Results Register; MBA, Dutch Municipal Base Administration 11
12. Checklist results (2) - Metadata
Table 2. Evaluation results for the Metadata hyperdimension
Dimensions Data Sources
IPA SFR CWI ERR 1FigHE 1FigSGE NCP MBA
1. Clarity + + - o + + + +
2. Comparability +/o + - + + + + +
3. Unique keys + + + + + + + +
4. Data treatment +/o ?(+) ? ?(o) ?(+) ?(+) + +
+, good; o, reasonable; -, poor; ?, unclear
IPA: Insurance Policy records Administration; 1FigHE: coordinated register for Higher Education
SFR: Student Finance Register; 1FigSGE: coordinated register for Secondary General Education
CWI: register of Centre for Work and Income; NCP: National Car Pass register
ERR: Exam Results Register; MBA, Dutch Municipal Base Administration 12
13. Overall conclusions
Data sources
• CWI only negative scoring data source
– Tempted to recommend not using it!
– Result of delivery issues and vague definitions
– However, it is the only administrative data source that contains
educational data on the non-student part of the population!
– Solve the weaknesses!!
• Other data sources
– Quite OK (there are always some things you can improve)
– Data processing by data source keeper needs attention
Checklist
– Good way to assist the user, quite fast
– Quality information on a basic but essential level
– Not all information is commonly known!
13
14. What about the Data hyperdimension
How to study data quality?
• A draft list of indicators is available
– 10 dimensions and 26 indicators
• A structured approach needs to be
developed!
1. Data inspection should be efficient
2. Assist user with scripts/software (were possible)
• ?A checklist?
14
16. Data: Technical checks
Very basic
• For RAW data
• Should be easy and quick
• No other info required!
Examples
• File size
• Number of (unique) units / records received
• Metadata compliance (standard for XML-files)
• Visual checks (Data fingerprinting)
– 2 examples
16
18. Data: Accuracy related indicators
First true indicators in the process
• Information from other data sources is required
Examples of indicator for units
• Over coverage indicator
– Units in source not belonging to NSI-population
• Under coverage indicators
– Missing units
– NSI-population units not in source
– Selectivity
– Representativity of units in data source
compared to NSI-population (RISQ-project)
• Linkability indicators
– Correct, incorrect and selectivity of linked units
18
19. Data: Output related indicators
Report data quality on an aggregated level
• Quality of the output!
• Need to link input quality to output quality
Examples of indicators:
• Precision of estimates of core variables
• Selectivity of core variable totals
19
20. How to report data quality ?
‘Quality Report Card’
• paper / computerized version
• Place were all results are combined and orderly
presented
Which indicators always?
• Is there a basic/minimum set?
• Hierarchy of quality indicators
Which indicators can be automatically determined?
• Create standardized scripts
• Create a software prototype
20
21. Future plans
Fully focus on Data hyperdimension
• Is a lot of work!
Study this in a European context
• BLUE-Enterprise and Trade Statistics project
– 7th Framework program
– From 1-4-2010 till 31-3-2013
– One of the topics is the study of admin. data quality
– This topic is studied jointly by he NSI’s of:
Netherlands, Italy, Norway, Slovakia, Sweden
21
22. Thank you for your attention!
More details in the Q2010-paper
Checklist can be obtained
• From the Statistics Netherlands website
• by mailing pjh.daas@cbs.nl and request a copy
22