SlideShare a Scribd company logo
Determination of Administrative Data Quality:
   Recent results and new developments


           Piet J.H. Daas, Saskia J.L. Ossen,
                  and Martijn Tennekes

                 Statistics Netherlands
                 May 6, 2010, Helsinki, Finland
Overview

  Introduction
  View on quality
  Framework developed for admin. data sources
  • Construction and composition
  Application (first part)
  • Checklist and results
  New developments
  • Ideas and future work
  • BLUE-ETS


                                           2
Introduction
 Statistics Netherlands increases the use of
  data (sources) collected and maintained by
  others
  • To decrease response burden and costs


 As a result, Statistics Netherlands becomes:
  • More dependent on administrative data sources
  • Must be able to monitor the quality of those data
    sources
     – What is ‘quality’ in this context?



                                                        3
View on quality

 Statistics Netherlands defines quality of
  administrative data sources as:
      “Usability for the production of statistics”

 Differs from ‘quality’ as used by the data source
  keeper
     – Often does not have statistical use in mind
     – Can’t use the quality report of the data source
       keeper (if available)

 And it is quality of the input !

                                                         4
Framework developed
 No standard framework available for input quality
  of administrative data sources

 Quality of administrative data is only occasionally
  observed in the literature
 • Majority of studies on quality and statistics focus on:
     – output quality
     – quality of survey data


 Framework for the determination of the quality of
  administrative data sources based on:
 • Statistics Netherlands experiences and ideas
 • Including the results published by others                 5
Framework overview (1)
 Many quality indicators were identified
 • In total 57!

 Many dimensions were identified
 • In total 19

 How to combine and structure these indicators?
 • Distinguish different views on quality
 • Alternative name is Hyperdimensions

 3 Hyperdimensions were required to combine all
  quality indicators into a single framework !!
 • First step towards a structured approach
                                              6
Framework overview (2)
 Three high level views on the input quality of
  administrative data sources
   • 3 hyperdimensions




                                                   7
3 Different high level views on quality
     Framework overview (2)
    Three high level views on the input quality of
     administrative data sources
      • 3 hyperdimensions




                                                      8
3 Different high level views on quality

                                       METADATA:
                                        Focuses on the
  SOURCE: - Focus on data source as a whole
                                        (availability of the)
          - Delivery related aspects    information required to
          - and some other things       understand and use the
                                        data in the data source


      SO
          UR                            A
               CE                      T
                                     A       DATA:
                                   D         - Technical checks
                                             - Accuracy related
                                               issues
                                                        9
Determine Source and Metadata quality

  With a checklist
   • Used for both Source and
     Metadata

  Tested 8 administrative data sources
  • Took on average about 2 hours per
    data source

  Results expressed at the
   dimensional level
  • 5 for Source, 4 for Metadata


                                          10
Checklist results (1) - Source

    Table 1. Evaluation results for the Source hyperdimension
        Dimensions                                         Data Sources
                             IPA      SFR       CWI       ERR     1FigHE 1FigSGE        NCP      MBA
    1. Supplier               +        +          +        +         +          +         +        +
    2. Relevance              +        +          +         o        +          +         +         +
    3. Privacy and            +        +          +         +        +         +/o        +         +
    Security
    4. Delivery               o        +          -         +        +          o         +         +
    5. Procedures             +       +/o         +        +/o      +/o        +/o        o         +
+, good; o, reasonable; -, poor; ?, unclear



IPA: Insurance Policy records Administration;   1FigHE: coordinated register for Higher Education
SFR: Student Finance Register;                  1FigSGE: coordinated register for Secondary General Education
CWI: register of Centre for Work and Income;    NCP: National Car Pass register
ERR: Exam Results Register;                     MBA, Dutch Municipal Base Administration          11
Checklist results (2) - Metadata

        Table 2. Evaluation results for the Metadata hyperdimension
            Dimensions                                        Data Sources
                                IPA      SFR       CWI      ERR      1FigHE 1FigSGE        NCP     MBA
        1. Clarity               +        +          -        o         +         +         +         +
        2. Comparability        +/o        +         -        +         +         +         +         +
        3. Unique keys           +         +         +        +         +         +         +         +
        4. Data treatment       +/o      ?(+)        ?       ?(o)      ?(+)      ?(+)       +         +


            +, good; o, reasonable; -, poor; ?, unclear




IPA: Insurance Policy records Administration;   1FigHE: coordinated register for Higher Education
SFR: Student Finance Register;                  1FigSGE: coordinated register for Secondary General Education
CWI: register of Centre for Work and Income;    NCP: National Car Pass register
ERR: Exam Results Register;                     MBA, Dutch Municipal Base Administration          12
Overall conclusions
 Data sources
 • CWI only negative scoring data source
    – Tempted to recommend not using it!
        – Result of delivery issues and vague definitions
    – However, it is the only administrative data source that contains
      educational data on the non-student part of the population!
    – Solve the weaknesses!!
 • Other data sources
    – Quite OK (there are always some things you can improve)
    – Data processing by data source keeper needs attention

 Checklist
    – Good way to assist the user, quite fast
    – Quality information on a basic but essential level
    – Not all information is commonly known!
                                                             13
What about the Data hyperdimension

  How to study data quality?
   • A draft list of indicators is available
      – 10 dimensions and 26 indicators
  • A structured approach needs to be
    developed!
      1. Data inspection should be efficient
      2. Assist user with scripts/software (were possible)
  • ?A checklist?



                                                        14
Overview of data quality approach




                                15
Data: Technical checks

 Very basic
 • For RAW data
 • Should be easy and quick
 • No other info required!

 Examples
 •   File size
 •   Number of (unique) units / records received
 •   Metadata compliance (standard for XML-files)
 •   Visual checks (Data fingerprinting)
      – 2 examples

                                                    16
Technical checks: Visualization examples

                        Missing data




‘Data fingerprinting’
                                       17
Data: Accuracy related indicators
 First true indicators in the process
 • Information from other data sources is required

 Examples of indicator for units
 • Over coverage indicator
         – Units in source not belonging to NSI-population
 • Under coverage indicators
     – Missing units
         – NSI-population units not in source
     – Selectivity
         – Representativity of units in data source
           compared to NSI-population (RISQ-project)
 • Linkability indicators
         – Correct, incorrect and selectivity of linked units
                                                                18
Data: Output related indicators

 Report data quality on an aggregated level
 • Quality of the output!
 • Need to link input quality to output quality


 Examples of indicators:
 • Precision of estimates of core variables
 • Selectivity of core variable totals




                                                  19
How to report data quality ?
 ‘Quality Report Card’
 • paper / computerized version
 • Place were all results are combined and orderly
   presented

 Which indicators always?
 • Is there a basic/minimum set?
 • Hierarchy of quality indicators

 Which indicators can be automatically determined?
 • Create standardized scripts
 • Create a software prototype
                                                20
Future plans

 Fully focus on Data hyperdimension
 • Is a lot of work!


 Study this in a European context
 • BLUE-Enterprise and Trade Statistics project
     –   7th Framework program
     –   From 1-4-2010 till 31-3-2013
     –   One of the topics is the study of admin. data quality
     –   This topic is studied jointly by he NSI’s of:
         Netherlands, Italy, Norway, Slovakia, Sweden


                                                                 21
Thank you for your attention!

 More details in the Q2010-paper
 Checklist can be obtained
 • From the Statistics Netherlands website
 • by mailing pjh.daas@cbs.nl and request a copy




                                               22

More Related Content

Viewers also liked

UMC Utrecht SAS Forum 2014
UMC Utrecht SAS Forum 2014UMC Utrecht SAS Forum 2014
UMC Utrecht SAS Forum 2014henkstobbe
 
Chapter9 International Finance Management
Chapter9 International Finance ManagementChapter9 International Finance Management
Chapter9 International Finance ManagementPiyush Gaur
 
Webinar slides: The Holy Grail Webinar: Become a MySQL DBA - Database Perform...
Webinar slides: The Holy Grail Webinar: Become a MySQL DBA - Database Perform...Webinar slides: The Holy Grail Webinar: Become a MySQL DBA - Database Perform...
Webinar slides: The Holy Grail Webinar: Become a MySQL DBA - Database Perform...Severalnines
 
LA DIVERSIDAD E INCLSION SOCIAL
LA DIVERSIDAD E INCLSION SOCIAL LA DIVERSIDAD E INCLSION SOCIAL
LA DIVERSIDAD E INCLSION SOCIAL leygarzuri
 
Robots are among us, but who takes responsibility?
Robots are among us, but who takes responsibility?Robots are among us, but who takes responsibility?
Robots are among us, but who takes responsibility?Cyber Security Alliance
 
WINs Process Mapping - Risk Assessment Session
WINs Process Mapping - Risk Assessment SessionWINs Process Mapping - Risk Assessment Session
WINs Process Mapping - Risk Assessment Sessionjohncarrollcanyon
 
Sharding with spider solutions 20160721
Sharding with spider solutions 20160721Sharding with spider solutions 20160721
Sharding with spider solutions 20160721Kentoku
 
Credit Impairment under IFRS 9 for Banks
Credit Impairment under IFRS 9 for BanksCredit Impairment under IFRS 9 for Banks
Credit Impairment under IFRS 9 for BanksFaraz Zuberi
 
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) ItalyClustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) ItalyGiovanni Toraldo
 
Atlantic View Restaurant Professional Service Training Manual 2nd Edition
Atlantic View Restaurant Professional Service Training Manual 2nd EditionAtlantic View Restaurant Professional Service Training Manual 2nd Edition
Atlantic View Restaurant Professional Service Training Manual 2nd EditionSamuel D. Anthony
 
Getting the Google Search results you want
Getting the Google Search results you wantGetting the Google Search results you want
Getting the Google Search results you wantChris Myers
 

Viewers also liked (14)

UMC Utrecht SAS Forum 2014
UMC Utrecht SAS Forum 2014UMC Utrecht SAS Forum 2014
UMC Utrecht SAS Forum 2014
 
Chapter9 International Finance Management
Chapter9 International Finance ManagementChapter9 International Finance Management
Chapter9 International Finance Management
 
Webinar slides: The Holy Grail Webinar: Become a MySQL DBA - Database Perform...
Webinar slides: The Holy Grail Webinar: Become a MySQL DBA - Database Perform...Webinar slides: The Holy Grail Webinar: Become a MySQL DBA - Database Perform...
Webinar slides: The Holy Grail Webinar: Become a MySQL DBA - Database Perform...
 
How to Use Picmonkey
How to Use PicmonkeyHow to Use Picmonkey
How to Use Picmonkey
 
Prospectiva (1)
Prospectiva (1)Prospectiva (1)
Prospectiva (1)
 
LA DIVERSIDAD E INCLSION SOCIAL
LA DIVERSIDAD E INCLSION SOCIAL LA DIVERSIDAD E INCLSION SOCIAL
LA DIVERSIDAD E INCLSION SOCIAL
 
Robots are among us, but who takes responsibility?
Robots are among us, but who takes responsibility?Robots are among us, but who takes responsibility?
Robots are among us, but who takes responsibility?
 
WINs Process Mapping - Risk Assessment Session
WINs Process Mapping - Risk Assessment SessionWINs Process Mapping - Risk Assessment Session
WINs Process Mapping - Risk Assessment Session
 
Sharding with spider solutions 20160721
Sharding with spider solutions 20160721Sharding with spider solutions 20160721
Sharding with spider solutions 20160721
 
Ky nang lanh dao
Ky nang lanh daoKy nang lanh dao
Ky nang lanh dao
 
Credit Impairment under IFRS 9 for Banks
Credit Impairment under IFRS 9 for BanksCredit Impairment under IFRS 9 for Banks
Credit Impairment under IFRS 9 for Banks
 
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) ItalyClustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
 
Atlantic View Restaurant Professional Service Training Manual 2nd Edition
Atlantic View Restaurant Professional Service Training Manual 2nd EditionAtlantic View Restaurant Professional Service Training Manual 2nd Edition
Atlantic View Restaurant Professional Service Training Manual 2nd Edition
 
Getting the Google Search results you want
Getting the Google Search results you wantGetting the Google Search results you want
Getting the Google Search results you want
 

Similar to Determination of administrative data quality: recent results and new developments.

Assessing M&E Systems For Data Quality
Assessing M&E Systems For Data QualityAssessing M&E Systems For Data Quality
Assessing M&E Systems For Data QualityMEASURE Evaluation
 
Provider workshop 11.14.12
Provider workshop 11.14.12Provider workshop 11.14.12
Provider workshop 11.14.12progroup
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
 
Evaluation and visualisation of the quality of administrative sources used fo...
Evaluation and visualisation of the quality of administrative sources used fo...Evaluation and visualisation of the quality of administrative sources used fo...
Evaluation and visualisation of the quality of administrative sources used fo...Piet J.H. Daas
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
Sure Fire Ways to Succeed with Data Analytics
Sure Fire Ways to Succeed with Data AnalyticsSure Fire Ways to Succeed with Data Analytics
Sure Fire Ways to Succeed with Data AnalyticsJim Kaplan CIA CFE
 
Data analytics and audit coverage guide
Data analytics and audit coverage guideData analytics and audit coverage guide
Data analytics and audit coverage guideAstalapulosListestos
 
Data analytics and audit coverage guide
Data analytics and audit coverage guideData analytics and audit coverage guide
Data analytics and audit coverage guideCenapSerdarolu
 
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesPragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesAmit Sheth
 
Audit Webinar: Surefire ways to succeed with Data Analytics
Audit Webinar: Surefire ways to succeed with Data AnalyticsAudit Webinar: Surefire ways to succeed with Data Analytics
Audit Webinar: Surefire ways to succeed with Data AnalyticsCaseWare IDEA
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesPiet J.H. Daas
 
Predictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxPredictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxSaminaNawaz14
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataPrecisely
 
Asyma E3 2014 The 5 Biggest Business Challenges and some tools to help you ...
Asyma E3 2014   The 5 Biggest Business Challenges and some tools to help you ...Asyma E3 2014   The 5 Biggest Business Challenges and some tools to help you ...
Asyma E3 2014 The 5 Biggest Business Challenges and some tools to help you ...asyma
 
Data Quality Presentation.ppt
Data Quality Presentation.pptData Quality Presentation.ppt
Data Quality Presentation.pptmusa_s
 

Similar to Determination of administrative data quality: recent results and new developments. (20)

Assessing M&E Systems For Data Quality
Assessing M&E Systems For Data QualityAssessing M&E Systems For Data Quality
Assessing M&E Systems For Data Quality
 
Provider workshop 11.14.12
Provider workshop 11.14.12Provider workshop 11.14.12
Provider workshop 11.14.12
 
dimensions_of_data_quality.pptx
dimensions_of_data_quality.pptxdimensions_of_data_quality.pptx
dimensions_of_data_quality.pptx
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Evaluation and visualisation of the quality of administrative sources used fo...
Evaluation and visualisation of the quality of administrative sources used fo...Evaluation and visualisation of the quality of administrative sources used fo...
Evaluation and visualisation of the quality of administrative sources used fo...
 
Konrad cedem praesi
Konrad cedem praesiKonrad cedem praesi
Konrad cedem praesi
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Sure Fire Ways to Succeed with Data Analytics
Sure Fire Ways to Succeed with Data AnalyticsSure Fire Ways to Succeed with Data Analytics
Sure Fire Ways to Succeed with Data Analytics
 
Data analytics and audit coverage guide
Data analytics and audit coverage guideData analytics and audit coverage guide
Data analytics and audit coverage guide
 
Data analytics and audit coverage guide
Data analytics and audit coverage guideData analytics and audit coverage guide
Data analytics and audit coverage guide
 
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesPragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
 
Audit Webinar: Surefire ways to succeed with Data Analytics
Audit Webinar: Surefire ways to succeed with Data AnalyticsAudit Webinar: Surefire ways to succeed with Data Analytics
Audit Webinar: Surefire ways to succeed with Data Analytics
 
Nikhil (1)
Nikhil (1)Nikhil (1)
Nikhil (1)
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniques
 
Predictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxPredictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptx
 
PPX January 2016 LE
PPX January 2016 LEPPX January 2016 LE
PPX January 2016 LE
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
 
Asyma E3 2014 The 5 Biggest Business Challenges and some tools to help you ...
Asyma E3 2014   The 5 Biggest Business Challenges and some tools to help you ...Asyma E3 2014   The 5 Biggest Business Challenges and some tools to help you ...
Asyma E3 2014 The 5 Biggest Business Challenges and some tools to help you ...
 
Data Quality Presentation.ppt
Data Quality Presentation.pptData Quality Presentation.ppt
Data Quality Presentation.ppt
 
Data Quality Presentation.ppt
Data Quality Presentation.pptData Quality Presentation.ppt
Data Quality Presentation.ppt
 

More from Piet J.H. Daas

Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their usePiet J.H. Daas
 
IT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsIT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsPiet J.H. Daas
 
Use of social media for official statistics
Use of social media for official statisticsUse of social media for official statistics
Use of social media for official statisticsPiet J.H. Daas
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasPiet J.H. Daas
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsPiet J.H. Daas
 
CBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSCBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSPiet J.H. Daas
 
Ntts2017 presentation 45
Ntts2017 presentation 45Ntts2017 presentation 45
Ntts2017 presentation 45Piet J.H. Daas
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation MannheimPiet J.H. Daas
 
Extracting information from ' messy' social media data
Extracting information from ' messy' social media dataExtracting information from ' messy' social media data
Extracting information from ' messy' social media dataPiet J.H. Daas
 
Big data cbs_piet_daas
Big data cbs_piet_daasBig data cbs_piet_daas
Big data cbs_piet_daasPiet J.H. Daas
 
Gebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekGebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekPiet J.H. Daas
 
Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityPiet J.H. Daas
 
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyUsing Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyPiet J.H. Daas
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenPiet J.H. Daas
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaPiet J.H. Daas
 
Quality challenges in modernising business statistics
Quality challenges in modernising business statisticsQuality challenges in modernising business statistics
Quality challenges in modernising business statisticsPiet J.H. Daas
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big DataPiet J.H. Daas
 
Social media sentiment and consumer confidence
Social media sentiment and consumer confidenceSocial media sentiment and consumer confidence
Social media sentiment and consumer confidencePiet J.H. Daas
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 

More from Piet J.H. Daas (20)

Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their use
 
IT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsIT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics Netherlands
 
Use of social media for official statistics
Use of social media for official statisticsUse of social media for official statistics
Use of social media for official statistics
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics Netherlands
 
CBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSCBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONS
 
Ntts2017 presentation 45
Ntts2017 presentation 45Ntts2017 presentation 45
Ntts2017 presentation 45
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation Mannheim
 
Extracting information from ' messy' social media data
Extracting information from ' messy' social media dataExtracting information from ' messy' social media data
Extracting information from ' messy' social media data
 
Big data cbs_piet_daas
Big data cbs_piet_daasBig data cbs_piet_daas
Big data cbs_piet_daas
 
Gebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekGebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiek
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivity
 
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyUsing Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in Eindhoven
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
Quality challenges in modernising business statistics
Quality challenges in modernising business statisticsQuality challenges in modernising business statistics
Quality challenges in modernising business statistics
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big Data
 
Social media sentiment and consumer confidence
Social media sentiment and consumer confidenceSocial media sentiment and consumer confidence
Social media sentiment and consumer confidence
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 

Recently uploaded

How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsCol Mukteshwar Prasad
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...Nguyen Thanh Tu Collection
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...Denish Jangid
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfkaushalkr1407
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersPedroFerreira53928
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfVivekanand Anglo Vedic Academy
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasGeoBlogs
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativePeter Windle
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chipsGeoBlogs
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...Jisc
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345beazzy04
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePedroFerreira53928
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfYibeltalNibretu
 
Benefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational ResourcesBenefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational Resourcesdimpy50
 

Recently uploaded (20)

How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
 
Benefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational ResourcesBenefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational Resources
 

Determination of administrative data quality: recent results and new developments.

  • 1. Determination of Administrative Data Quality: Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands May 6, 2010, Helsinki, Finland
  • 2. Overview  Introduction  View on quality  Framework developed for admin. data sources • Construction and composition  Application (first part) • Checklist and results  New developments • Ideas and future work • BLUE-ETS 2
  • 3. Introduction  Statistics Netherlands increases the use of data (sources) collected and maintained by others • To decrease response burden and costs  As a result, Statistics Netherlands becomes: • More dependent on administrative data sources • Must be able to monitor the quality of those data sources – What is ‘quality’ in this context? 3
  • 4. View on quality  Statistics Netherlands defines quality of administrative data sources as: “Usability for the production of statistics”  Differs from ‘quality’ as used by the data source keeper – Often does not have statistical use in mind – Can’t use the quality report of the data source keeper (if available)  And it is quality of the input ! 4
  • 5. Framework developed  No standard framework available for input quality of administrative data sources  Quality of administrative data is only occasionally observed in the literature • Majority of studies on quality and statistics focus on: – output quality – quality of survey data  Framework for the determination of the quality of administrative data sources based on: • Statistics Netherlands experiences and ideas • Including the results published by others 5
  • 6. Framework overview (1)  Many quality indicators were identified • In total 57!  Many dimensions were identified • In total 19  How to combine and structure these indicators? • Distinguish different views on quality • Alternative name is Hyperdimensions  3 Hyperdimensions were required to combine all quality indicators into a single framework !! • First step towards a structured approach 6
  • 7. Framework overview (2)  Three high level views on the input quality of administrative data sources • 3 hyperdimensions 7
  • 8. 3 Different high level views on quality Framework overview (2)  Three high level views on the input quality of administrative data sources • 3 hyperdimensions 8
  • 9. 3 Different high level views on quality METADATA: Focuses on the SOURCE: - Focus on data source as a whole (availability of the) - Delivery related aspects information required to - and some other things understand and use the data in the data source SO UR A CE T A DATA: D - Technical checks - Accuracy related issues 9
  • 10. Determine Source and Metadata quality  With a checklist • Used for both Source and Metadata  Tested 8 administrative data sources • Took on average about 2 hours per data source  Results expressed at the dimensional level • 5 for Source, 4 for Metadata 10
  • 11. Checklist results (1) - Source Table 1. Evaluation results for the Source hyperdimension Dimensions Data Sources IPA SFR CWI ERR 1FigHE 1FigSGE NCP MBA 1. Supplier + + + + + + + + 2. Relevance + + + o + + + + 3. Privacy and + + + + + +/o + + Security 4. Delivery o + - + + o + + 5. Procedures + +/o + +/o +/o +/o o + +, good; o, reasonable; -, poor; ?, unclear IPA: Insurance Policy records Administration; 1FigHE: coordinated register for Higher Education SFR: Student Finance Register; 1FigSGE: coordinated register for Secondary General Education CWI: register of Centre for Work and Income; NCP: National Car Pass register ERR: Exam Results Register; MBA, Dutch Municipal Base Administration 11
  • 12. Checklist results (2) - Metadata Table 2. Evaluation results for the Metadata hyperdimension Dimensions Data Sources IPA SFR CWI ERR 1FigHE 1FigSGE NCP MBA 1. Clarity + + - o + + + + 2. Comparability +/o + - + + + + + 3. Unique keys + + + + + + + + 4. Data treatment +/o ?(+) ? ?(o) ?(+) ?(+) + + +, good; o, reasonable; -, poor; ?, unclear IPA: Insurance Policy records Administration; 1FigHE: coordinated register for Higher Education SFR: Student Finance Register; 1FigSGE: coordinated register for Secondary General Education CWI: register of Centre for Work and Income; NCP: National Car Pass register ERR: Exam Results Register; MBA, Dutch Municipal Base Administration 12
  • 13. Overall conclusions  Data sources • CWI only negative scoring data source – Tempted to recommend not using it! – Result of delivery issues and vague definitions – However, it is the only administrative data source that contains educational data on the non-student part of the population! – Solve the weaknesses!! • Other data sources – Quite OK (there are always some things you can improve) – Data processing by data source keeper needs attention  Checklist – Good way to assist the user, quite fast – Quality information on a basic but essential level – Not all information is commonly known! 13
  • 14. What about the Data hyperdimension  How to study data quality? • A draft list of indicators is available – 10 dimensions and 26 indicators • A structured approach needs to be developed! 1. Data inspection should be efficient 2. Assist user with scripts/software (were possible) • ?A checklist? 14
  • 15. Overview of data quality approach 15
  • 16. Data: Technical checks  Very basic • For RAW data • Should be easy and quick • No other info required!  Examples • File size • Number of (unique) units / records received • Metadata compliance (standard for XML-files) • Visual checks (Data fingerprinting) – 2 examples 16
  • 17. Technical checks: Visualization examples Missing data ‘Data fingerprinting’ 17
  • 18. Data: Accuracy related indicators  First true indicators in the process • Information from other data sources is required  Examples of indicator for units • Over coverage indicator – Units in source not belonging to NSI-population • Under coverage indicators – Missing units – NSI-population units not in source – Selectivity – Representativity of units in data source compared to NSI-population (RISQ-project) • Linkability indicators – Correct, incorrect and selectivity of linked units 18
  • 19. Data: Output related indicators  Report data quality on an aggregated level • Quality of the output! • Need to link input quality to output quality  Examples of indicators: • Precision of estimates of core variables • Selectivity of core variable totals 19
  • 20. How to report data quality ?  ‘Quality Report Card’ • paper / computerized version • Place were all results are combined and orderly presented  Which indicators always? • Is there a basic/minimum set? • Hierarchy of quality indicators  Which indicators can be automatically determined? • Create standardized scripts • Create a software prototype 20
  • 21. Future plans  Fully focus on Data hyperdimension • Is a lot of work!  Study this in a European context • BLUE-Enterprise and Trade Statistics project – 7th Framework program – From 1-4-2010 till 31-3-2013 – One of the topics is the study of admin. data quality – This topic is studied jointly by he NSI’s of: Netherlands, Italy, Norway, Slovakia, Sweden 21
  • 22. Thank you for your attention!  More details in the Q2010-paper  Checklist can be obtained • From the Statistics Netherlands website • by mailing pjh.daas@cbs.nl and request a copy 22