SlideShare a Scribd company logo
1 of 19
Quality checklist for registers applied to
      online price information and
        offline route information

          Saskia J.L. Ossen, Piet J.H. Daas,
                   and Marco Puts

                Statistics Netherlands
                May 5, 2010, Helsinki, Finland
Overview
 Introduction
 Quality framework for registers
 Checklist for registers
 Application of checklist to other data sources
 • Offline routing information
 • Online (internet) price information
 Results
 Conclusions
 Future work
Introduction
 Statistics Netherlands wants to increase the use of
  data (sources) collected and maintained by others
  • Not only registers and administrative data sources
  • But also other data sources
     – internet
     – route information
     – ….

 As a result, Statistics Netherlands becomes:
  • More dependent on data sources from others
  • Must be able to monitor the quality of those data sources
     – How?
     – By applying the earlier developed checklist for registers?
Quality framework for registers

 Statistics Netherlands has developed a framework
  for the determination of the quality of registers


 Composed of:
  • 3 high level views on quality (Hyperdimensions)
 • Each view focuses on a different group of quality
   aspects
Quality framework
3 Different high level views on quality
     Quality framework
3 Different high level views on quality

                                         METADATA:
                                           Focuses on the
  SOURCE: - Focus on data source as a whole(availability of the)
          - Mainly delivery related aspectsinformation required to
          - and some other things          understand and use the
                                           data in the data source


      SO
           UR                             A
                CE                       T
                                      A        DATA:
                                    D          - Technical checks
                                               - Accuracy related
                                                 issues
Framework composition

                            Source
      HYPERDIMENSION        Metadata
                            Data

    n>1

                                       5 for Source
                    DIMENSION
                                       4 for Metadata


           n >= 1

                            QUALITY INDICATOR



                                   1:n



                            Measurement method
Determine Source and Metadata quality

  With a checklist
   • Used for both Source and
     Metadata



  Extensively tested on registers



  What about other data sources?
Apply checklist to other sources

 (1) Offline route information
 • For Transport statistics
      – Check number of km driven
      – Border crossing(s)
 Price information on the internet (www)
 •   (2) Flight ticket prices (manual and automatic)
 •   (3) Supermarket product prices
 •   (4) House prices
 •   (5) Product prices of unmanned filling stations
Approach used for testing checklist

     Applied the checklist to 5 data sources
    1. Looked at the scores obtained
       •   Identify quality issues
    2. Ease of use of checklist
       •   Applicability of questions
    3. Missing quality aspects
       •   Are any indicators missing?
Checklist scores (1) - Source

Table 1 Evaluation results for the Source hyperdimension
                       Offline   route   Internet Prices
                       information
                                         Supermarket       Prices   of   Prices         of   Prices of flight
                                         prices            houses        filling stations    tickets

Supplier               +                 ?                 ?             ?                   ?
Relevance              +                 +                 ?             ?                   +
Privacy and security   +                 +                 +             +                   +
Delivery               +                 +                 +             +                   +
Procedures             +/ o              o/+               o/+           o                   o
+, good; o, reasonable; -, poor; ?, unclear
Source conclusions

 Route information resembles registers a lot, no
  quality issues identified

 Internet data, more difficult
 • Who supplies price information on website?

 • Legal issues of collecting data via websites
 • Website change, often unexpected

 • No real deliveries when collecting internet data
Checklist scores (2) - Metadata

  Table 1 Evaluation results for the Metadata hyperdimension
                        Offline   route   Internet Prices
                        information
                                          Supermarket       Prices   of   Prices         of   Prices of flight
                                          prices            houses        filling stations    tickets

  Clarity               +                 +/o               +/o           +/ o                +/ o
  Comparability         +                 +                 ?             ?                   +
  Unique keys           +                 +                 +             +                   +
  Data treatment        o                 +                 +             +                   +

+, good; o, reasonable; -, poor; ?, unclear
Metadata conclusions

 No major issues for the Metadata part of checklist

 Routing information, no problems

 Internet data, somewhat more difficult
 • Clarity of internet population

 • Clarity of time periods to which prices refer
Checklist applicability
 Table 5 Applicability of the quality checklist for the Source hyperdimension
                                    Offline route information         Internet prices
             Supplier                           +                            -
            Relevance                           +                           +
       Privacy and security                     o                            o
             Delivery                           +                            -
           Procedures                           +                            o

 Table 6 Applicability of the quality checklist for the Metadata hyperdimension
                                    Offline route information         Internet prices
             Clarity                            +                           +
          Comparability                         +                           +
           Unique keys                          +                           +
          Data treatment                        +                            o


relevant (+), partly relevant (o), generally not directly applicable (-)
Missing quality aspects

 Only for internet data
 •   Availability of the website
 •   Burden on website
 •   Errors in data on website
 •   Representativity of website information
 •   Possibility for automatically collecting data
Overall conclusions
 Source hyperdimension
 • Directly applicable to route information
 • Inherent differences for internet prices
 Metadata hyperdimension
 • Generally applicable

 Future research will focus on:
 • Adapting checklist to internet data
 • Legal issues for internet data
 • Data quality
Thank you for your attention!

 Questions?

More Related Content

Similar to Quality checklist for registers applied to online price information and offline route information.

Lowry colorado state address dataset data quality
Lowry colorado state address dataset data qualityLowry colorado state address dataset data quality
Lowry colorado state address dataset data quality
GeCo in the Rockies
 

Similar to Quality checklist for registers applied to online price information and offline route information. (13)

Determination of administrative data quality: recent results and new developm...
Determination of administrative data quality: recent results and new developm...Determination of administrative data quality: recent results and new developm...
Determination of administrative data quality: recent results and new developm...
 
Not all data is born equal - B.C Open Data Summit 2013
Not all data is born equal - B.C Open Data Summit 2013Not all data is born equal - B.C Open Data Summit 2013
Not all data is born equal - B.C Open Data Summit 2013
 
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesPragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
 
Quality key users
Quality key usersQuality key users
Quality key users
 
Lowry colorado state address dataset data quality
Lowry colorado state address dataset data qualityLowry colorado state address dataset data quality
Lowry colorado state address dataset data quality
 
Methods for making the best use of admin data
Methods for making the best use of admin dataMethods for making the best use of admin data
Methods for making the best use of admin data
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data Governance
 
2012 09 moldovan_internet_landscape
2012 09 moldovan_internet_landscape2012 09 moldovan_internet_landscape
2012 09 moldovan_internet_landscape
 
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
 
Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13
 
Proposal for a quality framework for the evaluation of administrative and sur...
Proposal for a quality framework for the evaluation of administrative and sur...Proposal for a quality framework for the evaluation of administrative and sur...
Proposal for a quality framework for the evaluation of administrative and sur...
 
Tatiana Stebakova
Tatiana StebakovaTatiana Stebakova
Tatiana Stebakova
 
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
 

More from Piet J.H. Daas

More from Piet J.H. Daas (20)

Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their use
 
IT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics NetherlandsIT infrastructure for Big Data and Data Science at Statistics Netherlands
IT infrastructure for Big Data and Data Science at Statistics Netherlands
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniques
 
Use of social media for official statistics
Use of social media for official statisticsUse of social media for official statistics
Use of social media for official statistics
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics Netherlands
 
CBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONSCBS lecture at the opening of Data Science Campus of ONS
CBS lecture at the opening of Data Science Campus of ONS
 
Ntts2017 presentation 45
Ntts2017 presentation 45Ntts2017 presentation 45
Ntts2017 presentation 45
 
Big Data presentation Mannheim
Big Data presentation MannheimBig Data presentation Mannheim
Big Data presentation Mannheim
 
Extracting information from ' messy' social media data
Extracting information from ' messy' social media dataExtracting information from ' messy' social media data
Extracting information from ' messy' social media data
 
Big data cbs_piet_daas
Big data cbs_piet_daasBig data cbs_piet_daas
Big data cbs_piet_daas
 
Gebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiekGebruik van sociale media voor de officiële statistiek
Gebruik van sociale media voor de officiële statistiek
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
Profiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivityProfiling Big Data sources to assess their selectivity
Profiling Big Data sources to assess their selectivity
 
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyUsing Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
 
Big Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in EindhovenBig Data @ CBS for Fontys students in Eindhoven
Big Data @ CBS for Fontys students in Eindhoven
 
Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
Quality challenges in modernising business statistics
Quality challenges in modernising business statisticsQuality challenges in modernising business statistics
Quality challenges in modernising business statistics
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big Data
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

Quality checklist for registers applied to online price information and offline route information.

  • 1. Quality checklist for registers applied to online price information and offline route information Saskia J.L. Ossen, Piet J.H. Daas, and Marco Puts Statistics Netherlands May 5, 2010, Helsinki, Finland
  • 2. Overview  Introduction  Quality framework for registers  Checklist for registers  Application of checklist to other data sources • Offline routing information • Online (internet) price information  Results  Conclusions  Future work
  • 3. Introduction  Statistics Netherlands wants to increase the use of data (sources) collected and maintained by others • Not only registers and administrative data sources • But also other data sources – internet – route information – ….  As a result, Statistics Netherlands becomes: • More dependent on data sources from others • Must be able to monitor the quality of those data sources – How? – By applying the earlier developed checklist for registers?
  • 4. Quality framework for registers  Statistics Netherlands has developed a framework for the determination of the quality of registers  Composed of: • 3 high level views on quality (Hyperdimensions) • Each view focuses on a different group of quality aspects
  • 6. 3 Different high level views on quality Quality framework
  • 7. 3 Different high level views on quality METADATA: Focuses on the SOURCE: - Focus on data source as a whole(availability of the) - Mainly delivery related aspectsinformation required to - and some other things understand and use the data in the data source SO UR A CE T A DATA: D - Technical checks - Accuracy related issues
  • 8. Framework composition Source HYPERDIMENSION Metadata Data n>1 5 for Source DIMENSION 4 for Metadata n >= 1 QUALITY INDICATOR 1:n Measurement method
  • 9. Determine Source and Metadata quality  With a checklist • Used for both Source and Metadata  Extensively tested on registers  What about other data sources?
  • 10. Apply checklist to other sources  (1) Offline route information • For Transport statistics – Check number of km driven – Border crossing(s)  Price information on the internet (www) • (2) Flight ticket prices (manual and automatic) • (3) Supermarket product prices • (4) House prices • (5) Product prices of unmanned filling stations
  • 11. Approach used for testing checklist  Applied the checklist to 5 data sources 1. Looked at the scores obtained • Identify quality issues 2. Ease of use of checklist • Applicability of questions 3. Missing quality aspects • Are any indicators missing?
  • 12. Checklist scores (1) - Source Table 1 Evaluation results for the Source hyperdimension Offline route Internet Prices information Supermarket Prices of Prices of Prices of flight prices houses filling stations tickets Supplier + ? ? ? ? Relevance + + ? ? + Privacy and security + + + + + Delivery + + + + + Procedures +/ o o/+ o/+ o o +, good; o, reasonable; -, poor; ?, unclear
  • 13. Source conclusions  Route information resembles registers a lot, no quality issues identified  Internet data, more difficult • Who supplies price information on website? • Legal issues of collecting data via websites • Website change, often unexpected • No real deliveries when collecting internet data
  • 14. Checklist scores (2) - Metadata Table 1 Evaluation results for the Metadata hyperdimension Offline route Internet Prices information Supermarket Prices of Prices of Prices of flight prices houses filling stations tickets Clarity + +/o +/o +/ o +/ o Comparability + + ? ? + Unique keys + + + + + Data treatment o + + + + +, good; o, reasonable; -, poor; ?, unclear
  • 15. Metadata conclusions  No major issues for the Metadata part of checklist  Routing information, no problems  Internet data, somewhat more difficult • Clarity of internet population • Clarity of time periods to which prices refer
  • 16. Checklist applicability Table 5 Applicability of the quality checklist for the Source hyperdimension Offline route information Internet prices Supplier + - Relevance + + Privacy and security o o Delivery + - Procedures + o Table 6 Applicability of the quality checklist for the Metadata hyperdimension Offline route information Internet prices Clarity + + Comparability + + Unique keys + + Data treatment + o relevant (+), partly relevant (o), generally not directly applicable (-)
  • 17. Missing quality aspects  Only for internet data • Availability of the website • Burden on website • Errors in data on website • Representativity of website information • Possibility for automatically collecting data
  • 18. Overall conclusions  Source hyperdimension • Directly applicable to route information • Inherent differences for internet prices  Metadata hyperdimension • Generally applicable  Future research will focus on: • Adapting checklist to internet data • Legal issues for internet data • Data quality
  • 19. Thank you for your attention!  Questions?