Re-thinking Data Validation
Anette Morgils Hertz – ahz@dst.dk
Katja Overgaard – kao@dst.dk
Statistics Denmark
10/18/2019 1
Motivation
• Look at data across statistical
domains
• Special knowledge on
international production of
large multinationals
necessary
• Complex and time consuming
10/18/2019 2
Motivation
• Given the same resources available – how to establish this new
validation routine?
• Rearranging our validation routines
• Always focus on the errors that matter the most (Key account system)
• Monitoring the performance of the validation routines, to ensure they
perform optimally (monitoring report)
10/18/2019 3
Agenda
• About us
• SKV and Validation team
• Validation Routines then and now
• Monitoring the Validation Routines
• Key Account System
• Conclusion
10/18/2019 4
About us
• What is International Trade in Goods (ITGS)?
• Import/Export of goods
• Around 9000 commodity codes and 250 country codes
• Published monthly
• What is International Trade in Services (ITSS)?
• Import/Export of services
• Around 70 service codes and 250 country codes
• Published aggregated monthly and detailed quarterly
• What is Balance of Payments
• Use data from different sources incl. ITGS and ITSS
10/18/2019 5
SKV and Validation Team
• SKV team
• SKV → Company critical to our statistics
• Team dedicated to validate the reported data from these companies
• Validation team
• Responsible for all other validation and communication with the
companies
10/18/2019 6
Team No. companies
validated
Value validated
SKV 21 42,3 Bill.
Validation 82 4,01 Bill.
Validation Routines then and now
• Old system: 59 different valdation routines
• Now: Some validation routines are closed, others rearranged
10/18/2019 7
Key Account System
10/18/2019 8
One person = One company
Key Account System
10/18/2019 9
Changing weights →
Monitoring Validation Routines
• Monitoring report
• Measuring the quality of a validation routine:
ℎ𝑖𝑡 𝑟𝑎𝑡𝑒 =
𝑎𝑐𝑡𝑢𝑎𝑙 𝑒𝑟𝑟𝑜𝑟𝑠
𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑒 𝑒𝑟𝑟𝑜𝑟𝑠
10/18/2019 10
Monitoring Validation Routines
• Graphs for comparison
• Absolute errors
10/18/2019 11
Period Value in DKK Amount
2019M05 3.272.563 1
Conclusion
• To spend our resources optimally we must monitor every
validation routine
• A change of focus is necessary
• Start with the potential errors with the biggest impact
10/18/2019 12
Thank you!
Anette Morgils Hertz, email: ahz@dst.dk
Katja Overgaard, email: kao@dst.dk
10/18/2019 13

Re-thinking data validation, Anette Morgils Hertz and Katja Overgaard, Statistics Denmark

  • 1.
    Re-thinking Data Validation AnetteMorgils Hertz – ahz@dst.dk Katja Overgaard – kao@dst.dk Statistics Denmark 10/18/2019 1
  • 2.
    Motivation • Look atdata across statistical domains • Special knowledge on international production of large multinationals necessary • Complex and time consuming 10/18/2019 2
  • 3.
    Motivation • Given thesame resources available – how to establish this new validation routine? • Rearranging our validation routines • Always focus on the errors that matter the most (Key account system) • Monitoring the performance of the validation routines, to ensure they perform optimally (monitoring report) 10/18/2019 3
  • 4.
    Agenda • About us •SKV and Validation team • Validation Routines then and now • Monitoring the Validation Routines • Key Account System • Conclusion 10/18/2019 4
  • 5.
    About us • Whatis International Trade in Goods (ITGS)? • Import/Export of goods • Around 9000 commodity codes and 250 country codes • Published monthly • What is International Trade in Services (ITSS)? • Import/Export of services • Around 70 service codes and 250 country codes • Published aggregated monthly and detailed quarterly • What is Balance of Payments • Use data from different sources incl. ITGS and ITSS 10/18/2019 5
  • 6.
    SKV and ValidationTeam • SKV team • SKV → Company critical to our statistics • Team dedicated to validate the reported data from these companies • Validation team • Responsible for all other validation and communication with the companies 10/18/2019 6 Team No. companies validated Value validated SKV 21 42,3 Bill. Validation 82 4,01 Bill.
  • 7.
    Validation Routines thenand now • Old system: 59 different valdation routines • Now: Some validation routines are closed, others rearranged 10/18/2019 7
  • 8.
    Key Account System 10/18/20198 One person = One company
  • 9.
    Key Account System 10/18/20199 Changing weights →
  • 10.
    Monitoring Validation Routines •Monitoring report • Measuring the quality of a validation routine: ℎ𝑖𝑡 𝑟𝑎𝑡𝑒 = 𝑎𝑐𝑡𝑢𝑎𝑙 𝑒𝑟𝑟𝑜𝑟𝑠 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑒 𝑒𝑟𝑟𝑜𝑟𝑠 10/18/2019 10
  • 11.
    Monitoring Validation Routines •Graphs for comparison • Absolute errors 10/18/2019 11 Period Value in DKK Amount 2019M05 3.272.563 1
  • 12.
    Conclusion • To spendour resources optimally we must monitor every validation routine • A change of focus is necessary • Start with the potential errors with the biggest impact 10/18/2019 12
  • 13.
    Thank you! Anette MorgilsHertz, email: ahz@dst.dk Katja Overgaard, email: kao@dst.dk 10/18/2019 13