Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bad Data: Why Do We Care? The Move Toward Data-Driven Government

2,090 views

Published on

Presentation on bad data presented by Stefaan G. Verhulst of The GovLab to participants of a webinar hosted by the American Society for Public Administration on July 26, 2017.

Published in: Data & Analytics
  • Be the first to comment

Bad Data: Why Do We Care? The Move Toward Data-Driven Government

  1. 1. BAD DATA THE HOBGOBLIN OF EFFECTIVE GOVERNMENT STEFAAN VERHULST
  2. 2. BAD DATA: WHY DO WE CARE? THE MOVE TOWARD DATA-DRIVEN GOVERNMENT
  3. 3. IMPROVING GOVERNMENT EMPOWERING CITIZENS CREATING OPPORTUNITY SOLVING PUBLIC PROBLEMS
  4. 4. BAD DATA: WHY DO WE CARE? IMPACTS THE QUALITY OF DECISIONS AND UNDERMINES THE RELIABILITY OF DATA-DRIVEN POLICY MAKING Kansas City audit showed that employees were labeling cases brought in through the 311 as closed, even though the problems had never been fixed. Dever’s Road Home planned to end homelessness but the city didn’t keep any data.
  5. 5. BAD DATA: WHY DO WE CARE? CIVIL LIBERTY CONCERNS AND LIABILITY RISKS Multnomah County, Oregon’s audit of the Mental Health and Addiction Services Division found errors, inconsistencies and haphazard coding. Dallas audit on the Security of Weapons Inventories and Storage found that police department employees could “add, delete and modify sensitive data.”
  6. 6. BAD DATA: WHY DO WE CARE? INCREASES COSTS, WASTE AND INEFFICIENCIES The estimated fraction of time that data scientists spend cleaning and organizing data, according to CrowdFlower. 60%
  7. 7. Undermines Trust in Government which is already at an all-time low BAD DATA: WHY DO WE CARE?
  8. 8. Usefullness? Jack Olsen: “data has quality if it satisfies the requirements of its intended use” Data attributes? Wand and Strong, propose fifteen data dimensions that determine bad/quality data assembled into four categories: ACCURACY RELEVANCY REPRESENTATION ACCESSIBILITY BAD DATA?
  9. 9. COLLECTION PROCESSING SHARING ANALYZING USING BAD DATA AND THE DATA VALUE CHAIN Poor/Dirty Data entry Insufficient security provisions Lack of interoperable institutional norms and practices Inaccurate data modeling Faulty reporting Duplication Aggregation and correlation challenges Improper or unauthorized access Biased algorithms Lack of understanding Inconsistencies Conflicting legal jurisdiction Poor problem definition/design Misinterpretation Non-representation/bias Different levels of security
  10. 10. Gartner predicts that 25% of Fortune 1000 companies will have information that is inaccurate, incomplete or duplicated. = IN OUT BAD DATA COLLECTION STAGE POOR/DIRTY DATA ENTRY DUPLICATION INCONSISTENCIES NON-REPRESENTATION/BIAS
  11. 11. CONSIDER 9,040,595,509 data records have been lost or stolen since 2013. The number of US data breaches tracked in 2016 increased over 40% from the previous year. BAD DATA PROCESSING STAGE INSUFFICIENT SECURITY PROVISIONS AGGREGATION AND CORRELATION CHALLENGES
  12. 12. IINACCURATE DATA MODELING BIASED ALGORITHMS POOR PROBLEM DEFINITION/ DESIGN BAD DATA ANALYSIS STAGE FOR INSTANCE Algorithms in Florida’s criminal courts produced biased risk predictions; African American defendants were 77% more likely to be considered “higher risk” of committing crimes than their caucasian counterparts.
  13. 13. BAD DATA USE STAGE FAULTY REPORTING LACK OF UNDERSTANDING MISINTERPRETATION
  14. 14. BAD DATA DETERMINING FACTORS •  TECHNOLOGICAL CHALLENGES AND MISCONFIGURATIONS •  INDIVIDUAL OR INSTITUTIONAL NORMS AND STANDARDS OF QUALITY •  LEGAL CONFUSION OR GAPS •  MISALIGNED INCENTIVES OR INTERESTS
  15. 15. THANK YOU stefaan@thegovlab.org thegovlab.org

×