Data Quality
Issues and “Fixes”
Two Definitions of Quality
• Conformance to Requirements
• (Traditional Producer-Oriented
  Definition)
• Fitness for Use
• (Modern Client-Oriented
  Definition)
Definition of Process Quality
• Process Improvements Focus
• (Do It Right the First Time)
• Can be Reduced to Slogans
• Can also lead to Continuous
  Improvements
• Kaisen
Be Real Four Quality Costs
• Costs of Reputation and Loss of
  Business from Inaction
• Cost of Prevention to Avoid Errors
• Cost of Detection to Find Errors
• Cost of Repairing Errors Found
Quality and Cost 2 Worlds
Repair Methods
• Goal is “Fixing” to Fit Use
• Data Editing
• Data Imputation
• Data Fabrication
• Raking at NSS
Data Editing
• Honest Differences of Opinion or
  Real Errors?
• Need for Redundancy in System for
  Can’t Fail Items
• Achieving Measurability to Frame
  Expectations and Improvements
Data Editing Techniques
•   Minimizing Processing Errors
•   Definitional (e.g., Range) Tests
•   Deterministic Tests
•   Probabilistic Tests
    – Outlier Tests
    – Ratio Tests
Types of Edits Illustrated
• Range Test
    Age Negative
• Deterministic Tests
    If Age =14, then code as Child
• Probabilistic Tests
    If Income $1,000,000, take a look
Practical Editing Tips
• Edit for Diagnosis, not just
  Correction
• Don’t Edit Outside Your Confidence
  Interval
• Preserve the Original Dataset as
  Backup to Avoid Irreversible
  Changes
• Keep Tallies of all Errors Found
Not all errors need to be
        corrected
  Resist your Perfectionist
         Tendencies
More Practical Edit Tips
• Use your skilled staff to
  improve system rather than
  just edit data
• Never just depend on Intuition
  but still use it too!
• Employ Redundancy, Frugally!
Capture Recapture Methods
    (Double Keying Example)
• Two-by-Two Table with Cells
                A   B
                C   D
• Comparing Data Keyed the Same each
  time (A) with Errors Detected, (B and C)
• How to Estimate D?
• One Model D = BC/A?
Bottom Line Take-Away
• Use Data Checking to
  Understand Data’s Fitness for
  Use
• Edit but Don’t Over-Edit
• Use Edit Checks to Prevent
  Future Errors
Data Editing and Data
        Imputation
• Joint Role of Imputation and
  Editing No Clear Line?
• Editing “fixes” Often are
  Model-Based Hunches
• Data Quality (editing)
• Information Quality
  (imputation)

Data Quality: Issues and Fixes

  • 1.
  • 2.
    Two Definitions ofQuality • Conformance to Requirements • (Traditional Producer-Oriented Definition) • Fitness for Use • (Modern Client-Oriented Definition)
  • 3.
    Definition of ProcessQuality • Process Improvements Focus • (Do It Right the First Time) • Can be Reduced to Slogans • Can also lead to Continuous Improvements • Kaisen
  • 4.
    Be Real FourQuality Costs • Costs of Reputation and Loss of Business from Inaction • Cost of Prevention to Avoid Errors • Cost of Detection to Find Errors • Cost of Repairing Errors Found
  • 5.
  • 6.
    Repair Methods • Goalis “Fixing” to Fit Use • Data Editing • Data Imputation • Data Fabrication • Raking at NSS
  • 7.
    Data Editing • HonestDifferences of Opinion or Real Errors? • Need for Redundancy in System for Can’t Fail Items • Achieving Measurability to Frame Expectations and Improvements
  • 8.
    Data Editing Techniques • Minimizing Processing Errors • Definitional (e.g., Range) Tests • Deterministic Tests • Probabilistic Tests – Outlier Tests – Ratio Tests
  • 9.
    Types of EditsIllustrated • Range Test Age Negative • Deterministic Tests If Age =14, then code as Child • Probabilistic Tests If Income $1,000,000, take a look
  • 10.
    Practical Editing Tips •Edit for Diagnosis, not just Correction • Don’t Edit Outside Your Confidence Interval • Preserve the Original Dataset as Backup to Avoid Irreversible Changes • Keep Tallies of all Errors Found
  • 11.
    Not all errorsneed to be corrected Resist your Perfectionist Tendencies
  • 12.
    More Practical EditTips • Use your skilled staff to improve system rather than just edit data • Never just depend on Intuition but still use it too! • Employ Redundancy, Frugally!
  • 13.
    Capture Recapture Methods (Double Keying Example) • Two-by-Two Table with Cells A B C D • Comparing Data Keyed the Same each time (A) with Errors Detected, (B and C) • How to Estimate D? • One Model D = BC/A?
  • 14.
    Bottom Line Take-Away •Use Data Checking to Understand Data’s Fitness for Use • Edit but Don’t Over-Edit • Use Edit Checks to Prevent Future Errors
  • 15.
    Data Editing andData Imputation • Joint Role of Imputation and Editing No Clear Line? • Editing “fixes” Often are Model-Based Hunches • Data Quality (editing) • Information Quality (imputation)