ILCS Raking


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

ILCS Raking

  1. 1. CR RC Data Quality Issues and “Fixes” Dr. Fritz Scheuren July 3, 2009 (for academic purposes only)
  2. 2. Two Definitions of Quality • Conformance to Requirements • (Traditional Producer-Oriented Definition) • Fitness for Use • (Modern Client-Oriented Definition)
  3. 3. Definition of Process Quality • Process Improvements Focus • (Do It Right the First Time) • Can be Reduced to Slogans • Can also lead to Continuous Improvements • Kaisen
  4. 4. Be Real Four Quality Costs • Costs of Reputation and Loss of Business from Inaction • Cost of Prevention to Avoid Errors • Cost of Detection to Find Errors • Cost of Repairing Errors Found
  5. 5. Quality and Cost 2 Worlds
  6. 6. Repair Methods • Goal is “Fixing” to Fit Use • Data Editing • Data Imputation • Data Fabrication • Raking at NSS
  7. 7. Data Editing • Honest Differences of Opinion or Real Errors? • Need for Redundancy in System for Can’t Fail Items • Achieving Measurability to Frame Expectations and Improvements
  8. 8. Data Editing Techniques • Minimizing Processing Errors • Definitional (e.g., Range) Tests • Deterministic Tests • Probabilistic Tests – Outlier Tests – Ratio Tests
  9. 9. Types of Edits Illustrated • Range Test Age Negative • Deterministic Tests If Age =14, then code as Child • Probabilistic Tests If Income $1,000,000, take a look
  10. 10. Practical Editing Tips • Edit for Diagnosis, not just Correction • Don’t Edit Outside Your Confidence Interval • Preserve the Original Dataset as Backup to Avoid Irreversible Changes • Keep Tallies of all Errors Found
  11. 11. Not all errors need to be corrected Resist your Perfectionist Tendencies
  12. 12. More Practical Edit Tips • Use your skilled staff to improve system rather than just edit data • Never just depend on Intuition but still use it too! • Employ Redundancy, Frugally!
  13. 13. Capture Recapture Methods (Double Keying Example) • Two-by-Two Table with Cells A B C D • Comparing Data Keyed the Same each time (A) with Errors Detected, (B and C) • How to Estimate D? • One Model D = BC/A?
  14. 14. Bottom Line Take-Away • Use Data Checking to Understand Data’s Fitness for Use • Edit but Don’t Over-Edit • Use Edit Checks to Prevent Future Errors
  15. 15. Data Editing and Data Imputation • Joint Role of Imputation and Editing No Clear Line? • Editing “fixes” Often are Model-Based Hunches • Data Quality (editing) • Information Quality (imputation)
  16. 16. Imputation Versus Editing • What is Imputation? • Handles Missing and Misreported Data • Imputation Goal is roughly right! Information Quality • Editing Goal often “correction” Exactly right? Data Quality
  17. 17. Data Imputation Techniques • Imputation Needs More Justification when Data Quality is the Goal • Must be no more than Cosmetic in Nature, if done at all • Can only be Aggressively applied for Information Quality Goal
  18. 18. Fellegi-Holt Example • Identify Errors with Automated Edit Detection Software • Hot Deck acceptable values from Records that Pass Edits • Can be worth doing if errors are minor or cosmetic (e.g., Rounding)
  19. 19. More on Imputation • Treat Influential Errors Individually not just Automatically • That Said, Software Fixes can lead to Better Documentation (Paradata Matters) • Need to Measure Variance Impacts • Provide a natural break to Overediting but seldom used for this.
  20. 20. Edit/Imputation Summary • Most Editing Mainly Eliminates the Bad • Replacing it with a (Good?)Guess of some Sort • Imputation emphasizes Guessing even more
  21. 21. More Editing/Imputation • Best Imputation Practice tries to quantify Guessing impact on Information Quality • Editing has not improved as much as Imputation • Editing/Imputation needs more Joint Theory, especially to Measure and Use Mean Square Error Impacts
  22. 22. First Illustrative Example • Fabrication/Falsification • Illustrate the General Points about Editing and Imputation • Emphasize Importance of Fabrication threat to Quality
  23. 23. Fabrication/Falsification • Respondent/Interviewer Make up Data • How Common? • How to Reduce? • How to Detect?
  24. 24. Right Structure Right Resources • Examine Practice Elsewhere? • Website • Key is right incentives • Good staff/training • But Eternal Vigilance
  25. 25. Second Illustration • Raking Application at NSS • To link up to Next Talk • To illustrate Information Quality that is fit for use despite Data Quality
  26. 26. Raking Quality “Fix” • What is Raking? • How does it improve quality? Not Data Quality But Information Quality • Sometimes both -- Better Point Estimates More Stable (smaller variances)
  27. 27. Quality Summary • Editing Data Quality • Imputation Information Quality • Raking Information Quality • Fabrication Can Harm Both • Must be guarded against always
  28. 28. Almost Done Now • Tried to Stay Practical, with a Frank Discussion of Key Weaknesses in Current Practice • Deeper Understanding of Data Quality • But at an Applied Level
  29. 29. ÞÝáñѳϳÉáõ ÃÛáõ Ý Fritz Scheuren