ILCS Raking
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
685
On Slideshare
658
From Embeds
27
Number of Embeds
1

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 27

http://crrcam.blogspot.com 27

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. CR RC Data Quality Issues and “Fixes” Dr. Fritz Scheuren July 3, 2009 (for academic purposes only)
  • 2. Two Definitions of Quality • Conformance to Requirements • (Traditional Producer-Oriented Definition) • Fitness for Use • (Modern Client-Oriented Definition)
  • 3. Definition of Process Quality • Process Improvements Focus • (Do It Right the First Time) • Can be Reduced to Slogans • Can also lead to Continuous Improvements • Kaisen
  • 4. Be Real Four Quality Costs • Costs of Reputation and Loss of Business from Inaction • Cost of Prevention to Avoid Errors • Cost of Detection to Find Errors • Cost of Repairing Errors Found
  • 5. Quality and Cost 2 Worlds
  • 6. Repair Methods • Goal is “Fixing” to Fit Use • Data Editing • Data Imputation • Data Fabrication • Raking at NSS
  • 7. Data Editing • Honest Differences of Opinion or Real Errors? • Need for Redundancy in System for Can’t Fail Items • Achieving Measurability to Frame Expectations and Improvements
  • 8. Data Editing Techniques • Minimizing Processing Errors • Definitional (e.g., Range) Tests • Deterministic Tests • Probabilistic Tests – Outlier Tests – Ratio Tests
  • 9. Types of Edits Illustrated • Range Test Age Negative • Deterministic Tests If Age =14, then code as Child • Probabilistic Tests If Income $1,000,000, take a look
  • 10. Practical Editing Tips • Edit for Diagnosis, not just Correction • Don’t Edit Outside Your Confidence Interval • Preserve the Original Dataset as Backup to Avoid Irreversible Changes • Keep Tallies of all Errors Found
  • 11. Not all errors need to be corrected Resist your Perfectionist Tendencies
  • 12. More Practical Edit Tips • Use your skilled staff to improve system rather than just edit data • Never just depend on Intuition but still use it too! • Employ Redundancy, Frugally!
  • 13. Capture Recapture Methods (Double Keying Example) • Two-by-Two Table with Cells A B C D • Comparing Data Keyed the Same each time (A) with Errors Detected, (B and C) • How to Estimate D? • One Model D = BC/A?
  • 14. Bottom Line Take-Away • Use Data Checking to Understand Data’s Fitness for Use • Edit but Don’t Over-Edit • Use Edit Checks to Prevent Future Errors
  • 15. Data Editing and Data Imputation • Joint Role of Imputation and Editing No Clear Line? • Editing “fixes” Often are Model-Based Hunches • Data Quality (editing) • Information Quality (imputation)
  • 16. Imputation Versus Editing • What is Imputation? • Handles Missing and Misreported Data • Imputation Goal is roughly right! Information Quality • Editing Goal often “correction” Exactly right? Data Quality
  • 17. Data Imputation Techniques • Imputation Needs More Justification when Data Quality is the Goal • Must be no more than Cosmetic in Nature, if done at all • Can only be Aggressively applied for Information Quality Goal
  • 18. Fellegi-Holt Example • Identify Errors with Automated Edit Detection Software • Hot Deck acceptable values from Records that Pass Edits • Can be worth doing if errors are minor or cosmetic (e.g., Rounding)
  • 19. More on Imputation • Treat Influential Errors Individually not just Automatically • That Said, Software Fixes can lead to Better Documentation (Paradata Matters) • Need to Measure Variance Impacts • Provide a natural break to Overediting but seldom used for this.
  • 20. Edit/Imputation Summary • Most Editing Mainly Eliminates the Bad • Replacing it with a (Good?)Guess of some Sort • Imputation emphasizes Guessing even more
  • 21. More Editing/Imputation • Best Imputation Practice tries to quantify Guessing impact on Information Quality • Editing has not improved as much as Imputation • Editing/Imputation needs more Joint Theory, especially to Measure and Use Mean Square Error Impacts
  • 22. First Illustrative Example • Fabrication/Falsification • Illustrate the General Points about Editing and Imputation • Emphasize Importance of Fabrication threat to Quality
  • 23. Fabrication/Falsification • Respondent/Interviewer Make up Data • How Common? • How to Reduce? • How to Detect?
  • 24. Right Structure Right Resources • Examine Practice Elsewhere? • www.amstat.org Website • Key is right incentives • Good staff/training • But Eternal Vigilance
  • 25. Second Illustration • Raking Application at NSS • To link up to Next Talk • To illustrate Information Quality that is fit for use despite Data Quality
  • 26. Raking Quality “Fix” • What is Raking? • How does it improve quality? Not Data Quality But Information Quality • Sometimes both -- Better Point Estimates More Stable (smaller variances)
  • 27. Quality Summary • Editing Data Quality • Imputation Information Quality • Raking Information Quality • Fabrication Can Harm Both • Must be guarded against always
  • 28. Almost Done Now • Tried to Stay Practical, with a Frank Discussion of Key Weaknesses in Current Practice • Deeper Understanding of Data Quality • But at an Applied Level
  • 29. ÞÝáñѳϳÉáõ ÃÛáõ Ý Fritz Scheuren Scheuren@aol.com