ILCS Raking

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    ILCS Raking - Presentation Transcript

    1. CR RC Data Quality Issues and “Fixes” Dr. Fritz Scheuren July 3, 2009 (for academic purposes only)
    2. Two Definitions of Quality • Conformance to Requirements • (Traditional Producer-Oriented Definition) • Fitness for Use • (Modern Client-Oriented Definition)
    3. Definition of Process Quality • Process Improvements Focus • (Do It Right the First Time) • Can be Reduced to Slogans • Can also lead to Continuous Improvements • Kaisen
    4. Be Real Four Quality Costs • Costs of Reputation and Loss of Business from Inaction • Cost of Prevention to Avoid Errors • Cost of Detection to Find Errors • Cost of Repairing Errors Found
    5. Quality and Cost 2 Worlds
    6. Repair Methods • Goal is “Fixing” to Fit Use • Data Editing • Data Imputation • Data Fabrication • Raking at NSS
    7. Data Editing • Honest Differences of Opinion or Real Errors? • Need for Redundancy in System for Can’t Fail Items • Achieving Measurability to Frame Expectations and Improvements
    8. Data Editing Techniques • Minimizing Processing Errors • Definitional (e.g., Range) Tests • Deterministic Tests • Probabilistic Tests – Outlier Tests – Ratio Tests
    9. Types of Edits Illustrated • Range Test Age Negative • Deterministic Tests If Age =14, then code as Child • Probabilistic Tests If Income $1,000,000, take a look
    10. Practical Editing Tips • Edit for Diagnosis, not just Correction • Don’t Edit Outside Your Confidence Interval • Preserve the Original Dataset as Backup to Avoid Irreversible Changes • Keep Tallies of all Errors Found
    11. Not all errors need to be corrected Resist your Perfectionist Tendencies
    12. More Practical Edit Tips • Use your skilled staff to improve system rather than just edit data • Never just depend on Intuition but still use it too! • Employ Redundancy, Frugally!
    13. Capture Recapture Methods (Double Keying Example) • Two-by-Two Table with Cells A B C D • Comparing Data Keyed the Same each time (A) with Errors Detected, (B and C) • How to Estimate D? • One Model D = BC/A?
    14. Bottom Line Take-Away • Use Data Checking to Understand Data’s Fitness for Use • Edit but Don’t Over-Edit • Use Edit Checks to Prevent Future Errors
    15. Data Editing and Data Imputation • Joint Role of Imputation and Editing No Clear Line? • Editing “fixes” Often are Model-Based Hunches • Data Quality (editing) • Information Quality (imputation)
    16. Imputation Versus Editing • What is Imputation? • Handles Missing and Misreported Data • Imputation Goal is roughly right! Information Quality • Editing Goal often “correction” Exactly right? Data Quality
    17. Data Imputation Techniques • Imputation Needs More Justification when Data Quality is the Goal • Must be no more than Cosmetic in Nature, if done at all • Can only be Aggressively applied for Information Quality Goal
    18. Fellegi-Holt Example • Identify Errors with Automated Edit Detection Software • Hot Deck acceptable values from Records that Pass Edits • Can be worth doing if errors are minor or cosmetic (e.g., Rounding)
    19. More on Imputation • Treat Influential Errors Individually not just Automatically • That Said, Software Fixes can lead to Better Documentation (Paradata Matters) • Need to Measure Variance Impacts • Provide a natural break to Overediting but seldom used for this.
    20. Edit/Imputation Summary • Most Editing Mainly Eliminates the Bad • Replacing it with a (Good?)Guess of some Sort • Imputation emphasizes Guessing even more
    21. More Editing/Imputation • Best Imputation Practice tries to quantify Guessing impact on Information Quality • Editing has not improved as much as Imputation • Editing/Imputation needs more Joint Theory, especially to Measure and Use Mean Square Error Impacts
    22. First Illustrative Example • Fabrication/Falsification • Illustrate the General Points about Editing and Imputation • Emphasize Importance of Fabrication threat to Quality
    23. Fabrication/Falsification • Respondent/Interviewer Make up Data • How Common? • How to Reduce? • How to Detect?
    24. Right Structure Right Resources • Examine Practice Elsewhere? • www.amstat.org Website • Key is right incentives • Good staff/training • But Eternal Vigilance
    25. Second Illustration • Raking Application at NSS • To link up to Next Talk • To illustrate Information Quality that is fit for use despite Data Quality
    26. Raking Quality “Fix” • What is Raking? • How does it improve quality? Not Data Quality But Information Quality • Sometimes both -- Better Point Estimates More Stable (smaller variances)
    27. Quality Summary • Editing Data Quality • Imputation Information Quality • Raking Information Quality • Fabrication Can Harm Both • Must be guarded against always
    28. Almost Done Now • Tried to Stay Practical, with a Frank Discussion of Key Weaknesses in Current Practice • Deeper Understanding of Data Quality • But at an Applied Level
    29. ÞÝáñѳϳÉáõ ÃÛáõ Ý Fritz Scheuren Scheuren@aol.com

    + Gohar KhachatryanGohar Khachatryan, 4 months ago

    custom

    141 views, 0 favs, 1 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 141
      • 133 on SlideShare
      • 8 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds
    • 8 views on http://crrcam.blogspot.com

    more

    All embeds
    • 8 views on http://crrcam.blogspot.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?