# Data Quality: Issues and Fixes

### Data Quality: Issues and Fixes

1. 1. CR RC ILCS Raking Motivate Need and Illustrate Basic Approach Dr. Ali Mushtaq July 3, 2009 (for academic purposes only)
2. 2. What is Raking? • A way to Adjust Survey totals “t” to Independent Controls “T” • Takes existing Survey Weights, usually wij = 1/pij, where pij is probability of selection • Ratios them up to each total T in turn, until results are as close as wanted
3. 3. What is the Value? • Can increase stability of survey results Reduce Sample Variance • Get results that are close to desired outcomes Reduce bias arising from minor operational errors
4. 4. What Results to Expect? • If Controls are Reasonable, Raking Process will converge (“Hit” all controls) • And improve survey results related to Control Totals
5. 5. More Information Quality • Only Weights are Changed by Raking, not Survey Data • Data Quality is thus unchanged • But Information Quality is usually Improved
6. 6. What Does Raking Cost? • Usually Done quickly on a PC • Independent Controls Need to be consistent with each other • Sample must be reasonably large for Raking to be Safely Applied • Some Costs incurred to explain Method
7. 7. Raking Made Simple • “Fudge” Factor Intuition • Develop a ratio of target total divided by sample total • Repeat this process with each of the controls in turn
8. 8. NSS Example from ILCS While the NSS RA survey is raked across 4 dimensions (age, gender, marz and urban/rural), the example we’ll use here will just use two dimensions.
9. 9. Table 1. Raking Example – Source Survey Data
10. 10. Table 2: Desired Marginals