Data cleaning Basics for Managers

What every Manager should learn and understand about data analytics - Data Cleaning

Data cleaning Basics for Managers

  1. 1. Data Cleaning What every Manager should learn in data analytics . . . Lydia Gitonga Project Management| Data for Social Good| Business Analytics
  2. 2. What is data cleaning? Why data cleaning? How? When? …………….
  3. 3. This is the process of preparing the data for analysis. It involves identifying, eliminating or modifying information that is incorrect, incomplete or could be misleading in your datasets. What is Data Cleaning?
  4. 4. Data ready for use 6677 549 890 890 90 88 89 0 890 89 0 564 Addres s AgeDat e -1 89 0 88 89 0 S = sigma = sqrt{frac{sum (x- bar{xx̄ = ( Σ xi ) / n.})^{2}}{n}} LG
  5. 5. End Goal • What is your goal of analyzing the data? What is the project/ business goal? Understanding your data • What does your data contain? • How was the data collected? • How was the data recorded? Data Cleaning Guiding Principles
  6. 6. Validity Accuracy Completeness Consistency Uniformity What is the data quality criteria that must be attained?
  7. 7. The data must conform to some set rules and constraints corresponding to the real world Validity
  8. 8. The data must be correct and accurate. Comparison with third party information or comparing with known information can be used to qualify your data. Accuracy
  9. 9. Is the data complete? Are all important measures available? Completeness
  10. 10. The data should not contradict itself. There should be a reasonable degree of uniformity across the set of measures Consistency
  11. 11. The data measurements should have the same scale rates, conversion rates etc. Uniformity
  12. 12. Cleaning Process
  13. 13. Missing values Outliers Duplicates Harmonization (or normalization) of data, Formatting
  14. 14. Thank You!

