Data Cleaning
What every Manager should learn in data analytics . . .
Lydia Gitonga
Project Management| Data for Social Good| Business Analytics
What is data cleaning?
Why data cleaning?
How?
When?
…………….
This is the process of preparing the
data for analysis. It involves identifying,
eliminating or modifying information
that is incorrect, incomplete or could
be misleading in your datasets.
What is Data Cleaning?
Data ready for use
6677
549
890
890
90
88
89
0
890
89
0
564 Addres
s
AgeDat
e
-1
89
0
88
89
0
S = sigma =
sqrt{frac{sum (x-
bar{xx̄ = ( Σ xi ) /
n.})^{2}}{n}}
LG
End Goal
• What is your goal of analyzing the
data? What is the project/ business
goal?
Understanding your data
• What does your data contain?
• How was the data collected?
• How was the data recorded?
Data Cleaning Guiding Principles
Validity
Accuracy
Completeness
Consistency
Uniformity
What is the data quality criteria that must be
attained?
The data must conform to some set rules and
constraints corresponding to the real world
Validity
The data must be correct
and accurate. Comparison
with third party information
or comparing with known
information can be used to
qualify your data.
Accuracy
Is the data complete?
Are all important measures
available?
Completeness
The data should not contradict itself.
There should be a reasonable degree of
uniformity across the set of measures
Consistency
The data measurements
should have the same
scale rates, conversion
rates etc.
Uniformity
Cleaning Process
Missing values
Outliers
Duplicates
Harmonization (or normalization) of data,
Formatting
Thank You!

Data cleaning Basics for Managers

  • 1.
    Data Cleaning What everyManager should learn in data analytics . . . Lydia Gitonga Project Management| Data for Social Good| Business Analytics
  • 2.
    What is datacleaning? Why data cleaning? How? When? …………….
  • 3.
    This is theprocess of preparing the data for analysis. It involves identifying, eliminating or modifying information that is incorrect, incomplete or could be misleading in your datasets. What is Data Cleaning?
  • 4.
    Data ready foruse 6677 549 890 890 90 88 89 0 890 89 0 564 Addres s AgeDat e -1 89 0 88 89 0 S = sigma = sqrt{frac{sum (x- bar{xx̄ = ( Σ xi ) / n.})^{2}}{n}} LG
  • 5.
    End Goal • Whatis your goal of analyzing the data? What is the project/ business goal? Understanding your data • What does your data contain? • How was the data collected? • How was the data recorded? Data Cleaning Guiding Principles
  • 6.
    Validity Accuracy Completeness Consistency Uniformity What is thedata quality criteria that must be attained?
  • 7.
    The data mustconform to some set rules and constraints corresponding to the real world Validity
  • 8.
    The data mustbe correct and accurate. Comparison with third party information or comparing with known information can be used to qualify your data. Accuracy
  • 9.
    Is the datacomplete? Are all important measures available? Completeness
  • 10.
    The data shouldnot contradict itself. There should be a reasonable degree of uniformity across the set of measures Consistency
  • 11.
    The data measurements shouldhave the same scale rates, conversion rates etc. Uniformity
  • 12.
  • 13.
    Missing values Outliers Duplicates Harmonization (ornormalization) of data, Formatting
  • 14.