Data preprocessing

742 views
626 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
742
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data preprocessing

  1. 1. Data Preprocessing V. Saranya AP/CSE Sri Vidya College of Engineering & Technology, Virudhunagar
  2. 2. Preprocessing • Databases are so noisy – By missing values, inconsistent data, huge size. – From multiple resources – Low quality data – Low quality mining results
  3. 3. Techniques 1. Data Cleaning 2. Data transformation 3. Data integration 4. Data reduction
  4. 4. Data Cleaning Remove noise Correct inconsistencies. By filling missing values, smoothing noisy data Identify outliers
  5. 5. Data Integration • Merge data from multiple resources. • (DW) Issue: • Different names in different databases may cause inconsistencies.
  6. 6. Data Transformation • Normalization – Ex: -2,32,100  -0.02, .32, .00
  7. 7. Data Reduction • Reduce data size by aggregating or clustering.
  8. 8. Data reduction includes 1. Data aggregation: – Building data cube
  9. 9. 2. Data Generalization - concept hierarchy
  10. 10. 3. Attribute subset selection:  removing irrelevant attributes through correlation analysis
  11. 11. 4. Dimensionality Reduction:  minimize the dimensionality
  12. 12. 5. Numerosity reduction;  replacing the data by alternative.
  13. 13. Need for preprocessing • Incomplete, noisy and inconsistent data may be in large volume of databases.
  14. 14. Reason for incomplete data • Attributes may not be available • Misunderstanding • Data may be deleted. • Missing data • Fault data • Errors in data transformation
  15. 15. Data Discretization • Form of data reduction

×