SlideShare a Scribd company logo
1 of 17
Data Preprocessing

BY:
K.KOTTAISAMY
II MCA
Data Preprocessing

Data cleaning
Data integration
Data transformation
Data reduction

Data discretization
Major tasks in Data Preprocessing

Data cleaning
Fill in missing values, smooth noisy data, identify or remove
outliers, and resolve inconsistencies

Data integration
Integration of multiple databases, data cubes, or files

Data transformation
Normalization and aggregation
Data reduction
Reduced representation in volume but produces the same or
similar analytical results

Data discretization
Part of data reduction but with particular importance,
especially for numerical data
Data Preprocessing
Data Cleaning
Data in the Real World Is Dirty:
Lots of incorrect data, e.g., instrument faulty, human or
computer error, transmission error
incomplete:
lacking attribute values, containing only aggregate data
e.g., Occupation = โ€œ โ€ (missing data)
noisy: containing noise, errors, or outliers
e.g., Salary = โ€œโˆ’10โ€ (an error)
Data Integration
Data integration:
combines data from multiple sources

Schema integration
integrate metadata from different sources

Detecting and resolving data value conflicts
for the same real world entity, attribute values from
different sources are different,
e.g., different scales, metric vs. British units

Removing duplicates and redundant data
Data Transformation
Smoothing:

remove noise from data

Aggregation: summarization, data cube construction
Generalization: concept hierarchy climbing
Normalization: scaled to fall within a small, specified
range
min-max normalization
z-score normalization
normalization by decimal scaling
Data Transformation: Normalization
Min-max normalization: to [new_minA, new_maxA]
v'

v minA
(new _ maxA new _ minA) new _ minA
maxA minA

Ex.

Let income range $12,000 to $98,000 normalized to
[0.0, 1.0]. Then $73,000 is mapped to

73,600
98,000

12 ,000
(1.0 0)
12 ,000

0

0.716
Data Transformation: Normalization

Z-score normalization (ฮผ: mean, ฯƒ: standard
deviation):

v'

v

A
A

Ex. Let ฮผ = 54,000, ฯƒ = 16,000. Then

73,600 54 ,000
16 ,000

1.225
Data Transformation: Normalization

Normalization by decimal scaling

v
v'
10 j
Where j is the smallest integer such that Max(|ฮฝโ€™|) < 1
Data reduction
Why data reduction?
A database/data warehouse may store terabytes of data
Complex data analysis/mining may take a very long time to
run on the complete data set
Data reduction
The data set that is much smaller in volume but yet produce
the same analytical results
Data reduction strategies
Data cube aggregation
Dimensionality reduction โ€” e.g., remove unimportant
attributes
Data Compression
Discretization and concept hierarchy generation
Data cube aggregation

The lowest level of a data cube (base cuboid)
The aggregated data for an individual entity of
interest
E.g., a customer in a phone calling data warehouse
Multiple levels of aggregation in data cubes
Further reduce the size of data to deal with
Dimensionality Reduction

Feature selection (attribute subset selection):
Select a minimum set of attributes that is sufficient for
the data mining task.

Heuristic methods
step-wise forward selection
step-wise backward elimination
combining forward selection and backward elimination
Data Discretization

Three types of attributes:
Nominal โ€” values from an unordered set
Ordinal โ€” values from an ordered set
Continuous โ€” real numbers

Discretization:
Some classification algorithms only accept
categorical attributes.
Reduce data size by discretization
Prepare for further analysis
Data Preprocessing Tasks and Techniques

More Related Content

What's hot

Data preprocessing
Data preprocessingData preprocessing
Data preprocessingdineshbabuspr
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHarry Potter
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingAmuthamca
ย 
data warehousing & minining 1st unit
data warehousing & minining 1st unitdata warehousing & minining 1st unit
data warehousing & minining 1st unitbhagathk
ย 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessingdineshbabuspr
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingkayathri02
ย 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
ย 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
ย 
03 preprocessing
03 preprocessing03 preprocessing
03 preprocessingpurnimatm
ย 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data PreprocessingLakshmi Sarvani Videla
ย 
Data preprocessing in Data Mining
Data preprocessing  in Data MiningData preprocessing  in Data Mining
Data preprocessing in Data MiningSamad Baseer Khan
ย 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processingDataminingTools Inc
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHarry Potter
ย 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessingKrish_ver2
ย 

What's hot (16)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
data warehousing & minining 1st unit
data warehousing & minining 1st unitdata warehousing & minining 1st unit
data warehousing & minining 1st unit
ย 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
ย 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
ย 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
ย 
03 preprocessing
03 preprocessing03 preprocessing
03 preprocessing
ย 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
ย 
Data preprocessing in Data Mining
Data preprocessing  in Data MiningData preprocessing  in Data Mining
Data preprocessing in Data Mining
ย 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
ย 

Viewers also liked

Preprocessing
PreprocessingPreprocessing
Preprocessingmmuthuraj
ย 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data MiningR A Akerkar
ย 
Data discretization
Data discretizationData discretization
Data discretizationHadi M.Abachi
ย 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretizationKrish_ver2
ย 
Image pre processing
Image pre processingImage pre processing
Image pre processingAshish Kumar
ย 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
ย 
Data Discretization Simplified: Randomized Binary Search Trees for Data Prepr...
Data Discretization Simplified: Randomized Binary Search Trees for Data Prepr...Data Discretization Simplified: Randomized Binary Search Trees for Data Prepr...
Data Discretization Simplified: Randomized Binary Search Trees for Data Prepr...guest9ca1e5
ย 

Viewers also liked (10)

Preprocessing
PreprocessingPreprocessing
Preprocessing
ย 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
ย 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
ย 
Data discretization
Data discretizationData discretization
Data discretization
ย 
Data mining
Data miningData mining
Data mining
ย 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
ย 
Image pre processing
Image pre processingImage pre processing
Image pre processing
ย 
OLAP
OLAPOLAP
OLAP
ย 
Back propagation
Back propagationBack propagation
Back propagation
ย 
Data Discretization Simplified: Randomized Binary Search Trees for Data Prepr...
Data Discretization Simplified: Randomized Binary Search Trees for Data Prepr...Data Discretization Simplified: Randomized Binary Search Trees for Data Prepr...
Data Discretization Simplified: Randomized Binary Search Trees for Data Prepr...
ย 

Similar to Data Preprocessing Tasks and Techniques

Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.pptRevathy V R
ย 
Datapreprocessingppt
DatapreprocessingpptDatapreprocessingppt
DatapreprocessingpptShree Hari
ย 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processingDatamining Tools
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingTony Nguyen
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingJames Wong
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingFraboni Ec
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHoang Nguyen
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingYoung Alista
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingLuis Goldster
ย 
Data Mining
Data MiningData Mining
Data MiningJay Nagar
ย 
Data preperation
Data preperationData preperation
Data preperationHoang Nguyen
ย 
Data preperation
Data preperationData preperation
Data preperationFraboni Ec
ย 
Data preperation
Data preperationData preperation
Data preperationLuis Goldster
ย 
Data preparation
Data preparationData preparation
Data preparationYoung Alista
ย 
Data preparation
Data preparationData preparation
Data preparationHarry Potter
ย 
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...ImXaib
ย 
Data preparation
Data preparationData preparation
Data preparationTony Nguyen
ย 
Data preparation
Data preparationData preparation
Data preparationJames Wong
ย 
Data1
Data1Data1
Data1suganmca14
ย 
Data1
Data1Data1
Data1suganmca14
ย 

Similar to Data Preprocessing Tasks and Techniques (20)

Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
ย 
Datapreprocessingppt
DatapreprocessingpptDatapreprocessingppt
Datapreprocessingppt
ย 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 
Data Mining
Data MiningData Mining
Data Mining
ย 
Data preperation
Data preperationData preperation
Data preperation
ย 
Data preperation
Data preperationData preperation
Data preperation
ย 
Data preperation
Data preperationData preperation
Data preperation
ย 
Data preparation
Data preparationData preparation
Data preparation
ย 
Data preparation
Data preparationData preparation
Data preparation
ย 
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
ย 
Data preparation
Data preparationData preparation
Data preparation
ย 
Data preparation
Data preparationData preparation
Data preparation
ย 
Data1
Data1Data1
Data1
ย 
Data1
Data1Data1
Data1
ย 

More from Vikran Kottaisamy

Sixth sense technology tedcom
Sixth sense technology tedcomSixth sense technology tedcom
Sixth sense technology tedcomVikran Kottaisamy
ย 
Sixth sensetechnologytedcom
Sixth sensetechnologytedcomSixth sensetechnologytedcom
Sixth sensetechnologytedcomVikran Kottaisamy
ย 
Sixth sense technology
Sixth sense technologySixth sense technology
Sixth sense technologyVikran Kottaisamy
ย 
Sixthsensetechnology032 120812015706-phpapp01
Sixthsensetechnology032 120812015706-phpapp01Sixthsensetechnology032 120812015706-phpapp01
Sixthsensetechnology032 120812015706-phpapp01Vikran Kottaisamy
ย 
Sixthsensetechnology
SixthsensetechnologySixthsensetechnology
SixthsensetechnologyVikran Kottaisamy
ย 

More from Vikran Kottaisamy (7)

Sixth sense technology tedcom
Sixth sense technology tedcomSixth sense technology tedcom
Sixth sense technology tedcom
ย 
Sixth sensetechnologytedcom
Sixth sensetechnologytedcomSixth sensetechnologytedcom
Sixth sensetechnologytedcom
ย 
Sixth sense technology
Sixth sense technologySixth sense technology
Sixth sense technology
ย 
Sixthsensetechnology032 120812015706-phpapp01
Sixthsensetechnology032 120812015706-phpapp01Sixthsensetechnology032 120812015706-phpapp01
Sixthsensetechnology032 120812015706-phpapp01
ย 
Sixthsensetechnology
SixthsensetechnologySixthsensetechnology
Sixthsensetechnology
ย 
Sixth sense
Sixth sense Sixth sense
Sixth sense
ย 
Gsm1
Gsm1Gsm1
Gsm1
ย 

Recently uploaded

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
ย 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
ย 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
ย 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
ย 
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdfssuser54595a
ย 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
ย 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
ย 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
ย 
call girls in Kamla Market (DELHI) ๐Ÿ” >เผ’9953330565๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Kamla Market (DELHI) ๐Ÿ” >เผ’9953330565๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธcall girls in Kamla Market (DELHI) ๐Ÿ” >เผ’9953330565๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Kamla Market (DELHI) ๐Ÿ” >เผ’9953330565๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
ย 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
ย 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
ย 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
ย 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
ย 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
ย 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
ย 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
ย 

Recently uploaded (20)

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
ย 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
ย 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
ย 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
ย 
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
ย 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ย 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
ย 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
ย 
call girls in Kamla Market (DELHI) ๐Ÿ” >เผ’9953330565๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Kamla Market (DELHI) ๐Ÿ” >เผ’9953330565๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธcall girls in Kamla Market (DELHI) ๐Ÿ” >เผ’9953330565๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Kamla Market (DELHI) ๐Ÿ” >เผ’9953330565๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
ย 
Model Call Girl in Bikash Puri Delhi reach out to us at ๐Ÿ”9953056974๐Ÿ”
Model Call Girl in Bikash Puri  Delhi reach out to us at ๐Ÿ”9953056974๐Ÿ”Model Call Girl in Bikash Puri  Delhi reach out to us at ๐Ÿ”9953056974๐Ÿ”
Model Call Girl in Bikash Puri Delhi reach out to us at ๐Ÿ”9953056974๐Ÿ”
ย 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
ย 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
ย 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
ย 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
ย 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
ย 
Model Call Girl in Tilak Nagar Delhi reach out to us at ๐Ÿ”9953056974๐Ÿ”
Model Call Girl in Tilak Nagar Delhi reach out to us at ๐Ÿ”9953056974๐Ÿ”Model Call Girl in Tilak Nagar Delhi reach out to us at ๐Ÿ”9953056974๐Ÿ”
Model Call Girl in Tilak Nagar Delhi reach out to us at ๐Ÿ”9953056974๐Ÿ”
ย 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
ย 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
ย 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
ย 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
ย 

Data Preprocessing Tasks and Techniques

  • 2. Data Preprocessing Data cleaning Data integration Data transformation Data reduction Data discretization
  • 3. Major tasks in Data Preprocessing Data cleaning Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies Data integration Integration of multiple databases, data cubes, or files Data transformation Normalization and aggregation
  • 4. Data reduction Reduced representation in volume but produces the same or similar analytical results Data discretization Part of data reduction but with particular importance, especially for numerical data
  • 6. Data Cleaning Data in the Real World Is Dirty: Lots of incorrect data, e.g., instrument faulty, human or computer error, transmission error incomplete: lacking attribute values, containing only aggregate data e.g., Occupation = โ€œ โ€ (missing data) noisy: containing noise, errors, or outliers e.g., Salary = โ€œโˆ’10โ€ (an error)
  • 7. Data Integration Data integration: combines data from multiple sources Schema integration integrate metadata from different sources Detecting and resolving data value conflicts for the same real world entity, attribute values from different sources are different, e.g., different scales, metric vs. British units Removing duplicates and redundant data
  • 8. Data Transformation Smoothing: remove noise from data Aggregation: summarization, data cube construction Generalization: concept hierarchy climbing Normalization: scaled to fall within a small, specified range min-max normalization z-score normalization normalization by decimal scaling
  • 9. Data Transformation: Normalization Min-max normalization: to [new_minA, new_maxA] v' v minA (new _ maxA new _ minA) new _ minA maxA minA Ex. Let income range $12,000 to $98,000 normalized to [0.0, 1.0]. Then $73,000 is mapped to 73,600 98,000 12 ,000 (1.0 0) 12 ,000 0 0.716
  • 10. Data Transformation: Normalization Z-score normalization (ฮผ: mean, ฯƒ: standard deviation): v' v A A Ex. Let ฮผ = 54,000, ฯƒ = 16,000. Then 73,600 54 ,000 16 ,000 1.225
  • 11. Data Transformation: Normalization Normalization by decimal scaling v v' 10 j Where j is the smallest integer such that Max(|ฮฝโ€™|) < 1
  • 12. Data reduction Why data reduction? A database/data warehouse may store terabytes of data Complex data analysis/mining may take a very long time to run on the complete data set Data reduction The data set that is much smaller in volume but yet produce the same analytical results
  • 13. Data reduction strategies Data cube aggregation Dimensionality reduction โ€” e.g., remove unimportant attributes Data Compression Discretization and concept hierarchy generation
  • 14. Data cube aggregation The lowest level of a data cube (base cuboid) The aggregated data for an individual entity of interest E.g., a customer in a phone calling data warehouse Multiple levels of aggregation in data cubes Further reduce the size of data to deal with
  • 15. Dimensionality Reduction Feature selection (attribute subset selection): Select a minimum set of attributes that is sufficient for the data mining task. Heuristic methods step-wise forward selection step-wise backward elimination combining forward selection and backward elimination
  • 16. Data Discretization Three types of attributes: Nominal โ€” values from an unordered set Ordinal โ€” values from an ordered set Continuous โ€” real numbers Discretization: Some classification algorithms only accept categorical attributes. Reduce data size by discretization Prepare for further analysis