SlideShare a Scribd company logo
S.PARANIPUSHPA
DEPARTMENT OF CS AND IT
NADAR SARASWATHI COLLEGE OF ARTS AND
SCIENCE
 Invalid values: Some datasets have well-known
values, e.g. gender must only have “F” (Female)
and “M” (Male). In this case it’s easy to detect
wrong values.
 Formats: The most common issue. It’s possible to
get values in different formats like a name
written as “Name, Surname” or “Surname,
Name”.
 Attribute dependencies: When the value of a
feature depends on the value of another feature.
For example, if we have some school data, the
“number of students” is related to whether the
person “is teacher?”. If someone is not a teacher
he/she can’t have any students.
 Uniqueness: It’s possible to find repeated
data in features that only allow unique values.
For example, we can’t have two products with
the same identifier.
 Missing values: Some features in the dataset
may have blank or null values.
 Misspellings: Incorrectly written values.
 Misfielded values: When a feature contains the
values of another.
 Visualisation: Visualising all the values of each
feature, or taking a random sample to see if it’s
right.
 Outlier analysis: Analysing if data can be a
human error. E.g. a 300 year old person in the
“age” feature.
 Validation code: It’s possible to create a code
that checks if the data is right. For example, in
uniqueness, checking if the length of the data is
the same as the length of the vector of unique
values.
 We can apply many methods to fix the different
 Indicator variables: This technique converts
categorical data into boolean values by
creating indicator variables. If we have more
than two values (n) we have to create n-1
columns.
 Data Binning or Bucketing: A pre-processing
technique used to reduce the effects of
minor observation errors. The sample is
divided into intervals and replaced by
categorical values.
 Centering & Scaling: We can Centre the data
of one feature by substracting the mean to
all values. To scale the data, we
should divide the centered feature by the
standard deviation:

 Other techniques: For example, we can
group the outliers with the same value or
replace the value with the number of times
that it appears in the feature:
THANK
YOU

More Related Content

What's hot

E r diagram
E r diagramE r diagram
E r diagram
SandhyaTatekalva
 
ER-Model-ER Diagram
ER-Model-ER DiagramER-Model-ER Diagram
ER-Model-ER Diagram
Saranya Natarajan
 
Entity Relationship Diagrams
Entity Relationship DiagramsEntity Relationship Diagrams
Entity Relationship Diagrams
sadique_ghitm
 
Entity Relationship Diagram
Entity Relationship DiagramEntity Relationship Diagram
Entity Relationship Diagram
Shakila Mahjabin
 
Entity relation(1)
Entity relation(1)Entity relation(1)
Entity relation(1)
Loving Mishaa
 
ER MODEL
ER MODELER MODEL
ER MODEL
Rupali Rana
 
ER Model in DBMS
ER Model in DBMSER Model in DBMS
ER Model in DBMS
Kabindra Koirala
 
Entity relationship modelling
Entity relationship modellingEntity relationship modelling
Entity relationship modelling
Dr. C.V. Suresh Babu
 
Entity Relationship Modelling
Entity Relationship ModellingEntity Relationship Modelling
Entity Relationship Modelling
Bhandari Nawaraj
 
Erd examples
Erd examplesErd examples
Erd examples
Pramod Redekar
 
Data Models
Data ModelsData Models
Data Models
Megha Sharma
 
Database - Entity Relationship Diagram (ERD)
Database - Entity Relationship Diagram (ERD)Database - Entity Relationship Diagram (ERD)
Database - Entity Relationship Diagram (ERD)
Mudasir Qazi
 
Presentation of saad on e-r diagram.
Presentation of saad on e-r diagram.Presentation of saad on e-r diagram.
Presentation of saad on e-r diagram.
sumit gyawali
 
DBMS UNIT1
DBMS UNIT1DBMS UNIT1
DBMS UNIT1
CHANDRA BHUSHAN
 
Erd1
Erd1Erd1
Relational Databases 2
Relational Databases 2Relational Databases 2
Relational Databases 2
Jason Hando
 
Symbol of e r diagram presentation
Symbol of e r diagram presentationSymbol of e r diagram presentation
Symbol of e r diagram presentation
Mehedi Hasan
 
RDBMS ERD
RDBMS ERDRDBMS ERD
RDBMS ERD
Sarmad Ali
 
Er model
Er modelEr model
Er model
Soumyajit Dutta
 
Ch 3 E R Model
Ch 3  E R  ModelCh 3  E R  Model
Ch 3 E R Model
guest8fdbdd
 

What's hot (20)

E r diagram
E r diagramE r diagram
E r diagram
 
ER-Model-ER Diagram
ER-Model-ER DiagramER-Model-ER Diagram
ER-Model-ER Diagram
 
Entity Relationship Diagrams
Entity Relationship DiagramsEntity Relationship Diagrams
Entity Relationship Diagrams
 
Entity Relationship Diagram
Entity Relationship DiagramEntity Relationship Diagram
Entity Relationship Diagram
 
Entity relation(1)
Entity relation(1)Entity relation(1)
Entity relation(1)
 
ER MODEL
ER MODELER MODEL
ER MODEL
 
ER Model in DBMS
ER Model in DBMSER Model in DBMS
ER Model in DBMS
 
Entity relationship modelling
Entity relationship modellingEntity relationship modelling
Entity relationship modelling
 
Entity Relationship Modelling
Entity Relationship ModellingEntity Relationship Modelling
Entity Relationship Modelling
 
Erd examples
Erd examplesErd examples
Erd examples
 
Data Models
Data ModelsData Models
Data Models
 
Database - Entity Relationship Diagram (ERD)
Database - Entity Relationship Diagram (ERD)Database - Entity Relationship Diagram (ERD)
Database - Entity Relationship Diagram (ERD)
 
Presentation of saad on e-r diagram.
Presentation of saad on e-r diagram.Presentation of saad on e-r diagram.
Presentation of saad on e-r diagram.
 
DBMS UNIT1
DBMS UNIT1DBMS UNIT1
DBMS UNIT1
 
Erd1
Erd1Erd1
Erd1
 
Relational Databases 2
Relational Databases 2Relational Databases 2
Relational Databases 2
 
Symbol of e r diagram presentation
Symbol of e r diagram presentationSymbol of e r diagram presentation
Symbol of e r diagram presentation
 
RDBMS ERD
RDBMS ERDRDBMS ERD
RDBMS ERD
 
Er model
Er modelEr model
Er model
 
Ch 3 E R Model
Ch 3  E R  ModelCh 3  E R  Model
Ch 3 E R Model
 

Similar to Datacleaning.ppt

3. Chapter Three.pdf
3. Chapter Three.pdf3. Chapter Three.pdf
3. Chapter Three.pdf
fikadumola
 
EXPLAIN SPSS-part1.pdf
EXPLAIN SPSS-part1.pdfEXPLAIN SPSS-part1.pdf
EXPLAIN SPSS-part1.pdf
ahmedMETWALLI12
 
Categorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docxCategorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docx
keturahhazelhurst
 
Unit 2-Data Modeling.pdf
Unit 2-Data Modeling.pdfUnit 2-Data Modeling.pdf
Unit 2-Data Modeling.pdf
MaryJacob24
 
Exploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptxExploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptx
Mayura shelke
 
E_R-Diagram (2).pptx
E_R-Diagram (2).pptxE_R-Diagram (2).pptx
E_R-Diagram (2).pptx
sandeep54552
 
measurement and scaling techniques
measurement and scaling techniques measurement and scaling techniques
measurement and scaling techniques
Akanksha Gupta
 
Entity Relationship Model
Entity Relationship ModelEntity Relationship Model
Entity Relationship Model
A. S. M. Shafi
 
Introduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic MeanIntroduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic Mean
Mamatha Upadhya
 
Data Types
Data TypesData Types
Data Types
Carlos Rodriguez
 
er-models.pptx
er-models.pptxer-models.pptx
er-models.pptx
UmmerFarooq24
 
FDS PPT_Unit-5.pptx fundamentals of data science
FDS PPT_Unit-5.pptx fundamentals of data scienceFDS PPT_Unit-5.pptx fundamentals of data science
FDS PPT_Unit-5.pptx fundamentals of data science
JyoReddy9
 
Introduction to Data (1).pptx
Introduction to Data (1).pptxIntroduction to Data (1).pptx
Introduction to Data (1).pptx
SubhamitaKanungo
 
Data modeling
Data modelingData modeling
Data modeling
vidyapol01
 
data processing.pdf
data processing.pdfdata processing.pdf
data processing.pdf
DimpyJindal4
 
DATA MODEL PRESENTATION UNIT I-BCA I.pptx
DATA MODEL PRESENTATION UNIT I-BCA I.pptxDATA MODEL PRESENTATION UNIT I-BCA I.pptx
DATA MODEL PRESENTATION UNIT I-BCA I.pptx
JasmineMichael1
 
Data Models & Introduction to UML
Data Models & Introduction to UML Data Models & Introduction to UML
Data Models & Introduction to UML
نبيله نواز
 
Unit-1-DBMS-SUN-4 everything you need to know.pptx
Unit-1-DBMS-SUN-4 everything you need to know.pptxUnit-1-DBMS-SUN-4 everything you need to know.pptx
Unit-1-DBMS-SUN-4 everything you need to know.pptx
nirajsharmapuneiat
 
Data Mining DataLecture Notes for Chapter 2Introduc
Data Mining DataLecture Notes for Chapter 2IntroducData Mining DataLecture Notes for Chapter 2Introduc
Data Mining DataLecture Notes for Chapter 2Introduc
OllieShoresna
 
Chapter 2.pdf
Chapter 2.pdfChapter 2.pdf
Chapter 2.pdf
DrGnaneswariG
 

Similar to Datacleaning.ppt (20)

3. Chapter Three.pdf
3. Chapter Three.pdf3. Chapter Three.pdf
3. Chapter Three.pdf
 
EXPLAIN SPSS-part1.pdf
EXPLAIN SPSS-part1.pdfEXPLAIN SPSS-part1.pdf
EXPLAIN SPSS-part1.pdf
 
Categorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docxCategorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docx
 
Unit 2-Data Modeling.pdf
Unit 2-Data Modeling.pdfUnit 2-Data Modeling.pdf
Unit 2-Data Modeling.pdf
 
Exploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptxExploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptx
 
E_R-Diagram (2).pptx
E_R-Diagram (2).pptxE_R-Diagram (2).pptx
E_R-Diagram (2).pptx
 
measurement and scaling techniques
measurement and scaling techniques measurement and scaling techniques
measurement and scaling techniques
 
Entity Relationship Model
Entity Relationship ModelEntity Relationship Model
Entity Relationship Model
 
Introduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic MeanIntroduction to Statistics and Arithmetic Mean
Introduction to Statistics and Arithmetic Mean
 
Data Types
Data TypesData Types
Data Types
 
er-models.pptx
er-models.pptxer-models.pptx
er-models.pptx
 
FDS PPT_Unit-5.pptx fundamentals of data science
FDS PPT_Unit-5.pptx fundamentals of data scienceFDS PPT_Unit-5.pptx fundamentals of data science
FDS PPT_Unit-5.pptx fundamentals of data science
 
Introduction to Data (1).pptx
Introduction to Data (1).pptxIntroduction to Data (1).pptx
Introduction to Data (1).pptx
 
Data modeling
Data modelingData modeling
Data modeling
 
data processing.pdf
data processing.pdfdata processing.pdf
data processing.pdf
 
DATA MODEL PRESENTATION UNIT I-BCA I.pptx
DATA MODEL PRESENTATION UNIT I-BCA I.pptxDATA MODEL PRESENTATION UNIT I-BCA I.pptx
DATA MODEL PRESENTATION UNIT I-BCA I.pptx
 
Data Models & Introduction to UML
Data Models & Introduction to UML Data Models & Introduction to UML
Data Models & Introduction to UML
 
Unit-1-DBMS-SUN-4 everything you need to know.pptx
Unit-1-DBMS-SUN-4 everything you need to know.pptxUnit-1-DBMS-SUN-4 everything you need to know.pptx
Unit-1-DBMS-SUN-4 everything you need to know.pptx
 
Data Mining DataLecture Notes for Chapter 2Introduc
Data Mining DataLecture Notes for Chapter 2IntroducData Mining DataLecture Notes for Chapter 2Introduc
Data Mining DataLecture Notes for Chapter 2Introduc
 
Chapter 2.pdf
Chapter 2.pdfChapter 2.pdf
Chapter 2.pdf
 

More from amuthadeepa

Edgelinking
EdgelinkingEdgelinking
Edgelinking
amuthadeepa
 
Handover
HandoverHandover
Handover
amuthadeepa
 
Handover
HandoverHandover
Handover
amuthadeepa
 
Cookies
CookiesCookies
Cookies
amuthadeepa
 
Critical system
Critical systemCritical system
Critical system
amuthadeepa
 
Excellencein visualization
Excellencein visualizationExcellencein visualization
Excellencein visualization
amuthadeepa
 
Protocol.ppt
Protocol.pptProtocol.ppt
Protocol.ppt
amuthadeepa
 
Datacleaning.ppt
Datacleaning.pptDatacleaning.ppt
Datacleaning.ppt
amuthadeepa
 
Database.ppt
Database.pptDatabase.ppt
Database.ppt
amuthadeepa
 
Network security.ppt
Network security.pptNetwork security.ppt
Network security.ppt
amuthadeepa
 
Smart.ppt
Smart.pptSmart.ppt
Smart.ppt
amuthadeepa
 
Perceptron.ppt
Perceptron.pptPerceptron.ppt
Perceptron.ppt
amuthadeepa
 

More from amuthadeepa (12)

Edgelinking
EdgelinkingEdgelinking
Edgelinking
 
Handover
HandoverHandover
Handover
 
Handover
HandoverHandover
Handover
 
Cookies
CookiesCookies
Cookies
 
Critical system
Critical systemCritical system
Critical system
 
Excellencein visualization
Excellencein visualizationExcellencein visualization
Excellencein visualization
 
Protocol.ppt
Protocol.pptProtocol.ppt
Protocol.ppt
 
Datacleaning.ppt
Datacleaning.pptDatacleaning.ppt
Datacleaning.ppt
 
Database.ppt
Database.pptDatabase.ppt
Database.ppt
 
Network security.ppt
Network security.pptNetwork security.ppt
Network security.ppt
 
Smart.ppt
Smart.pptSmart.ppt
Smart.ppt
 
Perceptron.ppt
Perceptron.pptPerceptron.ppt
Perceptron.ppt
 

Recently uploaded

Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 

Recently uploaded (20)

Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 

Datacleaning.ppt

  • 1. S.PARANIPUSHPA DEPARTMENT OF CS AND IT NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE
  • 2.  Invalid values: Some datasets have well-known values, e.g. gender must only have “F” (Female) and “M” (Male). In this case it’s easy to detect wrong values.  Formats: The most common issue. It’s possible to get values in different formats like a name written as “Name, Surname” or “Surname, Name”.  Attribute dependencies: When the value of a feature depends on the value of another feature. For example, if we have some school data, the “number of students” is related to whether the person “is teacher?”. If someone is not a teacher he/she can’t have any students.
  • 3.  Uniqueness: It’s possible to find repeated data in features that only allow unique values. For example, we can’t have two products with the same identifier.  Missing values: Some features in the dataset may have blank or null values.  Misspellings: Incorrectly written values.  Misfielded values: When a feature contains the values of another.
  • 4.
  • 5.  Visualisation: Visualising all the values of each feature, or taking a random sample to see if it’s right.  Outlier analysis: Analysing if data can be a human error. E.g. a 300 year old person in the “age” feature.  Validation code: It’s possible to create a code that checks if the data is right. For example, in uniqueness, checking if the length of the data is the same as the length of the vector of unique values.  We can apply many methods to fix the different
  • 6.  Indicator variables: This technique converts categorical data into boolean values by creating indicator variables. If we have more than two values (n) we have to create n-1 columns.
  • 7.  Data Binning or Bucketing: A pre-processing technique used to reduce the effects of minor observation errors. The sample is divided into intervals and replaced by categorical values.
  • 8.  Centering & Scaling: We can Centre the data of one feature by substracting the mean to all values. To scale the data, we should divide the centered feature by the standard deviation: 
  • 9.  Other techniques: For example, we can group the outliers with the same value or replace the value with the number of times that it appears in the feature:
  • 10.