SlideShare a Scribd company logo
1 of 18
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Affiliated Institution of G.G.S.IP.U, Delhi
BCA
Data Warehouse & Data Mining
20302
Data Preprocessing
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
2
• Data Preprocessing: An Overview
– Data Quality
– Major Tasks in Data Preprocessing
• Data Cleaning
• Data Integration
• Data Reduction
• Data Transformation and Data Discretization
• Summary
Data Preprocessing
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
3
Data Quality: Why Preprocess the Data?
 Measures for data quality: A multidimensional view
◦ Accuracy: correct or wrong, accurate or not
◦ Completeness: not recorded, unavailable, …
◦ Consistency: some modified but some not, dangling, …
◦ Timeliness: timely update?
◦ Believability: how trustable the data are correct?
◦ Interpretability: how easily the data can be understood?
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
4
Reasons for inaccurate data
 Data collection instruments may be faulty
 Human or computer errors occurring at data entry
 Users may purposely submit incorrect data for
mandatory fields when they don’t want to share
personal information
 Technology limitations such as buffer size
 Incorrect data may also result from inconsistencies in
naming conventions or inconsistent formats
 Duplicate tuples also require cleaning
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
5
Reasons for incomplete data
Attributes of interest may not be available
Other data may not be included as it was not
considered imp at the time of entry
Relevant data may not be recorded due to
misunderstanding or equipment malfunctions
Inconsistent data may be deleted
Data history or modifications may be overlooked
Missing data
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
6
Major Tasks in Data Preprocessing
 Data cleaning
◦ Fill in missing values, smooth noisy data, identify or remove
outliers, and resolve inconsistencies
 Data integration
◦ Integration of multiple databases, data cubes, or files
 Data reduction
◦ Dimensionality reduction
◦ Numerosity reduction
◦ Data compression
 Data transformation and data discretization
◦ Normalization
◦ Concept hierarchy generation
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
7
Forms of Data Preprocessing
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
8
Why Is Data Preprocessing Important?
 No quality data, no quality mining results!
 Quality decisions must be based on quality data
 e.g., duplicate or missing data may cause incorrect or even
misleading statistics.
 Data warehouse needs consistent integration of quality
data
 Data extraction, cleaning, and transformation comprises the
majority of the work of building a data warehouse
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
9
Data Cleaning
 Data in the Real World Is Dirty :- Lots of potentially incorrect data, e.g.,
instrument faulty, human or computer error, transmission error
◦ incomplete: lacking attribute values, lacking certain attributes of
interest, or containing only aggregate data
 e.g., Occupation = “ ” (missing data
◦ noisy: containing noise, errors, or outliers
 e.g., Salary = “−10” (an error)
◦ inconsistent: containing discrepancies in codes or names, e.g.,
 Age = “42”, Birthday = “03/07/2010”
 Was rating “1, 2, 3”, now rating “A, B, C”
 discrepancy between duplicate records
◦ Intentional(e.g., disguised missing data)
 Jan. 1 as everyone’s birthday?
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
10
Incomplete (Missing) Data
 Data is not always available
◦ E.g., many tuples have no recorded value for several
attributes, such as customer income in sales data
 Missing data may be due to
◦ equipment malfunction
◦ inconsistent with other recorded data and thus deleted
◦ data not entered due to misunderstanding
◦ certain data may not be considered important at the time
of entry
◦ not register history or changes of the data
 Missing data may need to be inferred
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
How to Handle Missing Data?
 Ignore the tuple: usually done when class label is missing
(when doing classification)—not effective when the % of
missing values per attribute varies considerably
 Fill in the missing value manually: tedious + infeasible
 Fill in it automatically with
◦ a global constant : e.g., “unknown”, a new class?!
◦ the attribute mean
◦ the attribute mean for all samples belonging to the same class:
smarter
◦ the most probable value: inference-based such as Bayesian
formula or decision tree
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Noisy Data
• Noise: random error or variance in a measured variable
• Incorrect attribute values may be due to
– faulty data collection instruments
– data entry problems
– data transmission problems
– technology limitation
– inconsistency in naming convention
• Other data problems which require data cleaning
– duplicate records
– incomplete data
– inconsistent data
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
How to Handle Noisy Data?
 Binning
◦ first sort data and partition into (equal-frequency) bins
◦ then one can smooth by bin means, smooth by bin median,
smooth by bin boundaries, etc.
 Regression
◦ smooth by fitting the data into regression functions
 Outlier Analysis by Clustering
◦ detect and remove outliers
 Combined computer and human inspection
◦ detect suspicious values and check by human (e.g., deal
with possible outliers)
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Binning method to Smooth data
Binning Method smooth the sorted data value by
consulting its neighborhood i.e. the values around it.
The sorted values are distributed into a number of
“buckets” or “bins”.
Since binning methods consult the neighborhood of
values, they perform local smoothing.
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Binning Methods for Data Smoothing
 Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
* Partition into equal-frequency (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 15, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 34, 34
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Data Cleaning as a Process
I. Data discrepancy detection
Discrepancies can be caused by the following factors:
Poorly designed data entry forms that have many optional fields
Human error in data entry
Deliberate errors
Data decay(outdated addresses)
Inconsistent data representations
Instrumental and device errors
System errors
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Data discrepancy can be detected by :
 Use metadata (e.g., domain, range, dependency, distribution)
 Check field overloading
 Check uniqueness rule, consecutive rule and null rule
 Use commercial tools
 Data scrubbing tools : use simple domain knowledge (e.g., postal code,
spell-check) to detect errors and make corrections
 Data auditing tools : by analyzing data to discover rules and relationship
to detect violators (e.g., correlation and clustering to find outliers)
II. Data migration and integration
 Data migration tools: allow transformations to be specified
 ETL (Extraction/Transformation/Loading) tools: allow users to specify
transformations through a graphical user interface
# Potter’s Wheel is a publicly available data cleaning tool that integrates
discrepancy detection and transformation.
Data Cleaning as a Process
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
THANK YOU

More Related Content

What's hot

2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction StratergiesAnjaliSoorej
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Mustafa Sherazi
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Difference between File system And DBMS.pptx
Difference between File system And DBMS.pptxDifference between File system And DBMS.pptx
Difference between File system And DBMS.pptxShayanMujahid2
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 

What's hot (20)

2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction Stratergies
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Attributes
AttributesAttributes
Attributes
 
Difference between File system And DBMS.pptx
Difference between File system And DBMS.pptxDifference between File system And DBMS.pptx
Difference between File system And DBMS.pptx
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Tree pruning
 Tree pruning Tree pruning
Tree pruning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 

Viewers also liked

Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHoang Nguyen
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingAmuthamca
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHarry Potter
 
An efficient data preprocessing method for mining
An efficient data preprocessing method for miningAn efficient data preprocessing method for mining
An efficient data preprocessing method for miningKamesh Waran
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessingKrish_ver2
 
Introduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationIntroduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationOlga Scrivner
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningRohit Kumar
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

Viewers also liked (16)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
An efficient data preprocessing method for mining
An efficient data preprocessing method for miningAn efficient data preprocessing method for mining
An efficient data preprocessing method for mining
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
 
Big data security
Big data securityBig data security
Big data security
 
Introduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web ApplicationIntroduction to Text Mining and Visualization with Interactive Web Application
Introduction to Text Mining and Visualization with Interactive Web Application
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
What is big data?
What is big data?What is big data?
What is big data?
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Data Preprocessing- Data Warehouse & Data Mining

DM Lecture 3
DM Lecture 3DM Lecture 3
DM Lecture 3asad199
 
Machine learning topics machine learning algorithm into three main parts.
Machine learning topics  machine learning algorithm into three main parts.Machine learning topics  machine learning algorithm into three main parts.
Machine learning topics machine learning algorithm into three main parts.DurgaDeviP2
 
Data pre processing
Data pre processingData pre processing
Data pre processingpommurajopt
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Anastasija Nikiforova
 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningShivarkarSandip
 
Functions of Data Quality Assurance.pptx
Functions of Data Quality Assurance.pptxFunctions of Data Quality Assurance.pptx
Functions of Data Quality Assurance.pptxaahent
 
Application of KDD & its future scope
Application of KDD & its future scopeApplication of KDD & its future scope
Application of KDD & its future scopeTanmay Sethi
 
Dimensional Modelling-Data Warehouse & Data Mining
 Dimensional Modelling-Data Warehouse & Data Mining Dimensional Modelling-Data Warehouse & Data Mining
Dimensional Modelling-Data Warehouse & Data MiningTrinity Dwarka
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...TEST Huddle
 
DATA preprocessing.pptx
DATA preprocessing.pptxDATA preprocessing.pptx
DATA preprocessing.pptxChandra Meena
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects FailSense Corp
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects FailSense Corp
 
Data mining in the field of library
Data mining in the field of libraryData mining in the field of library
Data mining in the field of libraryMegha Goyal
 
BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxAkash527744
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesChristopher Eaker
 
Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Blackbaud Pacific
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
 

Similar to Data Preprocessing- Data Warehouse & Data Mining (20)

DM Lecture 3
DM Lecture 3DM Lecture 3
DM Lecture 3
 
My3prep
My3prepMy3prep
My3prep
 
Assignmentdatamining
AssignmentdataminingAssignmentdatamining
Assignmentdatamining
 
Machine learning topics machine learning algorithm into three main parts.
Machine learning topics  machine learning algorithm into three main parts.Machine learning topics  machine learning algorithm into three main parts.
Machine learning topics machine learning algorithm into three main parts.
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data Cleaning
 
Functions of Data Quality Assurance.pptx
Functions of Data Quality Assurance.pptxFunctions of Data Quality Assurance.pptx
Functions of Data Quality Assurance.pptx
 
Application of KDD & its future scope
Application of KDD & its future scopeApplication of KDD & its future scope
Application of KDD & its future scope
 
Dimensional Modelling-Data Warehouse & Data Mining
 Dimensional Modelling-Data Warehouse & Data Mining Dimensional Modelling-Data Warehouse & Data Mining
Dimensional Modelling-Data Warehouse & Data Mining
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
 
Intro to Data Management
Intro to Data ManagementIntro to Data Management
Intro to Data Management
 
DATA preprocessing.pptx
DATA preprocessing.pptxDATA preprocessing.pptx
DATA preprocessing.pptx
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects Fail
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects Fail
 
Data mining in the field of library
Data mining in the field of libraryData mining in the field of library
Data mining in the field of library
 
BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptx
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 

More from Trinity Dwarka

Why BAJMC in Trinity Dwarka
Why BAJMC in Trinity DwarkaWhy BAJMC in Trinity Dwarka
Why BAJMC in Trinity DwarkaTrinity Dwarka
 
Career Options after BCA
Career Options after BCACareer Options after BCA
Career Options after BCATrinity Dwarka
 
Principles of Management-Management-Concept & Meaning
  Principles of Management-Management-Concept & Meaning  Principles of Management-Management-Concept & Meaning
Principles of Management-Management-Concept & MeaningTrinity Dwarka
 
Principles of Management- Management Process & Functions
Principles of Management- Management Process  &  FunctionsPrinciples of Management- Management Process  &  Functions
Principles of Management- Management Process & FunctionsTrinity Dwarka
 
Principles of Management- Managerial Levels & Roles-
Principles of Management- Managerial Levels & Roles-Principles of Management- Managerial Levels & Roles-
Principles of Management- Managerial Levels & Roles-Trinity Dwarka
 
Management-Concept & Meaning
 Management-Concept & Meaning Management-Concept & Meaning
Management-Concept & MeaningTrinity Dwarka
 
Principles of Management- Planning
Principles of Management- PlanningPrinciples of Management- Planning
Principles of Management- PlanningTrinity Dwarka
 
Organizing Authority & Responsibility- Principles of Management
Organizing Authority & Responsibility- Principles of ManagementOrganizing Authority & Responsibility- Principles of Management
Organizing Authority & Responsibility- Principles of ManagementTrinity Dwarka
 
Staffing- Principles of Management
Staffing- Principles of ManagementStaffing- Principles of Management
Staffing- Principles of ManagementTrinity Dwarka
 
Directing-Principles of Management
Directing-Principles of ManagementDirecting-Principles of Management
Directing-Principles of ManagementTrinity Dwarka
 
Computer Networks- Network Basics
Computer Networks- Network BasicsComputer Networks- Network Basics
Computer Networks- Network BasicsTrinity Dwarka
 
Java Programming- Introduction to Java Applet Programs
Java Programming- Introduction to Java Applet ProgramsJava Programming- Introduction to Java Applet Programs
Java Programming- Introduction to Java Applet ProgramsTrinity Dwarka
 
Linux Environment- Linux vs Unix
Linux Environment- Linux vs UnixLinux Environment- Linux vs Unix
Linux Environment- Linux vs UnixTrinity Dwarka
 
Linux Environment- Linux Basics
Linux Environment- Linux BasicsLinux Environment- Linux Basics
Linux Environment- Linux BasicsTrinity Dwarka
 
BCA-Mobile Computing- BASICS OF MOBILE COMPUTING
BCA-Mobile Computing- BASICS OF MOBILE COMPUTINGBCA-Mobile Computing- BASICS OF MOBILE COMPUTING
BCA-Mobile Computing- BASICS OF MOBILE COMPUTINGTrinity Dwarka
 
INTRODUCTION TO INFORMATION TECHNOLOGY- IT Basics
INTRODUCTION TO INFORMATION TECHNOLOGY- IT BasicsINTRODUCTION TO INFORMATION TECHNOLOGY- IT Basics
INTRODUCTION TO INFORMATION TECHNOLOGY- IT BasicsTrinity Dwarka
 
Database Management System
Database Management System Database Management System
Database Management System Trinity Dwarka
 
JAVA PROGRAMMING- OOP Concept
JAVA PROGRAMMING- OOP ConceptJAVA PROGRAMMING- OOP Concept
JAVA PROGRAMMING- OOP ConceptTrinity Dwarka
 
E-Commerce- Introduction to E-Commerce
E-Commerce- Introduction to E-CommerceE-Commerce- Introduction to E-Commerce
E-Commerce- Introduction to E-CommerceTrinity Dwarka
 
DIGITAL ELECTRONICS- Minimization Technique Karnaugh Map
DIGITAL ELECTRONICS- Minimization TechniqueKarnaugh MapDIGITAL ELECTRONICS- Minimization TechniqueKarnaugh Map
DIGITAL ELECTRONICS- Minimization Technique Karnaugh Map Trinity Dwarka
 

More from Trinity Dwarka (20)

Why BAJMC in Trinity Dwarka
Why BAJMC in Trinity DwarkaWhy BAJMC in Trinity Dwarka
Why BAJMC in Trinity Dwarka
 
Career Options after BCA
Career Options after BCACareer Options after BCA
Career Options after BCA
 
Principles of Management-Management-Concept & Meaning
  Principles of Management-Management-Concept & Meaning  Principles of Management-Management-Concept & Meaning
Principles of Management-Management-Concept & Meaning
 
Principles of Management- Management Process & Functions
Principles of Management- Management Process  &  FunctionsPrinciples of Management- Management Process  &  Functions
Principles of Management- Management Process & Functions
 
Principles of Management- Managerial Levels & Roles-
Principles of Management- Managerial Levels & Roles-Principles of Management- Managerial Levels & Roles-
Principles of Management- Managerial Levels & Roles-
 
Management-Concept & Meaning
 Management-Concept & Meaning Management-Concept & Meaning
Management-Concept & Meaning
 
Principles of Management- Planning
Principles of Management- PlanningPrinciples of Management- Planning
Principles of Management- Planning
 
Organizing Authority & Responsibility- Principles of Management
Organizing Authority & Responsibility- Principles of ManagementOrganizing Authority & Responsibility- Principles of Management
Organizing Authority & Responsibility- Principles of Management
 
Staffing- Principles of Management
Staffing- Principles of ManagementStaffing- Principles of Management
Staffing- Principles of Management
 
Directing-Principles of Management
Directing-Principles of ManagementDirecting-Principles of Management
Directing-Principles of Management
 
Computer Networks- Network Basics
Computer Networks- Network BasicsComputer Networks- Network Basics
Computer Networks- Network Basics
 
Java Programming- Introduction to Java Applet Programs
Java Programming- Introduction to Java Applet ProgramsJava Programming- Introduction to Java Applet Programs
Java Programming- Introduction to Java Applet Programs
 
Linux Environment- Linux vs Unix
Linux Environment- Linux vs UnixLinux Environment- Linux vs Unix
Linux Environment- Linux vs Unix
 
Linux Environment- Linux Basics
Linux Environment- Linux BasicsLinux Environment- Linux Basics
Linux Environment- Linux Basics
 
BCA-Mobile Computing- BASICS OF MOBILE COMPUTING
BCA-Mobile Computing- BASICS OF MOBILE COMPUTINGBCA-Mobile Computing- BASICS OF MOBILE COMPUTING
BCA-Mobile Computing- BASICS OF MOBILE COMPUTING
 
INTRODUCTION TO INFORMATION TECHNOLOGY- IT Basics
INTRODUCTION TO INFORMATION TECHNOLOGY- IT BasicsINTRODUCTION TO INFORMATION TECHNOLOGY- IT Basics
INTRODUCTION TO INFORMATION TECHNOLOGY- IT Basics
 
Database Management System
Database Management System Database Management System
Database Management System
 
JAVA PROGRAMMING- OOP Concept
JAVA PROGRAMMING- OOP ConceptJAVA PROGRAMMING- OOP Concept
JAVA PROGRAMMING- OOP Concept
 
E-Commerce- Introduction to E-Commerce
E-Commerce- Introduction to E-CommerceE-Commerce- Introduction to E-Commerce
E-Commerce- Introduction to E-Commerce
 
DIGITAL ELECTRONICS- Minimization Technique Karnaugh Map
DIGITAL ELECTRONICS- Minimization TechniqueKarnaugh MapDIGITAL ELECTRONICS- Minimization TechniqueKarnaugh Map
DIGITAL ELECTRONICS- Minimization Technique Karnaugh Map
 

Recently uploaded

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Recently uploaded (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

Data Preprocessing- Data Warehouse & Data Mining

  • 1. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Affiliated Institution of G.G.S.IP.U, Delhi BCA Data Warehouse & Data Mining 20302 Data Preprocessing
  • 2. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 2 • Data Preprocessing: An Overview – Data Quality – Major Tasks in Data Preprocessing • Data Cleaning • Data Integration • Data Reduction • Data Transformation and Data Discretization • Summary Data Preprocessing
  • 3. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 3 Data Quality: Why Preprocess the Data?  Measures for data quality: A multidimensional view ◦ Accuracy: correct or wrong, accurate or not ◦ Completeness: not recorded, unavailable, … ◦ Consistency: some modified but some not, dangling, … ◦ Timeliness: timely update? ◦ Believability: how trustable the data are correct? ◦ Interpretability: how easily the data can be understood?
  • 4. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 4 Reasons for inaccurate data  Data collection instruments may be faulty  Human or computer errors occurring at data entry  Users may purposely submit incorrect data for mandatory fields when they don’t want to share personal information  Technology limitations such as buffer size  Incorrect data may also result from inconsistencies in naming conventions or inconsistent formats  Duplicate tuples also require cleaning
  • 5. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 5 Reasons for incomplete data Attributes of interest may not be available Other data may not be included as it was not considered imp at the time of entry Relevant data may not be recorded due to misunderstanding or equipment malfunctions Inconsistent data may be deleted Data history or modifications may be overlooked Missing data
  • 6. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 6 Major Tasks in Data Preprocessing  Data cleaning ◦ Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies  Data integration ◦ Integration of multiple databases, data cubes, or files  Data reduction ◦ Dimensionality reduction ◦ Numerosity reduction ◦ Data compression  Data transformation and data discretization ◦ Normalization ◦ Concept hierarchy generation
  • 7. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 7 Forms of Data Preprocessing
  • 8. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 8 Why Is Data Preprocessing Important?  No quality data, no quality mining results!  Quality decisions must be based on quality data  e.g., duplicate or missing data may cause incorrect or even misleading statistics.  Data warehouse needs consistent integration of quality data  Data extraction, cleaning, and transformation comprises the majority of the work of building a data warehouse
  • 9. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 9 Data Cleaning  Data in the Real World Is Dirty :- Lots of potentially incorrect data, e.g., instrument faulty, human or computer error, transmission error ◦ incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data  e.g., Occupation = “ ” (missing data ◦ noisy: containing noise, errors, or outliers  e.g., Salary = “−10” (an error) ◦ inconsistent: containing discrepancies in codes or names, e.g.,  Age = “42”, Birthday = “03/07/2010”  Was rating “1, 2, 3”, now rating “A, B, C”  discrepancy between duplicate records ◦ Intentional(e.g., disguised missing data)  Jan. 1 as everyone’s birthday?
  • 10. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 10 Incomplete (Missing) Data  Data is not always available ◦ E.g., many tuples have no recorded value for several attributes, such as customer income in sales data  Missing data may be due to ◦ equipment malfunction ◦ inconsistent with other recorded data and thus deleted ◦ data not entered due to misunderstanding ◦ certain data may not be considered important at the time of entry ◦ not register history or changes of the data  Missing data may need to be inferred
  • 11. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 How to Handle Missing Data?  Ignore the tuple: usually done when class label is missing (when doing classification)—not effective when the % of missing values per attribute varies considerably  Fill in the missing value manually: tedious + infeasible  Fill in it automatically with ◦ a global constant : e.g., “unknown”, a new class?! ◦ the attribute mean ◦ the attribute mean for all samples belonging to the same class: smarter ◦ the most probable value: inference-based such as Bayesian formula or decision tree
  • 12. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Noisy Data • Noise: random error or variance in a measured variable • Incorrect attribute values may be due to – faulty data collection instruments – data entry problems – data transmission problems – technology limitation – inconsistency in naming convention • Other data problems which require data cleaning – duplicate records – incomplete data – inconsistent data
  • 13. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 How to Handle Noisy Data?  Binning ◦ first sort data and partition into (equal-frequency) bins ◦ then one can smooth by bin means, smooth by bin median, smooth by bin boundaries, etc.  Regression ◦ smooth by fitting the data into regression functions  Outlier Analysis by Clustering ◦ detect and remove outliers  Combined computer and human inspection ◦ detect suspicious values and check by human (e.g., deal with possible outliers)
  • 14. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Binning method to Smooth data Binning Method smooth the sorted data value by consulting its neighborhood i.e. the values around it. The sorted values are distributed into a number of “buckets” or “bins”. Since binning methods consult the neighborhood of values, they perform local smoothing.
  • 15. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Binning Methods for Data Smoothing  Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 * Partition into equal-frequency (equi-depth) bins: - Bin 1: 4, 8, 9, 15 - Bin 2: 21, 21, 24, 25 - Bin 3: 26, 28, 29, 34 * Smoothing by bin means: - Bin 1: 9, 9, 9, 9 - Bin 2: 23, 23, 23, 23 - Bin 3: 29, 29, 29, 29 * Smoothing by bin boundaries: - Bin 1: 4, 4, 15, 15 - Bin 2: 21, 21, 25, 25 - Bin 3: 26, 26, 34, 34
  • 16. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Data Cleaning as a Process I. Data discrepancy detection Discrepancies can be caused by the following factors: Poorly designed data entry forms that have many optional fields Human error in data entry Deliberate errors Data decay(outdated addresses) Inconsistent data representations Instrumental and device errors System errors
  • 17. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Data discrepancy can be detected by :  Use metadata (e.g., domain, range, dependency, distribution)  Check field overloading  Check uniqueness rule, consecutive rule and null rule  Use commercial tools  Data scrubbing tools : use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make corrections  Data auditing tools : by analyzing data to discover rules and relationship to detect violators (e.g., correlation and clustering to find outliers) II. Data migration and integration  Data migration tools: allow transformations to be specified  ETL (Extraction/Transformation/Loading) tools: allow users to specify transformations through a graphical user interface # Potter’s Wheel is a publicly available data cleaning tool that integrates discrepancy detection and transformation. Data Cleaning as a Process
  • 18. TRINITY INSTITUTE OF PROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 THANK YOU