SlideShare a Scribd company logo
1 of 20
Data Preprocessing
MRS. MANISHA PATIL ,ASST PROFESSOR MIT ACSC ALANDI 1
Objectives
Understand the concept of data preprocessing
Discuss the various types of data and possible errors
Understand the need and use of various data preprocessing errors
2
Introduction to data processing
Data collected for performing data analysis in Data science are in a raw and unprocessed state
Data preprocessing is the task of transforming raw data to be ready to be fed into an algorithm
Data preparation takes place in usually two phases for any data science project :
Data Preprocessing
Data wrangling
3
Data types and forms
Categorical data
Nominal data
Ordinal data
Numerical data
Interval data
Ratio data
4
Categorical data
This data is non numeric and consists of text that can be coded as numeric.
It can be of two type
Nominal data : This data is used to label variables without providing any quantitative value .For example
gender can assigned numbers
Nominal scales are exclusive
Ordinal data :
This type of data is used to label variables that need to follow some order.
For example : A company take feedback about the quality of their service.
5
Numerical data
This data is numeric and it usually follows an order of values.
Types of numeric data
Interval data :
This type of data follows numeric scales in which the order and exact difference between values is
considered.
The distance between each value on the interval scale are always kept equal.
Ratio data :
6
Possible data error types
Missing data
● Missing Completely At Random
● Missing at Random data
● Missing Not at Random
7
Recor
d
Cust Id Salary Dateof Birth Role Spouse
1 A121 42000 1985-30-05 Manager Anjali
2 A122 07-02-1982 CEO Priya
3 A123 28530 11-09-1987 Asst
Manager
4 A124 32000 12/24/1986 Rahul
5 A125 37450 09/07/1988 Secy Bina
6 A126 37450 07-09-1987 Secretary Sumit
Manual Input
Data inconsistency
Regional Formats
Numerical units
Wrong data types
File Manipulation
Missing anonymization
8
Recor
d
Cust Id Salary Dateof Birth Role Spouse
1 A121 42000 1985-30-05 Manager Anjali
2 A122 07-02-1982 CEO Priya
3 A123 28530 11-09-1987 Asst
Manager
4 A124 32000 12/24/1986 Rahul
5 A125 37450 09/07/1988 Secy Bina
6 A126 37450 07-09-1987 Secretary Sumit
Various Data Preprocessing Operation
9
Data Cleaning
To handle irrelevant or missing data
Data is cleaned by filling in the missing values,smoothing any noisy data,identifying and removing outliers
Resolving inconsistencies
10
Filling Missing values
Replace missing values with Zeros
Dropping Rows with Missing Values
Replace missing value with Mean/Mode/Median
11
#Method 1 - Filling Every Missing Values with 0
print("nn Every Missing Value Replaced with '0':")
print("--------------------------------------------")
print(df.fillna(0))
12
#Method 2 - Dropping Rows Having Missing Values
print("nn Dropping Rows with Missing Values:")
print("----------------------------------------")
print(df.dropna())
13
#Method 3 - Replacing missing values with the Median
Valuemedian = df['C01'].median()
df['C01'].fillna(median, inplace=True)
print("nn Missing Values for Column 1 Replaced with Median
Value:")
print("--------------------------------------------------")
print(df)
14
Smoothing noisy data
15
One hot encoding
Label encoding assigns a numeric value to each categorical value .This will be ok for categorical
labels
But nominal features do not have any order
Eg : Color of car values do not have any order among themselves
To prevent this one hot encoding is used for nominal attributes
It splits the column which contain the nominal categorical data to many columns depending on the
number of categories present in that column.Each column may contain 0 or 2 corresponding to
which column it has been placed.
EG Color_of _cars =[white’,’red’,’black’]
The one hot encoding matrix will be
16
white red black
1 0 0
0 1 0
0 0 1
Data Reduction
Data reduction is a technique used in data mining to reduce the size of a dataset while still preserving the most
important information.
It reduces the data by removing unimportant and unwanted features from the transformation
17
Data cube Aggregation
● Data cubes are multidimensional sets of data that can be stored in a spreadsheet
● A data cube can be two,three,or a higher dimension.
● Each dimension represent an attribute of interest.
● Data Cube Aggregation is a multidimensional aggregation that uses aggregation at various levels of a
data cube to represent the original data set, thus achieving data reduction.
● Data cubes provide fast access to pre-computed ,summarized data.
18
Numerosity Reduction
● Numerosity reduction is a technique used in data mining to reduce the number of data points in a
dataset while still preserving the most important information.
● Numerosity Reduction is a data reduction technique which replaces the original data by smaller
form of data representation.
● There are two techniques for numerosity reduction- Parametric and Non-Parametric methods.
● This can be beneficial in situations where the dataset is too large to be processed efficiently, or
where the dataset contains a large amount of irrelevant or redundant data points.
● For parametric methods, data is represented using some model. The model is used to estimate the
data, so that only parameters of data are required to be stored, instead of actual data. Regression
and Log-Linear methods are used for creating such models.
● These methods are used for storing reduced representations of the data include histograms,
clustering, sampling and data cube aggregation.
19
Data Discretization
The data discretization techniques can be used to reduce the number of values for a given continuous attribute
by dividing the range of the attribute into intervals.
This leads to a concise, easy-to-use, knowledge-level representation of mining results.
Discretization is the process through which we can transform continuous variables ,models or functions into discr
20

More Related Content

Similar to Chapter 3 Data Preprocessing techniques.pptx

Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningNandakumar P
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptxLuminous8
 
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptxEDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptxsaurav3107pandey
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Qualitypriyanka rajput
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedYugal Kumar
 
Database ppt.pptx
Database ppt.pptxDatabase ppt.pptx
Database ppt.pptxAASTHAJAJOO
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2Mahmoud Alfarra
 

Similar to Chapter 3 Data Preprocessing techniques.pptx (20)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Preprocessing_new.ppt
Preprocessing_new.pptPreprocessing_new.ppt
Preprocessing_new.ppt
 
Data Preparation.pptx
Data Preparation.pptxData Preparation.pptx
Data Preparation.pptx
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data preprocessing.pdf
Data preprocessing.pdfData preprocessing.pdf
Data preprocessing.pdf
 
Pelatihan Data Analitik
Pelatihan Data AnalitikPelatihan Data Analitik
Pelatihan Data Analitik
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
1234
12341234
1234
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptx
 
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptxEDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
DATA MINING.pptx
DATA MINING.pptxDATA MINING.pptx
DATA MINING.pptx
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
Database ppt.pptx
Database ppt.pptxDatabase ppt.pptx
Database ppt.pptx
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 

Recently uploaded (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 

Chapter 3 Data Preprocessing techniques.pptx

  • 1. Data Preprocessing MRS. MANISHA PATIL ,ASST PROFESSOR MIT ACSC ALANDI 1
  • 2. Objectives Understand the concept of data preprocessing Discuss the various types of data and possible errors Understand the need and use of various data preprocessing errors 2
  • 3. Introduction to data processing Data collected for performing data analysis in Data science are in a raw and unprocessed state Data preprocessing is the task of transforming raw data to be ready to be fed into an algorithm Data preparation takes place in usually two phases for any data science project : Data Preprocessing Data wrangling 3
  • 4. Data types and forms Categorical data Nominal data Ordinal data Numerical data Interval data Ratio data 4
  • 5. Categorical data This data is non numeric and consists of text that can be coded as numeric. It can be of two type Nominal data : This data is used to label variables without providing any quantitative value .For example gender can assigned numbers Nominal scales are exclusive Ordinal data : This type of data is used to label variables that need to follow some order. For example : A company take feedback about the quality of their service. 5
  • 6. Numerical data This data is numeric and it usually follows an order of values. Types of numeric data Interval data : This type of data follows numeric scales in which the order and exact difference between values is considered. The distance between each value on the interval scale are always kept equal. Ratio data : 6
  • 7. Possible data error types Missing data ● Missing Completely At Random ● Missing at Random data ● Missing Not at Random 7 Recor d Cust Id Salary Dateof Birth Role Spouse 1 A121 42000 1985-30-05 Manager Anjali 2 A122 07-02-1982 CEO Priya 3 A123 28530 11-09-1987 Asst Manager 4 A124 32000 12/24/1986 Rahul 5 A125 37450 09/07/1988 Secy Bina 6 A126 37450 07-09-1987 Secretary Sumit
  • 8. Manual Input Data inconsistency Regional Formats Numerical units Wrong data types File Manipulation Missing anonymization 8 Recor d Cust Id Salary Dateof Birth Role Spouse 1 A121 42000 1985-30-05 Manager Anjali 2 A122 07-02-1982 CEO Priya 3 A123 28530 11-09-1987 Asst Manager 4 A124 32000 12/24/1986 Rahul 5 A125 37450 09/07/1988 Secy Bina 6 A126 37450 07-09-1987 Secretary Sumit
  • 10. Data Cleaning To handle irrelevant or missing data Data is cleaned by filling in the missing values,smoothing any noisy data,identifying and removing outliers Resolving inconsistencies 10
  • 11. Filling Missing values Replace missing values with Zeros Dropping Rows with Missing Values Replace missing value with Mean/Mode/Median 11
  • 12. #Method 1 - Filling Every Missing Values with 0 print("nn Every Missing Value Replaced with '0':") print("--------------------------------------------") print(df.fillna(0)) 12
  • 13. #Method 2 - Dropping Rows Having Missing Values print("nn Dropping Rows with Missing Values:") print("----------------------------------------") print(df.dropna()) 13
  • 14. #Method 3 - Replacing missing values with the Median Valuemedian = df['C01'].median() df['C01'].fillna(median, inplace=True) print("nn Missing Values for Column 1 Replaced with Median Value:") print("--------------------------------------------------") print(df) 14
  • 16. One hot encoding Label encoding assigns a numeric value to each categorical value .This will be ok for categorical labels But nominal features do not have any order Eg : Color of car values do not have any order among themselves To prevent this one hot encoding is used for nominal attributes It splits the column which contain the nominal categorical data to many columns depending on the number of categories present in that column.Each column may contain 0 or 2 corresponding to which column it has been placed. EG Color_of _cars =[white’,’red’,’black’] The one hot encoding matrix will be 16 white red black 1 0 0 0 1 0 0 0 1
  • 17. Data Reduction Data reduction is a technique used in data mining to reduce the size of a dataset while still preserving the most important information. It reduces the data by removing unimportant and unwanted features from the transformation 17
  • 18. Data cube Aggregation ● Data cubes are multidimensional sets of data that can be stored in a spreadsheet ● A data cube can be two,three,or a higher dimension. ● Each dimension represent an attribute of interest. ● Data Cube Aggregation is a multidimensional aggregation that uses aggregation at various levels of a data cube to represent the original data set, thus achieving data reduction. ● Data cubes provide fast access to pre-computed ,summarized data. 18
  • 19. Numerosity Reduction ● Numerosity reduction is a technique used in data mining to reduce the number of data points in a dataset while still preserving the most important information. ● Numerosity Reduction is a data reduction technique which replaces the original data by smaller form of data representation. ● There are two techniques for numerosity reduction- Parametric and Non-Parametric methods. ● This can be beneficial in situations where the dataset is too large to be processed efficiently, or where the dataset contains a large amount of irrelevant or redundant data points. ● For parametric methods, data is represented using some model. The model is used to estimate the data, so that only parameters of data are required to be stored, instead of actual data. Regression and Log-Linear methods are used for creating such models. ● These methods are used for storing reduced representations of the data include histograms, clustering, sampling and data cube aggregation. 19
  • 20. Data Discretization The data discretization techniques can be used to reduce the number of values for a given continuous attribute by dividing the range of the attribute into intervals. This leads to a concise, easy-to-use, knowledge-level representation of mining results. Discretization is the process through which we can transform continuous variables ,models or functions into discr 20