SlideShare a Scribd company logo
1 of 7
1. What is preprocessing?
. It is the state of the program that occurs BEFORE any code is compiled
.At this point, no values are initialized and nothing is evaluated
.Code swapping and replacing can occur here, but no computations
Preprocessor
Data Preprocessing
• Data preprocessing is an important step in the data mining process. The
phrase "garbage in, garbage out" is particularly applicable to data mining and machine
learning projects.
• Data-gathering methods are often loosely controlled, resulting
in out-of-range values (e.g., Income: −100), impossible data
combinations (e.g., Sex: Male, Pregnant: Yes), missing values, etc.
Analyzing data that has not been carefully screened for such
problems can produce misleading results. Thus, the representation
and quality of data is first and foremost before running an
analysis.[1] Often, data preprocessing is the most important phase of
a machine learning project
If there is much irrelevant and redundant information present or
noisy and unreliable data, then knowledge discovery during the training
phase is more difficult.
Data preparation and filtering steps can take considerable amount of
processing time.
Data preprocessing includes cleaning, Instance
selection, normalization, transformation, feature
extraction and selection, etc. The product of data preprocessing is the
final training set.
• Here are some brief introductions for the methods in the data preprocessing step.
Data cleaning is the process of detecting, correcting or removing the inaccurate
records from data;
• [3] Data normalization is the process used to standardize the range of
independent variables or features of data into [0, 1] or [-1, +1];
• [4] Data transformation is the process of converting data from a format to
the new format people expect
• [5] Feature extraction is the process of transforming the input data into a
set of features which can very well represent the input data;
• [6] Data reduction is the transformation of numerical data into a corrected,
ordered, and simplified form, minimizing the amount of data or reducing
the dimensionality of data.
Machine learning process.
. Basics of data maining

More Related Content

What's hot

Operation Research VS Software Engineering
Operation Research VS Software EngineeringOperation Research VS Software Engineering
Operation Research VS Software EngineeringMuthuganesh S
 
Data and information
Data and informationData and information
Data and informationvikash yadav
 
Hsc project management 2015
Hsc project management 2015Hsc project management 2015
Hsc project management 2015greg robertson
 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Sunny Mervyne Baa
 
Data quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of CambridgeData quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of CambridgeBCS Data Management Specialist Group
 
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningBrief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningJennifer Morrow
 
Lesson 1 data processing
Lesson 1   data processing Lesson 1   data processing
Lesson 1 data processing guevarra_2000
 
Chapter 8 system analysis and design
Chapter 8   system analysis and designChapter 8   system analysis and design
Chapter 8 system analysis and designPratik Gupta
 
Data processing by Neeraj Bhandari ( Surkhet.Nepal )
Data processing by Neeraj Bhandari ( Surkhet.Nepal )Data processing by Neeraj Bhandari ( Surkhet.Nepal )
Data processing by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
System Data Modelling Tools
System Data Modelling ToolsSystem Data Modelling Tools
System Data Modelling ToolsLiam Dunphy
 
Data Analytics all units
Data Analytics all unitsData Analytics all units
Data Analytics all unitsjayaramb
 
applications of operation research in business
applications of operation research in businessapplications of operation research in business
applications of operation research in businessraaz kumar
 
Operation research ppt chapter one
Operation research ppt   chapter oneOperation research ppt   chapter one
Operation research ppt chapter onemitku assefa
 
IPT Tools 2
IPT Tools 2IPT Tools 2
IPT Tools 2MR Z
 
Behind The Scenes Databases And Information Systems 6
Behind The Scenes  Databases And Information Systems 6Behind The Scenes  Databases And Information Systems 6
Behind The Scenes Databases And Information Systems 6guest4a9cdb
 
Operations Research - Models
Operations Research - ModelsOperations Research - Models
Operations Research - ModelsSundar B N
 
Operation research techniques
Operation research techniquesOperation research techniques
Operation research techniquesRodixon94
 
Comp10 unit3c lecture_slides
Comp10 unit3c lecture_slidesComp10 unit3c lecture_slides
Comp10 unit3c lecture_slidesCMDLMS
 

What's hot (20)

Input modeling
Input modelingInput modeling
Input modeling
 
Operation Research VS Software Engineering
Operation Research VS Software EngineeringOperation Research VS Software Engineering
Operation Research VS Software Engineering
 
Data and information
Data and informationData and information
Data and information
 
Hsc project management 2015
Hsc project management 2015Hsc project management 2015
Hsc project management 2015
 
Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...Models of Operational research, Advantages & disadvantages of Operational res...
Models of Operational research, Advantages & disadvantages of Operational res...
 
Data quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of CambridgeData quality in decision making - Dr. Philip Woodall, University of Cambridge
Data quality in decision making - Dr. Philip Woodall, University of Cambridge
 
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningBrief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
 
Lesson 1 data processing
Lesson 1   data processing Lesson 1   data processing
Lesson 1 data processing
 
Chapter 8 system analysis and design
Chapter 8   system analysis and designChapter 8   system analysis and design
Chapter 8 system analysis and design
 
Modeling and analysis
Modeling and analysisModeling and analysis
Modeling and analysis
 
Data processing by Neeraj Bhandari ( Surkhet.Nepal )
Data processing by Neeraj Bhandari ( Surkhet.Nepal )Data processing by Neeraj Bhandari ( Surkhet.Nepal )
Data processing by Neeraj Bhandari ( Surkhet.Nepal )
 
System Data Modelling Tools
System Data Modelling ToolsSystem Data Modelling Tools
System Data Modelling Tools
 
Data Analytics all units
Data Analytics all unitsData Analytics all units
Data Analytics all units
 
applications of operation research in business
applications of operation research in businessapplications of operation research in business
applications of operation research in business
 
Operation research ppt chapter one
Operation research ppt   chapter oneOperation research ppt   chapter one
Operation research ppt chapter one
 
IPT Tools 2
IPT Tools 2IPT Tools 2
IPT Tools 2
 
Behind The Scenes Databases And Information Systems 6
Behind The Scenes  Databases And Information Systems 6Behind The Scenes  Databases And Information Systems 6
Behind The Scenes Databases And Information Systems 6
 
Operations Research - Models
Operations Research - ModelsOperations Research - Models
Operations Research - Models
 
Operation research techniques
Operation research techniquesOperation research techniques
Operation research techniques
 
Comp10 unit3c lecture_slides
Comp10 unit3c lecture_slidesComp10 unit3c lecture_slides
Comp10 unit3c lecture_slides
 

Similar to preprocessing

Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedYugal Kumar
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. TisiModule-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. TisiArunnaik63
 
Dm data pre processing
Dm data pre processingDm data pre processing
Dm data pre processingSangeethaSasi1
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Qualitypriyanka rajput
 
Editing, cleaning and coding of data in Business research methodology
Editing, cleaning and coding of data in Business research methodology Editing, cleaning and coding of data in Business research methodology
Editing, cleaning and coding of data in Business research methodology VaishaghMp
 
Data Analytics Lifecycle – Steps.pdf
Data Analytics Lifecycle – Steps.pdfData Analytics Lifecycle – Steps.pdf
Data Analytics Lifecycle – Steps.pdfMadhuShree630941
 
BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxAkash527744
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhVISHALMARWADE1
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overviewdublinx
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSpartan60
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingsuganmca14
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessingKnoldus Inc.
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptxLuminous8
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET Journal
 

Similar to preprocessing (20)

Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. TisiModule-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
 
Dm data pre processing
Dm data pre processingDm data pre processing
Dm data pre processing
 
Data Preparation.pptx
Data Preparation.pptxData Preparation.pptx
Data Preparation.pptx
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
 
Editing, cleaning and coding of data in Business research methodology
Editing, cleaning and coding of data in Business research methodology Editing, cleaning and coding of data in Business research methodology
Editing, cleaning and coding of data in Business research methodology
 
Presentation 1.pptx
Presentation 1.pptxPresentation 1.pptx
Presentation 1.pptx
 
Data Analytics Lifecycle – Steps.pdf
Data Analytics Lifecycle – Steps.pdfData Analytics Lifecycle – Steps.pdf
Data Analytics Lifecycle – Steps.pdf
 
BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptx
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
 
Pre processing
Pre processingPre processing
Pre processing
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptx
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
 

Recently uploaded

Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfJNTUA
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxMustafa Ahmed
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Gaborone ...
☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Gaborone ...☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Gaborone ...
☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Gaborone ...mikehavy0
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashidFaiyazSheikh
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo
 
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书c3384a92eb32
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalSwarnaSLcse
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024EMMANUELLEFRANCEHELI
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Stationsiddharthteach18
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfVinayVadlagattu
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 

Recently uploaded (20)

Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdf
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Gaborone ...
☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Gaborone ...☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Gaborone ...
☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Gaborone ...
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdf
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 

preprocessing

  • 1. 1. What is preprocessing? . It is the state of the program that occurs BEFORE any code is compiled .At this point, no values are initialized and nothing is evaluated .Code swapping and replacing can occur here, but no computations
  • 3. Data Preprocessing • Data preprocessing is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to data mining and machine learning projects. • Data-gathering methods are often loosely controlled, resulting in out-of-range values (e.g., Income: −100), impossible data combinations (e.g., Sex: Male, Pregnant: Yes), missing values, etc. Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and quality of data is first and foremost before running an analysis.[1] Often, data preprocessing is the most important phase of a machine learning project
  • 4. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. Data preparation and filtering steps can take considerable amount of processing time. Data preprocessing includes cleaning, Instance selection, normalization, transformation, feature extraction and selection, etc. The product of data preprocessing is the final training set.
  • 5. • Here are some brief introductions for the methods in the data preprocessing step. Data cleaning is the process of detecting, correcting or removing the inaccurate records from data; • [3] Data normalization is the process used to standardize the range of independent variables or features of data into [0, 1] or [-1, +1]; • [4] Data transformation is the process of converting data from a format to the new format people expect • [5] Feature extraction is the process of transforming the input data into a set of features which can very well represent the input data; • [6] Data reduction is the transformation of numerical data into a corrected, ordered, and simplified form, minimizing the amount of data or reducing the dimensionality of data.
  • 7. . Basics of data maining