SlideShare a Scribd company logo
Data Science
Data Preprocessing
(Feature Selection and Merging )
Data
Preprocessing
Data
Integration
Data
Transforma
tion
Data
Reduction
or
dimension
reduction
Data
Cleaning
Scaling, Normalization,
Categorical Encoding
Handling missing
values Outliers,
duplicates
Selecting relevant
features/Column, Data
Combining multiple
datasets/ Merging
Feature Selection
A feature is an attribute that has an impact on a problem or is useful for the
problem, and choosing the important features for the model is known as
feature selection. Feature selection is often performed to remove irrelevant
or redundant features from the dataset.
We can define feature Selection as, "It is a process of automatically or
manually selecting the subset of most appropriate and relevant features to
be used in model building." Feature selection is performed by either
including the important features or excluding the irrelevant features in the
dataset without changing them.
Feature Selection Techniques
• Supervised Feature Selection technique : Supervised Feature selection
techniques consider the target variable and can be used for the labeled
dataset.
• Unsupervised Feature Selection technique: Unsupervised Feature
selection techniques ignore the target variable and can be used for the
unlabeled dataset.
Common techniques for selecting relevant features
 Feature Importance
 Recursive Feature Elimination(RFE)
 Forward/Backward Elimination
 Principal Component Analysis (PCA)
 Filter Method
 Domain Knowledge
Feature Extraction
Feature extraction involves transforming the original features into a new set
of features through mathematical transformations or projections.
Feature selection involves selecting a subset of the original features based
on their relevance, while feature extraction involves transforming the
original features into a new set of features. Both techniques are used for
dimensionality reduction to improve model performance, reduce overfitting,
and enhance interpretability.
Merging: Combining Multiple Datasets
Merging also known as joining, is a fundamental operation in data science
where we combine data from multiple datasets based on a common attribute
or key.
Merging is essential when dealing with essential datasets or when
integrating data from multiple sources. In merging it is important to ensure
that the keys used for merging are consistent and that we handle missing
values appropriately.
Types of Merges
The most common method for merging data is through a process called
“joining”. There are several types of joins.
• Inner Join: Uses a comparison operator to match rows from two tables that
are based on the values in common columns from each table.
• Left join/left outer join. Returns all the rows from the left table that are
specified in the left outer join clause, not just the rows in which the columns
match.
• Right join/right outer join Returns all the rows from the right table that are
specified in the right outer join clause, not just the rows in which the
columns match.
Continue…
• Full outer join Returns all the rows in both the left and right tables.
• Cross joins (cartesian join) Returns all possible combinations of rows from
two tables.
Thanks for Watching!

More Related Content

Similar to Data Preprocessing- Feature Selection and Merging.

Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
Davis David
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionData Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET Journal
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
Gokulks007
 
Data Engineer's Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature SelectionData Engineer's Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature Selection
Anant Corporation
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.
Jayanti Pande
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
Mehwish690898
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
Knoldus Inc.
 
Data warehouse 17 dimensional data model
Data warehouse 17 dimensional data modelData warehouse 17 dimensional data model
Data warehouse 17 dimensional data model
Vaibhav Khanna
 
Relational database (Unit 2)
Relational database (Unit 2)Relational database (Unit 2)
Relational database (Unit 2)
Ismail Mukiibi
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
excel content
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
DataminingTools Inc
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
SrushtiSuvarna
 
Business Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfBusiness Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdf
Jayanti Pande
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET Journal
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&toolsAmandeep Gill
 
Data pre processing
Data pre processingData pre processing
Data pre processing
kalavathisugan
 
Analysis Of Attribute Revelance
Analysis Of Attribute RevelanceAnalysis Of Attribute Revelance
Analysis Of Attribute Revelance
pradeepa velmurugan
 
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine LearningDimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
RomiRoy4
 

Similar to Data Preprocessing- Feature Selection and Merging. (20)

Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionData Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Data Engineer's Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature SelectionData Engineer's Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature Selection
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Data warehouse 17 dimensional data model
Data warehouse 17 dimensional data modelData warehouse 17 dimensional data model
Data warehouse 17 dimensional data model
 
Relational database (Unit 2)
Relational database (Unit 2)Relational database (Unit 2)
Relational database (Unit 2)
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Business Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfBusiness Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdf
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&tools
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Analysis Of Attribute Revelance
Analysis Of Attribute RevelanceAnalysis Of Attribute Revelance
Analysis Of Attribute Revelance
 
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine LearningDimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
 
Database aggregation using metadata
Database aggregation using metadataDatabase aggregation using metadata
Database aggregation using metadata
 

More from Megha Sharma

Data Management Activities, Extraction, Transformation and Loading (ETL)
Data Management Activities, Extraction, Transformation and Loading (ETL)Data Management Activities, Extraction, Transformation and Loading (ETL)
Data Management Activities, Extraction, Transformation and Loading (ETL)
Megha Sharma
 
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Megha Sharma
 
Model Evaluation Matrix: Confusion Matrix, F1 Score, ROC curve AUC
Model Evaluation Matrix: Confusion Matrix, F1 Score, ROC curve AUCModel Evaluation Matrix: Confusion Matrix, F1 Score, ROC curve AUC
Model Evaluation Matrix: Confusion Matrix, F1 Score, ROC curve AUC
Megha Sharma
 
Model Evaluation Matrix: Accuracy, precision and recall
Model Evaluation Matrix: Accuracy, precision and recallModel Evaluation Matrix: Accuracy, precision and recall
Model Evaluation Matrix: Accuracy, precision and recall
Megha Sharma
 
Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.
Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.
Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.
Megha Sharma
 
Visualization Techniques ,Exploratory Data Analysis(EDA), Histogram
Visualization Techniques ,Exploratory Data Analysis(EDA), HistogramVisualization Techniques ,Exploratory Data Analysis(EDA), Histogram
Visualization Techniques ,Exploratory Data Analysis(EDA), Histogram
Megha Sharma
 
Data Preprocessing- Data transformation, Scaling, Normalization, Standardiza...
Data Preprocessing- Data transformation,  Scaling, Normalization, Standardiza...Data Preprocessing- Data transformation,  Scaling, Normalization, Standardiza...
Data Preprocessing- Data transformation, Scaling, Normalization, Standardiza...
Megha Sharma
 
Data Science- Data Preprocessing, Data Cleaning.
Data Science- Data Preprocessing, Data Cleaning.Data Science- Data Preprocessing, Data Cleaning.
Data Science- Data Preprocessing, Data Cleaning.
Megha Sharma
 
Different types of data. Qualitative, Quantitative, Ordinal, Nominal, Discret...
Different types of data. Qualitative, Quantitative, Ordinal, Nominal, Discret...Different types of data. Qualitative, Quantitative, Ordinal, Nominal, Discret...
Different types of data. Qualitative, Quantitative, Ordinal, Nominal, Discret...
Megha Sharma
 
Data Science comparison with AI, ML, BI, and data warehousing, data mining.
Data Science comparison with AI, ML, BI, and data warehousing, data mining.Data Science comparison with AI, ML, BI, and data warehousing, data mining.
Data Science comparison with AI, ML, BI, and data warehousing, data mining.
Megha Sharma
 
Data Science Introduction, Application of Data Science.
Data Science Introduction, Application of Data Science.Data Science Introduction, Application of Data Science.
Data Science Introduction, Application of Data Science.
Megha Sharma
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Megha Sharma
 
Association Rule mining
Association Rule miningAssociation Rule mining
Association Rule mining
Megha Sharma
 
Bellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - IIBellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - II
Megha Sharma
 
Reinforcement learning in Machine learning
 Reinforcement learning in Machine learning Reinforcement learning in Machine learning
Reinforcement learning in Machine learning
Megha Sharma
 
E-M Algorithm
E-M AlgorithmE-M Algorithm
E-M Algorithm
Megha Sharma
 
Entropy and information gain in decision tree.
Entropy and information gain in decision tree.Entropy and information gain in decision tree.
Entropy and information gain in decision tree.
Megha Sharma
 
Types of Machine Learning. & Decision Tree.
Types of Machine Learning. & Decision Tree.Types of Machine Learning. & Decision Tree.
Types of Machine Learning. & Decision Tree.
Megha Sharma
 
If statements in C
If statements in CIf statements in C
If statements in C
Megha Sharma
 
Conditional and special operators
Conditional and special operatorsConditional and special operators
Conditional and special operators
Megha Sharma
 

More from Megha Sharma (20)

Data Management Activities, Extraction, Transformation and Loading (ETL)
Data Management Activities, Extraction, Transformation and Loading (ETL)Data Management Activities, Extraction, Transformation and Loading (ETL)
Data Management Activities, Extraction, Transformation and Loading (ETL)
 
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
 
Model Evaluation Matrix: Confusion Matrix, F1 Score, ROC curve AUC
Model Evaluation Matrix: Confusion Matrix, F1 Score, ROC curve AUCModel Evaluation Matrix: Confusion Matrix, F1 Score, ROC curve AUC
Model Evaluation Matrix: Confusion Matrix, F1 Score, ROC curve AUC
 
Model Evaluation Matrix: Accuracy, precision and recall
Model Evaluation Matrix: Accuracy, precision and recallModel Evaluation Matrix: Accuracy, precision and recall
Model Evaluation Matrix: Accuracy, precision and recall
 
Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.
Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.
Visualization Techniques- Box plot, Line Chart, Scatter plot, Bar chart.
 
Visualization Techniques ,Exploratory Data Analysis(EDA), Histogram
Visualization Techniques ,Exploratory Data Analysis(EDA), HistogramVisualization Techniques ,Exploratory Data Analysis(EDA), Histogram
Visualization Techniques ,Exploratory Data Analysis(EDA), Histogram
 
Data Preprocessing- Data transformation, Scaling, Normalization, Standardiza...
Data Preprocessing- Data transformation,  Scaling, Normalization, Standardiza...Data Preprocessing- Data transformation,  Scaling, Normalization, Standardiza...
Data Preprocessing- Data transformation, Scaling, Normalization, Standardiza...
 
Data Science- Data Preprocessing, Data Cleaning.
Data Science- Data Preprocessing, Data Cleaning.Data Science- Data Preprocessing, Data Cleaning.
Data Science- Data Preprocessing, Data Cleaning.
 
Different types of data. Qualitative, Quantitative, Ordinal, Nominal, Discret...
Different types of data. Qualitative, Quantitative, Ordinal, Nominal, Discret...Different types of data. Qualitative, Quantitative, Ordinal, Nominal, Discret...
Different types of data. Qualitative, Quantitative, Ordinal, Nominal, Discret...
 
Data Science comparison with AI, ML, BI, and data warehousing, data mining.
Data Science comparison with AI, ML, BI, and data warehousing, data mining.Data Science comparison with AI, ML, BI, and data warehousing, data mining.
Data Science comparison with AI, ML, BI, and data warehousing, data mining.
 
Data Science Introduction, Application of Data Science.
Data Science Introduction, Application of Data Science.Data Science Introduction, Application of Data Science.
Data Science Introduction, Application of Data Science.
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Association Rule mining
Association Rule miningAssociation Rule mining
Association Rule mining
 
Bellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - IIBellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - II
 
Reinforcement learning in Machine learning
 Reinforcement learning in Machine learning Reinforcement learning in Machine learning
Reinforcement learning in Machine learning
 
E-M Algorithm
E-M AlgorithmE-M Algorithm
E-M Algorithm
 
Entropy and information gain in decision tree.
Entropy and information gain in decision tree.Entropy and information gain in decision tree.
Entropy and information gain in decision tree.
 
Types of Machine Learning. & Decision Tree.
Types of Machine Learning. & Decision Tree.Types of Machine Learning. & Decision Tree.
Types of Machine Learning. & Decision Tree.
 
If statements in C
If statements in CIf statements in C
If statements in C
 
Conditional and special operators
Conditional and special operatorsConditional and special operators
Conditional and special operators
 

Recently uploaded

Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Denish Jangid
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Extraction Of Natural Dye From Beetroot (Beta Vulgaris) And Preparation Of He...
Extraction Of Natural Dye From Beetroot (Beta Vulgaris) And Preparation Of He...Extraction Of Natural Dye From Beetroot (Beta Vulgaris) And Preparation Of He...
Extraction Of Natural Dye From Beetroot (Beta Vulgaris) And Preparation Of He...
SachinKumar945617
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
Sayali Powar
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Extraction Of Natural Dye From Beetroot (Beta Vulgaris) And Preparation Of He...
Extraction Of Natural Dye From Beetroot (Beta Vulgaris) And Preparation Of He...Extraction Of Natural Dye From Beetroot (Beta Vulgaris) And Preparation Of He...
Extraction Of Natural Dye From Beetroot (Beta Vulgaris) And Preparation Of He...
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 

Data Preprocessing- Feature Selection and Merging.

  • 2. Data Preprocessing Data Integration Data Transforma tion Data Reduction or dimension reduction Data Cleaning Scaling, Normalization, Categorical Encoding Handling missing values Outliers, duplicates Selecting relevant features/Column, Data Combining multiple datasets/ Merging
  • 3. Feature Selection A feature is an attribute that has an impact on a problem or is useful for the problem, and choosing the important features for the model is known as feature selection. Feature selection is often performed to remove irrelevant or redundant features from the dataset. We can define feature Selection as, "It is a process of automatically or manually selecting the subset of most appropriate and relevant features to be used in model building." Feature selection is performed by either including the important features or excluding the irrelevant features in the dataset without changing them.
  • 4. Feature Selection Techniques • Supervised Feature Selection technique : Supervised Feature selection techniques consider the target variable and can be used for the labeled dataset. • Unsupervised Feature Selection technique: Unsupervised Feature selection techniques ignore the target variable and can be used for the unlabeled dataset.
  • 5. Common techniques for selecting relevant features  Feature Importance  Recursive Feature Elimination(RFE)  Forward/Backward Elimination  Principal Component Analysis (PCA)  Filter Method  Domain Knowledge
  • 6. Feature Extraction Feature extraction involves transforming the original features into a new set of features through mathematical transformations or projections. Feature selection involves selecting a subset of the original features based on their relevance, while feature extraction involves transforming the original features into a new set of features. Both techniques are used for dimensionality reduction to improve model performance, reduce overfitting, and enhance interpretability.
  • 7. Merging: Combining Multiple Datasets Merging also known as joining, is a fundamental operation in data science where we combine data from multiple datasets based on a common attribute or key. Merging is essential when dealing with essential datasets or when integrating data from multiple sources. In merging it is important to ensure that the keys used for merging are consistent and that we handle missing values appropriately.
  • 8. Types of Merges The most common method for merging data is through a process called “joining”. There are several types of joins. • Inner Join: Uses a comparison operator to match rows from two tables that are based on the values in common columns from each table. • Left join/left outer join. Returns all the rows from the left table that are specified in the left outer join clause, not just the rows in which the columns match. • Right join/right outer join Returns all the rows from the right table that are specified in the right outer join clause, not just the rows in which the columns match.
  • 9. Continue… • Full outer join Returns all the rows in both the left and right tables. • Cross joins (cartesian join) Returns all possible combinations of rows from two tables.