Data Preprocessing- Feature Selection and Merging.

Data Science
Data Preprocessing
(Feature Selection and Merging )

Data
Preprocessing
Data
Integration
Data
Transforma
tion
Data
Reduction
or
dimension
reduction
Data
Cleaning
Scaling, Normalization,
Categorical Encoding
Handling missing
values Outliers,
duplicates
Selecting relevant
features/Column, Data
Combining multiple
datasets/ Merging

Feature Selection
A feature is an attribute that has an impact on a problem or is useful for the
problem, and choosing the important features for the model is known as
feature selection. Feature selection is often performed to remove irrelevant
or redundant features from the dataset.
We can define feature Selection as, "It is a process of automatically or
manually selecting the subset of most appropriate and relevant features to
be used in model building." Feature selection is performed by either
including the important features or excluding the irrelevant features in the
dataset without changing them.

Feature Selection Techniques
• Supervised Feature Selection technique : Supervised Feature selection
techniques consider the target variable and can be used for the labeled
dataset.
• Unsupervised Feature Selection technique: Unsupervised Feature
selection techniques ignore the target variable and can be used for the
unlabeled dataset.

Common techniques for selecting relevant features
 Feature Importance
 Recursive Feature Elimination(RFE)
 Forward/Backward Elimination
 Principal Component Analysis (PCA)
 Filter Method
 Domain Knowledge

Feature Extraction
Feature extraction involves transforming the original features into a new set
of features through mathematical transformations or projections.
Feature selection involves selecting a subset of the original features based
on their relevance, while feature extraction involves transforming the
original features into a new set of features. Both techniques are used for
dimensionality reduction to improve model performance, reduce overfitting,
and enhance interpretability.

Merging: Combining Multiple Datasets
Merging also known as joining, is a fundamental operation in data science
where we combine data from multiple datasets based on a common attribute
or key.
Merging is essential when dealing with essential datasets or when
integrating data from multiple sources. In merging it is important to ensure
that the keys used for merging are consistent and that we handle missing
values appropriately.

Types of Merges
The most common method for merging data is through a process called
“joining”. There are several types of joins.
• Inner Join: Uses a comparison operator to match rows from two tables that
are based on the values in common columns from each table.
• Left join/left outer join. Returns all the rows from the left table that are
specified in the left outer join clause, not just the rows in which the columns
match.
• Right join/right outer join Returns all the rows from the right table that are
specified in the right outer join clause, not just the rows in which the
columns match.

Continue…
• Full outer join Returns all the rows in both the left and right tables.
• Cross joins (cartesian join) Returns all possible combinations of rows from
two tables.

Data Preprocessing- Feature Selection and Merging.

Recommended

Recommended

More Related Content

Similar to Data Preprocessing- Feature Selection and Merging.

Similar to Data Preprocessing- Feature Selection and Merging. (20)

More from Megha Sharma

More from Megha Sharma (20)

Recently uploaded

Recently uploaded (20)

Data Preprocessing- Feature Selection and Merging.