The document discusses dimensionality reduction techniques by Ms. Aakanksha Jain, focusing on feature selection and extraction, specifically highlighting backward feature elimination and forward feature selection. It emphasizes the importance of dimension reduction in data analysis and machine learning, demonstrating the concepts with practical Python implementations. The document also includes a hands-on session showcasing a dataset and utilizing specific Python libraries for feature selection.
Content
01
02
03
04
What is dimensionalityreduction?
Feature Selection and Feature Extraction
Techniques to achieve dimension reduction
Backward feature elimination and Forward
feature selection technique
Hand-on session on feature selection
Why dimension reduction is important?
Basic understanding of feature selection
Python implementation on Jupiter lab
4.
What is dimensionalityReduction?
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional
space into a low-dimensional space so that the low-dimensional representation retains some meaningful
properties of the original data
Source: Internet
6
Technique to achieveDimension Reduction
Feature extraction: finds a set of new
features (i.e., through some mapping f())
from the existing features.
1
2
1
2
.
.
.
.
.
.
. K
i
i
i
N
x
x
x
x
x
x
x y
1
2
1
2
( )
.
.
.
.
.
.
.
f
K
N
x
x
y
y
y
x
x
x y
Feature selection: chooses a
subset of the original features.
The mapping f() could
be linear or non-linear
K<<N K<<N
7.
Feature Selection Techniques
EmbeddedMethod
Features are selected in combined quality
of Filter and Wrapper method
WRAPPER Method
Selects the best combinations of the
features that produces the best result
FILTER Method
Features are being selected via various
statistical test score.
8.
Backward Feature Elimination
FeatureSelection
Keeping Most Significant
Feature
Complete
Dataset
All
Features Select Most
Significant
Feature
Initially we start with
all the features
Iterative checking of
significance of feature
Dependent
Variable
Iterative
Learning
Checking impact on model
performance after removal
Feature removal
9.
Backward Feature Elimination
Assumptions:
•There is no missing values in our dataset
• The variance of all the variable are very high
• And between independent variable, correlation is very low
10.
Backward Feature Elimination
Steps-I:
Toperform Backward feature elimination
Firstly, train the model using all variable let say n
Step-II:
Next, we will calculate the performance of the model
ACCURACY: 92%
Python Code
#importing thelibraries
import pandas as pd
#reading the file
data = pd.read_csv('backward_feature_elimination.csv')
# first 5 rows of the data
data.head()
#shape of the data
data.shape
# creating the training data
X = data.drop(['ID', 'count'], axis=1)
y = data['count']
#Checking Shape
X.shape, y.shape
#Installation of MlEXTEND
!pip install mlxtend
17.
Python Code
#importing thelibraries
from mlxtend.feature_selection import SequentialFeatureSelector as sfs
from sklearn.linear_model import LinearRegression
#Setting parameters to apply Backward Feature Elimination
lreg = LinearRegression()
sfs1 = sfs(lreg, k_features=4, forward=False, verbose=1, scoring='neg_mean_squared_error')
#Apply Backward Feature Elimination
sfs1 = sfs1.fit(X, y)
#Checking selected features
feat_names = list(sfs1.k_feature_names_)
print(feat_names)
#Setting new dataframe
new_data = data[feat_names]
new_data['count'] = data['count']
18.
Python Code
# firstfive rows of the new data
new_data.head()
# shape of new and original data
new_data.shape, data.shape
Congratulations!!
We have successfully implemented Backward feature elimination