Preprocessing presentation

•

0 likes•297 views

A short presentation on preprocessing data and feature scaling. It highlights the rationale behind scaling your continuous variables and the different methods for doing so.

Data & Analytics

Preprocessing data
Feature Scaling
Daniel Shin

What is feature scaling?
Feature scaling is a method used to
standardize the range of independent
variables or features of data. In data
processing, it is also known as data
normalization and is generally performed
during the data preprocessing step.

● Many estimators require it
○ Regression
○ SVM
○ kNN
○ Neural networks
● Image processing
Why use feature scaling?

● Methods
○ Calculate z-scores of your feature variables
○ Min-max method
How to use feature scaling?

● Andrew Ng
○ Okay if variables in the [-⅓ , ⅓ ] to [-3 , 3]
range
○ Otherwise, scale
When to use feature scaling?

● preprocessing.scale
○ by default scales feature to mean 0 and
variance 1
○ i.e calculates the Z-score
● preprocessing.normalize
○ scales each feature to unit vector
○ useful for text classification/clustering
sklearn.preprocessing

Results
Pre-scaled accuracy precision recall f1 Scaled accuracy precision recall f1
LogisticReg 0.838 0.857 0.788 0.817 LogisticReg 0.835 0.850 0.787 0.813
SVMC 0.539 0.000 0.000 0.000 SVMC 0.828 0.846 0.773 0.803
GaussNB 0.828 0.828 0.802 0.811 GaussNB 0.828 0.828 0.802 0.811
DecisionTree 0.703 0.688 0.736 0.690 DecisionTree 0.714 0.720 0.729 0.693
RandomForest 0.801 0.840 0.685 0.741 RandomForest 0.818 0.845 0.700 0.773
kNN9 0.653 0.637 0.576 0.605 kNN9 0.825 0.841 0.766 0.800

Fit scaler only on training set
std_scale = preprocessing.StandardScaler().fit(X_train)
X_train_std = std_scale.transform(X_train)
X_test_std = std_scale.transform(X_test)
also MinMaxScaler()

Recently uploaded

如何办理哥伦比亚大学毕业证(Columbia毕业证）成绩单原版一比一fztigerwe

Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation

如何办理英国卡迪夫大学毕业证（Cardiff毕业证书）成绩单留信学历认证ju0dztxtn

Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics

1:1原版定制利物浦大学毕业证(Liverpool毕业证）成绩单学位证书留信学历认证ppy8zfkfm

NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba

Aggregations - The Elasticsearch "GROUP BY"John Sobanski

社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社

Displacement, Velocity, Acceleration, and Second Derivatives23050636

如何办理(UCLA毕业证书）加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv

NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066

Northern New England Tableau User Group (TUG) May 2024patrickdtherriault

Formulas dax para power bI de microsoft.pdfRobertoOcampo24

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969

How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies

Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013

Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics

如何办理加州大学伯克利分校毕业证(UCB毕业证）成绩单留信学历认证a8om7o51

edited gordis ebook sixth edition david d.pdfgreat91

Recently uploaded (20)

如何办理哥伦比亚大学毕业证(Columbia毕业证）成绩单原版一比一

Data Analysis Project Presentation : NYC Shooting Cluster Analysis

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...

如何办理英国卡迪夫大学毕业证（Cardiff毕业证书）成绩单留信学历认证

Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks

1:1原版定制利物浦大学毕业证(Liverpool毕业证）成绩单学位证书留信学历认证

NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...

Aggregations - The Elasticsearch "GROUP BY"

社内勉強会資料_Object Recognition as Next Token Prediction

Displacement, Velocity, Acceleration, and Second Derivatives

如何办理(UCLA毕业证书）加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样

NOAM AAUG Adobe Summit 2024: Summit Slam Dunks

Northern New England Tableau User Group (TUG) May 2024

Formulas dax para power bI de microsoft.pdf

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...

How to Transform Clinical Trial Management with Advanced Data Analytics

Audience Researchndfhcvnfgvgbhujhgfv.pptx

Predictive Precipitation: Advanced Rain Forecasting Techniques

如何办理加州大学伯克利分校毕业证(UCB毕业证）成绩单留信学历认证

edited gordis ebook sixth edition david d.pdf

Preprocessing presentation

1. Preprocessing data Feature Scaling Daniel Shin

2. What is feature scaling? Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.

3. ● Many estimators require it ○ Regression ○ SVM ○ kNN ○ Neural networks ● Image processing Why use feature scaling?

4. ● Methods ○ Calculate z-scores of your feature variables ○ Min-max method How to use feature scaling?

5. ● Andrew Ng ○ Okay if variables in the [-⅓ , ⅓ ] to [-3 , 3] range ○ Otherwise, scale When to use feature scaling?

6. ● preprocessing.scale ○ by default scales feature to mean 0 and variance 1 ○ i.e calculates the Z-score ● preprocessing.normalize ○ scales each feature to unit vector ○ useful for text classification/clustering sklearn.preprocessing

7. Results Pre-scaled accuracy precision recall f1 Scaled accuracy precision recall f1 LogisticReg 0.838 0.857 0.788 0.817 LogisticReg 0.835 0.850 0.787 0.813 SVMC 0.539 0.000 0.000 0.000 SVMC 0.828 0.846 0.773 0.803 GaussNB 0.828 0.828 0.802 0.811 GaussNB 0.828 0.828 0.802 0.811 DecisionTree 0.703 0.688 0.736 0.690 DecisionTree 0.714 0.720 0.729 0.693 RandomForest 0.801 0.840 0.685 0.741 RandomForest 0.818 0.845 0.700 0.773 kNN9 0.653 0.637 0.576 0.605 kNN9 0.825 0.841 0.766 0.800

8. Actually this is wrong

9. Fit scaler only on training set std_scale = preprocessing.StandardScaler().fit(X_train) X_train_std = std_scale.transform(X_train) X_test_std = std_scale.transform(X_test) also MinMaxScaler()

Preprocessing presentation

Recommended

Recommended

More Related Content

Similar to Preprocessing presentation

Similar to Preprocessing presentation (20)

Recently uploaded

Recently uploaded (20)

Preprocessing presentation