SlideShare a Scribd company logo
Feature Scaling
by
Gautam Kumar
What is Feature Scaling?
Feature Scaling is a
to standardize the
independent features
in the data in a fixed range.
handle highly varying
magnitudes or values or
units.
Why feature scaling(Standardization)?
It’s a step of Data Pre-
Processing which is applied to
independent variables
or features of data.
It basically helps to normalize
the data within a particular
range. Sometimes, it also
helps in speeding up the
calculations in an algorithm
When to do
scaling?
 When we use below algorithms feature
scaling matters:
 K-nearest neighbors (KNN)
 K-Means
 Principal Component Analysis(PCA)
 gradient descent
 When we use below algorithms feature
scaling doesn’t require:
 Algorithms those rely on rules
 CART
 Random Forests
 Gradient Boosted Decision Trees
Distance calculation using different technique:
 Euclidean Distance : It is the square-root of the sum of squares of differences between the
coordinate , X is Data Point value, Y is Centroid value and K is no. of feature values.
Continue..
 Manhattan Distance : It is calculated as the sum of absolute differences between the coordinates
(feature values) of data point and centroid of each class.
Continue..
 Minkowski Distance : It is a generalization of above two methods.
Feature Scaling Techniques
Min-Max
Normalization
Standardization Max Abs Scaling Robust Scaling
Quantile
Transformer Scaling
Power Transformer
Scaling
Unit Vector Scaling
Min Max
Normalization
 This technique re-scales a feature or observation value with
distribution value between 0 and 1 or a given range.
 Min Max shrinks the data within the range of -1 to 1 if there are
negative values, and can set the range like [0,1] or [0,5] or [-1,1].
 This technique responds well if the standard deviation is small
and when a distribution is not Gaussian.
 sklearn.preprocessing.MinMaxScaler
Standardization
 This technique is used to re-scales a feature value so that it has
distribution with 0 mean value and variance equals to 1.
 scaling happen independently on each feature by computing the
relevant statistics on the samples in the training set.
 If data is not normally distributed, this is not the best Scaler to
use.
 sklearn.preprocessing.StandardScaler
Max Abs Scaling
Scale each feature by its
absolute value.
This technique scale and
each feature individually such that
the maximal absolute value of
each feature in the training set is
1.0 and minimum absolute is 0.0.
On positive-only data, this Scaler
behaves similarly to Min Max Scaler.
sklearn.preprocessing.MaxAbsScaler
Robust Scaling
This Scaling technique is robust to
outliers, If our data contains
many outliers, scaling using the
and standard deviation of the data
won’t work well.
This Scaling technique removes the
median and scales the data
to the quantile range(defaults to IQR:
Interquartile Range).
sklearn.preprocessing.robust_scale
Quantile Transformer Scaling
This technique transforms the features to follow a uniform or a normal distribution.
A quantile transform will map a variable’s probability distribution to another probability distribution.
transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature
is used to map the original values to a uniform distribution.
The obtained values are then mapped to the desired output distribution using the associated quantile function.
Then a Quantile Transformer is used to map the data distribution Gaussian and standardize the result, centering the values on
the mean value of 0 and a standard deviation of 1.0.
sklearn.preprocessing.quantile_transform
Power Transformer Scaling
The power transformer is a family of
parametric, monotonic transformations
that are applied to make data more
Gaussian-like.
This is useful for modeling issues related
to the variability of a variable that is
unequal across the range.
sklearn.preprocessing.power_transform
The power transform finds the optimal
scaling factor in stabilizing variance and
minimizing skewness through maximum
likelihood estimation.
Unit Vector Scaling
THIS SCALING TECHNIQUE IS DONE
CONSIDERING THE WHOLE FEATURE VECTOR
BE OF UNIT LENGTH.
UNIT VECTOR SCALING MEANS DIVIDING EACH
COMPONENT BY THE EUCLIDEAN LENGTH OF
VECTOR (L2 NORM).
UNIT VECTOR TECHNIQUE PRODUCES VALUES
RANGE [0,1]. WHEN DEALING WITH FEATURES
WITH HARD BOUNDARIES, THIS IS QUITE
EX. WHEN DEALING WITH IMAGE DATA, THE
COLORS CAN RANGE FROM ONLY 0 TO 255.
Any Question?
Contact:Gautam.kmr2893@outlook.com
Thank You

More Related Content

What's hot

Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Confusion Matrix
Confusion MatrixConfusion Matrix
Confusion Matrix
Rajat Gupta
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptx
Dr.Shweta
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Salah Amean
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
Upekha Vandebona
 
Bayesian Linear Regression.pptx
Bayesian Linear Regression.pptxBayesian Linear Regression.pptx
Bayesian Linear Regression.pptx
JerminJershaTC
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
AAKANKSHA JAIN
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)
Rehan Guha
 

What's hot (20)

Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Confusion Matrix
Confusion MatrixConfusion Matrix
Confusion Matrix
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptx
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Bayesian Linear Regression.pptx
Bayesian Linear Regression.pptxBayesian Linear Regression.pptx
Bayesian Linear Regression.pptx
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)
 

Similar to Feature scaling

Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
BeyaNasr1
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
Ashish Patel
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
Suresh Pokharel
 
Ai saturdays presentation
Ai saturdays presentationAi saturdays presentation
Ai saturdays presentation
Gurram Poorna Prudhvi
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
Kush Kulshrestha
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
Piyush Srivastava
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
PriyadharshiniG41
 
Data discretization
Data discretizationData discretization
Data discretization
Hadi M.Abachi
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
beherasushree212
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
Joe li
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Boston Institute of Analytics
 
Data Transformation – Standardization & Normalization PPM.pptx
Data Transformation – Standardization & Normalization PPM.pptxData Transformation – Standardization & Normalization PPM.pptx
Data Transformation – Standardization & Normalization PPM.pptx
ssuser5cdaa93
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMap
Ashish Patel
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
manaswinimysore
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
AnushaSharma81
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
Mehwish690898
 
NPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docxNPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docx
Mr. Moms
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
Derek Kane
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
Hussain395748
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
surbhidutta4
 

Similar to Feature scaling (20)

Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Ai saturdays presentation
Ai saturdays presentationAi saturdays presentation
Ai saturdays presentation
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
Data discretization
Data discretizationData discretization
Data discretization
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Data Transformation – Standardization & Normalization PPM.pptx
Data Transformation – Standardization & Normalization PPM.pptxData Transformation – Standardization & Normalization PPM.pptx
Data Transformation – Standardization & Normalization PPM.pptx
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMap
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 
NPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docxNPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docx
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 

Recently uploaded

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

Feature scaling

  • 2. What is Feature Scaling? Feature Scaling is a to standardize the independent features in the data in a fixed range. handle highly varying magnitudes or values or units.
  • 3. Why feature scaling(Standardization)? It’s a step of Data Pre- Processing which is applied to independent variables or features of data. It basically helps to normalize the data within a particular range. Sometimes, it also helps in speeding up the calculations in an algorithm
  • 4. When to do scaling?  When we use below algorithms feature scaling matters:  K-nearest neighbors (KNN)  K-Means  Principal Component Analysis(PCA)  gradient descent  When we use below algorithms feature scaling doesn’t require:  Algorithms those rely on rules  CART  Random Forests  Gradient Boosted Decision Trees
  • 5. Distance calculation using different technique:  Euclidean Distance : It is the square-root of the sum of squares of differences between the coordinate , X is Data Point value, Y is Centroid value and K is no. of feature values.
  • 6. Continue..  Manhattan Distance : It is calculated as the sum of absolute differences between the coordinates (feature values) of data point and centroid of each class.
  • 7. Continue..  Minkowski Distance : It is a generalization of above two methods.
  • 8. Feature Scaling Techniques Min-Max Normalization Standardization Max Abs Scaling Robust Scaling Quantile Transformer Scaling Power Transformer Scaling Unit Vector Scaling
  • 9. Min Max Normalization  This technique re-scales a feature or observation value with distribution value between 0 and 1 or a given range.  Min Max shrinks the data within the range of -1 to 1 if there are negative values, and can set the range like [0,1] or [0,5] or [-1,1].  This technique responds well if the standard deviation is small and when a distribution is not Gaussian.  sklearn.preprocessing.MinMaxScaler
  • 10. Standardization  This technique is used to re-scales a feature value so that it has distribution with 0 mean value and variance equals to 1.  scaling happen independently on each feature by computing the relevant statistics on the samples in the training set.  If data is not normally distributed, this is not the best Scaler to use.  sklearn.preprocessing.StandardScaler
  • 11. Max Abs Scaling Scale each feature by its absolute value. This technique scale and each feature individually such that the maximal absolute value of each feature in the training set is 1.0 and minimum absolute is 0.0. On positive-only data, this Scaler behaves similarly to Min Max Scaler. sklearn.preprocessing.MaxAbsScaler
  • 12. Robust Scaling This Scaling technique is robust to outliers, If our data contains many outliers, scaling using the and standard deviation of the data won’t work well. This Scaling technique removes the median and scales the data to the quantile range(defaults to IQR: Interquartile Range). sklearn.preprocessing.robust_scale
  • 13. Quantile Transformer Scaling This technique transforms the features to follow a uniform or a normal distribution. A quantile transform will map a variable’s probability distribution to another probability distribution. transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature is used to map the original values to a uniform distribution. The obtained values are then mapped to the desired output distribution using the associated quantile function. Then a Quantile Transformer is used to map the data distribution Gaussian and standardize the result, centering the values on the mean value of 0 and a standard deviation of 1.0. sklearn.preprocessing.quantile_transform
  • 14. Power Transformer Scaling The power transformer is a family of parametric, monotonic transformations that are applied to make data more Gaussian-like. This is useful for modeling issues related to the variability of a variable that is unequal across the range. sklearn.preprocessing.power_transform The power transform finds the optimal scaling factor in stabilizing variance and minimizing skewness through maximum likelihood estimation.
  • 15. Unit Vector Scaling THIS SCALING TECHNIQUE IS DONE CONSIDERING THE WHOLE FEATURE VECTOR BE OF UNIT LENGTH. UNIT VECTOR SCALING MEANS DIVIDING EACH COMPONENT BY THE EUCLIDEAN LENGTH OF VECTOR (L2 NORM). UNIT VECTOR TECHNIQUE PRODUCES VALUES RANGE [0,1]. WHEN DEALING WITH FEATURES WITH HARD BOUNDARIES, THIS IS QUITE EX. WHEN DEALING WITH IMAGE DATA, THE COLORS CAN RANGE FROM ONLY 0 TO 255.