SlideShare a Scribd company logo
1 of 17
Lesson 5
Scaling and Normalization Kush Kulshrestha
Topics
1. Scaling of Data
2. Data Normalization
3. Difference between Scaling and Normalization
4. Min-max scalar and Box-Cox Transformation
Scaling of data
• Variable scaling requires taking values that span a specific range and representing them in another range.
• The standard method is to scale variables to [0,1].
• This may introduce various distortions or biases into the data but the distribution or shape remains same.
• Depending on the modeling tool, scaling variable ranges can be beneficial or sometimes even required.
One way of doing this is:
Linear Scaling Transform
• First task in this scaling is to determine the minimum and maximum values of variables.
• Then applying the transform:
(x - min{x1, xN}) / (max{x1, xN} - min{x1, xN})
• This introduces no distortion to the variable distribution.
• Has a one-to-one relationship between the original and scaled values.
Scaling of data
Out of Range values
• In data preparation, the data used is only a sample of the population.
• Therefore, it is not certain that the actual minimum and maximum values of the variable have been
discovered when scaling the ranges.
• If some values that turn up later in the mining process are outside of the limits discovered in the sample,
they are called out-of-range values.
Scaling of data
Dealing with Out of Range values
• After range scaling, all variables should be in the range of [0,1].
• Out-of-range values, however, have values like -0.2 or 1.1 which can cause unwanted behavior.
Solution 1: Ignore that the range has been exceeded.
• Most modeling tools have (at least) some capacity to handle numbers outside the scaling range.
• Important question to ask: Does this affect the quality of the model?
Solution 2: Exclude the out of range instances.
• One problem is that reducing the number of instances reduces the confidence that the sample represents
the population.
• Another problem: Introduction of bias. Out-of-range values may occur with a certain pattern and ignoring
these instances removes samples according to a pattern introducing distortion to the sample
Scaling of data
Dealing with Out of Range values
Solution 3: Clip the out of range values
• If the value is greater than 1, assign 1 to it. If less than 0, assign 0.
• This approach assumes that out-of-range values are somehow equivalent with range limit values.
• Therefore, the information content on the limits is distorted by projecting multiple values into a single
value.
• This also introduces some bias.
Solution 4: Making room for out of range values
• The linear scaling transform provides an undistorted normalization but suffers from out-of-range values.
• Therefore, we should modify it to somehow include also values that are out of range.
• Most of the population is inside the range so for these values the normalization should be linear.
• The solution is to reserve some part of the range for the out-of-range values.
• Reserved amount of space depends on the confidence level of the sample:
e.g. - 98% confidence linear part is [0.01, 0.99]
Scaling of data
Dealing with Out of Range values
Squashing the out of range values
• Now the problem reduces to fitting the out-of-range values into the space left for them.
• The greater the difference between a value and the range limit, the less likely any such value is found.
• Therefore, the transformation should be such that as the distance to the range grows, the smaller the
increase towards one or decrease towards zero.
• One possibility is to use functions of the form y =1/x and attach them to the ends of the linear part.
Its difficult to carry out the scaling in pieces depending on the nature of the data point.
We can do all of the above steps using one function called Softmax scaling.
Scaling of data
Dealing with Out of Range values
Softmax Scaling –
• The extent of the linear part can be controlled by one parameter.
• The space assigned for out-of-range values can be controlled by the level of uncertainty in the sample.
• Non identical values have always different normalized values.
Softmax scaling is based on the logistic function:
y = 1 / (1 + e-x)
Where y is the scaled value and x is the input value.
• The logistic function transforms the original range of
[-,] to [0,1] and also has a linear part on the transform.
Scaling of data
Min-Max Scaler In scikit-learn
Data Normalization
• Applying a function to each data point z in the data: yi = f(zi)
• Unlike scaling, not only the data is distorted, but the shape or distribution of the data changes as well.
• The point of normalization is to change your observations so that they can be described as a normal
distribution.
• In general, you'll only want to normalize your data if you're going to be using a machine learning or
statistics technique that assumes your data is normally distributed.
• Or before using any technique containing “Gaussian” in its name.
Data Normalization
Data Standardization
• Also called z-scores.
• The terms normalization and standardization are
sometimes used interchangeably.
• Standardization transforms data to have a mean of
zero and a standard deviation of 1.
• Done by subtracting the mean and dividing by the
standard deviation for each data point.
Data Normalization
Log Transformation
• The log transformation can be used to make highly skewed distributions less skewed.
• This can be valuable both for making patterns in the data more interpretable and for helping to meet the
assumptions of inferential statistics.
• Figure shows an example of how a log transformation can make patterns more visible. Both graphs plot
the brain weight of animals as a function of their body weight. The raw weights are shown in the left
panel; the log-transformed weights are plotted in the right panel.
Data Normalization
Box-Cox Transformation
The Box-Cox transformation of the variable x is also indexed by λ, and is defined as:
• Box-Cox transformations cannot handle negative values.
• One way to deal with this is by adding the minimum data point value to add the data points.
• We want to use the same lambda generated from the training set of box cox transformation in the test
set.
• Another way to write this equation:
as ƛ turns to be 0
Data Normalization
Box-Cox Transformation
Notice with this definition of that x = 1 always maps to the point = 0 for all values of λ. To see how
the transformation works, look at the examples in Figure.
Data Normalization
Box-Cox Transformation
In the top row, the choice λ = 1 simply shifts x to the value x−1, which is a straight line. In the bottom row (on
a semi-logarithmic scale), the choice λ = 0 corresponds to a logarithmic transformation, which is now a
straight line. We superimpose a larger collection of transformations on a semi-logarithmic scale in Figure 2.
Data Normalization
Box-Cox Transformation
Data Normalization
Why do we need normalization?
1. Some algorithms need normally distributed data as input, otherwise the results are not reliable.
2. Easy comparison of values.
3. Interpretation of results makes more sense.
4. Objective function works faster in some cases if the data is normalized, i.e., speed of the algorithm
increases.
5. Can create complex features that may improve the model (or make it non-linear)

More Related Content

What's hot

Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation MethodSHUBHAM GUPTA
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning pyingkodi maran
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Activation functions
Activation functionsActivation functions
Activation functionsPRATEEK SAHU
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
Local search algorithm
Local search algorithmLocal search algorithm
Local search algorithmMegha Sharma
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 

What's hot (20)

Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Neural network
Neural networkNeural network
Neural network
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Data reduction
Data reductionData reduction
Data reduction
 
Local search algorithm
Local search algorithmLocal search algorithm
Local search algorithm
 
Activation function
Activation functionActivation function
Activation function
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 

Similar to Lesson 5 Scaling and Normalization

Data Transformation – Standardization & Normalization PPM.pptx
Data Transformation – Standardization & Normalization PPM.pptxData Transformation – Standardization & Normalization PPM.pptx
Data Transformation – Standardization & Normalization PPM.pptxssuser5cdaa93
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.pptDeadpool120050
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxNAGARAJANS68
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptxPriyadharshiniG41
 
Lead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdfLead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdfKrishP2
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptxsurbhidutta4
 
overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processingFEG
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Maninda Edirisooriya
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 

Similar to Lesson 5 Scaling and Normalization (20)

Data Transformation – Standardization & Normalization PPM.pptx
Data Transformation – Standardization & Normalization PPM.pptxData Transformation – Standardization & Normalization PPM.pptx
Data Transformation – Standardization & Normalization PPM.pptx
 
5954987.ppt
5954987.ppt5954987.ppt
5954987.ppt
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Lead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdfLead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdf
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Feature scaling
Feature scalingFeature scaling
Feature scaling
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 
overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processing
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 

More from Kush Kulshrestha

Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesKush Kulshrestha
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNNKush Kulshrestha
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationKush Kulshrestha
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionKush Kulshrestha
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningKush Kulshrestha
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningKush Kulshrestha
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionKush Kulshrestha
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine LearningKush Kulshrestha
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesKush Kulshrestha
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their typesKush Kulshrestha
 

More from Kush Kulshrestha (16)

Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for Classification
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic Regression
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine Learning
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine Learning
 
Visualization-2
Visualization-2Visualization-2
Visualization-2
 
Visualization-1
Visualization-1Visualization-1
Visualization-1
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric Vehicles
 
Time management
Time managementTime management
Time management
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their types
 

Recently uploaded

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Recently uploaded (20)

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Lesson 5 Scaling and Normalization

  • 1. Lesson 5 Scaling and Normalization Kush Kulshrestha
  • 2. Topics 1. Scaling of Data 2. Data Normalization 3. Difference between Scaling and Normalization 4. Min-max scalar and Box-Cox Transformation
  • 3. Scaling of data • Variable scaling requires taking values that span a specific range and representing them in another range. • The standard method is to scale variables to [0,1]. • This may introduce various distortions or biases into the data but the distribution or shape remains same. • Depending on the modeling tool, scaling variable ranges can be beneficial or sometimes even required. One way of doing this is: Linear Scaling Transform • First task in this scaling is to determine the minimum and maximum values of variables. • Then applying the transform: (x - min{x1, xN}) / (max{x1, xN} - min{x1, xN}) • This introduces no distortion to the variable distribution. • Has a one-to-one relationship between the original and scaled values.
  • 4. Scaling of data Out of Range values • In data preparation, the data used is only a sample of the population. • Therefore, it is not certain that the actual minimum and maximum values of the variable have been discovered when scaling the ranges. • If some values that turn up later in the mining process are outside of the limits discovered in the sample, they are called out-of-range values.
  • 5. Scaling of data Dealing with Out of Range values • After range scaling, all variables should be in the range of [0,1]. • Out-of-range values, however, have values like -0.2 or 1.1 which can cause unwanted behavior. Solution 1: Ignore that the range has been exceeded. • Most modeling tools have (at least) some capacity to handle numbers outside the scaling range. • Important question to ask: Does this affect the quality of the model? Solution 2: Exclude the out of range instances. • One problem is that reducing the number of instances reduces the confidence that the sample represents the population. • Another problem: Introduction of bias. Out-of-range values may occur with a certain pattern and ignoring these instances removes samples according to a pattern introducing distortion to the sample
  • 6. Scaling of data Dealing with Out of Range values Solution 3: Clip the out of range values • If the value is greater than 1, assign 1 to it. If less than 0, assign 0. • This approach assumes that out-of-range values are somehow equivalent with range limit values. • Therefore, the information content on the limits is distorted by projecting multiple values into a single value. • This also introduces some bias. Solution 4: Making room for out of range values • The linear scaling transform provides an undistorted normalization but suffers from out-of-range values. • Therefore, we should modify it to somehow include also values that are out of range. • Most of the population is inside the range so for these values the normalization should be linear. • The solution is to reserve some part of the range for the out-of-range values. • Reserved amount of space depends on the confidence level of the sample: e.g. - 98% confidence linear part is [0.01, 0.99]
  • 7. Scaling of data Dealing with Out of Range values Squashing the out of range values • Now the problem reduces to fitting the out-of-range values into the space left for them. • The greater the difference between a value and the range limit, the less likely any such value is found. • Therefore, the transformation should be such that as the distance to the range grows, the smaller the increase towards one or decrease towards zero. • One possibility is to use functions of the form y =1/x and attach them to the ends of the linear part. Its difficult to carry out the scaling in pieces depending on the nature of the data point. We can do all of the above steps using one function called Softmax scaling.
  • 8. Scaling of data Dealing with Out of Range values Softmax Scaling – • The extent of the linear part can be controlled by one parameter. • The space assigned for out-of-range values can be controlled by the level of uncertainty in the sample. • Non identical values have always different normalized values. Softmax scaling is based on the logistic function: y = 1 / (1 + e-x) Where y is the scaled value and x is the input value. • The logistic function transforms the original range of [-,] to [0,1] and also has a linear part on the transform.
  • 9. Scaling of data Min-Max Scaler In scikit-learn
  • 10. Data Normalization • Applying a function to each data point z in the data: yi = f(zi) • Unlike scaling, not only the data is distorted, but the shape or distribution of the data changes as well. • The point of normalization is to change your observations so that they can be described as a normal distribution. • In general, you'll only want to normalize your data if you're going to be using a machine learning or statistics technique that assumes your data is normally distributed. • Or before using any technique containing “Gaussian” in its name.
  • 11. Data Normalization Data Standardization • Also called z-scores. • The terms normalization and standardization are sometimes used interchangeably. • Standardization transforms data to have a mean of zero and a standard deviation of 1. • Done by subtracting the mean and dividing by the standard deviation for each data point.
  • 12. Data Normalization Log Transformation • The log transformation can be used to make highly skewed distributions less skewed. • This can be valuable both for making patterns in the data more interpretable and for helping to meet the assumptions of inferential statistics. • Figure shows an example of how a log transformation can make patterns more visible. Both graphs plot the brain weight of animals as a function of their body weight. The raw weights are shown in the left panel; the log-transformed weights are plotted in the right panel.
  • 13. Data Normalization Box-Cox Transformation The Box-Cox transformation of the variable x is also indexed by λ, and is defined as: • Box-Cox transformations cannot handle negative values. • One way to deal with this is by adding the minimum data point value to add the data points. • We want to use the same lambda generated from the training set of box cox transformation in the test set. • Another way to write this equation: as ƛ turns to be 0
  • 14. Data Normalization Box-Cox Transformation Notice with this definition of that x = 1 always maps to the point = 0 for all values of λ. To see how the transformation works, look at the examples in Figure.
  • 15. Data Normalization Box-Cox Transformation In the top row, the choice λ = 1 simply shifts x to the value x−1, which is a straight line. In the bottom row (on a semi-logarithmic scale), the choice λ = 0 corresponds to a logarithmic transformation, which is now a straight line. We superimpose a larger collection of transformations on a semi-logarithmic scale in Figure 2.
  • 17. Data Normalization Why do we need normalization? 1. Some algorithms need normally distributed data as input, otherwise the results are not reliable. 2. Easy comparison of values. 3. Interpretation of results makes more sense. 4. Objective function works faster in some cases if the data is normalized, i.e., speed of the algorithm increases. 5. Can create complex features that may improve the model (or make it non-linear)