Preparing your data for Machine Learning with Feature Scaling
1. Feature Scaling
IS IT LIKE SCHOOL DRESS ? EVERYONE WEARS THE SAME
(Their may be variation of shades, but the base color remains same)
2. What is feature scaling ?
Feature scaling or data normalization is the process of
standardizing the scale/range of exploratory
data/independent variables.
Set the outliers aside, which may influence the output.
As machine learning is based on Euclidean distance.
Lets see on right side how the range can impact your
model?
Here in case of age the distance will be
Square of 44-22=441 and
For salary it is 961000000. so your model will be dominated
By the largest value.
3. Ways of Scaling
Standardization
A process of rescaling one or more numeric
values in you dataset so that they have a
mean value of 0 and a standard
deviation of 1
Normalization
Data normalization is the process of
rescaling one or more attributes to the
range of 0 to 1. This means that the largest
value for each attribute is 1 and the
smallest value is 0.
*Many algorithms (such as SVM, K-nearest
neighbors, and logistic regression) require
features to be normalized.
*Standardization works better with linear
regression, logistic regression and linear
discriminate analysis
4. Lets Practice
standardization
Country Age Salary Purchased
France 44 72000 0
Spain 27 48000 1
Germany 30 54000 0
Spain 38 61000 0
Germany 40 76000 1
France 35 58000 1
Spain 45 52000 0
France 48 79000 1
Germany 50 83000 0
France 37 67000 1