HELP YOUR DATA BE
NORMAL
DAMIAN MINGLE
CHIEF DATA SCIENTIST
@DamianMingle
GET THE FULL STORY
bit.ly/UseSciKitNow
Want faster model run times and
better accuracy?
Try Normalizing Your Data
What’s Normal Anyway?
 Often stated as “scaling individual samples to have unit norm” or “scale
input vectors individually to unit norm (vector length).
 Adjusting values measured on different scales to a notionally common
scale
Why Normalization Matters
 In truth, not all machine learning models are sensitive to magnitude.
 Data on the same scale can help machine learning models learn (think k-
nearest neighbors and coefficients in regression)
Power in SciKit Learn
 Preprocessing
 Clustering
 Regression
 Classification
 Dimensionality Reduction
 Model Selection
Power of SciKit Learn
Let’s Look at ML Recipe
Normalization
The Imports
from sklearn.datasets import load_iris
from sklearn import preprocessing
Separate Features from Target
iris = load_iris()
print(iris.data.shape)
X = iris.data
y = iris.target
Normalize the Features
normalized_X = preprocessing.normalize(X)
Normalization Recipe
# Normalize the data attributes for the Iris dataset. from
sklearn.datasets import load_iris
from sklearn import preprocessing
# load the iris dataset iris = load_iris()
print(iris.data.shape)
# separate the data from the target attributes
X = iris.data
y = iris.target
# normalize the data attributes
normalized_X = preprocessing.normalize(X)
HELP YOUR DATA BE
NORMAL
DAMIAN MINGLE
CHIEF DATA SCIENTIST
@DamianMingle
GET THE FULL STORY
bit.ly/UseSciKitNow
Resources
 Society of Data Scientists
 SciKit Learn

Scikit Learn: Data Normalization Techniques That Work