© SuperDataScience
Machine
Learning A-Z
Course Slides
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Welcome to the course!
Dear student,
Welcome to the “Machine Learning A-Z” course brought to you by SuperDataScience. We are
super-excited to have you on board! In this class you will learn many interesting and useful
concepts while having lots of fun.
These slides may be updated from time-to-time. If this happens, you will be able to find them in
the course materials repository with a new version indicated in the filename.
We kindly ask that you use these slides only for the purpose of supporting your own learning
journey and we look forward to seeing you inside the class!
Enjoy machine learning,
Kirill & Hadelin
PS: if you are not yet enrolled in the course, you can find it here.
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Who Are Your Instructors?
• Hello! My name is Kirill Eremenko
• I have a bachelor’s degree in Physics & Maths
and a background in Data Analytics consulting
• I used to host the SuperDataScience podcast
We’ve been teaching online together since 2016 and over 1 Million students have enrolled in our
Machine Learning and Data Science courses. You can be confident that you are in good hands!
• Hi there! My name is Hadelin de Ponteves
• I have a master’s in Machine Learning and I
used to do Reinforcement Learning at Google
• I wrote a research paper on Machine Learning
© SuperDataScience
Data
Preprocessing
© SuperDataScience
The Machine
Learning
Process
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
The Machine Learning Process
Data Pre-Processing
• Import the data
• Clean the data
• Split into training & test sets
• Feature Scaling
Modelling
• Build the model
• Train the model
• Make predictions
Evaluation
• Calculate performance metrics
• Make a verdict
© SuperDataScience
Training Set
& Test Set
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Training Set & Test Set
~
!
𝑦 = 𝑏! + 𝑏"𝑋" + 𝑏#𝑋#
V.S.
Predicted values !
𝑦
Test
20%
Train
80%
Actual values 𝑦
© SuperDataScience
Feature
Scaling
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Scaling
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Scaling
Normalization Standardization
𝑋! =
𝑋 − 𝜇
𝜎
𝑋!
=
𝑋 − 𝑋"#$
𝑋"%& − 𝑋"#$
[0 ; 1] [-3 ; +3]
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Scaling
70,000 $ 45 yrs
60,000 $ 44 yrs
52,000 $ 40 yrs
10,000
8,000 4
1
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Scaling
Normalization
𝑋!
=
𝑋 − 𝑋"#$
𝑋"%& − 𝑋"#$
[0 ; 1]
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Scaling
70,000 $ 45 yrs
60,000 $ 44 yrs
52,000 $ 40 yrs
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Scaling
1
0.444
0
45 yrs
44 yrs
40 yrs
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Scaling
1 1
0.444 0.75
0 0
© SuperDataScience
Regression
© SuperDataScience
Simple Linear
Regression
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Simple Linear Regression
!
𝑦 = 𝑏! + 𝑏"𝑋"
Dependent variable Independent variable
y-intercept (constant)
Slope coefficient
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Simple Linear Regression
𝑦 [tonnes]
(Potato yield)
𝑋! [kg]
(Nitrogen Fertilizer)
𝑃𝑜𝑡𝑎𝑡𝑜𝑒𝑠 𝑡 = 𝑏! + 𝑏"×𝐹𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟 𝑘𝑔
8𝑡
+1𝑘𝑔
+3𝑡
!
𝑦 = 𝑏! + 𝑏"𝑋"
𝑏! = 8[𝑡]
𝑏" = 3[
𝑡
𝑘𝑔
]
~
Each point represents
a separate harvest
© SuperDataScience
Ordinary Least
Squares
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Simple Linear Regression
𝑆𝑈𝑀(𝑦$ − !
𝑦$)# is minimized
%
𝑦!
𝑦!
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙: 𝜀! = 𝑦! − %
𝑦!
!
𝑦 = 𝑏! + 𝑏"𝑋"
𝑏!, 𝑏" such that:
𝑦!
%
𝑦!
Ordinary Least Squares:
𝑦 [tonnes]
(Potato yield)
𝑋! [kg]
(Nitrogen Fertilizer)
© SuperDataScience
Multiple Linear
Regression
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Multiple Linear Regression
!
𝑦 = 𝑏! + 𝑏"𝑋" + 𝑏#𝑋# + ⋯ + 𝑏$𝑋$
Dependent variable Independent variable 1
y-intercept
(constant)
Slope coefficient 1
Independent variable 2
Slope coefficient 2
Independent variable n
Slope coefficient n
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Multiple Linear Regression
~
𝑃𝑜𝑡𝑎𝑡𝑜𝑒𝑠 𝑡 = 8𝑡 + 3
3
45
×𝐹𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟 𝑘𝑔 − 0.54
3
°7
×𝐴𝑣𝑔𝑇𝑒𝑚𝑝 °𝐶 + 0.04
3
88
×𝑅𝑎𝑖𝑛[𝑚𝑚]
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Additional Reading
The Application of Multiple Linear Regression and
Artificial Neural Network Models for Yield Prediction
of Very Early Potato Cultivars before Harvest
Magdalena Piekutowska et. al. (2021)
Link:
https://www.mdpi.com/2073-4395/11/5/885
© SuperDataScience
R Squared
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
R Squared
𝑦 [tonnes]
(Potato yield)
𝑋! [kg]
(Nitrogen Fertilizer)
𝑦"
!
𝑦"
𝑦 [tonnes]
(Potato yield)
𝑋! [kg]
(Nitrogen Fertilizer)
𝑦"
𝑦#$%
𝑆𝑆393 = 𝑆𝑈𝑀(𝑦$ − 𝑦:;5)#
𝑅# = 1 −
𝑆𝑆<=>
𝑆𝑆393
𝑆𝑈𝑀(𝑦$ − !
𝑦$)#
𝑆𝑆<=> =
Regression: Average:
Rule of thumb (for our tutorials)*:
1.0 = Perfect fit (suspicious)
~0.9 = Very good
<0.7 = Not great
<0.4 = Terrible
<0 = Model makes no sense for this data
*This is highly dependent on the context
© SuperDataScience
Adjusted
R Squared
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Adjusted R Squared
𝑆𝑆"#" doesn’t change
𝑆𝑆$%& will decrease or stay the same
R2 – Goodness of fit
(greater is better)
Problem:
k – number of independent variables
n – sample size
𝑅# = 1 −
𝑆𝑆<=>
𝑆𝑆393
%
𝑦 = 𝑏' + 𝑏(𝑋( + 𝑏)𝑋) + 𝑏*𝑋*
𝐴𝑑𝑗 𝑅# = 1 − 1 − 𝑅# ×
𝑛 − 1
𝑛 − 𝑘 − 1
(This is because of Ordinary Least Squares: 𝑆𝑆$%&-> Min)
Solution:
𝑆𝑈𝑀(𝑦$ − !
𝑦$)#
𝑆𝑆<=> =
© SuperDataScience
Assumptions Of
Linear Regression
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Assumptions of Linear Regression
Anscombe's quartet (1973):
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Assumptions of Linear Regression
1. Linearity
(Linear relationship between Y and each X)
2. Homoscedasticity
(Equal variance)
4. Independence
(of observations. Includes “no autocorrelation”)
5. Lack of Multicollinearity
(Predictors are not correlated with each other)
3. Multivariate Normality
(Normality of error distribution)
6. The Outlier Check
(This is not an assumption, but an “extra”)
𝑋"~ 𝑋#
𝑋"~ 𝑋#
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Bonus
Download the Assumptions poster at:
superdatascience.com/assumptions
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Additional Reading
Verifying the Assumptions of Linear Regression
in Python and R
Eryk Lewinson (2019)
Link:
towardsdatascience.com/verifying-the-
assumptions-of-linear-regression-in-python-
and-r-f4cd2907d4c0
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Profit R&D Spend Admin Marketing State
192,261.83 165,349.20 136,897.80 471,784.10 New York
191,792.06 162,597.70 151,377.59 443,898.53 California
191,050.39 153,441.51 101,145.55 407,934.54 California
182,901.99 144,372.41 118,671.85 383,199.62 New York
166,187.94 142,107.34 91,391.77 366,168.42 California
y = b0 + b1*x1 + b2*x2 + b3*x3 + ???
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Profit R&D Spend Admin Marketing State New York California
192,261.83 165,349.20 136,897.80 471,784.10 New York 1 0
191,792.06 162,597.70 151,377.59 443,898.53 California 0 1
191,050.39 153,441.51 101,145.55 407,934.54 California 0 1
182,901.99 144,372.41 118,671.85 383,199.62 New York 1 0
166,187.94 142,107.34 91,391.77 366,168.42 California 0 1
y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*D1
Dummy Variables
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Profit R&D Spend Admin Marketing State New York California
192,261.83 165,349.20 136,897.80 471,784.10 New York 1 0
191,792.06 162,597.70 151,377.59 443,898.53 California 0 1
191,050.39 153,441.51 101,145.55 407,934.54 California 0 1
182,901.99 144,372.41 118,671.85 383,199.62 New York 1 0
166,187.94 142,107.34 91,391.77 366,168.42 California 0 1
y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*D1
Dummy Variables
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Profit R&D Spend Admin Marketing State New York California
192,261.83 165,349.20 136,897.80 471,784.10 New York 1 0
191,792.06 162,597.70 151,377.59 443,898.53 California 0 1
191,050.39 153,441.51 101,145.55 407,934.54 California 0 1
182,901.99 144,372.41 118,671.85 383,199.62 New York 1 0
166,187.94 142,107.34 91,391.77 366,168.42 California 0 1
y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*D1 + b5*D2
Dummy Variables
D2 = 1 - D1
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Profit R&D Spend Admin Marketing State New York California
192,261.83 165,349.20 136,897.80 471,784.10 New York 1 0
191,792.06 162,597.70 151,377.59 443,898.53 California 0 1
191,050.39 153,441.51 101,145.55 407,934.54 California 0 1
182,901.99 144,372.41 118,671.85 383,199.62 New York 1 0
166,187.94 142,107.34 91,391.77 366,168.42 California 0 1
y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*D1 + b5*D2
Dummy Variables
Always omit one dummy
variable
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X2
X3
X4 X5
X6
X1 X7
y
Why?
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
1)
2)
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
5 methods of building models:
1. All-in
2. Backward Elimination
3. Forward Selection
4. Bidirectional Elimination
5. Score Comparison
Stepwise
Regression
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
“All-in” – cases:
• Prior knowledge; OR
• You have to; OR
• Preparing for Backward
Elimination
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Backward Elimination
STEP 1: Select a significance level to stay in the model (e.g. SL = 0.05)
STEP 2: Fit the full model with all possible predictors
STEP 3: Consider the predictor with the highest P-value. If P > SL, go to STEP 4, otherwise go to FIN
STEP 4: Remove the predictor
STEP 5: Fit model without this variable*
FIN: Your Model Is Ready
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Forward Selection
STEP 1: Select a significance level to enter the model (e.g. SL = 0.05)
STEP 2: Fit all simple regression models y ~ xn Select the one with the lowest P-value
STEP 3: Keep this variable and fit all possible models with one extra predictor added to the one(s) you
already have
STEP 4: Consider the predictor with the lowest P-value. If P < SL, go to STEP 3, otherwise go to FIN
FIN: Keep the previous model
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Bidirectional Elimination
STEP 1: Select a significance level to enter and to stay in the model
e.g.: SLENTER = 0.05, SLSTAY = 0.05
STEP 2: Perform the next step of Forward Selection (new variables must have: P < SLENTER to enter)
STEP 3: Perform ALL steps of Backward Elimination (old variables must have P < SLSTAY to stay)
STEP 4: No new variables can enter and no old variables can exit
FIN: Your Model Is Ready
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
All Possible Models
STEP 1: Select a criterion of goodness of fit (e.g. Akaike criterion)
STEP 2: Construct All Possible Regression Models: 2N-1 total combinations
STEP 3: Select the one with the best criterion
FIN: Your Model Is Ready
Example:
10 columns means
1,023 models
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
5 methods of building models:
1. All-in
2. Backward Elimination
3. Forward Selection
4. Bidirectional Elimination
5. Score Comparison
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
In this section we learned:
1. How to create dummies for categorical IVs
2. How to avoid the dummy variable trap
3. Backward, Forward, Bidirectional, All Possible
4. We actually built a model. Step-By-Step!!
5. How to use adjusted R-squared in modelling
6. How to interpret coefficients of a MLR
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Simple
Linear
Regression
Multiple
Linear
Regression
Polynomial
Linear
Regression
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
y
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
y
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
y
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Polynomial
Linear
Regression
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Vladimir Vapnik
1992
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
x1
x2
ε
ε
Ordinary Least Squares
SUM (y – ŷ)2 -> min
ε-Insensitive Tube
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
x1
x2
ξ2
ξ3
ξ5 ε
ε
Ordinary Least Squares
SUM (y – ŷ)2 -> min
ε-Insensitive Tube
1
2
𝑤 !
+ 𝐶 &
"#$
%
𝜉" + 𝜉"
∗
→ 𝑚𝑖𝑛
Slack Variables ξi and ξi*
ξ1
*
ξ4
*
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Chapter 4 – Support Vector Regression
(from: Efficient Learning Machines:
Theories, Concepts, and Applications for
Engineers and System Designers)
By Mariette Awad & Rahul Khanna (2015)
Link:
https://core.ac.uk/download/pdf/81523322.pdf
Additional Reading:
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
ε
ε
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
• Kernel SVM Intuition
• Mapping to a higher dimension
• The Kernel Trick
• Types of Kernel Functions
• Non-linear Kernel SVR
X
Y
Image source: http://www.cs.toronto.edu/~duvenaud/cookbook/index.html
Section on Kernel SVM:
• SVM Intuition
Section on SVM:
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X1
X2
20 40
200
170
Split 1
Split 2
Split 3
Split 4
Y
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20
Split 1
X1
X2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X1< 20
Yes No
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20
170
Split 1
Split 2
X1
X2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Yes No
X1 < 20
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X1 < 20
Yes No
No
Yes
X2 < 170
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20
200
170
Split 1
Split 2
Split 3
X1
X2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Yes No
No
Yes
X1 < 20
X2 < 170
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Yes No
No
Yes
No
Yes
X1 < 20
X2 < 200 X2 < 170
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20 40
200
170
Split 1
Split 2
Split 3
Split 4
X1
X2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Yes No
No
Yes
No
Yes
X1 < 20
X2 < 200 X2 < 170
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X1 < 20
Yes No
X2 < 200
No
Yes
X2 < 170
No
Yes
X1 < 40
No
Yes
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20 40
200
170
Split 1
Split 2
Split 3
Split 4
X1
65.7
300.5 -64.1
1023
0.7
Y
X2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X1 < 20
Yes No
X2 < 200
No
Yes
X2 < 170
No
Yes
X1 < 40
No
Yes
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
No
Yes
No
Yes
1023
300.5 65.7
No
Yes
-64.1 0.7
X1 < 20
Yes No
X2 < 200 X2 < 170
X1 < 40
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 1: Pick at random K data points from the Training set.
STEP 2: Build the Decision Tree associated to these K data points.
STEP 3: Choose the number Ntree of trees you want to build and repeat STEPS 1 & 2
STEP 4: For a new data point, make each one of your Ntree trees predict the value of Y to
for the data point in question, and assign the new data point the average across all of the
predicted Y values.
© SuperDataScience
R Squared
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
R Squared
𝑦 [tonnes]
(Potato yield)
𝑋! [kg]
(Nitrogen Fertilizer)
𝑦"
!
𝑦"
𝑦 [tonnes]
(Potato yield)
𝑋! [kg]
(Nitrogen Fertilizer)
𝑦"
𝑦#$%
𝑆𝑆393 = 𝑆𝑈𝑀(𝑦$ − 𝑦:;5)#
𝑅# = 1 −
𝑆𝑆<=>
𝑆𝑆393
𝑆𝑈𝑀(𝑦$ − !
𝑦$)#
𝑆𝑆<=> =
Regression: Average:
Rule of thumb (for our tutorials)*:
1.0 = Perfect fit (suspicious)
~0.9 = Very good
<0.7 = Not great
<0.4 = Terrible
<0 = Model makes no sense for this data
*This is highly dependent on the context
© SuperDataScience
Adjusted
R Squared
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Adjusted R Squared
𝑆𝑆"#" doesn’t change
𝑆𝑆$%& will decrease or stay the same
R2 – Goodness of fit
(greater is better)
Problem:
k – number of independent variables
n – sample size
𝑅# = 1 −
𝑆𝑆<=>
𝑆𝑆393
%
𝑦 = 𝑏' + 𝑏(𝑋( + 𝑏)𝑋) + 𝑏*𝑋*
𝐴𝑑𝑗 𝑅# = 1 − 1 − 𝑅# ×
𝑛 − 1
𝑛 − 𝑘 − 1
(This is because of Ordinary Least Squares: 𝑆𝑆$%&-> Min)
Solution:
𝑆𝑈𝑀(𝑦$ − !
𝑦$)#
𝑆𝑆<=> =
© SuperDataScience
Classification
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
What is Classification?
Classification: a Machine Learning technique to identify the
category of new observations based on training data.
Likely to stay Likely to leave Dogs Cats
© SuperDataScience
Logistic
Regression
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Logistic Regression
ln
𝑝
1 − 𝑝
= 𝑏! + 𝑏"𝑋"
~
Will purchase
health insurance:
Yes / No
Age
𝑦 [yes/no]
(Took up offer?)
𝑋! [yrs]
(Age)
YES
NO
18 60
≥ 50%
< 50%
81%
42%
NO
YES
35 45
Logistic regression: predict a categorical
dependent variable from a number of
independent variables.
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Logistic Regression
ln
'
()'
= 𝑏* + 𝑏(𝑋( + 𝑏+𝑋+ + 𝑏,𝑋, + 𝑏-𝑋-
~
Will purchase
health insurance:
Yes / No
Age Income Level of
Education
Family or
Single
© SuperDataScience
Maximum
Likelihood
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑦 [yes/no]
(Took up offer?)
𝑋! [yrs]
(Age)
YES
NO
18 60
Maximum Likelihood
𝑦 [yes/no]
(Took up offer?)
𝑋! [yrs]
(Age)
YES
NO
18 60
0.03 0.01
0.54
0.92
0.95 0.98
0.04
0.58
0.96
0.10
Likelihood = 0.00019939
Likelihood = 0.03 x 0.54 x 0.92 x 0.95 x 0.98 x (1 – 0.01) x (1 – 0.04) x (1 – 0.10) x (1 – 0.58) x (1 – 0.96)
1- 1-
1-
1-
1-
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Maximum Likelihood
Likelihood = 0.00007418
Likelihood = 0.00012845
Likelihood = 0.00016553
𝑦 [yes/no]
(Took up offer?)
𝑋! [yrs]
(Age)
YES
NO
18 60
Likelihood = 0.00019939
Maximum Likelihood
Best Curve <=
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
After K-NN
New data point assigned
to Category 1
K-NN
Category 1
Category 2
x1
x2
New data point
Before K-NN
Category 1
Category 2
x1
X2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 1: Choose the number K of neighbors
STEP 2: Take the K nearest neighbors of the new data point, according to the Euclidean distance
STEP 3: Among these K neighbors, count the number of data points in each category
STEP 4: Assign the new data point to the category where you counted the most neighbors
Your Model is Ready
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 1: Choose the number K of neighbors: K = 5
New data point
Category 1
Category 2
x1
x2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x
y
P1(x1,y1)
P2(x2,y2)
x2
x1
y2
y1
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
New data point
Category 1
Category 2
x1
x2
Category 1: 3 neighbors
Category 2: 2 neighbors
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
Maximum Margin
Support
Vectors
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
Maximum Margin
Hyperplane
(Maximum Margin Classifier)
Positive Hyperplane
Negative Hyperplane
Maximum Margin
Support
Vectors
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
Support
Vectors
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
x1
x2
Linearly Separable Not Linearly Separable
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
0
f = x - 5
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
0
f = x - 5
f = (x – 5)^2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
0
f = x - 5
f = (x – 5)^2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
x1
x2
z
Mapping Function
Hyperplane
New
Dimension
2D Space 3D Space
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
z
Projection
x1
x2
3D Space
Non Linear Separator
2D Space
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Mapping to a Higher Dimensional Space
can be highly compute-intensive
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝐾 ⃗
𝑥, ⃗
𝑙%
= 𝑒
&
⃗
(&⃗
)! "
#*"
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image source: http://www.cs.toronto.edu/~duvenaud/cookbook/index.html
𝐾 ⃗
𝑥, ⃗
𝑙!
= 𝑒
"
⃗
$"⃗
%! "
&'"
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
2D Space
x2
𝐾 ⃗
𝑥, ⃗
𝑙!
= 𝑒
"
⃗
$"⃗
%! "
&'"
Landmark
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
2D Space
x2
𝐾 ⃗
𝑥, ⃗
𝑙!
= 𝑒
"
⃗
$"⃗
%! "
&'"
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
2D Space
x2
𝐾 ⃗
𝑥, ⃗
𝑙!
= 𝑒
"
⃗
$"⃗
%! "
&'"
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
𝐾 ⃗
𝑥, ⃗
𝑙(
+ 𝐾 ⃗
𝑥, ⃗
𝑙&
Green when:
𝐾 ⃗
𝑥, ⃗
𝑙(
+ 𝐾 ⃗
𝑥, ⃗
𝑙&
> 0
Red when:
𝐾 ⃗
𝑥, ⃗
𝑙(
+ 𝐾 ⃗
𝑥, ⃗
𝑙&
= 0
(Simplified Formula)
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝐾 ⃗
𝑥, ⃗
𝑙!
= 𝑒
"
⃗
$"⃗
%! "
&'"
Gaussian RBF Kernel
Sigmoid Kernel
Polynomial Kernel
𝐾 𝑋, 𝑌 = tanh 𝛾 X 𝑋E𝑌 + 𝑟
𝐾 𝑋, 𝑌 = 𝛾 X 𝑋E𝑌 + 𝑟 F, 𝛾 > 0
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
• Kernel SVM Intuition
• Mapping to a higher dimension
• The Kernel Trick
• Types of Kernel Functions
• Non-linear Kernel SVR
X
Y
Image source: http://www.cs.toronto.edu/~duvenaud/cookbook/index.html
Section on Kernel SVM:
• SVM Intuition
Section on SVM:
• SVR Intuition
Section on SVR:
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Y
X
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Y
X
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Y
X
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Y
X
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Y
X
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Y
X
X
Y
X
Y
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X
Y
X
Y
𝐾 ⃗
𝑥, ⃗
𝑙$ = 𝑒
%
⃗
'%⃗
(" #
)*#
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X
Y
X
Y
𝐾 ⃗
𝑥, ⃗
𝑙$ = 𝑒
%
⃗
'%⃗
(" #
)*#
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X
Y
X
Y
𝐾 ⃗
𝑥, ⃗
𝑙$ = 𝑒
%
⃗
'%⃗
(" #
)*#
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X
Y
Y
X
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X
Y
𝐾 ⃗
𝑥, ⃗
𝑙$ = 𝑒
%
⃗
'%⃗
(" #
)*#
Y
X
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1
m2 m2 m2 m2 m2 m2 m2 m2 m2 m2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
m2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
m2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃 𝐴 𝐵 =
)
𝑃 𝐵 𝐴 ∗ 𝑃(𝐴
)
𝑃(𝐵
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃 𝐴 𝐵 =
)
𝑃 𝐵 𝐴 ∗ 𝑃(𝐴
)
𝑃(𝐵
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X1
X2
Age
Salary
Category 1
Category 2
Walks
Drives
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Salary
New data point
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃 𝐴 𝐵 =
)
𝑃 𝐵 𝐴 ∗ 𝑃(𝐴
)
𝑃(𝐵
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
)
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
)
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 𝑣. 𝑠. 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Salary
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃(𝑊𝑎𝑙𝑘𝑠) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑊𝑎𝑙𝑘𝑒𝑟𝑠
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑃 𝑊𝑎𝑙𝑘𝑠 =
10
30
Salary
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
)
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃(𝑋) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑃 𝑋 =
4
30
Salary
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
)
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃(𝑋|𝑊𝑎𝑙𝑘𝑠) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟
𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝐴𝑚𝑜𝑛𝑔 𝑡ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑊𝑎𝑙𝑘
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑊𝑎𝑙𝑘𝑒𝑟𝑠
𝑃 𝑋|𝑊𝑎𝑙𝑘𝑠 =
3
10
Salary
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
)
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
#1
#2
#3
#4
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
3
10 ∗
10
30
4
30
= 0.75
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
)
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
#1
#2
#3
#4
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
1
20 ∗
20
30
4
30
= 0.25
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 𝑣. 𝑠. 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0.75 𝑣. 𝑠. 0.25
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0.75 > 0.25
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 > 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Salary
New data point
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Salary
New data point
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
)
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Salary
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃(𝐷𝑟𝑖𝑣𝑒𝑠) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐷𝑟𝑖𝑣𝑒𝑟𝑠
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 =
20
30
Salary
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
)
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃(𝑋) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑃 𝑋 =
4
30
Salary
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
)
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃(𝑋|𝐷𝑟𝑖𝑣𝑒𝑠) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟
𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝐴𝑚𝑜𝑛𝑔 𝑡ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑊𝑎𝑙𝑘
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑊𝑎𝑙𝑘𝑒𝑟𝑠
𝑃 𝑋|𝐷𝑟𝑖𝑣𝑒𝑠 =
1
20
Salary
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
)
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
#1
#2
#3
#4
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
1
20 ∗
20
30
4
30
= 0.25
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Salary
New data point
Walks
Drives
Age
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃(𝑋) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑃 𝑋 =
4
30
Salary
Walks
Drives
Age
NOTE: Same both times
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
)
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Posterior Probability
Likelihood Prior Probability
Marginal Likelihood
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
)
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠
)
𝑃(𝑋
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑣. 𝑠.
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
)
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠
)
𝑃(𝑋
)
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠
)
𝑃(𝑋
𝑣. 𝑠.
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 𝑣. 𝑠. 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0.75 𝑣. 𝑠. 0.25
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0.75 > 0.25
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 > 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
60 Split 1
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2 Split 2
60
50
Split 1
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
Split 1
Split 2
60
50
Split 3
70
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Split 4
x1
x2
Split 1
Split 2
60
50
Split 3
70
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
Split 1
60
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X2 < 60
Yes No
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
Split 1
Split 2
60
50
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X2 < 60
Yes No
No
Yes
X1 < 50
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
Split 1
Split 2
60
50
Split 3
70
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X2 < 60
Yes No
X1 < 70
No
Yes
X1 < 50
No
Yes
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x1
x2
Split 1
Split 2
60
50
Split 3
70
Split 4
20
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X2 < 60
Yes No
X1 < 70
No
Yes
X1 < 50
No
Yes
X2 < 20
No
Yes
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 1: Pick at random K data points from the Training set.
STEP 2: Build the Decision Tree associated to these K data points.
STEP 3: Choose the number Ntree of trees you want to build and repeat STEPS 1 & 2
STEP 4: For a new data point, make each one of your Ntree trees predict the category to
which the data points belongs, and assign the new data point to the category that wins
the majority vote.
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X
p̂ (Probability)
y (Actual DV)
0.5
ŷ = 0 ŷ = 0
ŷ = 1 ŷ = 1
1
20 30 40 50
ŷ (Predicted DV)
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X
p̂ (Probability)
y (Actual DV)
0.5
1
#1
#2
#3
#4
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
X
p̂ (Probability)
y (Actual DV)
0.5
1
ŷ (Predicted DV)
False Posi.ve
(Type I Error)
False Nega.ve
(Type II Error) Fin.
#1
#2
#3
#4
© SuperDataScience
Confusion
Matrix &
Accuracy
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
FALSE
POS
Confusion Matrix & Accuracy
Prediction
Actual
NEG
POS
NEG POS
TRUE
NEG
FALSE
NEG
TRUE
POS
Image source: nature.com
Type I Error
(False Positives)
Type II Error
(False Negatives)
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Confusion Matrix & Accuracy
Prediction
Actual
NEG
POS
NEG POS
43 12
4 41
Type II Error
(False Negatives)
Type I Error
(False Positives)
Accuracy Rate and Error Rate:
𝐴𝑅 =
𝐶𝑜𝑟𝑟𝑒𝑐𝑡
𝑇𝑜𝑡𝑎𝑙
=
𝑇𝑁 + 𝑇𝑃
𝑇𝑜𝑡𝑎𝑙
=
84
100
= 84%
𝐸𝑅 =
𝐼𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡
𝑇𝑜𝑡𝑎𝑙
=
𝐹𝑃 + 𝐹𝑁
𝑇𝑜𝑡𝑎𝑙
=
16
100
= 16%
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Additional Reading
Understanding the Confusion Matrix from Scikit
learn
Samarth Agrawal (2021)
Link:
https://towardsdatascience.com/understanding-the-
confusion-matrix-from-scikit-learn-c51d88929c79
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 1
0 9,700 150
1 50 100
y
(Actual
DV)
ŷ (Predicted DV) Scenario 1:
Accuracy Rate = Correct / Total
AR = 9,800/10,000 = 98%
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 1
0 9,850 0
1 150 0
y
(Actual
DV)
ŷ (Predicted DV) Scenario 1:
Accuracy Rate = Correct / Total
AR = 9,800/10,000 = 98%
Scenario 2:
Accuracy Rate = Correct / Total
AR = 9,850/10,000 = 98.5%
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20,000 40,000 60,000 80,000 100,000
0
0
2,000
4,000
6,000
8,000
10,000
Total Contacted
Purchased
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20% 40% 60% 80% 100%
0
0
20%
40%
60%
80%
100%
Total Contacted
Purchased Crystal Ball Good Model
Random
Poor Model
10%
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Note:
CAP = Cumulative Accuracy Profile
ROC = Receiver Operating Characteristic
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20% 40% 60% 80% 100%
0
0
20%
40%
60%
80%
100%
Total Contacted
Purchased Model
Random
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20% 40% 60% 80% 100%
0
0
20%
40%
60%
80%
100%
Total Contacted
Purchased Perfect Model Good Model
Random Model
aP
aR
AR =
aR
aP
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20% 40% 60% 80% 100%
0
0
20%
40%
60%
80%
100%
Total Contacted
Purchased Perfect Model Good Model
Random Model
50%
X%
90% < X < 100%
80% < X < 90%
70% < X < 80%
60% < X < 70%
X < 60%
Too Good
Very Good
Good
Poor
Rubbish
© SuperDataScience
Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
What is Clustering?
Clustering – grouping
unlabelled data
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
What is Clustering?
Supervised Learning
(e.g. Regression, Classification)
Unsupervised Learning
(e.g. Clustering)
Image source: mdpi.com/2073-8994/10/12/734
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
What is Clustering?
Clustering
Annual Income $
Spending Score
Annual Income $
Spending Score
© SuperDataScience
K-Means
Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means Clustering
© SuperDataScience
The Elbow
Method
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
The Elbow Method
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
The Elbow Method
...
Within Cluster Sum of Squares:
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
The Elbow Method
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Cluster 1
C1
The Elbow Method
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Cluster 1
Cluster 2
C2
C1
The Elbow Method
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Cluster 1
Cluster 2
Cluster 3
C1
C3
C2
The Elbow Method
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
The Elbow Method
Optimal number of clusters
The Elbow Method
© SuperDataScience
K-Means++
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means++
Cluster 1
Cluster 2
Cluster 3
Cluster 1
Cluster 2
Cluster 3
K-Means
K-Means
Different results
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means++
K-Means++ Initialization Algorithm:
Step 1: Choose first centroid at random among data points
Step 2: For each of the remaining data points compute the distance (D)
to the nearest out of already selected centroids
Step 3: Choose next centroid among remaining data points using
weighted random selection – weighted by D2
Step 4: Repeat Steps 2 and 3 until all k centroids have been selected
Step 5: Proceed with standard k-means clustering
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means++
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means++
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means++
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means++
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means++
© SuperDataScience
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
K-Means++
Cluster 1
Cluster 2
Cluster 3
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Before HC After HC
HC
Same as K-Means but different process
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 1: Make each data point a single-point cluster That forms N clusters
STEP 2: Take the two closest data points and make them one cluster That forms N-1
clusters
STEP 3: Take the two closest clusters and make them one cluster That forms N - 2
clusters
STEP 4: Repeat STEP 3 until there is only one cluster
FIN
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
x
y
P1(x1,y1)
P2(x2,y2)
x2
x1
y2
y1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Distance Between Two Clusters:
• Option 1: Closest Points
• Option 2: Furthest Points
• Option 3: Average Distance
• Option 4: Distance Between Centroids
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Consider the following dataset of N = 6 data points
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 1: Make each data point a single-point cluster That forms 6 clusters
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 1: Make each data point a single-point cluster That forms 6 clusters
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 2: Take the two closest data points and make them one cluster
That forms 5 clusters
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 3: Take the two closest clusters and make them one cluster
That forms 4 clusters
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 4: Repeat STEP 3 until there is only one cluster
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 4: Repeat STEP 3 until there is only one cluster
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 4: Repeat STEP 3 until there is only one cluster
FIN
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
z
P3
P2
P1
P4
P5
P6
P1 P2 P3 P4 P5 P6
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
z
P3
P2
P1
P4
P5
P6
P1 P2 P3 P4 P5 P6
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
z
P3
P2
P1
P4
P5
P6
P1 P2 P3 P4 P5 P6
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
z
P3
P2
P1
P4
P5
P6
P1 P2 P3 P4 P5 P6
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
z
P3
P2
P1
P4
P5
P6
P1 P2 P3 P4 P5 P6
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
P3
P2
P1
P4
P5
P6
P1 P2 P3 P4 P5 P6
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
P3
P2
P1
P4
P5
P6
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
P3
P2
P1
P4
P5
P6
2 clusters
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
P3
P2
P1
P4
P5
P6
4 clusters
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
P3
P2
P1
P4
P5
P6
6 clusters
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
P3
P2
P1
P4
P5
P6
2 clusters
Largest
distance
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
P1
P2
P3
P4
P5
P6
P7
P8
P9
3 clusters
Largest
distance
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
People who bought also bought …
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
User ID Movies liked
46578 Movie1, Movie2, Movie3, Movie4
98989 Movie1, Movie2
71527 Movie1, Movie2, Movie4
78981 Movie1, Movie2
89192 Movie2, Movie4
61557 Movie1, Movie3
Potential Rules:
Movie1 Movie2
Movie2
Movie1
Movie4
Movie3
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Transaction ID Products purchased
46578 Burgers, French Fries, Vegetables
98989 Burgers, French Fries, Ketchup
71527 Vegetables, Fruits
78981 Pasta, Fruits, Butter, Vegetables
89192 Burgers, Pasta, French Fries
61557 Fruits, Orange Juice, Vegetables
87923 Burgers, French Fries, Ketchup, Mayo
Potential Rules:
Burgers French Fries
Vegetables
Burgers, French Fries
Fruits
Ketchup
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Market Basket Optimisation:
Movie Recommendation:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Support = 10 / 100 = 10%
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Market Basket Optimisation:
Movie Recommendation:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Confidence = 7 / 40 = 17.5%
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Market Basket Optimisation:
Movie Recommendation:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Lift = 17.5% / 10% = 1.75
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Step 1: Set a minimum support and confidence
Step 2: Take all the subsets in transactions having higher support than minimum support
Step 3: Take all the rules of these subsets having higher confidence than minimum confidence
Step 4: Sort the rules by decreasing lift
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
People who bought also bought …
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
User ID Movies liked
46578 Movie1, Movie2, Movie3, Movie4
98989 Movie1, Movie2
71527 Movie1, Movie2, Movie4
78981 Movie1, Movie2
89192 Movie2, Movie4
61557 Movie1, Movie3
Potential Rules:
Movie1 Movie2
Movie2
Movie1
Movie4
Movie3
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Transaction ID Products purchased
46578 Burgers, French Fries, Vegetables
98989 Burgers, French Fries, Ketchup
71527 Vegetables, Fruits
78981 Pasta, Fruits, Butter, Vegetables
89192 Burgers, Pasta, French Fries
61557 Fruits, Orange Juice, Vegetables
87923 Burgers, French Fries, Ketchup, Mayo
Potential Rules:
Burgers French Fries
Vegetables
Burgers, French Fries
Fruits
Ketchup
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Market Basket Optimisation:
Movie Recommendation:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Step 1: Set a minimum support
Step 2: Take all the subsets in transactions having higher support than minimum support
Step 3: Sort these subsets by decreasing support
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
Examples used for educational purposes. No affiliation with Coca-
Cola
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
D1 D2 D3 D4 D5
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Where we think the μ* values will be
Return
I.e. We are NOT trying to guess the distributions behind the machines
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
We’ve generated our own bandit configuration
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
We’ve generated our own bandit configuration
New Round
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
We’ve generated our own bandit configuration
New Round
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
• Deterministic • Probabilistic
• Requires update at every round • Can accommodate delayed feedback
• Better empirical evidence
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Here’s what we will learn:
• Types of Natural Language Processing
• Classical vs Deep Learning Models
• End-to-end Deep Learning Models
• Bag-Of-Words
• Note: Seq2Seq and Chatbots are outside the
scope of this course
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Natural
Language
Processing
Deep
Learning
Seq2Seq
DNLP
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Natural
Language
Processing
Deep
Learning
Seq2Seq
DNLP
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Seq2Seq
Some examples:
1. If / Else Rules (Chatbot)
2. Audio frequency components analysis (Speech
Recognition)
3. Bag-of-words model (Classification)
4. CNN for text Recognition (Classification)
5. Seq2Seq (many applications)
Comment Pass/Fail
Great job! 1
Amazing work. 1
Well done. 1
Very well written. 1
Poor effort. 0
Could have done better. 0
Try harder next time. 0
… …
Image Source: www.wildml.com
Hello
h0
Kirill
h1
,
h2
Checking
h3
EOS
hn
…
Yes
g0
I’m
g1
back
g2
EOS
Yes I’m back
Encoder Decoder
DL
NLP
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
DL
NLP
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
DL
NLP
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
DL
NLP
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
DL
NLP
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
End-to-end
Deep Learning
Models
DL
NLP
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... , 0]
20,000 elements long
if badminton table
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... , 0]
20,000 elements long
SOS
EOS
Special
Words
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... , 0]
20,000 elements long
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
20,000 elements long
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
20,000 elements long
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
20,000 elements long
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V
Hey mate, have you read about Hinton’s capsule networks?
Training Data:
Did you like that recipe I sent you last week?
Hi Kirill, are you coming to dinner tonight?
Dear Kirill, would you like to service your car with us again?
Are you coming to Australia in December?
…
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
20,000 elements long
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V
Training Data:
…
[1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, ... , 2]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, ... , 1]
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
20,000 elements long
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V
Training Data:
…
[1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, ... , 2]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, ... , 1]
Image Source: www.helloacm.com
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
DL
NLP
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
2x 25,600x
1956 1980 2017
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Log-scale
Source: mkomo.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Source: nature.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Source: Time Magazine
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Geoffrey Hinton
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: www.austincc.edu
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
8
4
5
6
7
Input value 1
Input value 2
Input value 3
Output value
Input
Layer
Hidden
Layer
Output
Layer
2
1
3
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
2
1
3
8
4
5
6
7
Input Layer Output Layer
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
8
4
5
6
7
2
1
3
Input Layer Output Layer
Hidden Layers
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Used for Regression & Classification
Artificial Neural Networks
Used for Computer Vision
Convolutional Neural Networks
Used for Time Series Analysis
Recurrent Neural Networks
Used for Feature Detection
Self-Organizing Maps
Used for Recommendation Systems
Deep Boltzmann Machines
Used for Recommendation Systems
AutoEncoders
Supervised
Unsupervised
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
What we will learn in this section:
• The Neuron
• The Activation Function
• How do Neural Networks work? (example)
• How do Neural Networks learn?
• Gradient Descent
• Stochastic Gradient Descent
• Backpropagation
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: www.austincc.edu
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: Wikipedia
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: Wikipedia
Neuron
Axon
Dendrites
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: Wikipedia
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
neuron
Node
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input signal 1
Input signal 2
Input signal m
neuron
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input signal 1
Input signal 2
Input signal m
Output signal
neuron
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
neuron
Input value 1
Input value 2
Input value m
X1
X2
Xm
Output signal
Synapse
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
neuron
Input value 1
Input value 2
Input value m
X1
X2
Xm
y Output value
Synapse
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
neuron
X1
X2
Xm
Output value
Independent
variable 1
Independent
variable 2
Independent
variable m
Input value 1
Input value 2
Input value m
y
Standardize
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Efficient BackProp
By Yann LeCun et al. (1998)
Link:
http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
Additional Reading:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
neuron
Input value 1
Input value 2
Input value m
Output value
X1
X2
Xm
y Output value
Can be:
• Continuous (price)
• Binary (will exit yes/no)
• Categorical
Independent
variable 1
Independent
variable 2
Independent
variable m
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
neuron
Input value 1
Input value 2
Input value m
Output value 1
Output value 2
Output value p
X1
X2
Xm
y
y2
y1
y3
Independent
variable 1
Independent
variable 2
Independent
variable m
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
neuron
Single Observation
Same observation
X1
X2
Xm
y
Input value 1
Input value 2
Input value m
Output value
Independent
variable 1
Independent
variable 2
Independent
variable m
Single Observation
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
neuron
Input value 1
Input value 2
Input value m
Output value
X1
X2
Xm
y
w1
w2
wm
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
?
neuron
Input value 1
Input value 2
Input value m
y
X1
X2
Xm
w1
w2
wm
Output value
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
y
1st step:
X1
X2
Xm
w1
w2
wm
Output value
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
3rd step
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
3rd step
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
y
0
1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0
y
Threshold Function
1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
y
1
0
Sigmoid
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0
y
Rectifier
1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0
y
1
-1
Hyperbolic Tangent (tanh)
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Deep sparse rectifier
neural networks
By Xavier Glorot et al. (2011)
Link:
http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf
Additional Reading:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
y
2nd step:
X1
X2
Xm
w1
w2
wm
Output value
3rd step
If threshold activation function:
If sigmoid activation function:
Assuming the DV is binary (y = 0 or
1)
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
y
4
5
6
7
Input value 1
Input value 2
Input value m
Output value
Input
Layer
Hidden
Layer
Output
Layer
X2
X1
Xm
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
.
Output Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
w1
w2
w3
w4
Price = w1*x1+ w2*x2+ w3*x3+ w4*x4
Input Layer
X4
X3
X2
X1
y
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
X4
X3
X2
X1
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
.
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
X4
X3
X2
X1
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
.
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
X4
X3
X2
X1
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
X4
X3
X2
X1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
X4
X3
X2
X1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
y
Hidden Layer Output Layer
X4
X3
X2
X1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
X4
X3
X2
X1
y
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
X4
X3
X2
X1
y
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
X4
X3
X2
X1
y Price
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
y
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
y Actual value
ŷ
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
ŷ
X1
X2
Xm
w1
w2
wm
Output value
y Actual value
C = ½(ŷ- y)2
ŷ y C
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
Output value
Actual value
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y C
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input value 1
Input value 2
Input value m
X1
X2
Xm
w1
w2
wm
Output value
Actual value
ŷ
y
C = ½(ŷ- y)2
ŷ y C
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
A list of cost functions used in
neural networks, alongside
applications
CrossValidated (2015)
Link:
http://stats.stackexchange.com/questions/154879/a-list-of-cost-
functions-used-in-neural-networks-alongside-applications
Additional Reading:
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
y
Input value w1 Output value
y Actual value
ŷ
C = ½(ŷ- y)2
X1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
C = ½(ŷ- y)2
C
ŷ
Best!
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
X4
X3
X2
X1
y Price
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Layer
Area (feet2)
Bedrooms
Distance to city (Miles)
Age
Hidden Layer Output Layer
y Price
X4
X3
X2
X1
25 weights
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
1,000 x 1,000 x … x 1,000 = 1,00025 = 1075 combinations
Sunway TaihuLight: World’s fastest Super Computer
93 PFLOPS
93 x 1015
1075 / (93 x 1015)
= 1.08 x 1058 seconds
= 3.42 x 1050 years
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: neuralnetworksanddeeplearning.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
C = ½(ŷ- y)2
C
ŷ
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
C = ½(ŷ- y)2
C
ŷ
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
C
ŷ
Best!
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
C = ∑ ½(ŷ- y)2
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
X1
X2
Xm
w1
w2
wm
ŷ
y
ŷ y
C
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
Adjust w1, w2, w3
Adjust w1, w2, w3
Adjust w1, w2, w3
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
A Neural Network in 13 lines
of Python (Part 2 - Gradient
Descent)
Andrew Trask (2015)
Link:
https://iamtrask.github.io/2015/07/27/python-network-part2/
Additional Reading:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Neural Networks and Deep
Learning
Michael Nielsen (2015)
Link:
http://neuralnetworksanddeeplearning.com/chap2.html
Additional Reading:
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: neuralnetworksanddeeplearning.com
Forward Propagation
Backpropagation
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Neural Networks and Deep
Learning
Michael Nielsen (2015)
Link:
http://neuralnetworksanddeeplearning.com/chap2.html
Additional Reading:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 1: Randomly initialise the weights to small numbers close to 0 (but not 0).
STEP 2: Input the first observation of your dataset in the input layer, each feature in one input node.
STEP 3: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each
neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result y.
STEP 4: Compare the predicted result to the actual result. Measure the generated error.
STEP 5: Back-Propagation: from right to left, the error is back-propagated. Update the weights according to
how much they are responsible for the error. The learning rate decides by how much we update the
weights.
STEP 6: Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning). Or:
Repeat Steps 1 to 5 but update the weights only after a batch of observations (Batch Learning).
STEP 7: When the whole training set passed through the ANN, that makes an epoch. Redo more epochs.
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
What we will learn in this section:
• What are Convolutional Neural Networks?
• Step 1 - Convolution Operation
• Step 1(b) - ReLU Layer
• Step 2 - Pooling
• Step 3 - Flattening
• Step 4 - Full Connection
• Summary
• EXTRA: Softmax & Cross-Entropy
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: a talk by Geoffrey Hinton
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Source: google trends
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Yann Lecun
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Google Facebook
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Image CNN
Output
Label
(Image
class)
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
CNN Happy
CNN Sad
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Pixel 1 Pixel 2
Pixel 3 Pixel 4
2d array
B / W Image 2x2px
Pixel 1 Pixel 2
Pixel 3 Pixel 4
Colored
Image
Colored
Image
Pixel 1 Pixel 2
Pixel 3 Pixel 4
Red channel Green
channel
Blue channel
3d array
Colored Image 2x2px
Pixel 1 Pixel 2
Pixel 3 Pixel 4
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
STEP 1: Convolution
STEP 2: Max Pooling
STEP 3: Flattening
STEP 4: Full Connection
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Gradient-Based Learning
Applied to Document
Recognition
By Yann LeCun et al. (1998)
Link:
http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
Additional Reading:
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Introduction to
Convolutional Neural
Networks
By Jianxin Wu (2017)
Link:
http://cs.nju.edu.cn/wujx/paper/CNN.pdf
Additional Reading:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
Feature
Detector
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0
Feature Map
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
0 0 1
1 0 0
0 1 1
Input
Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Feature
Detector
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
We create many
feature maps to
obtain our first
convolution layer
Convolutional
Layer
Feature
Maps
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Sharpen:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Blur:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Edge Enhance:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Edge Detect:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Emboss:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: eonardoaraujosantos.gitbooks.io
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
We create many
feature maps to
obtain our first
convolution layer
Convolutional
Layer
Feature
Maps
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Image
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Convolutional Layer
Feature Maps
0
y
Rectifier
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Understanding
Convolutional Neural
Networks with A
Mathematical Model
By C.-C. Jay Kuo (2016)
Link:
https://arxiv.org/pdf/1609.04112.pdf
Additional Reading:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Delving Deep into Rectifiers:
Surpassing Human-Level
Performance on ImageNet
Classification
By Kaiming He et al. (2015)
Link:
https://arxiv.org/pdf/1502.01852.pdf
Additional Reading:
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: Wikipedia
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: Wikipedia
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Max Pooling
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Max Pooling
1
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
Max Pooling
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
1 1
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
Max Pooling
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
1 1 0
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Max Pooling
1 1 0
4
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Max Pooling
1 1 0
4 2
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Max Pooling
1 1 0
4 2 1
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Max Pooling
1 1 0
4 2 1
0
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Max Pooling
1 1 0
4 2 1
0 2
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Feature Map
0 1 0 0 0
0 1 1 1 0
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Max Pooling
1 1 0
4 2 1
0 2 1
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Evaluation of Pooling
Operations in Convolutional
Architectures for Object
Recognition
By Dominik Scherer et al. (2010)
Link:
http://ais.uni-bonn.de/papers/icann2010_maxpool.pdf
Additional Reading:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Image
Convolutional
Layer
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Pooling Layer
Convolution Pooling
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: scs.ryerson.ca/~aharley/vis/conv/flat.html
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
1 1 0
4 2 1
0 2 1
Pooled Feature
Map
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening
1 1 0
4 2 1
0 2 1
Pooled Feature
Map
1
1
0
4
2
1
0
2
1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Pooling Layer
Flattening
Input layer of a future ANN
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Input Image
Convolutional
Layer
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Pooling Layer
Convolution Pooling Flattening
Input
layer of
a future
ANN
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening Output
value
X1
X2
Xm
Input Layer Fully Connected Layer Output Layer
y
y
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening
Dog
Cat
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening
Dog
Cat
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening
0.2
0.1
0.2
1
0.1
0.1
0.9
1
Dog
Cat
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening
0.2
0.1
0.2
0.1
0.1
Dog
Cat
0.9
1
1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening
Dog
Cat
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening
0.9
0.2
0.9
0.2
1
Dog
Cat
0.1
0.2
0.1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening 0.2
0.2
Dog
Cat
0.1
0.2
0.1
0.9
0.9
1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening
Dog
Cat
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening 0.4
0.1
Dog
Cat
1
1
0.8
0.1
0.8
0.2
0.95
0.05
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening 0.9
0.1
Dog
Cat
0.1
0.8
0.2
0.4
1
1
0.21
0.79
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Image Source: a talk by Geoffrey Hinton
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
The 9 Deep Learning Papers
You Need To Know About
(Understanding CNNs Part 3)
Adit Deshpande (2016)
Link:
https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-
Learning-Papers-You-Need-To-Know-About.html
Additional Reading:
© SuperDataScience
Machine Learning A-Z
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattening
Dog
Cat
0.95
0.05
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Flattenin
g
Dog
Cat
z1
z2
0.95
0.05
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
1
0
Dog
Cat
0.9
0.1
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
1
0
Dog
Cat
0.9
0.1
Dog
Cat
0.1
0.9
Dog
Cat
0.4
0.6
0
1
1
0
0.6
0.4
0.3
0.7
0.1
0.9
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
Row Dog
^
Cat^ Dog Cat
#1 0.9 0.1 1 0
#2 0.1 0.9 0 1
#3 0.4 0.6 1 0
Row Dog
^
Cat^ Dog Cat
#1 0.6 0.4 1 0
#2 0.3 0.7 0 1
#3 0.1 0.9 1 0
Classification Error
Mean Squared Error
Cross-Entropy
1/3 = 0.33 1/3 = 0.33
0.25 0.71
0.38 1.06
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
A Friendly Introduction to
Cross-Entropy Loss
By Rob DiPietro (2016)
Link:
https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/
Additional Reading:
NOT
FOR
DISTRIBUTION
©
SUPERDATASCIENCE
www.superdatascience.com
How to implement a neural
network Intermezzo 2
By Peter Roelants (2016)
Link:
http://peterroelants.github.io/posts/neural_network_implementation
_intermezzo02/
Additional Reading:

Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf