Machine Learning
Introduction
Applied Areas
• Image Recognition, Voice Recognition
• Text Translation
• Ad Tech
• Medical
• Biology
• Industry
Linear Regression
Prediction of Housing Price
area (m2)
price (USD)
How much is it when
the area is 50 m2 ?
50
?
Collect Data with Label
area (m2)
price (USD) Label: Output what we want to know.
Feature(s): Input(s) to predict output.
Cost
area (m2)
price (USD)
Cost
Model: y = θ0 + θ1 * x
Cost Function: J(θ1, θ2)
Find θ1 and θ2 which minimizes J(θ1, θ2).
How to get a model
Predict
area (m2)
price (USD) Model: y = θ0 + θ1 * x
How much is it when the
area is 50 m2 ?
50
100000
Neural Network
Categorical Classification
Java 7 years 0 years 1 years 3 years
Ruby 0 years 0 years 2 years 2 years
ObjC 2 years 5 years 0 years 0 years
PHP 0 years 2 years 5 years 1 years
Android
Developer
iOS
Developer
Web
Developer
Label
Features
?
Experience
Binary Classification
Male(1) Male(1) Female(0)
Java 7 years 0 years 1 years 3 years
Ruby 0 years 0 years 2 years 2 years
ObjC 2 years 5 years 0 years 0 years
PHP 0 years 2 years 5 years 1 years
?
Neural Network
7
0
2
0
1
0
0
Java
Ruby
ObjC
PHP
Android
iOS
Web
?
Neural Network
0
0
5
2
0
1
0
Java
Ruby
ObjC
PHP
Android
iOS
Web
?
Neural Network
1
2
0
5
0
0
1
Java
Ruby
ObjC
PHP
Android
iOS
Web
?
Neural Network
x1
x2
x3
x4
y1
y2
y3
Java
Ruby
ObjC
PHP
Android
iOS
Web
θ11 θ12 θ13
θ21 θ22 θ23
θ31 θ32 θ33
θ41 θ42 θ43
y1 = θ11 * x1 + θ21 * x2 + θ31 * x3 + θ41 * x4
y2 = θ12 * x1 + θ22 * x2 + θ32 * x3 + θ42 * x4
y3 = θ13 * x1 + θ23 * x2 + θ33 * x3 + θ43 * x4
I omitted bias term.
Neural Network
x1
x2
x3
x4
y1
y2
y3
Java
Ruby
ObjC
PHP
Android
iOS
Web
I omitted bias term.
θ11
θ12
θ13
Neural Network
y1 = θ11 * x1 + θ21 * x2 + θ31 * x3 + θ41 * x4
y2 = θ12 * x1 + θ22 * x2 + θ32 * x3 + θ42 * x4
y3 = θ13 * x1 + θ23 * x2 + θ33 * x3 + θ43 * x4
I omitted bias term.
y1 = θ11 * x1 + θ21 * x2 + θ31 * x3 + θ41 * x4
y2 = θ12 * x1 + θ22 * x2 + θ32 * x3 + θ42 * x4
y3 = θ13 * x1 + θ23 * x2 + θ33 * x3 + θ43 * x4
y1 = θ11 * x1 + θ21 * x2 + θ31 * x3 + θ41 * x4
y2 = θ12 * x1 + θ22 * x2 + θ32 * x3 + θ42 * x4
y3 = θ13 * x1 + θ23 * x2 + θ33 * x3 + θ43 * x4
m=1
m=2
m=3
Neural Network
XΘ = Y
θ11 θ12 θ13
θ21 θ22 θ23
θ31 θ32 θ33
θ41 θ42 θ43
x11 x12 x13 x14
x21 x22 x23 x24
x31 x32 x33 x34
y11 y12 y13
y21 y22 y23
y31 y32 y33
3 x 4 (m x n) 4 x 3 3 x 3
θ11 θ12 θ13
θ21 θ22 θ23
θ31 θ32 θ33
θ41 θ42 θ43
7 0 2 0
0 0 5 2
1 2 0 5
1 0 0
0 1 0
0 0 1
Calculate an example of Y
7 0 2 0
0 0 5 2
1 2 0 5
1 4 2
3 2 4
2 3 1
3 1 2
11 34 16
16 17 9
22 13 20
Example of Theta: Θex
0.18 0.55 0.26
0.38 0.40 0.21
0.40 0.23 0.36
Output: Yex
Cost
0.18 0.55 0.26
0.38 0.40 0.21
0.40 0.23 0.36
Yex
1 0 0
0 1 0
0 0 1
Y
Cost : J(Θex) = (1-0.18)2 + (0.55-0)2 + (0.26-0)2
+ (0.38-0)2 + (0.40-1)2 + (0.21-0)2
+ (0.40-0)2 + (0.23-1)2 + (0.36-0)2
How to get a model
Cost Function: J(Θ)
Find Θ which minimizes J(Θ).
More complicated model
x1
x2
x3
x4
a1
y2
y3
Java
Ruby
ObjC
PHP
Android
iOS
Web
I omitted bias term.
y1a2
a3
a4
a5
Hidden Layer
Model Evaluation
Model Evaluation
Low cost always does not
mean good model.
Overfitting
area (m2)
price (USD)
Though the cost is very low, is this a good model?
Test Data
area (m2)
price (USD)
Give your model Test Data which were not used for
training a model. Evaluate the cost against the Test Data.
Hyper Parameter Tuning
•Should I add more units?
•Should I add more hidden layers?
•etc.
X Y X Y
or
Cross Validation Data
When we are tuning hyper parameters,
we should not evaluate them by Test Data.
Otherwise, the hyper parameters will be optimized for
Test Data. Then, your model will lose generalization.
Split Training Data and use it as Cross Validation
Data, which evaluate hyper parameters.
The Test Data must be used only for final evaluation.
Entire Flow
Collect Data
Split Data
Train and Cross
Validate Model
Evaluate Model
Caution
This slide omits a lot of details, especially for
math, and not precise. In some parts, it
would be even not correct…
Demonstration

Machine Learning Introduction