Deep learning

Outline
• Neural Networks

• Regression and Classiﬁcation

• Deep Learning

• Convolution neural network
2

The concept of learning  
in a ML system
• Learning = Improving with experience at some task

• Improve over task T,

• With respect to performance measure, P

• Based on experience, E.
Deep learning
CNN, RNN, LSTM ...
Machine learning
NN, SVM, DT ...
A.I.
3

Case: 
Housing Price Prediction
4

Housing Price Prediction
Size
#the rooms
Zip code
View
family size
traﬃc
life quality
Predicted price
7

Basic neuron network
x1
Y'
x3
x4
x2
Input layer Hidden layer Output layer
8

Basic neuron network
x1
Y'
x3
x4
x2
Many Weighted Sum
9

http://www.asimovinstitute.org/neural-network-zoo/
10

Learning Strategy
• Supervised learning

• Unsupervised learning

• Reinforcement learning
11

Supervised learning
• These're training data set and already know what correct
output.

• The regression problem:  
Predicting results within a continuous output

• The classiﬁcation problem:  
Predicting results in a discrete output
12

Application
Input (X) Output (X) Application
House size Prices estate
AD types, User info. Click on AD Online Advertising
Image Object (1,…,1000) Photo tagging
Audio Text transcript Speech recognition
Model
standard
NN
CNN
RNN
English Chinese Machine translation
Image, Radar info Position of the cars Autonomous driving
Customized  
hybrid
13

Unsupervised learning
• The data have no target attribute.

• Analyze data, look for patterns and clustering
14

Reinforcement learning
• The agent take actions in an environment  
so as to maximize some notion of cumulative reward.
15

The workﬂow  
for Supervised learning
Feature  
Extraction
Train  
the model
Eval 
the model
Feature  
Extraction Predict
Model
Label
Label
Model
Data
New data
• Training phase
• predicting phase
16

How to train a model
• Training data set.

• The layers and neurons

• Hypothesis / Activation function

• Cost / Loss Function

• Optimization algorithm
17

How to choose parameters
*Choose so that is close to y for our training example (x, y)
20

Cost function
It's to quantify the gap between network outputs and actual values
mean squared error method
•
21

Find the best weights to
minimize the loss
800
- 0.12
28

minimize the loss
360
29

minimize the loss
100
0.12
30

Optimization algorithm
Gradient Descent: 
A iterative optimization algorithm for ﬁnding the minimum of a function

•
* one epoch = one pass of all the training examples
31

Learning rate
• ... , 0.001, 0.003, 0.01, 0.03, 0.1, 0.3. 1...
34

Local minimum
Local minimum
Global minimum
35

Local minimum
is local minimum
= 0
36

Momentum
Momentum
Movement
Movement = + Momentum
Negative of
Negative of
37

Mini-Batch optimization
• Mini-batch optimization has the following advantages.

• Reduce the memory usage.

• Avoid being trapped in the local minima with the random m
*Batch size = the number of training examples in one pass
Iterations = number of passes, each pass using [batch size] of examples
38

Back propagation (BP)
x1
predicted Y
x3
x4
x2
Y ; Label
update ...
39

Mean Normalization
• Make sure gradient descent is working properly
41

Make sure gradient descent
is working properly
•
•
42

Under/Overﬁtting
Overfitting - high varianceUnderfitting - high bias Sweet spot
Train error
Test error
Train error
Test error
43

Avoid Overﬁtting
• Reduce number of features

• Add more training data.

• Regularization

• Dropout
44

Regularization
• Keep all the features, but reduce the magnitude of parameters.
45

Dropout
• Instead of using all neurons, "dropout" some randomly 
(usually 0.5 probability)
46

Classiﬁcation
•
•
•
47

Logistic Regression
Want
Sigmoid Function (Logistic Function)
•
51

Logistic Regression 
Cost function
non- convex
52

Logistic Regression  
Cost function
53

Cost Function & Gradient Descent
• Cost function - Log loss (Cross-entropy) for sigmoid function
• Gradient Descent
54

DL Frameworks
https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software
55

Convolutional  
Neural Network (CNN)
61

CNN
*Fully connected neural network *Locally connected neural network
62

CNN
*Share the weight across hidden units
63

Visualization of Modulation
Ref: Visualizing Higher-Layer Features of a Deep Network
66

Alexnet
• A large, deep convolutional neural network (8 layers) to classify in the
training set into the 1000 diﬀerent classes.

• On the test data, It achieved top-1 and top-5 error rates of 39.7% and
18.9%
Convolutional layers Fully-connected
CONV Layers: 5  
Fully Connected Layers: 3  
Weights: 61M  
MACs: 724M
67

Alexnet
• Trained the network with 2 GPUs on ImageNet data, which contained
over 1.2 million annotated images from a total of over 1000 categories.

• Used ReLU for the nonlinearity functions (Found to decrease training
time as ReLUs are several times faster than the conventional tanh
function).

• Used data augmentation techniques that consisted of image
translations, horizontal reflections, and patch extractions.

• Implemented dropout layers in order to combat the problem of
overfitting to the training data.

• Trained the model using batch stochastic gradient descent, with specific
values for momentum and weight decay.
68

GPU & Big data
• Trained on two GTX 580 GPUs for ﬁve to six days.
69

Data augmentation
• It consisted of image translations, horizontal reﬂections,
and patch extractions.
70

Rectiﬁed Linear Unit (Relu)
71

Relu function
• The nonlinearity functions that be found to decrease
training time as ReLUs are several times faster than the
conventional tanh function
Relu
tanh
72

Polling
• Reduce resolution of each channel independently

• Increase translation-invariance and noise-resilience
73

Local response
normalization (LRN)
• Tries to mimic the inhibition scheme in the brain
74

Dropout
• Avoid overﬁtting in FC layer.
75

Revolution of Depth
http://icml.cc/2016/tutorials/icml2016_tutorial_deep_residual_networks_kaiminghe.pdf
76

Demo
• Tensorﬂow playground 
http://playground.tensorﬂow.org/

• ConvNetJS CIFAR-10 demo 
http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html

Resource
• Deep learningon on Coursera, Andrew Ng, Stanford University 
https://www.coursera.org/specializations/deep-learning

• Deep Learning on MOOC 
https://www.udacity.com/course/deep-learning--ud730

• Machine Learning Foundations, HT Lin, National Taiwan University 
https://www.coursera.org/learn/ntumlone-mathematicalfoundations/

• TensorFlow 
https://www.tensorﬂow.org/

• cnn-benchmarks 
https://github.com/jcjohnson/cnn-benchmarks

Deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep learning

Similar to Deep learning (20)

More from Rouyun Pan

More from Rouyun Pan (20)

Recently uploaded

Recently uploaded (20)

Deep learning