Seoul National University
Seoul National University System Health & Risk Management
Deep learning and
Tensorflow implementation
Myungyon Kim
2016. 11. 16.
Seoul National University
Contents
2016/11/16 - 2 -
• Feature Engineering
• Deep Neural Network
• Tensorflow
• Tensorflow Implementation
• Future Works
• References
Seoul National University
Feature Engineering
2016/11/16 - 3 -
• Data Representation
– Raw sensory data (ex. Vibration, acceleration, temperature)
: complex, high dimension, redundant, containing noises
 hard to discover useful information and insights
– Necessary to find some good, suitable way to represent our data
• Feature engineering
– Process of create and extract features which represent the systems well
– Fundamental to application of machine learning algorithms
– Quality and quantity of the features have great influence on the results
– Based on the physical, domain knowledge and intuition of engineer
Seoul National University
Feature Engineering
2016/11/16 - 4 -
• Rotor team case
• Image processing
Edge detection Corner detection HoG (Histogram of Gradients)
Statistical Frequency
Features
System Characteristic
Frequency Features
Time-domain Features
Frequency-domain Features
Kinetic energy
related
Data statistics
related
Waveform
related
RMS, Max,
Mean
Kurtosis,
Skewness
Crest Factor,
Impulse Factor
Gear Mesh Freq.,
Sideband Freq.,
Harmonic Freq.
Freq. Center,
RMS Freq.,
Component Ratio of 1x
https://en.wikipedia.org/wiki/Edge_detection https://en.wikipedia.org/wiki/Corner_detection http://www.mdpi.com/1424-8220/16/7/1134/htm
Seoul National University
Feature Engineering
2016/11/16 - 5 -
• Manual Feature engineering
– “Coming up with features is difficult, time-consuming, requires expert
knowledge. ‘Applied machine learning’ is basically feature engineering”
- Andrew Ng (Stanford university, Chief Scientist at Baidu Research)
– Feature engineering (select and extract features) is fundamental for machine
learning, but it is very difficult, tedious and expensive
– For some applications, we may have no idea ‘which features we should use’
• Problems of Current PHM Practices
– A considerable amount of human expertise and knowledge is required.
– Different systems and data require different feature engineering approaches
 Features relevant to diagnosis of one system may NOT be suitable to that
of another system.
Seoul National University
Feature Engineering
2016/11/16 - 6 -
• Automated Feature Learning
– Need to substitute manual feature engineering with automated feature
learning using deep learning
• Deep Learning (Deep Neural Network)
– Multiple processing layers, with several linear and nonlinear transformation
– Learn and extract hierarchical features automatically
Replace handcrafted, manually extracted features
– Mimic the information processing and communication patterns in a nervous
system of human brain (Inspired by advances in neuroscience)
– Various algorithms and architectures (DNN, CNN, DBN, RNN) *
– Applied to various field (computer vision, speech recognition, natural
language processing, bioinformatics)
* DNN: Deep Neural Network
CNN: Convolutional Neural Network
DBN: Deep Belief Network
RNN: Recurrent Neural Network
Seoul National University
Deep Neural Network
2016/11/16 - 7 -
• History of Neural Network
– Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957
– “Perceptrons” – Marvin Minsky, 1969
– Multilayer Neural Network (composition of perceptrons)
– Backpropagation – “Learning representations by back-propagating errors”,
nature, Geoffrey Hinton, 1986/ Paul Werbos 1974
– Several problems in multilayer, deep Neural Network
: hard to train, vanishing gradient, local minima, overfitting
– Pre-training (greedy pre-train using RBM) *
clever way to initialize weight values
– ReLU (Rectifier)
– Drop out
– CNN (Convolutional Neural Networks)
– Computing power/ GPU (Graphic Processing Unit)
– Large number of digital data
* RBM: Restricted Boltzmann Machine
1st winter
2nd winter
Seoul National University
Deep Neural Network
2016/11/16 - 8 -
• Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957
– Most simple case of feedforward neural network
– Linear, binary classifier
– Train neural network = obtain suitable, correct weights and bias values
– He thought that his perceptron is able to classify/recognize everything in the
future, with highly developed hardware
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
or
1
10
+ +
+-
and
1
10
- +
--
And perceptron
w1=1, w2=1,
Ө=1.5
Or perceptron
w1=1, w2=1,
Ө=0.5
Seoul National University
Deep Neural Network
2016/11/16 - 9 -
• Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957
Activation function
– Activated (input > threshold Ө) or not (input < Ө)
– Sigmoid function (activation function)
 to train neural network, activation function should be differentiable
– 𝑠𝑖𝑔 𝑋 =
1
1+𝑒−𝑋
– Nonlinear activation function
 “squashing” linear net input within specific region
& add nonlinear properties to NN
– allow networks to compute nontrivial problems using only a small number of
nodes
https://commons.wikimedia.org/wiki/File:Sigmoid-function-2.svg
Seoul National University
Deep Neural Network
2016/11/16 - 10 -
• “Perceptrons” – Marvin Minsky, 1969
– Single layer perceptron cannot solve the nonlinear classification problem
– Xor (exclusive or): logical operation that outputs true only when input differ
– To solve nonlinear problem, MLP(multilayer perceptrons, multilayer neural
network) is needed
– However, it is not easy (very hard) to train MLP properly
http://www.aistudy.com/neural/multilayer_perceptron.htm
XOR
1
10
- +
-+
 cannot solved by
using single perceptron
(linear classifier)
https://www.amazon.com/Perceptrons-Introduction-Computational-Geometry-Expanded/dp/0262631113
1st winter
Seoul National University
Deep Neural Network
2016/11/16 - 11 -
• “Perceptrons” – Marvin Minsky, 1969
Examples) Neural networks for logical operation
http://toritris.weebly.com/perceptron-2-logical-operations.html
AND OR
XOR
 Single layer perceptron is ok
 Multi-layer perceptron is needed
(*different weight and threshold can be used.
Ex) different values for “and”, “or” as I explained before)
XOR
1
10
- +
-+
Seoul National University
Deep Neural Network
2016/11/16 - 12 -
• Backpropagation
(“Learning representations by back-propagating errors”, nature, Geoffrey Hinton, 1986/
Paul Werbos 1974)
– “Backward propagation of errors”
– Common method to train NN with multiple layers
– error gradient with respect to each weights
 Using chain rule, quantify the influence of each weights to final error
– Using optimization method such as gradient descent algorithm
Error gradient
Calculation of partial derivative of cost
with respect to specific weight using chain rule
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
Seoul National University
Deep Neural Network
2016/11/16 - 13 -
• Backpropagation
– Procedure
1. Initialize the weights randomly
2. Forward propagation (through the neural network, to obtain output & cost)
3. Backward propagation (influence of each weights on errors)
4. Weight update (𝑊 ≔ 𝑊 − 𝛼
𝜕
𝜕𝑊
𝑐𝑜𝑠𝑡 𝑊 )
repeat those steps until the performance of the network is satisfied
• Cost function (Error)
𝐶𝑜𝑠𝑡 =
1
2𝑚
෍
𝑖=1
𝑚
𝑡𝑎𝑟𝑔𝑒𝑡 𝑖
− 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖 2
• Gradient descent algorithm
- find W, b to minimize the cost using delta rule
- used in many minimization problems
Convex function
 guarantee global minima
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
Seoul National University
Deep Neural Network
2016/11/16 - 14 -
• Several problems in multilayer, deep Neural Network
1. Vanishing gradient
– Error gradient: multiplication of gradients in backward direction
 for early layers, error gradients vanish
– Back-propagation fail to train earlier-layer parameters properly
– Early layers are responsible for detecting the simple patterns and the
building blocks (ex. Edge for facial recognition)
 when early layers are not trained properly, the result will be inaccurate
https://www.youtube.com/watch?v=E5a3nDpaXjwhttp://kawahara.ca/how-to-compute-the-derivative-of-a-sigmoid-function-fully-worked-example/
Max: 0.25
Seoul National University
Deep Neural Network
2016/11/16 - 15 -
• Several problems in multilayer, deep Neural Network
2. Local minima
– Using gradient descent algorithm to train the deep neural network, it is
possible to get stuck in local minima
– Cannot obtain the optimum solution
https://www.toptal.com/machine-learning/an-introduction-to-deep-learning-from-perceptrons-to-deep-networks
Seoul National University
Deep Neural Network
2016/11/16 - 16 -
• Several problems in multilayer, deep Neural Network
3. Overfitting
– complex model: too many parameters relative to the number of observation
– Learn or train not only true relation, but also noise and random errors
– overacts to minor fluctuations in the given training data
 Poor predictive performance
4. Hard to train correctly, takes long time
 use other machine learning algorithms, such as support vector machine
http://www.slideshare.net/fcollova/introduction-to-neural-network
Under-fitting Just right! overfitting
2nd winter
Seoul National University
Deep Neural Network
2016/11/16 - 17 -
• Pre-training using RBM – Geoffrey Hinton
*clever way to initialize weight values
– RBM: Restricted, special case of Boltzmann machine/ undirected, generative
energy-based model with a visible input layer and a hidden layer/
connections between the layers but not within layers
– By greedy, layer-wise training* of RBM and Stacking them (*Bengio, 2007)
 initialize the weights of DNN well (pre-training)
 faster convergence of the fine-tuning and improved performance
– 1. pre-training: Learn generally useful feature detector (RBM, AE)
2. fine-tuning: whole network is trained further by supervised BP
http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DeepBeliefNetworks
RBM
http://deeplearning.net/tutorial/rbm.html
Layer-wise training of a DBN * (DBN: Deep Belief Network)
Seoul National University
Deep Neural Network
2016/11/16 - 18 -
• ReLU (Rectified Linear Unig, Rectifier)
– Activation function defined as
𝑓 𝑥 = 𝑚𝑎𝑥(0, 𝑥)
– Powerful activation function which substitute the sigmoid function
– Sigmoid: derivative is smaller than 0.25  vanishing gradient problem
ReLU: 0 or 1  error transferred 100%: no vanishing gradient!
– Sparse activation  fast and effective training of DNN with large datasets
 No need to use unsupervised pre-training (RBM, AE)
http://nn.readthedocs.io/en/latest/transfer/
ReLU and its derivative
Seoul National University
Deep Neural Network
2016/11/16 - 19 -
• Dropout
– Regularization technique : Prevent co-adaptation on training data
– At each training step, individual nodes are either “dropped out” of the net
with probability 1 − 𝑝 or kept with 𝑝  reduced network (less parameters)
– By avoiding training all nodes on all training data, dropout reduces
overfitting in NN  can be thought as ensemble of smaller NNs
– Significantly improves the speed of training
– Reduce the tightly fitted interactions between nodes
 learn more robust features which better generalize to new data
Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014), Hinton
Drop-out
Seoul National University
Deep Neural Network
2016/11/16 - 20 -
• Convolutional Neural Network (CNN)
– A type of feed-forward artificial neural network
– Inspired by the connectivity pattern between neurons of the visual cortex*
– Individual cortical neurons respond to stimuli in a small region of space
(receptive field)
– The receptive fields of different neurons partially overlap such that they tile
the visual field  convolution operation mathematically.
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
*visual cortex: 대뇌 시각 피질
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
Seoul National University
Deep Neural Network
2016/11/16 - 21 -
• Convolutional Neural Network (CNN)
MLP
– Multilayer perceptron  suffer from curse of dimensionality due to full
connectivity between nodes (large number of weights)
– not take into account the spatial structure of data,
 treating input pixels far apart and close on the same way
– full connectivity of neurons is wasteful in the image recognition, and the
huge number of parameters quickly leads to overfitting
CNN
– mitigate the challenges posed by the MLP architecture by exploiting the
spatially local correlation present in images (local connectivity)
– Each filter (weight) is shared across the entire visual field
 Weight sharing reduces the number of parameters dramatically, thus
lower the memory requirements and training time.
Seoul National University
Deep Neural Network
2016/11/16 - 22 -
• Convolutional Neural Network (CNN)
MLP (Deep neural network)
Nodes are fully connected with
adjacent layer
CNN
Locally connected (receptive field)
http://neuralnetworksanddeeplearning.com/chap6.html
Seoul National University
Deep Neural Network
2016/11/16 - 23 -
• Structure of CNN
– Convolutional layer
1) depth: number of filter
2) filter size: area of filter (number of weight parameters)
3) stride: filter movement, inverse proportional to conv. layer’s dimension
4) zero-padding: pad the input with zeros on the border, control output
volume spatial size
 convolutional layer size is depend on these parameters
– ReLU layer
NL activation function which increases the NL properties
– Pooling layer
nonlinear down-sampling (ex. Max pooling, average pooling)
– Fully connected layer
after several conv. and pooling layers  high-level reasoning via FC layer
fully connected to all activations in the previous layer (same as common
DNN)
Seoul National University
Deep Neural Network
2016/11/16 - 24 -
• Structure of CNN
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
Seoul National University
Deep Neural Network
2016/11/16 - 25 -
• Structure of CNN
Convolutional layer
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
Seoul National University
Deep Neural Network
2016/11/16 - 26 -
• Structure of CNN
Convolutional layer
Pooling layer
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
stride=1
stride Zero-padding
Seoul National University
Deep Neural Network
2016/11/16 - 27 -
• Examples of CNN
http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
Seoul National University
Tensorflow
2016/11/16 - 28 -
• Tensorflow basics
– Open source package for machine learning and deep learning developed by
Google Brain Team (within Google’s Machine Intelligence research organization)
– Library for numerical computation using data flow graph
– graph structure that contains all the information, operations and data .
– Node: represent the mathematical operations, points of data entry, output
results, or read/write variables.
– Edge: describe the relationships between nodes with their inputs and
outputs, carry tensors (the basic data structure of TensorFlow).
Seoul National University
Tensorflow
2016/11/16 - 29 -
• Placeholder
– “symbolic” variables to manipulate them during the program execution
– Provide data using feed_dict when we run the code
• Session
– create a session to evaluate the specified symbolic expression.
– Indeed, before we run session, nothing has yet been executed in the code
– TensorFlow is both, an interface to express Machine Learning’s algorithms
and an implementation program to run them
Nodes which contain
variables and operations
Operation
Result of operation
Seoul National University
Tensorflow
2016/11/16 - 30 -
• Tensorboard
– Visualization tools
– Make it easier to understand, debug, and optimize tensorflow programs
– 1) Visualize tensorflow graphs, 2) plot quantitative metrics, 3) show
additional data
Seoul National University
Tensorflow
2016/11/16 - 31 -
• Tensorboard
– Visualization tools
– Make it easier to understand, debug, and optimize tensorflow programs
– 1) Visualize tensorflow graphs, 2) plot quantitative metrics, 3) show
additional data
Seoul National University
Tensorflow Implementation
2016/11/16 - 32 -
• Example 1. Single-layer NN
Nodes which contain
variables and operations
Result of operation
Session
/ run
training
10 output nodes
784 input nodes
Seoul National University
Tensorflow Implementation
2016/11/16 - 33 -
• Example 2. Multi-layer NN: CNN
MNIST dataset
(handwritten digits)
Training set: 60,000 examples
Testing set: 10,000 examples
http://yann.lecun.com/exdb/mnist/https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
Package and MNIST load
Seoul National University
Tensorflow Implementation
2016/11/16 - 34 -
• Example 2. Multi-layer NN: CNN
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
Define weight and parameters Define CNN structure
Seoul National University
Tensorflow Implementation
2016/11/16 - 35 -
• Example 2. Multi-layer NN: CNN
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
Define variables and functions
Session and summary
Train (weight values to minimize the cost)
Seoul National University
Tensorflow Implementation
2016/11/16 - 36 -
• Example 2. Multi-layer NN: CNN
– Tensorboard graph
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
Seoul National University
Tensorflow Implementation
2016/11/16 - 37 -
• Example 2. Multi-layer NN: CNN
– visualization of each layers
https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/cnn_mnist_simple.ipynb
Input image (28x28) 1st conv. Layer (28x28) ReLU (28x28) Max Pooling (14x14)
Convolution filter
(5x5)
Seoul National University2016/11/16 - 38 -
Thank you
Seoul National University
References
2016/11/16 - 39 -
– http://sebastianraschka.com/faq/docs/visual-backpropagation.html
– http://kawahara.ca/how-to-compute-the-derivative-of-a-sigmoid-function-
fully-worked-example/
– https://www.youtube.com/watch?v=E5a3nDpaXjw
– http://deeplearning.net/tutorial/rbm.html
– http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
– https://github.com/sjchoi86/tensorflow-
101/blob/master/notebooks/cnn_mnist_simple.ipynb
– http://www.mdpi.com/1424-8220/16/7/1134/htm
– https://www.toptal.com/machine-learning/an-introduction-to-deep-
learning-from-perceptrons-to-deep-networks
– http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
– http://www.aistudy.com/neural/multilayer_perceptron.htm
– https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-
example/
– https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-
concepts/
Seoul National University
References
2016/11/16 - 40 -
– https://en.wikipedia.org/wiki/Deep_learning
– https://en.wikipedia.org/wiki/Feature_engineering
– https://en.wikipedia.org/wiki/Edge_detection
– https://en.wikipedia.org/wiki/Corner_detection
– http://www.erogol.com/brief-history-machine-learning/
– http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622
– http://yann.lecun.com/exdb/mnist/
– http://darkpgmr.tistory.com/116
– http://neuralnetworksanddeeplearning.com/chap6.html

Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Kim_SNU_SHRM

  • 1.
    Seoul National University SeoulNational University System Health & Risk Management Deep learning and Tensorflow implementation Myungyon Kim 2016. 11. 16.
  • 2.
    Seoul National University Contents 2016/11/16- 2 - • Feature Engineering • Deep Neural Network • Tensorflow • Tensorflow Implementation • Future Works • References
  • 3.
    Seoul National University FeatureEngineering 2016/11/16 - 3 - • Data Representation – Raw sensory data (ex. Vibration, acceleration, temperature) : complex, high dimension, redundant, containing noises  hard to discover useful information and insights – Necessary to find some good, suitable way to represent our data • Feature engineering – Process of create and extract features which represent the systems well – Fundamental to application of machine learning algorithms – Quality and quantity of the features have great influence on the results – Based on the physical, domain knowledge and intuition of engineer
  • 4.
    Seoul National University FeatureEngineering 2016/11/16 - 4 - • Rotor team case • Image processing Edge detection Corner detection HoG (Histogram of Gradients) Statistical Frequency Features System Characteristic Frequency Features Time-domain Features Frequency-domain Features Kinetic energy related Data statistics related Waveform related RMS, Max, Mean Kurtosis, Skewness Crest Factor, Impulse Factor Gear Mesh Freq., Sideband Freq., Harmonic Freq. Freq. Center, RMS Freq., Component Ratio of 1x https://en.wikipedia.org/wiki/Edge_detection https://en.wikipedia.org/wiki/Corner_detection http://www.mdpi.com/1424-8220/16/7/1134/htm
  • 5.
    Seoul National University FeatureEngineering 2016/11/16 - 5 - • Manual Feature engineering – “Coming up with features is difficult, time-consuming, requires expert knowledge. ‘Applied machine learning’ is basically feature engineering” - Andrew Ng (Stanford university, Chief Scientist at Baidu Research) – Feature engineering (select and extract features) is fundamental for machine learning, but it is very difficult, tedious and expensive – For some applications, we may have no idea ‘which features we should use’ • Problems of Current PHM Practices – A considerable amount of human expertise and knowledge is required. – Different systems and data require different feature engineering approaches  Features relevant to diagnosis of one system may NOT be suitable to that of another system.
  • 6.
    Seoul National University FeatureEngineering 2016/11/16 - 6 - • Automated Feature Learning – Need to substitute manual feature engineering with automated feature learning using deep learning • Deep Learning (Deep Neural Network) – Multiple processing layers, with several linear and nonlinear transformation – Learn and extract hierarchical features automatically Replace handcrafted, manually extracted features – Mimic the information processing and communication patterns in a nervous system of human brain (Inspired by advances in neuroscience) – Various algorithms and architectures (DNN, CNN, DBN, RNN) * – Applied to various field (computer vision, speech recognition, natural language processing, bioinformatics) * DNN: Deep Neural Network CNN: Convolutional Neural Network DBN: Deep Belief Network RNN: Recurrent Neural Network
  • 7.
    Seoul National University DeepNeural Network 2016/11/16 - 7 - • History of Neural Network – Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957 – “Perceptrons” – Marvin Minsky, 1969 – Multilayer Neural Network (composition of perceptrons) – Backpropagation – “Learning representations by back-propagating errors”, nature, Geoffrey Hinton, 1986/ Paul Werbos 1974 – Several problems in multilayer, deep Neural Network : hard to train, vanishing gradient, local minima, overfitting – Pre-training (greedy pre-train using RBM) * clever way to initialize weight values – ReLU (Rectifier) – Drop out – CNN (Convolutional Neural Networks) – Computing power/ GPU (Graphic Processing Unit) – Large number of digital data * RBM: Restricted Boltzmann Machine 1st winter 2nd winter
  • 8.
    Seoul National University DeepNeural Network 2016/11/16 - 8 - • Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957 – Most simple case of feedforward neural network – Linear, binary classifier – Train neural network = obtain suitable, correct weights and bias values – He thought that his perceptron is able to classify/recognize everything in the future, with highly developed hardware http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html or 1 10 + + +- and 1 10 - + -- And perceptron w1=1, w2=1, Ө=1.5 Or perceptron w1=1, w2=1, Ө=0.5
  • 9.
    Seoul National University DeepNeural Network 2016/11/16 - 9 - • Perceptron (Artificial Neural Network) – Frank Rosenblatt, 1957 Activation function – Activated (input > threshold Ө) or not (input < Ө) – Sigmoid function (activation function)  to train neural network, activation function should be differentiable – 𝑠𝑖𝑔 𝑋 = 1 1+𝑒−𝑋 – Nonlinear activation function  “squashing” linear net input within specific region & add nonlinear properties to NN – allow networks to compute nontrivial problems using only a small number of nodes https://commons.wikimedia.org/wiki/File:Sigmoid-function-2.svg
  • 10.
    Seoul National University DeepNeural Network 2016/11/16 - 10 - • “Perceptrons” – Marvin Minsky, 1969 – Single layer perceptron cannot solve the nonlinear classification problem – Xor (exclusive or): logical operation that outputs true only when input differ – To solve nonlinear problem, MLP(multilayer perceptrons, multilayer neural network) is needed – However, it is not easy (very hard) to train MLP properly http://www.aistudy.com/neural/multilayer_perceptron.htm XOR 1 10 - + -+  cannot solved by using single perceptron (linear classifier) https://www.amazon.com/Perceptrons-Introduction-Computational-Geometry-Expanded/dp/0262631113 1st winter
  • 11.
    Seoul National University DeepNeural Network 2016/11/16 - 11 - • “Perceptrons” – Marvin Minsky, 1969 Examples) Neural networks for logical operation http://toritris.weebly.com/perceptron-2-logical-operations.html AND OR XOR  Single layer perceptron is ok  Multi-layer perceptron is needed (*different weight and threshold can be used. Ex) different values for “and”, “or” as I explained before) XOR 1 10 - + -+
  • 12.
    Seoul National University DeepNeural Network 2016/11/16 - 12 - • Backpropagation (“Learning representations by back-propagating errors”, nature, Geoffrey Hinton, 1986/ Paul Werbos 1974) – “Backward propagation of errors” – Common method to train NN with multiple layers – error gradient with respect to each weights  Using chain rule, quantify the influence of each weights to final error – Using optimization method such as gradient descent algorithm Error gradient Calculation of partial derivative of cost with respect to specific weight using chain rule https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
  • 13.
    Seoul National University DeepNeural Network 2016/11/16 - 13 - • Backpropagation – Procedure 1. Initialize the weights randomly 2. Forward propagation (through the neural network, to obtain output & cost) 3. Backward propagation (influence of each weights on errors) 4. Weight update (𝑊 ≔ 𝑊 − 𝛼 𝜕 𝜕𝑊 𝑐𝑜𝑠𝑡 𝑊 ) repeat those steps until the performance of the network is satisfied • Cost function (Error) 𝐶𝑜𝑠𝑡 = 1 2𝑚 ෍ 𝑖=1 𝑚 𝑡𝑎𝑟𝑔𝑒𝑡 𝑖 − 𝑜𝑢𝑡𝑝𝑢𝑡 𝑖 2 • Gradient descent algorithm - find W, b to minimize the cost using delta rule - used in many minimization problems Convex function  guarantee global minima http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
  • 14.
    Seoul National University DeepNeural Network 2016/11/16 - 14 - • Several problems in multilayer, deep Neural Network 1. Vanishing gradient – Error gradient: multiplication of gradients in backward direction  for early layers, error gradients vanish – Back-propagation fail to train earlier-layer parameters properly – Early layers are responsible for detecting the simple patterns and the building blocks (ex. Edge for facial recognition)  when early layers are not trained properly, the result will be inaccurate https://www.youtube.com/watch?v=E5a3nDpaXjwhttp://kawahara.ca/how-to-compute-the-derivative-of-a-sigmoid-function-fully-worked-example/ Max: 0.25
  • 15.
    Seoul National University DeepNeural Network 2016/11/16 - 15 - • Several problems in multilayer, deep Neural Network 2. Local minima – Using gradient descent algorithm to train the deep neural network, it is possible to get stuck in local minima – Cannot obtain the optimum solution https://www.toptal.com/machine-learning/an-introduction-to-deep-learning-from-perceptrons-to-deep-networks
  • 16.
    Seoul National University DeepNeural Network 2016/11/16 - 16 - • Several problems in multilayer, deep Neural Network 3. Overfitting – complex model: too many parameters relative to the number of observation – Learn or train not only true relation, but also noise and random errors – overacts to minor fluctuations in the given training data  Poor predictive performance 4. Hard to train correctly, takes long time  use other machine learning algorithms, such as support vector machine http://www.slideshare.net/fcollova/introduction-to-neural-network Under-fitting Just right! overfitting 2nd winter
  • 17.
    Seoul National University DeepNeural Network 2016/11/16 - 17 - • Pre-training using RBM – Geoffrey Hinton *clever way to initialize weight values – RBM: Restricted, special case of Boltzmann machine/ undirected, generative energy-based model with a visible input layer and a hidden layer/ connections between the layers but not within layers – By greedy, layer-wise training* of RBM and Stacking them (*Bengio, 2007)  initialize the weights of DNN well (pre-training)  faster convergence of the fine-tuning and improved performance – 1. pre-training: Learn generally useful feature detector (RBM, AE) 2. fine-tuning: whole network is trained further by supervised BP http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DeepBeliefNetworks RBM http://deeplearning.net/tutorial/rbm.html Layer-wise training of a DBN * (DBN: Deep Belief Network)
  • 18.
    Seoul National University DeepNeural Network 2016/11/16 - 18 - • ReLU (Rectified Linear Unig, Rectifier) – Activation function defined as 𝑓 𝑥 = 𝑚𝑎𝑥(0, 𝑥) – Powerful activation function which substitute the sigmoid function – Sigmoid: derivative is smaller than 0.25  vanishing gradient problem ReLU: 0 or 1  error transferred 100%: no vanishing gradient! – Sparse activation  fast and effective training of DNN with large datasets  No need to use unsupervised pre-training (RBM, AE) http://nn.readthedocs.io/en/latest/transfer/ ReLU and its derivative
  • 19.
    Seoul National University DeepNeural Network 2016/11/16 - 19 - • Dropout – Regularization technique : Prevent co-adaptation on training data – At each training step, individual nodes are either “dropped out” of the net with probability 1 − 𝑝 or kept with 𝑝  reduced network (less parameters) – By avoiding training all nodes on all training data, dropout reduces overfitting in NN  can be thought as ensemble of smaller NNs – Significantly improves the speed of training – Reduce the tightly fitted interactions between nodes  learn more robust features which better generalize to new data Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014), Hinton Drop-out
  • 20.
    Seoul National University DeepNeural Network 2016/11/16 - 20 - • Convolutional Neural Network (CNN) – A type of feed-forward artificial neural network – Inspired by the connectivity pattern between neurons of the visual cortex* – Individual cortical neurons respond to stimuli in a small region of space (receptive field) – The receptive fields of different neurons partially overlap such that they tile the visual field  convolution operation mathematically. http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution *visual cortex: 대뇌 시각 피질 http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
  • 21.
    Seoul National University DeepNeural Network 2016/11/16 - 21 - • Convolutional Neural Network (CNN) MLP – Multilayer perceptron  suffer from curse of dimensionality due to full connectivity between nodes (large number of weights) – not take into account the spatial structure of data,  treating input pixels far apart and close on the same way – full connectivity of neurons is wasteful in the image recognition, and the huge number of parameters quickly leads to overfitting CNN – mitigate the challenges posed by the MLP architecture by exploiting the spatially local correlation present in images (local connectivity) – Each filter (weight) is shared across the entire visual field  Weight sharing reduces the number of parameters dramatically, thus lower the memory requirements and training time.
  • 22.
    Seoul National University DeepNeural Network 2016/11/16 - 22 - • Convolutional Neural Network (CNN) MLP (Deep neural network) Nodes are fully connected with adjacent layer CNN Locally connected (receptive field) http://neuralnetworksanddeeplearning.com/chap6.html
  • 23.
    Seoul National University DeepNeural Network 2016/11/16 - 23 - • Structure of CNN – Convolutional layer 1) depth: number of filter 2) filter size: area of filter (number of weight parameters) 3) stride: filter movement, inverse proportional to conv. layer’s dimension 4) zero-padding: pad the input with zeros on the border, control output volume spatial size  convolutional layer size is depend on these parameters – ReLU layer NL activation function which increases the NL properties – Pooling layer nonlinear down-sampling (ex. Max pooling, average pooling) – Fully connected layer after several conv. and pooling layers  high-level reasoning via FC layer fully connected to all activations in the previous layer (same as common DNN)
  • 24.
    Seoul National University DeepNeural Network 2016/11/16 - 24 - • Structure of CNN http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
  • 25.
    Seoul National University DeepNeural Network 2016/11/16 - 25 - • Structure of CNN Convolutional layer http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
  • 26.
    Seoul National University DeepNeural Network 2016/11/16 - 26 - • Structure of CNN Convolutional layer Pooling layer http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf stride=1 stride Zero-padding
  • 27.
    Seoul National University DeepNeural Network 2016/11/16 - 27 - • Examples of CNN http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf
  • 28.
    Seoul National University Tensorflow 2016/11/16- 28 - • Tensorflow basics – Open source package for machine learning and deep learning developed by Google Brain Team (within Google’s Machine Intelligence research organization) – Library for numerical computation using data flow graph – graph structure that contains all the information, operations and data . – Node: represent the mathematical operations, points of data entry, output results, or read/write variables. – Edge: describe the relationships between nodes with their inputs and outputs, carry tensors (the basic data structure of TensorFlow).
  • 29.
    Seoul National University Tensorflow 2016/11/16- 29 - • Placeholder – “symbolic” variables to manipulate them during the program execution – Provide data using feed_dict when we run the code • Session – create a session to evaluate the specified symbolic expression. – Indeed, before we run session, nothing has yet been executed in the code – TensorFlow is both, an interface to express Machine Learning’s algorithms and an implementation program to run them Nodes which contain variables and operations Operation Result of operation
  • 30.
    Seoul National University Tensorflow 2016/11/16- 30 - • Tensorboard – Visualization tools – Make it easier to understand, debug, and optimize tensorflow programs – 1) Visualize tensorflow graphs, 2) plot quantitative metrics, 3) show additional data
  • 31.
    Seoul National University Tensorflow 2016/11/16- 31 - • Tensorboard – Visualization tools – Make it easier to understand, debug, and optimize tensorflow programs – 1) Visualize tensorflow graphs, 2) plot quantitative metrics, 3) show additional data
  • 32.
    Seoul National University TensorflowImplementation 2016/11/16 - 32 - • Example 1. Single-layer NN Nodes which contain variables and operations Result of operation Session / run training 10 output nodes 784 input nodes
  • 33.
    Seoul National University TensorflowImplementation 2016/11/16 - 33 - • Example 2. Multi-layer NN: CNN MNIST dataset (handwritten digits) Training set: 60,000 examples Testing set: 10,000 examples http://yann.lecun.com/exdb/mnist/https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb Package and MNIST load
  • 34.
    Seoul National University TensorflowImplementation 2016/11/16 - 34 - • Example 2. Multi-layer NN: CNN https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb Define weight and parameters Define CNN structure
  • 35.
    Seoul National University TensorflowImplementation 2016/11/16 - 35 - • Example 2. Multi-layer NN: CNN https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb Define variables and functions Session and summary Train (weight values to minimize the cost)
  • 36.
    Seoul National University TensorflowImplementation 2016/11/16 - 36 - • Example 2. Multi-layer NN: CNN – Tensorboard graph https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/vis_cnn_mnist.ipynb
  • 37.
    Seoul National University TensorflowImplementation 2016/11/16 - 37 - • Example 2. Multi-layer NN: CNN – visualization of each layers https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/cnn_mnist_simple.ipynb Input image (28x28) 1st conv. Layer (28x28) ReLU (28x28) Max Pooling (14x14) Convolution filter (5x5)
  • 38.
  • 39.
    Seoul National University References 2016/11/16- 39 - – http://sebastianraschka.com/faq/docs/visual-backpropagation.html – http://kawahara.ca/how-to-compute-the-derivative-of-a-sigmoid-function- fully-worked-example/ – https://www.youtube.com/watch?v=E5a3nDpaXjw – http://deeplearning.net/tutorial/rbm.html – http://cs231n.stanford.edu/slides/winter1516_lecture7.pdf – https://github.com/sjchoi86/tensorflow- 101/blob/master/notebooks/cnn_mnist_simple.ipynb – http://www.mdpi.com/1424-8220/16/7/1134/htm – https://www.toptal.com/machine-learning/an-introduction-to-deep- learning-from-perceptrons-to-deep-networks – http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html – http://www.aistudy.com/neural/multilayer_perceptron.htm – https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation- example/ – https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core- concepts/
  • 40.
    Seoul National University References 2016/11/16- 40 - – https://en.wikipedia.org/wiki/Deep_learning – https://en.wikipedia.org/wiki/Feature_engineering – https://en.wikipedia.org/wiki/Edge_detection – https://en.wikipedia.org/wiki/Corner_detection – http://www.erogol.com/brief-history-machine-learning/ – http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622 – http://yann.lecun.com/exdb/mnist/ – http://darkpgmr.tistory.com/116 – http://neuralnetworksanddeeplearning.com/chap6.html