SlideShare a Scribd company logo
Copyright 2011 Trend Micro Inc. 1
Introduction to Deep Neural Network
Liwei Ren, Ph.D
San Jose, California, Nov, 2016
Copyright 2011 Trend Micro Inc.
Agenda
• What a DNN is
• How a DNN works
• Why a DNN works
• Those DNNs in action
• Where the challenges are
• Successful stories
• Security problems
• Summary
• Quiz
• What else
2
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN and AI in the secular world
3
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN and AI in the secular world
4
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN and AI in the secular world
5
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN in the technical world
6
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN in the technical world
7
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN in the technical world
8
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Categorizing the DNNs :
9
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Three technical elements
• Architecture: the graph, weights/biases, activation functions
• Activity Rule: weights/biases, activation functions
• Learning Rule: a typical one is backpropagation algorithm
• Three masters in this area:
10
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Given a practical problem , we have two approaches
to solve it.
11
Copyright 2011 Trend Micro Inc.
What is a DNN?
• An example: image recognition
12
Copyright 2011 Trend Micro Inc.
What is a DNN?
• An example: image recognition
13
Copyright 2011 Trend Micro Inc.
What is a DNN?
• In the mathematical world
– A DNN is a mathematical function f: D  S, where D ⊆ Rn and S ⊆ Rm,
which is constructed by a directed graph based architecture.
– A DNN is also a composition of functions from a network of primitive
functions.
14
Copyright 2011 Trend Micro Inc.
What is a DNN?
• We denote the a feed-forward DNN function by O= f(I) which
is determined by a few parameters G, Φ ,W,B
• Hyper-parameters:
– G is the directed graph which presents the structure
– Φ presents one or multiple activation functions for activating the nodes
• Parameters:
– W is the vector of weights relevant to the edges
– B is the vector of biases relevant to the nodes
15
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Activation at a node:
16
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Activation function:
17
Copyright 2011 Trend Micro Inc.
What is a DNN?
• G=(V,E) is a graph and Φ is a set of activation functions.
• <G,Φ> constructs a family of functions F:
– F(G,Φ) = { f | f is a function constructed by <G, Φ ,W> where WϵRN }
• N= total number of weights at all nodes of output layer and hidden layers.
• Each f(I) can be denoted by f(I ,W).
18
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Mathematically, a DNN based supervised machine
learning technology can be described as follows :
– Given g ϵ { h | h:D  S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find f ϵ
F(G,Φ) such that 𝑓 − 𝑔 < δ.
• Essentially, it is to identify a W ϵ RN such that 𝑓(∗, 𝑊) − 𝑔 < δ
• However, in practice, g is not explicitly expressed . It
usually appears in a sequence of samples:
– { <I(j),T(j)> | T(j) =g(I(j)), j=1, 2, …,M}
• where I(j) is an input vector and T(j) is its corresponding target vector.
19
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• The function g is not explicitly expressed, we are not able
to calculate g − f(∗, W)
• Instead, we evaluate the error function E(W)=
1
2𝑀
∑||T(j) -
f(I(j),W)||2
• We expect to determine W such that E(W) < δ
• How to identify W ϵ RN so that E(W) < δ ? Lets solve the
nonlinear optimization problem min{E(W)| W ϵ RN} , i.e.:
min{
1
2𝑀
∑|| T(j) - f(I(j),W) ||2 | W ϵ RN } (P1)
20
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• (P1) is for batch mode training, however ,it is too
expensive.
• In order to reduce the computational cost, a sequential
mode is introduced.
• Picking <I,T> ϵ {<I(1),T(1) >, <I(2),T(2)> ,…, <I(M),T(M)>}
sequentially, let the output of the network as O= f(I,W) for any
W:
• Error function E(W)= ||T- f(I,W)||2 /2 = ∑(Tj-Oj)2 /2
• Each Oj can be considered as a function of W. We denote it as Oj(W).
• We have the optimization problem for training with
sequential mode:
– min{ ∑(Tj-Oj(W) )2 /2 | W ϵ RN} (P2)
21
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• One may ask whether we get the same solution for
both batch mode and sequential mode ?
• BTW
– batch mode = offline mode
– sequential mode = online mode
• We focus on online mode in this talk
22
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• How to solve the unconstrained nonlinear optimization
problem (P2)?
• The general approach of unconstrained nonlinear
optimization is to find local minima of E(W) by using
the iterative process of Gradient Descent.
•∂E = (∂E/∂W1, ∂E/∂W2, …, ∂E/∂WT)
• The iterations:
– ΔWj = - γ ∂E/∂Wj for j=1, …,T
– Updating W in each step by
• Wj
(k+1) = Wj
(k) - γ ∂E(W (k))/∂Wj for j=1, …,T (A1)
• until E(W (k+1)) < δ or E(W (k+1)) can not be reduced anymore
23
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• The algorithm of Gradient Descent:
24
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• From the perspective of mathematics, the process of
Gradient Descent is straightforward.
• However, from the perspective of scientific computing, it is
quite challenging to calculate the values of all ∂E/∂Wj for
j=1, …,N:
– The complexity of presenting each ∂E/∂Wj where j=1, …,N.
– There are (k+1)-layer function compositions for a DNN of k hidden
layers.
25
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• For example, we have a very simple network as follows
with the activation function φ(v)=1/(1 + 𝑒−𝑣
).
• E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2, we
have:
– ∂E/∂w0 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0)
– ∂E/∂w1 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ(w3I+w2)
– ∂E/∂w2 = - w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0)
φ’(w3I+w2)
– ∂E/∂w3 = - I w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0)
φ’(w3I+w2)
26
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• Lets imagine a network of N inputs, M outputs and K
hidden layers each of which has L nodes.
– It is a daunting task to express ∂E/∂wj explicitly. Last simple
example already shows this.
• The backpropagation (BP) algorithm was proposed as a
rescue:
– Main idea : the weights of (k-1)-th hidden layer can be expressed
by the k-th layer recursively.
– We can start with the output layer which is considered as (L+1)-
layer.
27
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• BP algorithm has the following major steps:
1. Feed-forward computation
2. Back-propagation to the output layer
3. Back-propagation to the hidden layers
4. Weight updates
28
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
29
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• A general DNN can be drawn as follows
30
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• How to express the weights of (k-1)-th hidden layer by the
weights of k-th layer recursively?
31
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• Let us experience the BP with our small network.
– E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2.
• ∂E/∂w0 = - φ’(O) (T – O)
• ∂E/∂w1 = -φ’(O) (T – O) φ(O)
• ∂E/∂w2 = - φ’(O) (T – O) φ’(H) w1 * 1
• ∂E/∂w3 = - φ’(O) (T – O) φ’(H) w1 * I
– Let H0
(1)= 1, H1
(1) = H = φ(w3I+ w2), H1
(0) = I, we verify the follows:
• δ1
(2)= φ’(O) (T – O)
• w0
+ = w0 + γ δ1
(2) H0
(1) , w1
+ = w1 + γ δ1
(2) H1
(1)
• δ1
(1)= φ’(H1
(1)) δ1
(2) w1
• w2
+ = w2 + γ δ1
(1) H0
(0) , w3
+ = w3 + γ δ1
(1) H1
(0)
• where w0 = w0,1
(2) , w1 = w1,1
(2), w2 = w0,1
(1) , w2 = w1,1
(1)
32
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
• It is amazing ! However, why does it work?
• For a FNN, it is to ask whether the following approximation
problem has a solution:
– Given g ϵ { h | h:D  S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find a W ϵ
RN such that 𝑓(∗, 𝑊) − 𝑔 < δ.
• Universal approximation theorem (S):
– Let φ(.) be a bounded and monotonically-increasing continuous function. Let
Im denote the m-dimensional unit hypercube [0,1]m . The space of continuous
functions on Im is denoted by C(Im) . Then, given any function f ϵ C(Im) and
ε>0 , there exists an integer N , real constants vi, bi ϵ R and real vectors wi ϵ
Rm, where i=1, …, N , such that
|F(x)-f(x)| < ε
for all x ϵ Im , where F(x) = vi φ(wi T x + bi)𝑵
𝒊=𝟏 is an approximation to the
function f which is independent of φ .
33
Im
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
• Its corresponding network with only one hidden layer
– NOTE : this is not even a general case for one hidden layer. It is a
special case. WHY?
– However, it is powerful and encouraging from the mathematical
perspective.
34
Im
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
The general networks have a general version of Universal
Approximation Theorem accordingly:
35
Im
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
• Universal approximation theorem (G):
– Let φ(.) be a bounded and monotonically-increasing continuous function. Let
S be a compact space in Rm . Let C(S ) = {g | g:S ⊂ Rm  Rn is continuous}.
Then, given any function f ϵ C(S) and ε>0 , there exists a FNN as shown
above which constructs the network function F such that
|| F(x)-f(x) || < ε
where F is an approximation to the function f which is independent of φ .
• It seems both shallow and deep neural networks can
construct an approximation to a given function.
– Which is better?
– Or which is more efficient in terms of using less nodes ?
36
Im
Rm
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
• Mathematical foundation of neural networks:
37
Im
Rm
Copyright 2011 Trend Micro Inc.
Those DNNs in action
• DNN has three elements
• Architecture: the graph , weights/biases, activation functions
• Activity Rule: weights/biases, activation functions
• Learning Rule: a typical one is backpropagation algorithm
• The architecture basically determines the capability of a specific DNN
– Different architectures are suitable for different applications.
– The most general architecture of an ANN is a DAG ( directed acyclic graph).
38
Copyright 2011 Trend Micro Inc.
Those DNNs in action
• There are a few well-known categories of DNNs.
39
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• Given a specific problem, there are a few questions
before one starts the journey with DNNs:
– Do you understand the problem that you need to solve?
– Do you really want to solve this problem with DNN, why?
• Do you have an alternative yet effective solution?
– Do you know how to describe the problem in DNN mathematically ?
– Do you know how to implement a DNN , beyond a few APIs and
sizzling hype?
– How to collect sufficient data for training?
– How to solve the problem efficiently and cost-effectively?
40
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• 1st Challenge:
– a full mesh network has the curse of dimensionality.
41
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• Many tasks of FNN do not need a full mesh network.
• For example, if we can present the input vector as a grid, the nearest-
neighborhood models can be used when constructing an effective FNN
which can reduce connections
– Image recognition
– GO (圍棋) : a game that two players play on a 19x19 grid of lines.
42
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• The 2nd challenge is how to describe a technical problem in terms of
DNN, i.e., mathematical modeling. There are generally two
approaches:
– Applying a well-learned DNN architecture to describe the problem. Deep
understanding of the specific network is usually required!
• Two general DNN architectures are well-known
– FNN: feedforward neural network. Its special architecture CNN (convolutional
neural network) is widely used in many applications such as image recognition,
GO, and etc.
– RNN: recurrent neural network. Its special architecture is LSTM (long short-
term memory) which has been applied successfully in speech recognition,
language translation, and etc.
• For example, if we want to try a FNN, how to describe the problem in
terms of <Input vector, Output vector> with fixed dimension ?
– Creating a novel DNN architecture from ground if none of the existing
models fits your problem. Deep understanding of DNN theory /
algorithms is required.
43
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• Handwriting digit recognition:
– Modeling this problem is straightforward
44
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• Image Recognition is also straightforward
45
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• However, due to the curse of dimensionality, we can use a special FFN:
– Convolutional neural network (CNN)
46
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• How to construct a DNN to describe language translation ?
– They use LSTM networks
• How to construct a DNN to describe the problem of malware
classification?
• How to construct a DNN to describe the network traffic for
security purpose?
47
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• The 3rd challenge is how to collect sufficient training data.
To achieve required accuracy, sufficient training data is
necessary. WHY?
48
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• The 4th challenge is how to identify various talents for
providing a DNN solution to solve specific problems.
– Who knows how to use existing DL APIs such as TensorFlow
– Who understands various DNN architectures in depth so that he/she
knows how to evaluate and identify a suitable DNN architecture to
solve the problem.
– Who understands the theory and algorithms of the DNN in depth so
that he/she can create and design a novel DNN from ground.
49
Copyright 2011 Trend Micro Inc.
Successful Stories
• ImageNet : 1M+ images, 1000+ categories, CNN
50
Copyright 2011 Trend Micro Inc.
Successful Stories
• Unsupervised learning neural networks… YouTube and the
Cat .
51
Copyright 2011 Trend Micro Inc.
Successful Stories
• AlphaGo, a significant milestone in AI history
– More significant than DeepBlue
• Both Policy Network and Value Network are CNNs.
52
Copyright 2011 Trend Micro Inc.
Successful Stories
• Google Machine Neural Translation… LSTM (Long
Short Term Memory) network
53
Copyright 2011 Trend Micro Inc.
Successful Stories
• Microsoft Speech Recognition … LSTM and TDNN
(Time Delay Neural Networks )
54
Copyright 2011 Trend Micro Inc.
Security Problems
• Not disclosed for the public version.
55
Copyright 2011 Trend Micro Inc.
Summary
• What a DNN is
• How a DNN works
• Why a DNN works
• The categories of DNNs
• Some challenges
• Well-known stories
• Security problems
56
Copyright 2011 Trend Micro Inc.
Quiz
• Why do we choose the activation function as a
nonlinear function?
• Why Deep? Why deep networks are better than
shallow networks?
• What is the difference between online and batch
mode training?
• Will online and batch mode training converge to the
same solution?
• Why do we need the backpropagation algorithm?
• Why do we apply convolutional neural networks to
image recognition? 57
Copyright 2011 Trend Micro Inc.
Quiz
• If we solve a problem with a FNN,
– how many deep layers should we go?
– How many nodes are good for each layer?
– How to estimate and optimize the cost?
• Is it guaranteed that the backpropagation algorithm converge
to a solution?
• Why do we need sufficient data for training in order to achieve
certain accuracy?
• Can a DNN do some tasks more than extending human’s
capabilities or automating extensive manual tasks ?
– To prove a mathematical theorem ... or to introduce an interesting
concept… or to appreciate a poem… or to love…
58
Copyright 2011 Trend Micro Inc.
Quiz
• AlphaGo is trained for 19x19 lattice. If we play GO
game on 20x20 board, can AlphaGo handle it?
• ImageNet is trained for 1000 categories. If we add the
1001-th category, what should we do?
• People do consider a special DNN as a black box.
Why?
• More questions from you …
59
Copyright 2011 Trend Micro Inc.
What Else?
• What to share next from me? Why do you care?
– Various DNNs: principles, examples, analysis and
experiments…
•ImageNet, AlphaGO, GNMT and etc..
– My Ph.D work and its relevance to DNN
– Little History of AI and Artificial Neural Network
– Various Schools of the AI Discipline
– Strong AI vs. Weak AI
60
Copyright 2011 Trend Micro Inc.
What Else?
• What to share next from me? Why do you care?
– Questions when thinking about AI:
• Are we able to understand how we learn?
• Are we going the right directions mathematically and scientifically?
• Are there simple principles for cognition like what Newton and Einstein
established for understanding our universe?
• What are we lack between now and the coming of so called Strong AI?
61
Copyright 2011 Trend Micro Inc.
What Else?
• What to share next from me? Why do you care?
•Questions about who we are.
– Are we created? 
– Are we the AI of the creator?
•My little theory about the Universe
62

More Related Content

What's hot

Low-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsLow-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problems
Alexander Litvinenko
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
Devashish Patel
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
Sanghyuk Chun
 
Chapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image TransformationChapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image Transformation
Varun Ojha
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
Jason Anderson
 
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارشبررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
پروژه مارکت
 
Perceptron
PerceptronPerceptron
Perceptron
VARUN KUMAR
 
Lecture 14 Properties of Fourier Transform for 2D Signal
Lecture 14 Properties of Fourier Transform for 2D SignalLecture 14 Properties of Fourier Transform for 2D Signal
Lecture 14 Properties of Fourier Transform for 2D Signal
VARUN KUMAR
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9
M S Prasad
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes Family
Kota Matsui
 
Neural Networks - How do they work?
Neural Networks - How do they work?Neural Networks - How do they work?
Neural Networks - How do they work?
Accubits Technologies
 
Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Tatsuya Yokota
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
NAVER Engineering
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Ashwin Rao
 
Max net
Max netMax net
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
범준 김
 
deep learning
deep learningdeep learning
deep learning
Aravindharamanan S
 

What's hot (20)

Low-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsLow-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problems
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Chapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image TransformationChapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image Transformation
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارشبررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
 
Perceptron
PerceptronPerceptron
Perceptron
 
Lecture 14 Properties of Fourier Transform for 2D Signal
Lecture 14 Properties of Fourier Transform for 2D SignalLecture 14 Properties of Fourier Transform for 2D Signal
Lecture 14 Properties of Fourier Transform for 2D Signal
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes Family
 
Neural Networks - How do they work?
Neural Networks - How do they work?Neural Networks - How do they work?
Neural Networks - How do they work?
 
Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
 
Max net
Max netMax net
Max net
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Chapter 16
Chapter 16Chapter 16
Chapter 16
 
Aleksander gegov
Aleksander gegovAleksander gegov
Aleksander gegov
 
deep learning
deep learningdeep learning
deep learning
 

Viewers also liked

企业安全市场综述
企业安全市场综述 企业安全市场综述
企业安全市场综述
Liwei Ren任力偉
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
Ahmed_hashmi
 
neural network
neural networkneural network
neural network
STUDENT
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
Nagarajan
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
REHMAT ULLAH
 
聊一聊大明朝的火器
聊一聊大明朝的火器聊一聊大明朝的火器
聊一聊大明朝的火器
Liwei Ren任力偉
 
Artificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google BrainArtificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google Brain
Rawan Al-Omari
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksstellajoseph
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
硅谷的那点事儿
硅谷的那点事儿硅谷的那点事儿
硅谷的那点事儿
Liwei Ren任力偉
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
Chun-Hao Chang
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network
Qingkai Kong
 
Neural network 20161210_jintaekseo
Neural network 20161210_jintaekseoNeural network 20161210_jintaekseo
Neural network 20161210_jintaekseo
JinTaek Seo
 
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
FAST CAMPUS
 
Árboles de Decisión en Weka
Árboles de Decisión en WekaÁrboles de Decisión en Weka
Árboles de Decisión en Weka
Lorena Quiñónez
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
Anastasiia Kornilova
 
Perceptron Simple y Regla Aprendizaje
Perceptron  Simple y  Regla  AprendizajePerceptron  Simple y  Regla  Aprendizaje
Perceptron Simple y Regla Aprendizaje
Roberth Figueroa-Diaz
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a function
Taisuke Oe
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
Neural network
Neural networkNeural network
Neural network
KRISH na TimeTraveller
 

Viewers also liked (20)

企业安全市场综述
企业安全市场综述 企业安全市场综述
企业安全市场综述
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
neural network
neural networkneural network
neural network
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
 
聊一聊大明朝的火器
聊一聊大明朝的火器聊一聊大明朝的火器
聊一聊大明朝的火器
 
Artificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google BrainArtificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google Brain
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
硅谷的那点事儿
硅谷的那点事儿硅谷的那点事儿
硅谷的那点事儿
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network
 
Neural network 20161210_jintaekseo
Neural network 20161210_jintaekseoNeural network 20161210_jintaekseo
Neural network 20161210_jintaekseo
 
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
 
Árboles de Decisión en Weka
Árboles de Decisión en WekaÁrboles de Decisión en Weka
Árboles de Decisión en Weka
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
Perceptron Simple y Regla Aprendizaje
Perceptron  Simple y  Regla  AprendizajePerceptron  Simple y  Regla  Aprendizaje
Perceptron Simple y Regla Aprendizaje
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a function
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Neural network
Neural networkNeural network
Neural network
 

Similar to Introduction to Deep Neural Network

UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
abcdefghijklmn19
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learning
simaokasonse
 
Time complexity.ppt
Time complexity.pptTime complexity.ppt
Time complexity.ppt
YekoyeTigabuYeko
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihmSajid Marwat
 
Find all hazards in this circuit. Redesign the circuit as a three-le.pdf
Find all hazards in this circuit.  Redesign the circuit as a three-le.pdfFind all hazards in this circuit.  Redesign the circuit as a three-le.pdf
Find all hazards in this circuit. Redesign the circuit as a three-le.pdf
Arrowdeepak
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdf
abay golla
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
Digital Signal Processing Tutorial:Chapt 1 signal and systems
Digital Signal Processing Tutorial:Chapt 1 signal and systemsDigital Signal Processing Tutorial:Chapt 1 signal and systems
Digital Signal Processing Tutorial:Chapt 1 signal and systemsChandrashekhar Padole
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
台灣資料科學年會
 
Scala and Deep Learning
Scala and Deep LearningScala and Deep Learning
Scala and Deep Learning
Oswald Campesato
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
Dongang (Sean) Wang
 
Activation function
Activation functionActivation function
Activation function
RakshithGowdakodihal
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithms
Eellekwameowusu
 
Cdc18 dg lee
Cdc18 dg leeCdc18 dg lee
Cdc18 dg lee
whatthehellisit
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
nyomans1
 
Recursion in Java
Recursion in JavaRecursion in Java
Recursion in Java
Fulvio Corno
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.
Tariq Khan
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
Oswald Campesato
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexityAbbas Ali
 

Similar to Introduction to Deep Neural Network (20)

UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learning
 
Time complexity.ppt
Time complexity.pptTime complexity.ppt
Time complexity.ppt
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihm
 
Find all hazards in this circuit. Redesign the circuit as a three-le.pdf
Find all hazards in this circuit.  Redesign the circuit as a three-le.pdfFind all hazards in this circuit.  Redesign the circuit as a three-le.pdf
Find all hazards in this circuit. Redesign the circuit as a three-le.pdf
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdf
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Digital Signal Processing Tutorial:Chapt 1 signal and systems
Digital Signal Processing Tutorial:Chapt 1 signal and systemsDigital Signal Processing Tutorial:Chapt 1 signal and systems
Digital Signal Processing Tutorial:Chapt 1 signal and systems
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
Scala and Deep Learning
Scala and Deep LearningScala and Deep Learning
Scala and Deep Learning
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
Activation function
Activation functionActivation function
Activation function
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithms
 
Cdc18 dg lee
Cdc18 dg leeCdc18 dg lee
Cdc18 dg lee
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Recursion in Java
Recursion in JavaRecursion in Java
Recursion in Java
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexity
 

More from Liwei Ren任力偉

信息安全领域里的创新和机遇
信息安全领域里的创新和机遇信息安全领域里的创新和机遇
信息安全领域里的创新和机遇
Liwei Ren任力偉
 
防火牆們的故事
防火牆們的故事防火牆們的故事
防火牆們的故事
Liwei Ren任力偉
 
移动互联网时代下创新的思维
移动互联网时代下创新的思维移动互联网时代下创新的思维
移动互联网时代下创新的思维
Liwei Ren任力偉
 
非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究
Liwei Ren任力偉
 
世纪猜想
世纪猜想世纪猜想
世纪猜想
Liwei Ren任力偉
 
Arm the World with SPN based Security
Arm the World with SPN based SecurityArm the World with SPN based Security
Arm the World with SPN based Security
Liwei Ren任力偉
 
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemExtending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
Liwei Ren任力偉
 
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsNear Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Liwei Ren任力偉
 
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Liwei Ren任力偉
 
Phase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillatorsPhase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillators
Liwei Ren任力偉
 
On existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problemOn existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problem
Liwei Ren任力偉
 
Math stories
Math storiesMath stories
Math stories
Liwei Ren任力偉
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
Liwei Ren任力偉
 
IoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and SolutionsIoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and Solutions
Liwei Ren任力偉
 
Taxonomy of Differential Compression
Taxonomy of Differential CompressionTaxonomy of Differential Compression
Taxonomy of Differential Compression
Liwei Ren任力偉
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
Liwei Ren任力偉
 
Overview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) TechnologyOverview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) Technology
Liwei Ren任力偉
 
DLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and AlgorithmsDLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and Algorithms
Liwei Ren任力偉
 
Mathematical Modeling for Practical Problems
Mathematical Modeling for Practical ProblemsMathematical Modeling for Practical Problems
Mathematical Modeling for Practical Problems
Liwei Ren任力偉
 
Securing Your Data for Your Journey to the Cloud
Securing Your Data for Your Journey to the CloudSecuring Your Data for Your Journey to the Cloud
Securing Your Data for Your Journey to the Cloud
Liwei Ren任力偉
 

More from Liwei Ren任力偉 (20)

信息安全领域里的创新和机遇
信息安全领域里的创新和机遇信息安全领域里的创新和机遇
信息安全领域里的创新和机遇
 
防火牆們的故事
防火牆們的故事防火牆們的故事
防火牆們的故事
 
移动互联网时代下创新的思维
移动互联网时代下创新的思维移动互联网时代下创新的思维
移动互联网时代下创新的思维
 
非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究
 
世纪猜想
世纪猜想世纪猜想
世纪猜想
 
Arm the World with SPN based Security
Arm the World with SPN based SecurityArm the World with SPN based Security
Arm the World with SPN based Security
 
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemExtending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
 
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsNear Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
 
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
 
Phase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillatorsPhase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillators
 
On existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problemOn existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problem
 
Math stories
Math storiesMath stories
Math stories
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
 
IoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and SolutionsIoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and Solutions
 
Taxonomy of Differential Compression
Taxonomy of Differential CompressionTaxonomy of Differential Compression
Taxonomy of Differential Compression
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
 
Overview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) TechnologyOverview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) Technology
 
DLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and AlgorithmsDLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and Algorithms
 
Mathematical Modeling for Practical Problems
Mathematical Modeling for Practical ProblemsMathematical Modeling for Practical Problems
Mathematical Modeling for Practical Problems
 
Securing Your Data for Your Journey to the Cloud
Securing Your Data for Your Journey to the CloudSecuring Your Data for Your Journey to the Cloud
Securing Your Data for Your Journey to the Cloud
 

Recently uploaded

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Introduction to Deep Neural Network

  • 1. Copyright 2011 Trend Micro Inc. 1 Introduction to Deep Neural Network Liwei Ren, Ph.D San Jose, California, Nov, 2016
  • 2. Copyright 2011 Trend Micro Inc. Agenda • What a DNN is • How a DNN works • Why a DNN works • Those DNNs in action • Where the challenges are • Successful stories • Security problems • Summary • Quiz • What else 2
  • 3. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN and AI in the secular world 3
  • 4. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN and AI in the secular world 4
  • 5. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN and AI in the secular world 5
  • 6. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN in the technical world 6
  • 7. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN in the technical world 7
  • 8. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN in the technical world 8
  • 9. Copyright 2011 Trend Micro Inc. What is a DNN? • Categorizing the DNNs : 9
  • 10. Copyright 2011 Trend Micro Inc. What is a DNN? • Three technical elements • Architecture: the graph, weights/biases, activation functions • Activity Rule: weights/biases, activation functions • Learning Rule: a typical one is backpropagation algorithm • Three masters in this area: 10
  • 11. Copyright 2011 Trend Micro Inc. What is a DNN? • Given a practical problem , we have two approaches to solve it. 11
  • 12. Copyright 2011 Trend Micro Inc. What is a DNN? • An example: image recognition 12
  • 13. Copyright 2011 Trend Micro Inc. What is a DNN? • An example: image recognition 13
  • 14. Copyright 2011 Trend Micro Inc. What is a DNN? • In the mathematical world – A DNN is a mathematical function f: D  S, where D ⊆ Rn and S ⊆ Rm, which is constructed by a directed graph based architecture. – A DNN is also a composition of functions from a network of primitive functions. 14
  • 15. Copyright 2011 Trend Micro Inc. What is a DNN? • We denote the a feed-forward DNN function by O= f(I) which is determined by a few parameters G, Φ ,W,B • Hyper-parameters: – G is the directed graph which presents the structure – Φ presents one or multiple activation functions for activating the nodes • Parameters: – W is the vector of weights relevant to the edges – B is the vector of biases relevant to the nodes 15
  • 16. Copyright 2011 Trend Micro Inc. What is a DNN? • Activation at a node: 16
  • 17. Copyright 2011 Trend Micro Inc. What is a DNN? • Activation function: 17
  • 18. Copyright 2011 Trend Micro Inc. What is a DNN? • G=(V,E) is a graph and Φ is a set of activation functions. • <G,Φ> constructs a family of functions F: – F(G,Φ) = { f | f is a function constructed by <G, Φ ,W> where WϵRN } • N= total number of weights at all nodes of output layer and hidden layers. • Each f(I) can be denoted by f(I ,W). 18
  • 19. Copyright 2011 Trend Micro Inc. What is a DNN? • Mathematically, a DNN based supervised machine learning technology can be described as follows : – Given g ϵ { h | h:D  S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find f ϵ F(G,Φ) such that 𝑓 − 𝑔 < δ. • Essentially, it is to identify a W ϵ RN such that 𝑓(∗, 𝑊) − 𝑔 < δ • However, in practice, g is not explicitly expressed . It usually appears in a sequence of samples: – { <I(j),T(j)> | T(j) =g(I(j)), j=1, 2, …,M} • where I(j) is an input vector and T(j) is its corresponding target vector. 19
  • 20. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • The function g is not explicitly expressed, we are not able to calculate g − f(∗, W) • Instead, we evaluate the error function E(W)= 1 2𝑀 ∑||T(j) - f(I(j),W)||2 • We expect to determine W such that E(W) < δ • How to identify W ϵ RN so that E(W) < δ ? Lets solve the nonlinear optimization problem min{E(W)| W ϵ RN} , i.e.: min{ 1 2𝑀 ∑|| T(j) - f(I(j),W) ||2 | W ϵ RN } (P1) 20
  • 21. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • (P1) is for batch mode training, however ,it is too expensive. • In order to reduce the computational cost, a sequential mode is introduced. • Picking <I,T> ϵ {<I(1),T(1) >, <I(2),T(2)> ,…, <I(M),T(M)>} sequentially, let the output of the network as O= f(I,W) for any W: • Error function E(W)= ||T- f(I,W)||2 /2 = ∑(Tj-Oj)2 /2 • Each Oj can be considered as a function of W. We denote it as Oj(W). • We have the optimization problem for training with sequential mode: – min{ ∑(Tj-Oj(W) )2 /2 | W ϵ RN} (P2) 21
  • 22. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • One may ask whether we get the same solution for both batch mode and sequential mode ? • BTW – batch mode = offline mode – sequential mode = online mode • We focus on online mode in this talk 22
  • 23. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • How to solve the unconstrained nonlinear optimization problem (P2)? • The general approach of unconstrained nonlinear optimization is to find local minima of E(W) by using the iterative process of Gradient Descent. •∂E = (∂E/∂W1, ∂E/∂W2, …, ∂E/∂WT) • The iterations: – ΔWj = - γ ∂E/∂Wj for j=1, …,T – Updating W in each step by • Wj (k+1) = Wj (k) - γ ∂E(W (k))/∂Wj for j=1, …,T (A1) • until E(W (k+1)) < δ or E(W (k+1)) can not be reduced anymore 23
  • 24. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • The algorithm of Gradient Descent: 24
  • 25. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • From the perspective of mathematics, the process of Gradient Descent is straightforward. • However, from the perspective of scientific computing, it is quite challenging to calculate the values of all ∂E/∂Wj for j=1, …,N: – The complexity of presenting each ∂E/∂Wj where j=1, …,N. – There are (k+1)-layer function compositions for a DNN of k hidden layers. 25
  • 26. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • For example, we have a very simple network as follows with the activation function φ(v)=1/(1 + 𝑒−𝑣 ). • E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2, we have: – ∂E/∂w0 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) – ∂E/∂w1 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ(w3I+w2) – ∂E/∂w2 = - w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ’(w3I+w2) – ∂E/∂w3 = - I w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ’(w3I+w2) 26
  • 27. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • Lets imagine a network of N inputs, M outputs and K hidden layers each of which has L nodes. – It is a daunting task to express ∂E/∂wj explicitly. Last simple example already shows this. • The backpropagation (BP) algorithm was proposed as a rescue: – Main idea : the weights of (k-1)-th hidden layer can be expressed by the k-th layer recursively. – We can start with the output layer which is considered as (L+1)- layer. 27
  • 28. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • BP algorithm has the following major steps: 1. Feed-forward computation 2. Back-propagation to the output layer 3. Back-propagation to the hidden layers 4. Weight updates 28
  • 29. Copyright 2011 Trend Micro Inc. How Does a DNN work ? 29
  • 30. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • A general DNN can be drawn as follows 30
  • 31. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • How to express the weights of (k-1)-th hidden layer by the weights of k-th layer recursively? 31
  • 32. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • Let us experience the BP with our small network. – E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2. • ∂E/∂w0 = - φ’(O) (T – O) • ∂E/∂w1 = -φ’(O) (T – O) φ(O) • ∂E/∂w2 = - φ’(O) (T – O) φ’(H) w1 * 1 • ∂E/∂w3 = - φ’(O) (T – O) φ’(H) w1 * I – Let H0 (1)= 1, H1 (1) = H = φ(w3I+ w2), H1 (0) = I, we verify the follows: • δ1 (2)= φ’(O) (T – O) • w0 + = w0 + γ δ1 (2) H0 (1) , w1 + = w1 + γ δ1 (2) H1 (1) • δ1 (1)= φ’(H1 (1)) δ1 (2) w1 • w2 + = w2 + γ δ1 (1) H0 (0) , w3 + = w3 + γ δ1 (1) H1 (0) • where w0 = w0,1 (2) , w1 = w1,1 (2), w2 = w0,1 (1) , w2 = w1,1 (1) 32
  • 33. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? • It is amazing ! However, why does it work? • For a FNN, it is to ask whether the following approximation problem has a solution: – Given g ϵ { h | h:D  S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find a W ϵ RN such that 𝑓(∗, 𝑊) − 𝑔 < δ. • Universal approximation theorem (S): – Let φ(.) be a bounded and monotonically-increasing continuous function. Let Im denote the m-dimensional unit hypercube [0,1]m . The space of continuous functions on Im is denoted by C(Im) . Then, given any function f ϵ C(Im) and ε>0 , there exists an integer N , real constants vi, bi ϵ R and real vectors wi ϵ Rm, where i=1, …, N , such that |F(x)-f(x)| < ε for all x ϵ Im , where F(x) = vi φ(wi T x + bi)𝑵 𝒊=𝟏 is an approximation to the function f which is independent of φ . 33 Im
  • 34. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? • Its corresponding network with only one hidden layer – NOTE : this is not even a general case for one hidden layer. It is a special case. WHY? – However, it is powerful and encouraging from the mathematical perspective. 34 Im
  • 35. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? The general networks have a general version of Universal Approximation Theorem accordingly: 35 Im
  • 36. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? • Universal approximation theorem (G): – Let φ(.) be a bounded and monotonically-increasing continuous function. Let S be a compact space in Rm . Let C(S ) = {g | g:S ⊂ Rm  Rn is continuous}. Then, given any function f ϵ C(S) and ε>0 , there exists a FNN as shown above which constructs the network function F such that || F(x)-f(x) || < ε where F is an approximation to the function f which is independent of φ . • It seems both shallow and deep neural networks can construct an approximation to a given function. – Which is better? – Or which is more efficient in terms of using less nodes ? 36 Im Rm
  • 37. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? • Mathematical foundation of neural networks: 37 Im Rm
  • 38. Copyright 2011 Trend Micro Inc. Those DNNs in action • DNN has three elements • Architecture: the graph , weights/biases, activation functions • Activity Rule: weights/biases, activation functions • Learning Rule: a typical one is backpropagation algorithm • The architecture basically determines the capability of a specific DNN – Different architectures are suitable for different applications. – The most general architecture of an ANN is a DAG ( directed acyclic graph). 38
  • 39. Copyright 2011 Trend Micro Inc. Those DNNs in action • There are a few well-known categories of DNNs. 39
  • 40. Copyright 2011 Trend Micro Inc. What Are the Challenges? • Given a specific problem, there are a few questions before one starts the journey with DNNs: – Do you understand the problem that you need to solve? – Do you really want to solve this problem with DNN, why? • Do you have an alternative yet effective solution? – Do you know how to describe the problem in DNN mathematically ? – Do you know how to implement a DNN , beyond a few APIs and sizzling hype? – How to collect sufficient data for training? – How to solve the problem efficiently and cost-effectively? 40
  • 41. Copyright 2011 Trend Micro Inc. What Are the Challenges? • 1st Challenge: – a full mesh network has the curse of dimensionality. 41
  • 42. Copyright 2011 Trend Micro Inc. What Are the Challenges? • Many tasks of FNN do not need a full mesh network. • For example, if we can present the input vector as a grid, the nearest- neighborhood models can be used when constructing an effective FNN which can reduce connections – Image recognition – GO (圍棋) : a game that two players play on a 19x19 grid of lines. 42
  • 43. Copyright 2011 Trend Micro Inc. What Are the Challenges? • The 2nd challenge is how to describe a technical problem in terms of DNN, i.e., mathematical modeling. There are generally two approaches: – Applying a well-learned DNN architecture to describe the problem. Deep understanding of the specific network is usually required! • Two general DNN architectures are well-known – FNN: feedforward neural network. Its special architecture CNN (convolutional neural network) is widely used in many applications such as image recognition, GO, and etc. – RNN: recurrent neural network. Its special architecture is LSTM (long short- term memory) which has been applied successfully in speech recognition, language translation, and etc. • For example, if we want to try a FNN, how to describe the problem in terms of <Input vector, Output vector> with fixed dimension ? – Creating a novel DNN architecture from ground if none of the existing models fits your problem. Deep understanding of DNN theory / algorithms is required. 43
  • 44. Copyright 2011 Trend Micro Inc. What Are the Challenges? • Handwriting digit recognition: – Modeling this problem is straightforward 44
  • 45. Copyright 2011 Trend Micro Inc. What Are the Challenges? • Image Recognition is also straightforward 45
  • 46. Copyright 2011 Trend Micro Inc. What Are the Challenges? • However, due to the curse of dimensionality, we can use a special FFN: – Convolutional neural network (CNN) 46
  • 47. Copyright 2011 Trend Micro Inc. What Are the Challenges? • How to construct a DNN to describe language translation ? – They use LSTM networks • How to construct a DNN to describe the problem of malware classification? • How to construct a DNN to describe the network traffic for security purpose? 47
  • 48. Copyright 2011 Trend Micro Inc. What Are the Challenges? • The 3rd challenge is how to collect sufficient training data. To achieve required accuracy, sufficient training data is necessary. WHY? 48
  • 49. Copyright 2011 Trend Micro Inc. What Are the Challenges? • The 4th challenge is how to identify various talents for providing a DNN solution to solve specific problems. – Who knows how to use existing DL APIs such as TensorFlow – Who understands various DNN architectures in depth so that he/she knows how to evaluate and identify a suitable DNN architecture to solve the problem. – Who understands the theory and algorithms of the DNN in depth so that he/she can create and design a novel DNN from ground. 49
  • 50. Copyright 2011 Trend Micro Inc. Successful Stories • ImageNet : 1M+ images, 1000+ categories, CNN 50
  • 51. Copyright 2011 Trend Micro Inc. Successful Stories • Unsupervised learning neural networks… YouTube and the Cat . 51
  • 52. Copyright 2011 Trend Micro Inc. Successful Stories • AlphaGo, a significant milestone in AI history – More significant than DeepBlue • Both Policy Network and Value Network are CNNs. 52
  • 53. Copyright 2011 Trend Micro Inc. Successful Stories • Google Machine Neural Translation… LSTM (Long Short Term Memory) network 53
  • 54. Copyright 2011 Trend Micro Inc. Successful Stories • Microsoft Speech Recognition … LSTM and TDNN (Time Delay Neural Networks ) 54
  • 55. Copyright 2011 Trend Micro Inc. Security Problems • Not disclosed for the public version. 55
  • 56. Copyright 2011 Trend Micro Inc. Summary • What a DNN is • How a DNN works • Why a DNN works • The categories of DNNs • Some challenges • Well-known stories • Security problems 56
  • 57. Copyright 2011 Trend Micro Inc. Quiz • Why do we choose the activation function as a nonlinear function? • Why Deep? Why deep networks are better than shallow networks? • What is the difference between online and batch mode training? • Will online and batch mode training converge to the same solution? • Why do we need the backpropagation algorithm? • Why do we apply convolutional neural networks to image recognition? 57
  • 58. Copyright 2011 Trend Micro Inc. Quiz • If we solve a problem with a FNN, – how many deep layers should we go? – How many nodes are good for each layer? – How to estimate and optimize the cost? • Is it guaranteed that the backpropagation algorithm converge to a solution? • Why do we need sufficient data for training in order to achieve certain accuracy? • Can a DNN do some tasks more than extending human’s capabilities or automating extensive manual tasks ? – To prove a mathematical theorem ... or to introduce an interesting concept… or to appreciate a poem… or to love… 58
  • 59. Copyright 2011 Trend Micro Inc. Quiz • AlphaGo is trained for 19x19 lattice. If we play GO game on 20x20 board, can AlphaGo handle it? • ImageNet is trained for 1000 categories. If we add the 1001-th category, what should we do? • People do consider a special DNN as a black box. Why? • More questions from you … 59
  • 60. Copyright 2011 Trend Micro Inc. What Else? • What to share next from me? Why do you care? – Various DNNs: principles, examples, analysis and experiments… •ImageNet, AlphaGO, GNMT and etc.. – My Ph.D work and its relevance to DNN – Little History of AI and Artificial Neural Network – Various Schools of the AI Discipline – Strong AI vs. Weak AI 60
  • 61. Copyright 2011 Trend Micro Inc. What Else? • What to share next from me? Why do you care? – Questions when thinking about AI: • Are we able to understand how we learn? • Are we going the right directions mathematically and scientifically? • Are there simple principles for cognition like what Newton and Einstein established for understanding our universe? • What are we lack between now and the coming of so called Strong AI? 61
  • 62. Copyright 2011 Trend Micro Inc. What Else? • What to share next from me? Why do you care? •Questions about who we are. – Are we created?  – Are we the AI of the creator? •My little theory about the Universe 62