SlideShare a Scribd company logo
1 of 62
Download to read offline
Copyright 2011 Trend Micro Inc. 1
Introduction to Deep Neural Network
Liwei Ren, Ph.D
San Jose, California, Nov, 2016
Copyright 2011 Trend Micro Inc.
Agenda
• What a DNN is
• How a DNN works
• Why a DNN works
• Those DNNs in action
• Where the challenges are
• Successful stories
• Security problems
• Summary
• Quiz
• What else
2
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN and AI in the secular world
3
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN and AI in the secular world
4
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN and AI in the secular world
5
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN in the technical world
6
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN in the technical world
7
Copyright 2011 Trend Micro Inc.
What is a DNN?
• DNN in the technical world
8
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Categorizing the DNNs :
9
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Three technical elements
• Architecture: the graph, weights/biases, activation functions
• Activity Rule: weights/biases, activation functions
• Learning Rule: a typical one is backpropagation algorithm
• Three masters in this area:
10
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Given a practical problem , we have two approaches
to solve it.
11
Copyright 2011 Trend Micro Inc.
What is a DNN?
• An example: image recognition
12
Copyright 2011 Trend Micro Inc.
What is a DNN?
• An example: image recognition
13
Copyright 2011 Trend Micro Inc.
What is a DNN?
• In the mathematical world
– A DNN is a mathematical function f: D  S, where D ⊆ Rn and S ⊆ Rm,
which is constructed by a directed graph based architecture.
– A DNN is also a composition of functions from a network of primitive
functions.
14
Copyright 2011 Trend Micro Inc.
What is a DNN?
• We denote the a feed-forward DNN function by O= f(I) which
is determined by a few parameters G, Φ ,W,B
• Hyper-parameters:
– G is the directed graph which presents the structure
– Φ presents one or multiple activation functions for activating the nodes
• Parameters:
– W is the vector of weights relevant to the edges
– B is the vector of biases relevant to the nodes
15
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Activation at a node:
16
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Activation function:
17
Copyright 2011 Trend Micro Inc.
What is a DNN?
• G=(V,E) is a graph and Φ is a set of activation functions.
• <G,Φ> constructs a family of functions F:
– F(G,Φ) = { f | f is a function constructed by <G, Φ ,W> where WϵRN }
• N= total number of weights at all nodes of output layer and hidden layers.
• Each f(I) can be denoted by f(I ,W).
18
Copyright 2011 Trend Micro Inc.
What is a DNN?
• Mathematically, a DNN based supervised machine
learning technology can be described as follows :
– Given g ϵ { h | h:D  S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find f ϵ
F(G,Φ) such that 𝑓 − 𝑔 < δ.
• Essentially, it is to identify a W ϵ RN such that 𝑓(∗, 𝑊) − 𝑔 < δ
• However, in practice, g is not explicitly expressed . It
usually appears in a sequence of samples:
– { <I(j),T(j)> | T(j) =g(I(j)), j=1, 2, …,M}
• where I(j) is an input vector and T(j) is its corresponding target vector.
19
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• The function g is not explicitly expressed, we are not able
to calculate g − f(∗, W)
• Instead, we evaluate the error function E(W)=
1
2𝑀
∑||T(j) -
f(I(j),W)||2
• We expect to determine W such that E(W) < δ
• How to identify W ϵ RN so that E(W) < δ ? Lets solve the
nonlinear optimization problem min{E(W)| W ϵ RN} , i.e.:
min{
1
2𝑀
∑|| T(j) - f(I(j),W) ||2 | W ϵ RN } (P1)
20
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• (P1) is for batch mode training, however ,it is too
expensive.
• In order to reduce the computational cost, a sequential
mode is introduced.
• Picking <I,T> ϵ {<I(1),T(1) >, <I(2),T(2)> ,…, <I(M),T(M)>}
sequentially, let the output of the network as O= f(I,W) for any
W:
• Error function E(W)= ||T- f(I,W)||2 /2 = ∑(Tj-Oj)2 /2
• Each Oj can be considered as a function of W. We denote it as Oj(W).
• We have the optimization problem for training with
sequential mode:
– min{ ∑(Tj-Oj(W) )2 /2 | W ϵ RN} (P2)
21
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• One may ask whether we get the same solution for
both batch mode and sequential mode ?
• BTW
– batch mode = offline mode
– sequential mode = online mode
• We focus on online mode in this talk
22
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• How to solve the unconstrained nonlinear optimization
problem (P2)?
• The general approach of unconstrained nonlinear
optimization is to find local minima of E(W) by using
the iterative process of Gradient Descent.
•∂E = (∂E/∂W1, ∂E/∂W2, …, ∂E/∂WT)
• The iterations:
– ΔWj = - γ ∂E/∂Wj for j=1, …,T
– Updating W in each step by
• Wj
(k+1) = Wj
(k) - γ ∂E(W (k))/∂Wj for j=1, …,T (A1)
• until E(W (k+1)) < δ or E(W (k+1)) can not be reduced anymore
23
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• The algorithm of Gradient Descent:
24
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• From the perspective of mathematics, the process of
Gradient Descent is straightforward.
• However, from the perspective of scientific computing, it is
quite challenging to calculate the values of all ∂E/∂Wj for
j=1, …,N:
– The complexity of presenting each ∂E/∂Wj where j=1, …,N.
– There are (k+1)-layer function compositions for a DNN of k hidden
layers.
25
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• For example, we have a very simple network as follows
with the activation function φ(v)=1/(1 + 𝑒−𝑣
).
• E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2, we
have:
– ∂E/∂w0 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0)
– ∂E/∂w1 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ(w3I+w2)
– ∂E/∂w2 = - w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0)
φ’(w3I+w2)
– ∂E/∂w3 = - I w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0)
φ’(w3I+w2)
26
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• Lets imagine a network of N inputs, M outputs and K
hidden layers each of which has L nodes.
– It is a daunting task to express ∂E/∂wj explicitly. Last simple
example already shows this.
• The backpropagation (BP) algorithm was proposed as a
rescue:
– Main idea : the weights of (k-1)-th hidden layer can be expressed
by the k-th layer recursively.
– We can start with the output layer which is considered as (L+1)-
layer.
27
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• BP algorithm has the following major steps:
1. Feed-forward computation
2. Back-propagation to the output layer
3. Back-propagation to the hidden layers
4. Weight updates
28
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
29
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• A general DNN can be drawn as follows
30
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• How to express the weights of (k-1)-th hidden layer by the
weights of k-th layer recursively?
31
Copyright 2011 Trend Micro Inc.
How Does a DNN work ?
• Let us experience the BP with our small network.
– E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2.
• ∂E/∂w0 = - φ’(O) (T – O)
• ∂E/∂w1 = -φ’(O) (T – O) φ(O)
• ∂E/∂w2 = - φ’(O) (T – O) φ’(H) w1 * 1
• ∂E/∂w3 = - φ’(O) (T – O) φ’(H) w1 * I
– Let H0
(1)= 1, H1
(1) = H = φ(w3I+ w2), H1
(0) = I, we verify the follows:
• δ1
(2)= φ’(O) (T – O)
• w0
+ = w0 + γ δ1
(2) H0
(1) , w1
+ = w1 + γ δ1
(2) H1
(1)
• δ1
(1)= φ’(H1
(1)) δ1
(2) w1
• w2
+ = w2 + γ δ1
(1) H0
(0) , w3
+ = w3 + γ δ1
(1) H1
(0)
• where w0 = w0,1
(2) , w1 = w1,1
(2), w2 = w0,1
(1) , w2 = w1,1
(1)
32
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
• It is amazing ! However, why does it work?
• For a FNN, it is to ask whether the following approximation
problem has a solution:
– Given g ϵ { h | h:D  S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find a W ϵ
RN such that 𝑓(∗, 𝑊) − 𝑔 < δ.
• Universal approximation theorem (S):
– Let φ(.) be a bounded and monotonically-increasing continuous function. Let
Im denote the m-dimensional unit hypercube [0,1]m . The space of continuous
functions on Im is denoted by C(Im) . Then, given any function f ϵ C(Im) and
ε>0 , there exists an integer N , real constants vi, bi ϵ R and real vectors wi ϵ
Rm, where i=1, …, N , such that
|F(x)-f(x)| < ε
for all x ϵ Im , where F(x) = vi φ(wi T x + bi)𝑵
𝒊=𝟏 is an approximation to the
function f which is independent of φ .
33
Im
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
• Its corresponding network with only one hidden layer
– NOTE : this is not even a general case for one hidden layer. It is a
special case. WHY?
– However, it is powerful and encouraging from the mathematical
perspective.
34
Im
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
The general networks have a general version of Universal
Approximation Theorem accordingly:
35
Im
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
• Universal approximation theorem (G):
– Let φ(.) be a bounded and monotonically-increasing continuous function. Let
S be a compact space in Rm . Let C(S ) = {g | g:S ⊂ Rm  Rn is continuous}.
Then, given any function f ϵ C(S) and ε>0 , there exists a FNN as shown
above which constructs the network function F such that
|| F(x)-f(x) || < ε
where F is an approximation to the function f which is independent of φ .
• It seems both shallow and deep neural networks can
construct an approximation to a given function.
– Which is better?
– Or which is more efficient in terms of using less nodes ?
36
Im
Rm
Copyright 2011 Trend Micro Inc.
Why Does a DNN Work?
• Mathematical foundation of neural networks:
37
Im
Rm
Copyright 2011 Trend Micro Inc.
Those DNNs in action
• DNN has three elements
• Architecture: the graph , weights/biases, activation functions
• Activity Rule: weights/biases, activation functions
• Learning Rule: a typical one is backpropagation algorithm
• The architecture basically determines the capability of a specific DNN
– Different architectures are suitable for different applications.
– The most general architecture of an ANN is a DAG ( directed acyclic graph).
38
Copyright 2011 Trend Micro Inc.
Those DNNs in action
• There are a few well-known categories of DNNs.
39
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• Given a specific problem, there are a few questions
before one starts the journey with DNNs:
– Do you understand the problem that you need to solve?
– Do you really want to solve this problem with DNN, why?
• Do you have an alternative yet effective solution?
– Do you know how to describe the problem in DNN mathematically ?
– Do you know how to implement a DNN , beyond a few APIs and
sizzling hype?
– How to collect sufficient data for training?
– How to solve the problem efficiently and cost-effectively?
40
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• 1st Challenge:
– a full mesh network has the curse of dimensionality.
41
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• Many tasks of FNN do not need a full mesh network.
• For example, if we can present the input vector as a grid, the nearest-
neighborhood models can be used when constructing an effective FNN
which can reduce connections
– Image recognition
– GO (圍棋) : a game that two players play on a 19x19 grid of lines.
42
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• The 2nd challenge is how to describe a technical problem in terms of
DNN, i.e., mathematical modeling. There are generally two
approaches:
– Applying a well-learned DNN architecture to describe the problem. Deep
understanding of the specific network is usually required!
• Two general DNN architectures are well-known
– FNN: feedforward neural network. Its special architecture CNN (convolutional
neural network) is widely used in many applications such as image recognition,
GO, and etc.
– RNN: recurrent neural network. Its special architecture is LSTM (long short-
term memory) which has been applied successfully in speech recognition,
language translation, and etc.
• For example, if we want to try a FNN, how to describe the problem in
terms of <Input vector, Output vector> with fixed dimension ?
– Creating a novel DNN architecture from ground if none of the existing
models fits your problem. Deep understanding of DNN theory /
algorithms is required.
43
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• Handwriting digit recognition:
– Modeling this problem is straightforward
44
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• Image Recognition is also straightforward
45
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• However, due to the curse of dimensionality, we can use a special FFN:
– Convolutional neural network (CNN)
46
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• How to construct a DNN to describe language translation ?
– They use LSTM networks
• How to construct a DNN to describe the problem of malware
classification?
• How to construct a DNN to describe the network traffic for
security purpose?
47
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• The 3rd challenge is how to collect sufficient training data.
To achieve required accuracy, sufficient training data is
necessary. WHY?
48
Copyright 2011 Trend Micro Inc.
What Are the Challenges?
• The 4th challenge is how to identify various talents for
providing a DNN solution to solve specific problems.
– Who knows how to use existing DL APIs such as TensorFlow
– Who understands various DNN architectures in depth so that he/she
knows how to evaluate and identify a suitable DNN architecture to
solve the problem.
– Who understands the theory and algorithms of the DNN in depth so
that he/she can create and design a novel DNN from ground.
49
Copyright 2011 Trend Micro Inc.
Successful Stories
• ImageNet : 1M+ images, 1000+ categories, CNN
50
Copyright 2011 Trend Micro Inc.
Successful Stories
• Unsupervised learning neural networks… YouTube and the
Cat .
51
Copyright 2011 Trend Micro Inc.
Successful Stories
• AlphaGo, a significant milestone in AI history
– More significant than DeepBlue
• Both Policy Network and Value Network are CNNs.
52
Copyright 2011 Trend Micro Inc.
Successful Stories
• Google Machine Neural Translation… LSTM (Long
Short Term Memory) network
53
Copyright 2011 Trend Micro Inc.
Successful Stories
• Microsoft Speech Recognition … LSTM and TDNN
(Time Delay Neural Networks )
54
Copyright 2011 Trend Micro Inc.
Security Problems
• Not disclosed for the public version.
55
Copyright 2011 Trend Micro Inc.
Summary
• What a DNN is
• How a DNN works
• Why a DNN works
• The categories of DNNs
• Some challenges
• Well-known stories
• Security problems
56
Copyright 2011 Trend Micro Inc.
Quiz
• Why do we choose the activation function as a
nonlinear function?
• Why Deep? Why deep networks are better than
shallow networks?
• What is the difference between online and batch
mode training?
• Will online and batch mode training converge to the
same solution?
• Why do we need the backpropagation algorithm?
• Why do we apply convolutional neural networks to
image recognition? 57
Copyright 2011 Trend Micro Inc.
Quiz
• If we solve a problem with a FNN,
– how many deep layers should we go?
– How many nodes are good for each layer?
– How to estimate and optimize the cost?
• Is it guaranteed that the backpropagation algorithm converge
to a solution?
• Why do we need sufficient data for training in order to achieve
certain accuracy?
• Can a DNN do some tasks more than extending human’s
capabilities or automating extensive manual tasks ?
– To prove a mathematical theorem ... or to introduce an interesting
concept… or to appreciate a poem… or to love…
58
Copyright 2011 Trend Micro Inc.
Quiz
• AlphaGo is trained for 19x19 lattice. If we play GO
game on 20x20 board, can AlphaGo handle it?
• ImageNet is trained for 1000 categories. If we add the
1001-th category, what should we do?
• People do consider a special DNN as a black box.
Why?
• More questions from you …
59
Copyright 2011 Trend Micro Inc.
What Else?
• What to share next from me? Why do you care?
– Various DNNs: principles, examples, analysis and
experiments…
•ImageNet, AlphaGO, GNMT and etc..
– My Ph.D work and its relevance to DNN
– Little History of AI and Artificial Neural Network
– Various Schools of the AI Discipline
– Strong AI vs. Weak AI
60
Copyright 2011 Trend Micro Inc.
What Else?
• What to share next from me? Why do you care?
– Questions when thinking about AI:
• Are we able to understand how we learn?
• Are we going the right directions mathematically and scientifically?
• Are there simple principles for cognition like what Newton and Einstein
established for understanding our universe?
• What are we lack between now and the coming of so called Strong AI?
61
Copyright 2011 Trend Micro Inc.
What Else?
• What to share next from me? Why do you care?
•Questions about who we are.
– Are we created? 
– Are we the AI of the creator?
•My little theory about the Universe
62

More Related Content

What's hot

Low-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsLow-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsAlexander Litvinenko
 
Chapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image TransformationChapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image TransformationVarun Ojha
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationJason Anderson
 
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارشبررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارشپروژه مارکت
 
Lecture 14 Properties of Fourier Transform for 2D Signal
Lecture 14 Properties of Fourier Transform for 2D SignalLecture 14 Properties of Fourier Transform for 2D Signal
Lecture 14 Properties of Fourier Transform for 2D SignalVARUN KUMAR
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9M S Prasad
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes FamilyKota Matsui
 
Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Tatsuya Yokota
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkAshwin Rao
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 

What's hot (20)

Low-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsLow-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problems
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Chapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image TransformationChapter 4 Image Processing: Image Transformation
Chapter 4 Image Processing: Image Transformation
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارشبررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
بررسی دو روش شناسایی سیستم های متغیر با زمان به همراه شبیه سازی و گزارش
 
Perceptron
PerceptronPerceptron
Perceptron
 
Lecture 14 Properties of Fourier Transform for 2D Signal
Lecture 14 Properties of Fourier Transform for 2D SignalLecture 14 Properties of Fourier Transform for 2D Signal
Lecture 14 Properties of Fourier Transform for 2D Signal
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9
 
Neural Processes Family
Neural Processes FamilyNeural Processes Family
Neural Processes Family
 
Neural Networks - How do they work?
Neural Networks - How do they work?Neural Networks - How do they work?
Neural Networks - How do they work?
 
Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)Linked CP Tensor Decomposition (presented by ICONIP2012)
Linked CP Tensor Decomposition (presented by ICONIP2012)
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
 
Max net
Max netMax net
Max net
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Chapter 16
Chapter 16Chapter 16
Chapter 16
 
Aleksander gegov
Aleksander gegovAleksander gegov
Aleksander gegov
 
deep learning
deep learningdeep learning
deep learning
 

Viewers also liked

Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 
neural network
neural networkneural network
neural networkSTUDENT
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural networkNagarajan
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSREHMAT ULLAH
 
Artificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google BrainArtificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google BrainRawan Al-Omari
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksstellajoseph
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxChun-Hao Chang
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Qingkai Kong
 
Neural network 20161210_jintaekseo
Neural network 20161210_jintaekseoNeural network 20161210_jintaekseo
Neural network 20161210_jintaekseoJinTaek Seo
 
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측FAST CAMPUS
 
Árboles de Decisión en Weka
Árboles de Decisión en WekaÁrboles de Decisión en Weka
Árboles de Decisión en WekaLorena Quiñónez
 
Perceptron Simple y Regla Aprendizaje
Perceptron  Simple y  Regla  AprendizajePerceptron  Simple y  Regla  Aprendizaje
Perceptron Simple y Regla AprendizajeRoberth Figueroa-Diaz
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a functionTaisuke Oe
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 

Viewers also liked (20)

企业安全市场综述
企业安全市场综述 企业安全市场综述
企业安全市场综述
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
neural network
neural networkneural network
neural network
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
 
聊一聊大明朝的火器
聊一聊大明朝的火器聊一聊大明朝的火器
聊一聊大明朝的火器
 
Artificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google BrainArtificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google Brain
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
硅谷的那点事儿
硅谷的那点事儿硅谷的那点事儿
硅谷的那点事儿
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network
 
Neural network 20161210_jintaekseo
Neural network 20161210_jintaekseoNeural network 20161210_jintaekseo
Neural network 20161210_jintaekseo
 
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
Boosted decision tree를 활용한 lending club의 채무자 원리금 상환 여부 예측
 
Árboles de Decisión en Weka
Árboles de Decisión en WekaÁrboles de Decisión en Weka
Árboles de Decisión en Weka
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
Perceptron Simple y Regla Aprendizaje
Perceptron  Simple y  Regla  AprendizajePerceptron  Simple y  Regla  Aprendizaje
Perceptron Simple y Regla Aprendizaje
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a function
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Neural network
Neural networkNeural network
Neural network
 

Similar to Introduction to Deep Neural Network

Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learningsimaokasonse
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihmSajid Marwat
 
Find all hazards in this circuit. Redesign the circuit as a three-le.pdf
Find all hazards in this circuit.  Redesign the circuit as a three-le.pdfFind all hazards in this circuit.  Redesign the circuit as a three-le.pdf
Find all hazards in this circuit. Redesign the circuit as a three-le.pdfArrowdeepak
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdfabay golla
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
Digital Signal Processing Tutorial:Chapt 1 signal and systems
Digital Signal Processing Tutorial:Chapt 1 signal and systemsDigital Signal Processing Tutorial:Chapt 1 signal and systems
Digital Signal Processing Tutorial:Chapt 1 signal and systemsChandrashekhar Padole
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習台灣資料科學年會
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingDongang (Sean) Wang
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsEellekwameowusu
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdfnyomans1
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.Tariq Khan
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexityAbbas Ali
 

Similar to Introduction to Deep Neural Network (20)

UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learning
 
Time complexity.ppt
Time complexity.pptTime complexity.ppt
Time complexity.ppt
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihm
 
Find all hazards in this circuit. Redesign the circuit as a three-le.pdf
Find all hazards in this circuit.  Redesign the circuit as a three-le.pdfFind all hazards in this circuit.  Redesign the circuit as a three-le.pdf
Find all hazards in this circuit. Redesign the circuit as a three-le.pdf
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdf
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Digital Signal Processing Tutorial:Chapt 1 signal and systems
Digital Signal Processing Tutorial:Chapt 1 signal and systemsDigital Signal Processing Tutorial:Chapt 1 signal and systems
Digital Signal Processing Tutorial:Chapt 1 signal and systems
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
Scala and Deep Learning
Scala and Deep LearningScala and Deep Learning
Scala and Deep Learning
 
RNN and sequence-to-sequence processing
RNN and sequence-to-sequence processingRNN and sequence-to-sequence processing
RNN and sequence-to-sequence processing
 
Activation function
Activation functionActivation function
Activation function
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithms
 
Cdc18 dg lee
Cdc18 dg leeCdc18 dg lee
Cdc18 dg lee
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Recursion in Java
Recursion in JavaRecursion in Java
Recursion in Java
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexity
 

More from Liwei Ren任力偉

信息安全领域里的创新和机遇
信息安全领域里的创新和机遇信息安全领域里的创新和机遇
信息安全领域里的创新和机遇Liwei Ren任力偉
 
移动互联网时代下创新的思维
移动互联网时代下创新的思维移动互联网时代下创新的思维
移动互联网时代下创新的思维Liwei Ren任力偉
 
非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究Liwei Ren任力偉
 
Arm the World with SPN based Security
Arm the World with SPN based SecurityArm the World with SPN based Security
Arm the World with SPN based SecurityLiwei Ren任力偉
 
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemExtending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemLiwei Ren任力偉
 
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsNear Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsLiwei Ren任力偉
 
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...Liwei Ren任力偉
 
Phase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillatorsPhase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillatorsLiwei Ren任力偉
 
On existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problemOn existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problemLiwei Ren任力偉
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
 
IoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and SolutionsIoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and SolutionsLiwei Ren任力偉
 
Taxonomy of Differential Compression
Taxonomy of Differential CompressionTaxonomy of Differential Compression
Taxonomy of Differential CompressionLiwei Ren任力偉
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and ApplicationsLiwei Ren任力偉
 
Overview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) TechnologyOverview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) TechnologyLiwei Ren任力偉
 
DLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and AlgorithmsDLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and AlgorithmsLiwei Ren任力偉
 
Mathematical Modeling for Practical Problems
Mathematical Modeling for Practical ProblemsMathematical Modeling for Practical Problems
Mathematical Modeling for Practical ProblemsLiwei Ren任力偉
 
Securing Your Data for Your Journey to the Cloud
Securing Your Data for Your Journey to the CloudSecuring Your Data for Your Journey to the Cloud
Securing Your Data for Your Journey to the CloudLiwei Ren任力偉
 

More from Liwei Ren任力偉 (20)

信息安全领域里的创新和机遇
信息安全领域里的创新和机遇信息安全领域里的创新和机遇
信息安全领域里的创新和机遇
 
防火牆們的故事
防火牆們的故事防火牆們的故事
防火牆們的故事
 
移动互联网时代下创新的思维
移动互联网时代下创新的思维移动互联网时代下创新的思维
移动互联网时代下创新的思维
 
非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究非齐次特征值问题解存在性研究
非齐次特征值问题解存在性研究
 
世纪猜想
世纪猜想世纪猜想
世纪猜想
 
Arm the World with SPN based Security
Arm the World with SPN based SecurityArm the World with SPN based Security
Arm the World with SPN based Security
 
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemExtending Boyer-Moore Algorithm to an Abstract String Matching Problem
Extending Boyer-Moore Algorithm to an Abstract String Matching Problem
 
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsNear Duplicate Document Detection: Mathematical Modeling and Algorithms
Near Duplicate Document Detection: Mathematical Modeling and Algorithms
 
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...
 
Phase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillatorsPhase locking in chains of multiple-coupled oscillators
Phase locking in chains of multiple-coupled oscillators
 
On existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problemOn existence of the solution of inhomogeneous eigenvalue problem
On existence of the solution of inhomogeneous eigenvalue problem
 
Math stories
Math storiesMath stories
Math stories
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
 
IoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and SolutionsIoT Security: Problems, Challenges and Solutions
IoT Security: Problems, Challenges and Solutions
 
Taxonomy of Differential Compression
Taxonomy of Differential CompressionTaxonomy of Differential Compression
Taxonomy of Differential Compression
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
 
Overview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) TechnologyOverview of Data Loss Prevention (DLP) Technology
Overview of Data Loss Prevention (DLP) Technology
 
DLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and AlgorithmsDLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and Algorithms
 
Mathematical Modeling for Practical Problems
Mathematical Modeling for Practical ProblemsMathematical Modeling for Practical Problems
Mathematical Modeling for Practical Problems
 
Securing Your Data for Your Journey to the Cloud
Securing Your Data for Your Journey to the CloudSecuring Your Data for Your Journey to the Cloud
Securing Your Data for Your Journey to the Cloud
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Introduction to Deep Neural Network

  • 1. Copyright 2011 Trend Micro Inc. 1 Introduction to Deep Neural Network Liwei Ren, Ph.D San Jose, California, Nov, 2016
  • 2. Copyright 2011 Trend Micro Inc. Agenda • What a DNN is • How a DNN works • Why a DNN works • Those DNNs in action • Where the challenges are • Successful stories • Security problems • Summary • Quiz • What else 2
  • 3. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN and AI in the secular world 3
  • 4. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN and AI in the secular world 4
  • 5. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN and AI in the secular world 5
  • 6. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN in the technical world 6
  • 7. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN in the technical world 7
  • 8. Copyright 2011 Trend Micro Inc. What is a DNN? • DNN in the technical world 8
  • 9. Copyright 2011 Trend Micro Inc. What is a DNN? • Categorizing the DNNs : 9
  • 10. Copyright 2011 Trend Micro Inc. What is a DNN? • Three technical elements • Architecture: the graph, weights/biases, activation functions • Activity Rule: weights/biases, activation functions • Learning Rule: a typical one is backpropagation algorithm • Three masters in this area: 10
  • 11. Copyright 2011 Trend Micro Inc. What is a DNN? • Given a practical problem , we have two approaches to solve it. 11
  • 12. Copyright 2011 Trend Micro Inc. What is a DNN? • An example: image recognition 12
  • 13. Copyright 2011 Trend Micro Inc. What is a DNN? • An example: image recognition 13
  • 14. Copyright 2011 Trend Micro Inc. What is a DNN? • In the mathematical world – A DNN is a mathematical function f: D  S, where D ⊆ Rn and S ⊆ Rm, which is constructed by a directed graph based architecture. – A DNN is also a composition of functions from a network of primitive functions. 14
  • 15. Copyright 2011 Trend Micro Inc. What is a DNN? • We denote the a feed-forward DNN function by O= f(I) which is determined by a few parameters G, Φ ,W,B • Hyper-parameters: – G is the directed graph which presents the structure – Φ presents one or multiple activation functions for activating the nodes • Parameters: – W is the vector of weights relevant to the edges – B is the vector of biases relevant to the nodes 15
  • 16. Copyright 2011 Trend Micro Inc. What is a DNN? • Activation at a node: 16
  • 17. Copyright 2011 Trend Micro Inc. What is a DNN? • Activation function: 17
  • 18. Copyright 2011 Trend Micro Inc. What is a DNN? • G=(V,E) is a graph and Φ is a set of activation functions. • <G,Φ> constructs a family of functions F: – F(G,Φ) = { f | f is a function constructed by <G, Φ ,W> where WϵRN } • N= total number of weights at all nodes of output layer and hidden layers. • Each f(I) can be denoted by f(I ,W). 18
  • 19. Copyright 2011 Trend Micro Inc. What is a DNN? • Mathematically, a DNN based supervised machine learning technology can be described as follows : – Given g ϵ { h | h:D  S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find f ϵ F(G,Φ) such that 𝑓 − 𝑔 < δ. • Essentially, it is to identify a W ϵ RN such that 𝑓(∗, 𝑊) − 𝑔 < δ • However, in practice, g is not explicitly expressed . It usually appears in a sequence of samples: – { <I(j),T(j)> | T(j) =g(I(j)), j=1, 2, …,M} • where I(j) is an input vector and T(j) is its corresponding target vector. 19
  • 20. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • The function g is not explicitly expressed, we are not able to calculate g − f(∗, W) • Instead, we evaluate the error function E(W)= 1 2𝑀 ∑||T(j) - f(I(j),W)||2 • We expect to determine W such that E(W) < δ • How to identify W ϵ RN so that E(W) < δ ? Lets solve the nonlinear optimization problem min{E(W)| W ϵ RN} , i.e.: min{ 1 2𝑀 ∑|| T(j) - f(I(j),W) ||2 | W ϵ RN } (P1) 20
  • 21. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • (P1) is for batch mode training, however ,it is too expensive. • In order to reduce the computational cost, a sequential mode is introduced. • Picking <I,T> ϵ {<I(1),T(1) >, <I(2),T(2)> ,…, <I(M),T(M)>} sequentially, let the output of the network as O= f(I,W) for any W: • Error function E(W)= ||T- f(I,W)||2 /2 = ∑(Tj-Oj)2 /2 • Each Oj can be considered as a function of W. We denote it as Oj(W). • We have the optimization problem for training with sequential mode: – min{ ∑(Tj-Oj(W) )2 /2 | W ϵ RN} (P2) 21
  • 22. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • One may ask whether we get the same solution for both batch mode and sequential mode ? • BTW – batch mode = offline mode – sequential mode = online mode • We focus on online mode in this talk 22
  • 23. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • How to solve the unconstrained nonlinear optimization problem (P2)? • The general approach of unconstrained nonlinear optimization is to find local minima of E(W) by using the iterative process of Gradient Descent. •∂E = (∂E/∂W1, ∂E/∂W2, …, ∂E/∂WT) • The iterations: – ΔWj = - γ ∂E/∂Wj for j=1, …,T – Updating W in each step by • Wj (k+1) = Wj (k) - γ ∂E(W (k))/∂Wj for j=1, …,T (A1) • until E(W (k+1)) < δ or E(W (k+1)) can not be reduced anymore 23
  • 24. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • The algorithm of Gradient Descent: 24
  • 25. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • From the perspective of mathematics, the process of Gradient Descent is straightforward. • However, from the perspective of scientific computing, it is quite challenging to calculate the values of all ∂E/∂Wj for j=1, …,N: – The complexity of presenting each ∂E/∂Wj where j=1, …,N. – There are (k+1)-layer function compositions for a DNN of k hidden layers. 25
  • 26. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • For example, we have a very simple network as follows with the activation function φ(v)=1/(1 + 𝑒−𝑣 ). • E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2, we have: – ∂E/∂w0 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) – ∂E/∂w1 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ(w3I+w2) – ∂E/∂w2 = - w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ’(w3I+w2) – ∂E/∂w3 = - I w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ’(w3I+w2) 26
  • 27. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • Lets imagine a network of N inputs, M outputs and K hidden layers each of which has L nodes. – It is a daunting task to express ∂E/∂wj explicitly. Last simple example already shows this. • The backpropagation (BP) algorithm was proposed as a rescue: – Main idea : the weights of (k-1)-th hidden layer can be expressed by the k-th layer recursively. – We can start with the output layer which is considered as (L+1)- layer. 27
  • 28. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • BP algorithm has the following major steps: 1. Feed-forward computation 2. Back-propagation to the output layer 3. Back-propagation to the hidden layers 4. Weight updates 28
  • 29. Copyright 2011 Trend Micro Inc. How Does a DNN work ? 29
  • 30. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • A general DNN can be drawn as follows 30
  • 31. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • How to express the weights of (k-1)-th hidden layer by the weights of k-th layer recursively? 31
  • 32. Copyright 2011 Trend Micro Inc. How Does a DNN work ? • Let us experience the BP with our small network. – E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2. • ∂E/∂w0 = - φ’(O) (T – O) • ∂E/∂w1 = -φ’(O) (T – O) φ(O) • ∂E/∂w2 = - φ’(O) (T – O) φ’(H) w1 * 1 • ∂E/∂w3 = - φ’(O) (T – O) φ’(H) w1 * I – Let H0 (1)= 1, H1 (1) = H = φ(w3I+ w2), H1 (0) = I, we verify the follows: • δ1 (2)= φ’(O) (T – O) • w0 + = w0 + γ δ1 (2) H0 (1) , w1 + = w1 + γ δ1 (2) H1 (1) • δ1 (1)= φ’(H1 (1)) δ1 (2) w1 • w2 + = w2 + γ δ1 (1) H0 (0) , w3 + = w3 + γ δ1 (1) H1 (0) • where w0 = w0,1 (2) , w1 = w1,1 (2), w2 = w0,1 (1) , w2 = w1,1 (1) 32
  • 33. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? • It is amazing ! However, why does it work? • For a FNN, it is to ask whether the following approximation problem has a solution: – Given g ϵ { h | h:D  S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find a W ϵ RN such that 𝑓(∗, 𝑊) − 𝑔 < δ. • Universal approximation theorem (S): – Let φ(.) be a bounded and monotonically-increasing continuous function. Let Im denote the m-dimensional unit hypercube [0,1]m . The space of continuous functions on Im is denoted by C(Im) . Then, given any function f ϵ C(Im) and ε>0 , there exists an integer N , real constants vi, bi ϵ R and real vectors wi ϵ Rm, where i=1, …, N , such that |F(x)-f(x)| < ε for all x ϵ Im , where F(x) = vi φ(wi T x + bi)𝑵 𝒊=𝟏 is an approximation to the function f which is independent of φ . 33 Im
  • 34. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? • Its corresponding network with only one hidden layer – NOTE : this is not even a general case for one hidden layer. It is a special case. WHY? – However, it is powerful and encouraging from the mathematical perspective. 34 Im
  • 35. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? The general networks have a general version of Universal Approximation Theorem accordingly: 35 Im
  • 36. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? • Universal approximation theorem (G): – Let φ(.) be a bounded and monotonically-increasing continuous function. Let S be a compact space in Rm . Let C(S ) = {g | g:S ⊂ Rm  Rn is continuous}. Then, given any function f ϵ C(S) and ε>0 , there exists a FNN as shown above which constructs the network function F such that || F(x)-f(x) || < ε where F is an approximation to the function f which is independent of φ . • It seems both shallow and deep neural networks can construct an approximation to a given function. – Which is better? – Or which is more efficient in terms of using less nodes ? 36 Im Rm
  • 37. Copyright 2011 Trend Micro Inc. Why Does a DNN Work? • Mathematical foundation of neural networks: 37 Im Rm
  • 38. Copyright 2011 Trend Micro Inc. Those DNNs in action • DNN has three elements • Architecture: the graph , weights/biases, activation functions • Activity Rule: weights/biases, activation functions • Learning Rule: a typical one is backpropagation algorithm • The architecture basically determines the capability of a specific DNN – Different architectures are suitable for different applications. – The most general architecture of an ANN is a DAG ( directed acyclic graph). 38
  • 39. Copyright 2011 Trend Micro Inc. Those DNNs in action • There are a few well-known categories of DNNs. 39
  • 40. Copyright 2011 Trend Micro Inc. What Are the Challenges? • Given a specific problem, there are a few questions before one starts the journey with DNNs: – Do you understand the problem that you need to solve? – Do you really want to solve this problem with DNN, why? • Do you have an alternative yet effective solution? – Do you know how to describe the problem in DNN mathematically ? – Do you know how to implement a DNN , beyond a few APIs and sizzling hype? – How to collect sufficient data for training? – How to solve the problem efficiently and cost-effectively? 40
  • 41. Copyright 2011 Trend Micro Inc. What Are the Challenges? • 1st Challenge: – a full mesh network has the curse of dimensionality. 41
  • 42. Copyright 2011 Trend Micro Inc. What Are the Challenges? • Many tasks of FNN do not need a full mesh network. • For example, if we can present the input vector as a grid, the nearest- neighborhood models can be used when constructing an effective FNN which can reduce connections – Image recognition – GO (圍棋) : a game that two players play on a 19x19 grid of lines. 42
  • 43. Copyright 2011 Trend Micro Inc. What Are the Challenges? • The 2nd challenge is how to describe a technical problem in terms of DNN, i.e., mathematical modeling. There are generally two approaches: – Applying a well-learned DNN architecture to describe the problem. Deep understanding of the specific network is usually required! • Two general DNN architectures are well-known – FNN: feedforward neural network. Its special architecture CNN (convolutional neural network) is widely used in many applications such as image recognition, GO, and etc. – RNN: recurrent neural network. Its special architecture is LSTM (long short- term memory) which has been applied successfully in speech recognition, language translation, and etc. • For example, if we want to try a FNN, how to describe the problem in terms of <Input vector, Output vector> with fixed dimension ? – Creating a novel DNN architecture from ground if none of the existing models fits your problem. Deep understanding of DNN theory / algorithms is required. 43
  • 44. Copyright 2011 Trend Micro Inc. What Are the Challenges? • Handwriting digit recognition: – Modeling this problem is straightforward 44
  • 45. Copyright 2011 Trend Micro Inc. What Are the Challenges? • Image Recognition is also straightforward 45
  • 46. Copyright 2011 Trend Micro Inc. What Are the Challenges? • However, due to the curse of dimensionality, we can use a special FFN: – Convolutional neural network (CNN) 46
  • 47. Copyright 2011 Trend Micro Inc. What Are the Challenges? • How to construct a DNN to describe language translation ? – They use LSTM networks • How to construct a DNN to describe the problem of malware classification? • How to construct a DNN to describe the network traffic for security purpose? 47
  • 48. Copyright 2011 Trend Micro Inc. What Are the Challenges? • The 3rd challenge is how to collect sufficient training data. To achieve required accuracy, sufficient training data is necessary. WHY? 48
  • 49. Copyright 2011 Trend Micro Inc. What Are the Challenges? • The 4th challenge is how to identify various talents for providing a DNN solution to solve specific problems. – Who knows how to use existing DL APIs such as TensorFlow – Who understands various DNN architectures in depth so that he/she knows how to evaluate and identify a suitable DNN architecture to solve the problem. – Who understands the theory and algorithms of the DNN in depth so that he/she can create and design a novel DNN from ground. 49
  • 50. Copyright 2011 Trend Micro Inc. Successful Stories • ImageNet : 1M+ images, 1000+ categories, CNN 50
  • 51. Copyright 2011 Trend Micro Inc. Successful Stories • Unsupervised learning neural networks… YouTube and the Cat . 51
  • 52. Copyright 2011 Trend Micro Inc. Successful Stories • AlphaGo, a significant milestone in AI history – More significant than DeepBlue • Both Policy Network and Value Network are CNNs. 52
  • 53. Copyright 2011 Trend Micro Inc. Successful Stories • Google Machine Neural Translation… LSTM (Long Short Term Memory) network 53
  • 54. Copyright 2011 Trend Micro Inc. Successful Stories • Microsoft Speech Recognition … LSTM and TDNN (Time Delay Neural Networks ) 54
  • 55. Copyright 2011 Trend Micro Inc. Security Problems • Not disclosed for the public version. 55
  • 56. Copyright 2011 Trend Micro Inc. Summary • What a DNN is • How a DNN works • Why a DNN works • The categories of DNNs • Some challenges • Well-known stories • Security problems 56
  • 57. Copyright 2011 Trend Micro Inc. Quiz • Why do we choose the activation function as a nonlinear function? • Why Deep? Why deep networks are better than shallow networks? • What is the difference between online and batch mode training? • Will online and batch mode training converge to the same solution? • Why do we need the backpropagation algorithm? • Why do we apply convolutional neural networks to image recognition? 57
  • 58. Copyright 2011 Trend Micro Inc. Quiz • If we solve a problem with a FNN, – how many deep layers should we go? – How many nodes are good for each layer? – How to estimate and optimize the cost? • Is it guaranteed that the backpropagation algorithm converge to a solution? • Why do we need sufficient data for training in order to achieve certain accuracy? • Can a DNN do some tasks more than extending human’s capabilities or automating extensive manual tasks ? – To prove a mathematical theorem ... or to introduce an interesting concept… or to appreciate a poem… or to love… 58
  • 59. Copyright 2011 Trend Micro Inc. Quiz • AlphaGo is trained for 19x19 lattice. If we play GO game on 20x20 board, can AlphaGo handle it? • ImageNet is trained for 1000 categories. If we add the 1001-th category, what should we do? • People do consider a special DNN as a black box. Why? • More questions from you … 59
  • 60. Copyright 2011 Trend Micro Inc. What Else? • What to share next from me? Why do you care? – Various DNNs: principles, examples, analysis and experiments… •ImageNet, AlphaGO, GNMT and etc.. – My Ph.D work and its relevance to DNN – Little History of AI and Artificial Neural Network – Various Schools of the AI Discipline – Strong AI vs. Weak AI 60
  • 61. Copyright 2011 Trend Micro Inc. What Else? • What to share next from me? Why do you care? – Questions when thinking about AI: • Are we able to understand how we learn? • Are we going the right directions mathematically and scientifically? • Are there simple principles for cognition like what Newton and Einstein established for understanding our universe? • What are we lack between now and the coming of so called Strong AI? 61
  • 62. Copyright 2011 Trend Micro Inc. What Else? • What to share next from me? Why do you care? •Questions about who we are. – Are we created?  – Are we the AI of the creator? •My little theory about the Universe 62