SlideShare a Scribd company logo
1 of 33
Download to read offline
Neural networks. Overview
Oleksandr Baiev, PhD
Senior Engineer
Samsung R&D Institute Ukraine
Neural networks. Overview
• Common principles
– Structure
– Learning
• Shallow and Deep NN
• Additional methods
– Conventional
– Voodoo
Neural networks. Overview
• Common principles
– Structure
– Learning
• Shallow and Deep NN
• Additional methods
– Conventional
– Voodoo
Canonical/Typical tasks
Solutions in general
𝑥𝑗 = 𝑥1, 𝑥2, 𝑥3, 𝑥4, … , 𝑥𝑖, … 𝑗 ∈ 𝑋
𝑦𝑗 = 𝑦1, 𝑦2, … , 𝑦 𝑘, … 𝑗 ∈ 𝑌
𝐹: 𝑋 → 𝑌
Classification
𝑦1 = 1,0,0
𝑦2 = 0,0,1
𝑦3 = 0,1,0
𝑦4 = 0,1,0
Index of sample in dataset
sample of class “0”
sample of class “2”
sample of class “2”
sample of class “1”
Regression
𝑦1 = 0.3
𝑦2 = 0.2
𝑦3 = 1.0
𝑦4 = 0.65
What is artificial Neural Networks?
Is it biology?
Simulation of biological neural networks (synapses, axons,
chains, layers, etc.) is a good abstraction for understanding
topology.
Bio NN is only inspiration and illustration. Nothing more!
What is artificial Neural Networks?
Let’s imagine black box!
F
inputs
params
outputs
General form:
𝑜𝑢𝑡𝑝𝑢𝑡𝑠 = 𝐹 𝑖𝑛𝑝𝑢𝑡𝑠, 𝑝𝑎𝑟𝑎𝑚𝑠
Steps:
1) choose “form” of F
2) find params
What is artificial Neural Networks?
It’s a simple math!
free parameters
activation function
𝑠𝑖 =
𝑗=1
𝑛
𝑤𝑖𝑗 𝑥𝑗 + 𝑏𝑖
𝑦𝑖 = 𝑓 𝑠𝑖
Output of i-th neuron:
What is artificial Neural Networks?
It’s a simple math!
activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
What is artificial Neural Networks?
It’s a simple math!
activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
What is artificial Neural Networks?
It’s a simple math!
activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
What is artificial Neural Networks?
It’s a simple math!
activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
What is artificial Neural Networks?
It’s a simple math!
activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
What is artificial Neural Networks?
It’s a simple math!
n inputs
m neurons
in hidden layer
𝑠𝑖 =
𝑗=1
𝑛
𝑤𝑖𝑗 𝑥𝑗 + 𝑏𝑖
𝑦𝑖 = 𝑓 𝑠𝑖
Output of i-th neuron:
Output of k-th layer:
1) 𝑆 𝑘 = 𝑊𝑘 𝑋 𝑘 + 𝐵 𝑘 =
=
𝑤11 𝑤12 ⋯ 𝑤1𝑛
𝑤21 𝑤21 ⋯ 𝑤21
⋯ ⋯ ⋯ ⋯
𝑤 𝑚1 𝑤 𝑚2 ⋯ 𝑤 𝑚𝑛 𝑘
𝑥1
𝑥2
𝑥3
⋮
𝑥 𝑛 𝑘
+
𝑏1
𝑏2
𝑏3
⋮
𝑏 𝑛 𝑘
2) 𝑌𝑘 = 𝑓𝑘 𝑆 𝑘
apply element-wise
Kolmagorov & Arnold function superposition
Form of F:
Neural networks. Overview
• Common principles
– Structure
– Learning
• Shallow and Deep NN
• Additional methods
– Conventional
– Voodoo
How to find parameters
W and B?
Supervised learning:
Training set (pairs of variables and responses):
𝑋; 𝑌 𝑖, 𝑖 = 1. . 𝑁
Find: 𝑊∗
, 𝐵∗
= 𝑎𝑟𝑔𝑚𝑖𝑛
𝑊,𝐵
𝐿 𝐹 𝑋 , 𝑌
Cost function (loss, error):
logloss: L 𝐹 𝑋 , 𝑌 =
1
𝑁 𝑖=1
𝑁
𝑗=1
𝑀
𝑦𝑖.𝑗 log 𝑓𝑖,𝑗
rmse: L 𝐹 𝑋 , 𝑌 =
1
𝑁 𝑖=1
𝑁
𝐹 𝑋𝑖 − 𝑌𝑖 2
“1” if in i-th sample is
class j else “0”
previously scaled:
𝑓𝑖,𝑗 = 𝑓𝑖,𝑗 𝑗 𝑓𝑖,𝑗
Just an examples.
Cost function depend on
problem (classification,
regression) and domain
knowledge
Training or optimization algorithm
So, we have model cost 𝐿 (or error of prediction)
And we want to update weights in order to minimize 𝑳:
𝑤∗ = 𝑤 + 𝛼Δ𝑤
In accordance to gradient descent: Δ𝑤 = −𝛻𝐿
It’s clear for network with only one layer (we have
predicted outputs and targets, so can evaluate 𝐿).
But how to find 𝜟𝒘 for hidden layers?
Meet “Error Back Propagation”
Find Δ𝑤 for each layer from the last to the first
as influence of weights to cost:
∆𝑤𝑖,𝑗 =
𝜕𝐿
𝜕𝑤𝑖,𝑗
and:
𝜕𝐿
𝜕𝑤𝑖,𝑗
=
𝜕𝐿
𝜕𝑓𝑗
𝜕𝑓𝑗
𝜕𝑠 𝑗
𝜕𝑠 𝑗
𝜕𝑤𝑖,𝑗
Error Back Propagation
Details
𝜕𝐿
𝜕𝑤𝑖,𝑗
=
𝜕𝐿
𝜕𝑓𝑗
𝜕𝑓𝑗
𝜕𝑠𝑗
𝜕𝑠𝑗
𝜕𝑤𝑖,𝑗
𝛿𝑗 =
𝜕𝐿
𝜕𝑓𝑗
𝜕𝑓𝑗
𝜕𝑠 𝑗
𝛿𝑗 =
𝐿′
𝐹 𝑋 , 𝑌 𝑓′ 𝑠𝑗 , 𝑜𝑢𝑡𝑝𝑢𝑡 𝑙𝑎𝑦𝑒𝑟
𝑙 ∈ 𝑛𝑒𝑥𝑡 𝑙𝑎𝑦𝑒𝑟 𝛿𝑙 𝑤 𝑗,𝑙 𝑓′ 𝑠𝑗 , ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟𝑠
∆𝑤 = −𝛼 𝛿 𝑥
Gradient Descent
in real life
Recall gradient descent:
𝑤∗
= 𝑤 + 𝛼Δ𝑤
𝛼 is a “step” coefficient. In term of ML – learning rate.
Recall cost function:
𝐿 =
1
𝑁
𝑁
…
GD modification: update 𝑤 for each sample.
Sum along all samples,
And what if 𝑁 = 106 or more?
Typical: 𝛼 = 0.01. . 0.1
Gradient Descent
Stochastic & Minibatch
“Batch” GD
(L for full set)
need a lot of memory
Stochastic GD
(L for each sample)
fast, but fluctuation
Minibatch GD
(L for subsets)
less memory & less fluctuations
Size of minibatch depends on HW Typical: minibatch=32…256
Termination criteria
By epochs count
max number of iterations along all data set
By value of gradient
when gradient is equal to 0 than minimum, but small gradient => very
slow learning
When cost didn’t change during several epochs
if error is not change than training procedure is not converges
Early stopping
Stop when “validation” score starts increase
even when “train” score continue decreasing
Typical: epochs=50…200
Neural networks. Overview
• Common principles
– Structure
– Learning
• Shallow and Deep NN
• Additional methods
– Conventional
– Voodoo
What about “form” of F?
Network topology
“Shallow” networks 1, 2 hidden layers => not
enough parameters => pure separation abilities
“Deep” networks is a NN with 2..10 layers
“Very deep” networks is a NN with >10 layers
Deep learning. Problems
• Big networks => Too huge
separating ability => Overfitting
• Vanishing gradient problem
during training
• Complex error’s surface => Local
minimum
• Curse of dimensionality => memory
& computations
𝑚(𝑖−1)
𝑚(𝑖)
dim 𝑊(𝑖)
= 𝑚 𝑖−1
∗ 𝑚(𝑖)
Neural networks. Overview
• Common principles
– Structure
– Learning
• Shallow and Deep NN
• Additional methods
– Conventional
– Voodoo
Additional methods
Conventional
• Momentum (prevent the variations on error surface)
∆𝑤(𝑡)
= −𝛼𝛻𝐿 𝑤 𝑡
+ 𝛽∆𝑤(𝑡−1)
𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚
• LR decay (make smaller steps near optimum)
𝛼(𝑡)
= 𝑘𝛼(𝑡−1)
, 0 < 𝑘 < 1
• Weight Decay(prevent weight growing, and smooth F)
𝐿∗
= 𝐿 + 𝜆 𝑤(𝑡)
L1 or L2 regularization often used
Typical: 𝛽 = 0.9
Typical: apply LR decay (𝑘 = 0.1) each 10..100 epochs
Typical: 𝐿2 with 𝜆 = 0.0005
Neural networks. Overview
• Common principles
– Structure
– Learning
• Shallow and Deep NN
• Additional methods
– Conventional
– Voodoo
Additional methods
Contemporary
Dropout/DropConnect
– ensembles of networks
– 2 𝑁 networks in one: for each example
hide neurons output randomly (𝑃 = 0.5)
Additional methods
Contemporary
Data augmentation - more data
with all available cases:
– affine transformations, flips, crop,
contrast, noise, scale
– pseudo-labeling
Additional methods
Contemporary
New activation function:
– Linear: 𝑦𝑖 = 𝑓 𝑠𝑖 = 𝑎𝑠𝑖
– ReLU: 𝑦𝑖 = 𝑚𝑎𝑥 𝑠𝑖, 0
– Leaky ReLU: 𝑦𝑖 =
𝑠𝑖 𝑠𝑖 > 0
𝑎𝑠𝑖 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
– Maxout: 𝑦𝑖 = 𝑚𝑎𝑥 𝑠1,𝑖, 𝑠2,𝑖, … , 𝑠 𝑘,𝑖
Typical: 𝑎 = 0.01
Typical: 𝑘 = 2. . 3
Additional methods
Contemporary
Pre-training
– train layer-by-layer,
– re-train “other” network
Sources
• Jeffry Hinton Course “Neural Networks for Machine Learning”
[http://www.coursera.org/course/neuralnets]
• Ian Goodfellow, Yoshua Bengio and Aaron Courville “Deep Learning”
[http://www.deeplearningbook.org/]
• http://neuralnetworksanddeeplearning.com
• CS231n: Convolutional Neural Networks for Visual Recognition
[http://cs231n.stanford.edu/]
• CS224d: Deep Learning for Natural Language Processing [http://cs224d.stanford.edu/]
• Schmidhuber “Deep Learning in Neural Networks: An Overview”
• kaggle.com competitions and forums

More Related Content

What's hot

Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Randa Elanwar
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters
 
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...WiMLDSMontreal
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNSArtificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNSMohammed Bennamoun
 
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Randa Elanwar
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningChun-Ming Chang
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introductionYan Xu
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!ChenYiHuang5
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksChenYiHuang5
 

What's hot (20)

Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy...
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNSArtificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
Artificial Neural Networks Lect2: Neurobiology & Architectures of ANNS
 
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introduction
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 

Similar to Neural Networks. Overview

JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
The neural tangent link between CNN denoisers and non-local filters
The neural tangent link between CNN denoisers and non-local filtersThe neural tangent link between CNN denoisers and non-local filters
The neural tangent link between CNN denoisers and non-local filtersJulián Tachella
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Networkssuserab4f3e
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Machine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronMachine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronAndrew Ferlitsch
 
Machine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksMachine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksAndrew Ferlitsch
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry
 

Similar to Neural Networks. Overview (20)

JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
The neural tangent link between CNN denoisers and non-local filters
The neural tangent link between CNN denoisers and non-local filtersThe neural tangent link between CNN denoisers and non-local filters
The neural tangent link between CNN denoisers and non-local filters
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 
ann-ics320Part4.ppt
ann-ics320Part4.pptann-ics320Part4.ppt
ann-ics320Part4.ppt
 
ann-ics320Part4.ppt
ann-ics320Part4.pptann-ics320Part4.ppt
ann-ics320Part4.ppt
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Annintro
AnnintroAnnintro
Annintro
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
Machine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronMachine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - Perceptron
 
Machine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksMachine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural Networks
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Neural Networks. Overview

  • 1. Neural networks. Overview Oleksandr Baiev, PhD Senior Engineer Samsung R&D Institute Ukraine
  • 2. Neural networks. Overview • Common principles – Structure – Learning • Shallow and Deep NN • Additional methods – Conventional – Voodoo
  • 3. Neural networks. Overview • Common principles – Structure – Learning • Shallow and Deep NN • Additional methods – Conventional – Voodoo
  • 5. Solutions in general 𝑥𝑗 = 𝑥1, 𝑥2, 𝑥3, 𝑥4, … , 𝑥𝑖, … 𝑗 ∈ 𝑋 𝑦𝑗 = 𝑦1, 𝑦2, … , 𝑦 𝑘, … 𝑗 ∈ 𝑌 𝐹: 𝑋 → 𝑌 Classification 𝑦1 = 1,0,0 𝑦2 = 0,0,1 𝑦3 = 0,1,0 𝑦4 = 0,1,0 Index of sample in dataset sample of class “0” sample of class “2” sample of class “2” sample of class “1” Regression 𝑦1 = 0.3 𝑦2 = 0.2 𝑦3 = 1.0 𝑦4 = 0.65
  • 6. What is artificial Neural Networks? Is it biology? Simulation of biological neural networks (synapses, axons, chains, layers, etc.) is a good abstraction for understanding topology. Bio NN is only inspiration and illustration. Nothing more!
  • 7. What is artificial Neural Networks? Let’s imagine black box! F inputs params outputs General form: 𝑜𝑢𝑡𝑝𝑢𝑡𝑠 = 𝐹 𝑖𝑛𝑝𝑢𝑡𝑠, 𝑝𝑎𝑟𝑎𝑚𝑠 Steps: 1) choose “form” of F 2) find params
  • 8. What is artificial Neural Networks? It’s a simple math! free parameters activation function 𝑠𝑖 = 𝑗=1 𝑛 𝑤𝑖𝑗 𝑥𝑗 + 𝑏𝑖 𝑦𝑖 = 𝑓 𝑠𝑖 Output of i-th neuron:
  • 9. What is artificial Neural Networks? It’s a simple math! activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
  • 10. What is artificial Neural Networks? It’s a simple math! activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
  • 11. What is artificial Neural Networks? It’s a simple math! activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
  • 12. What is artificial Neural Networks? It’s a simple math! activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
  • 13. What is artificial Neural Networks? It’s a simple math! activation: 𝑦 = 𝑓 𝑤𝑥 + 𝑏 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤𝑥 + 𝑏)
  • 14. What is artificial Neural Networks? It’s a simple math! n inputs m neurons in hidden layer 𝑠𝑖 = 𝑗=1 𝑛 𝑤𝑖𝑗 𝑥𝑗 + 𝑏𝑖 𝑦𝑖 = 𝑓 𝑠𝑖 Output of i-th neuron: Output of k-th layer: 1) 𝑆 𝑘 = 𝑊𝑘 𝑋 𝑘 + 𝐵 𝑘 = = 𝑤11 𝑤12 ⋯ 𝑤1𝑛 𝑤21 𝑤21 ⋯ 𝑤21 ⋯ ⋯ ⋯ ⋯ 𝑤 𝑚1 𝑤 𝑚2 ⋯ 𝑤 𝑚𝑛 𝑘 𝑥1 𝑥2 𝑥3 ⋮ 𝑥 𝑛 𝑘 + 𝑏1 𝑏2 𝑏3 ⋮ 𝑏 𝑛 𝑘 2) 𝑌𝑘 = 𝑓𝑘 𝑆 𝑘 apply element-wise Kolmagorov & Arnold function superposition Form of F:
  • 15. Neural networks. Overview • Common principles – Structure – Learning • Shallow and Deep NN • Additional methods – Conventional – Voodoo
  • 16. How to find parameters W and B? Supervised learning: Training set (pairs of variables and responses): 𝑋; 𝑌 𝑖, 𝑖 = 1. . 𝑁 Find: 𝑊∗ , 𝐵∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑊,𝐵 𝐿 𝐹 𝑋 , 𝑌 Cost function (loss, error): logloss: L 𝐹 𝑋 , 𝑌 = 1 𝑁 𝑖=1 𝑁 𝑗=1 𝑀 𝑦𝑖.𝑗 log 𝑓𝑖,𝑗 rmse: L 𝐹 𝑋 , 𝑌 = 1 𝑁 𝑖=1 𝑁 𝐹 𝑋𝑖 − 𝑌𝑖 2 “1” if in i-th sample is class j else “0” previously scaled: 𝑓𝑖,𝑗 = 𝑓𝑖,𝑗 𝑗 𝑓𝑖,𝑗 Just an examples. Cost function depend on problem (classification, regression) and domain knowledge
  • 17. Training or optimization algorithm So, we have model cost 𝐿 (or error of prediction) And we want to update weights in order to minimize 𝑳: 𝑤∗ = 𝑤 + 𝛼Δ𝑤 In accordance to gradient descent: Δ𝑤 = −𝛻𝐿 It’s clear for network with only one layer (we have predicted outputs and targets, so can evaluate 𝐿). But how to find 𝜟𝒘 for hidden layers?
  • 18. Meet “Error Back Propagation” Find Δ𝑤 for each layer from the last to the first as influence of weights to cost: ∆𝑤𝑖,𝑗 = 𝜕𝐿 𝜕𝑤𝑖,𝑗 and: 𝜕𝐿 𝜕𝑤𝑖,𝑗 = 𝜕𝐿 𝜕𝑓𝑗 𝜕𝑓𝑗 𝜕𝑠 𝑗 𝜕𝑠 𝑗 𝜕𝑤𝑖,𝑗
  • 19. Error Back Propagation Details 𝜕𝐿 𝜕𝑤𝑖,𝑗 = 𝜕𝐿 𝜕𝑓𝑗 𝜕𝑓𝑗 𝜕𝑠𝑗 𝜕𝑠𝑗 𝜕𝑤𝑖,𝑗 𝛿𝑗 = 𝜕𝐿 𝜕𝑓𝑗 𝜕𝑓𝑗 𝜕𝑠 𝑗 𝛿𝑗 = 𝐿′ 𝐹 𝑋 , 𝑌 𝑓′ 𝑠𝑗 , 𝑜𝑢𝑡𝑝𝑢𝑡 𝑙𝑎𝑦𝑒𝑟 𝑙 ∈ 𝑛𝑒𝑥𝑡 𝑙𝑎𝑦𝑒𝑟 𝛿𝑙 𝑤 𝑗,𝑙 𝑓′ 𝑠𝑗 , ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟𝑠 ∆𝑤 = −𝛼 𝛿 𝑥
  • 20. Gradient Descent in real life Recall gradient descent: 𝑤∗ = 𝑤 + 𝛼Δ𝑤 𝛼 is a “step” coefficient. In term of ML – learning rate. Recall cost function: 𝐿 = 1 𝑁 𝑁 … GD modification: update 𝑤 for each sample. Sum along all samples, And what if 𝑁 = 106 or more? Typical: 𝛼 = 0.01. . 0.1
  • 21. Gradient Descent Stochastic & Minibatch “Batch” GD (L for full set) need a lot of memory Stochastic GD (L for each sample) fast, but fluctuation Minibatch GD (L for subsets) less memory & less fluctuations Size of minibatch depends on HW Typical: minibatch=32…256
  • 22. Termination criteria By epochs count max number of iterations along all data set By value of gradient when gradient is equal to 0 than minimum, but small gradient => very slow learning When cost didn’t change during several epochs if error is not change than training procedure is not converges Early stopping Stop when “validation” score starts increase even when “train” score continue decreasing Typical: epochs=50…200
  • 23. Neural networks. Overview • Common principles – Structure – Learning • Shallow and Deep NN • Additional methods – Conventional – Voodoo
  • 24. What about “form” of F? Network topology “Shallow” networks 1, 2 hidden layers => not enough parameters => pure separation abilities “Deep” networks is a NN with 2..10 layers “Very deep” networks is a NN with >10 layers
  • 25. Deep learning. Problems • Big networks => Too huge separating ability => Overfitting • Vanishing gradient problem during training • Complex error’s surface => Local minimum • Curse of dimensionality => memory & computations 𝑚(𝑖−1) 𝑚(𝑖) dim 𝑊(𝑖) = 𝑚 𝑖−1 ∗ 𝑚(𝑖)
  • 26. Neural networks. Overview • Common principles – Structure – Learning • Shallow and Deep NN • Additional methods – Conventional – Voodoo
  • 27. Additional methods Conventional • Momentum (prevent the variations on error surface) ∆𝑤(𝑡) = −𝛼𝛻𝐿 𝑤 𝑡 + 𝛽∆𝑤(𝑡−1) 𝑚𝑜𝑚𝑒𝑛𝑡𝑢𝑚 • LR decay (make smaller steps near optimum) 𝛼(𝑡) = 𝑘𝛼(𝑡−1) , 0 < 𝑘 < 1 • Weight Decay(prevent weight growing, and smooth F) 𝐿∗ = 𝐿 + 𝜆 𝑤(𝑡) L1 or L2 regularization often used Typical: 𝛽 = 0.9 Typical: apply LR decay (𝑘 = 0.1) each 10..100 epochs Typical: 𝐿2 with 𝜆 = 0.0005
  • 28. Neural networks. Overview • Common principles – Structure – Learning • Shallow and Deep NN • Additional methods – Conventional – Voodoo
  • 29. Additional methods Contemporary Dropout/DropConnect – ensembles of networks – 2 𝑁 networks in one: for each example hide neurons output randomly (𝑃 = 0.5)
  • 30. Additional methods Contemporary Data augmentation - more data with all available cases: – affine transformations, flips, crop, contrast, noise, scale – pseudo-labeling
  • 31. Additional methods Contemporary New activation function: – Linear: 𝑦𝑖 = 𝑓 𝑠𝑖 = 𝑎𝑠𝑖 – ReLU: 𝑦𝑖 = 𝑚𝑎𝑥 𝑠𝑖, 0 – Leaky ReLU: 𝑦𝑖 = 𝑠𝑖 𝑠𝑖 > 0 𝑎𝑠𝑖 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 – Maxout: 𝑦𝑖 = 𝑚𝑎𝑥 𝑠1,𝑖, 𝑠2,𝑖, … , 𝑠 𝑘,𝑖 Typical: 𝑎 = 0.01 Typical: 𝑘 = 2. . 3
  • 32. Additional methods Contemporary Pre-training – train layer-by-layer, – re-train “other” network
  • 33. Sources • Jeffry Hinton Course “Neural Networks for Machine Learning” [http://www.coursera.org/course/neuralnets] • Ian Goodfellow, Yoshua Bengio and Aaron Courville “Deep Learning” [http://www.deeplearningbook.org/] • http://neuralnetworksanddeeplearning.com • CS231n: Convolutional Neural Networks for Visual Recognition [http://cs231n.stanford.edu/] • CS224d: Deep Learning for Natural Language Processing [http://cs224d.stanford.edu/] • Schmidhuber “Deep Learning in Neural Networks: An Overview” • kaggle.com competitions and forums