SlideShare a Scribd company logo
1 of 37
Download to read offline
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 1
Artificial Neural Networks
• Threshold units
• Gradient descent
• Multilayer networks
• Backpropagation
• Hidden layer representations
• Example: Face recognition
• Advanced topics
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 2
Connectionist Models
Consider humans
• Neuron switching time ~.001 second
• Number of neurons ~1010
• Connections per neuron ~104-5
• Scene recognition time ~.1 second
• 100 inference step does not seem like enough
must use lots of parallel computation!
Properties of artificial neural nets (ANNs):
• Many neuron-like threshold switching units
• Many weighted interconnections among units
• Highly parallel, distributed process
• Emphasis on tuning weights automatically
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 3
When to Consider Neural Networks
• Input is high-dimensional discrete or real-valued (e.g., raw
sensor input)
• Output is discrete or real valued
• Output is a vector of values
• Possibly noisy data
• Form of target function is unknown
• Human readability of result is unimportant
Examples:
• Speech phoneme recognition [Waibel]
• Image classification [Kanade, Baluja, Rowley]
• Financial prediction
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 4
ALVINN drives 70 mph on highways
4 Hidden
Units
30x32 Sensor
Input Retina
Sharp
Left
Sharp
Right
Straight
Ahead
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 5
Perceptron
W n
W
2
W
0
W
1
X1
Σ
X2
Xn
X0=1
i
n
i
i x
w
∑
=0





= ∑
=
otherwise
1
-
if
1
n
1
i
i
i x
w
σ


 >
⋅
=


 >
+
+
+
=
otherwise
1
-
0
if
1
:
notation
ctor
simpler ve
use
will
we
Sometimes
otherwise
1
-
0
...
if
1
)
,...,
( 1
1
0
1
x
w
)
x
o(
x
w
x
w
w
x
x
o n
n
n
r
r
r
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 6
Decision Surface of Perceptron
X2
X
1
X2
X
1
Represents some useful functions
• What weights represent g(x1,x2) = AND(x1,x2)?
But some functions not representable
• e.g., not linearly separable
• therefore, we will want networks of these ...
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 7
Perceptron Training Rule
•
small
ly
sufficient
is
and
separable
linearly
is
data
training
If
converge
it will
prove
Can
rate
learning
called
.1)
(e.g.,
constant
small
is
output
perceptron
is
lue
target va
is
)
(
)
(
where
η
η
η
•
•
•
•
=
•
−
=
∆
∆
+
←
o
x
c
t
x
o
t
w
w
w
w
i
i
i
i
i
r
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 8
Gradient Descent
[ ]
examples
training
of
set
the
is
Where
)
(
error
squared
the
minimize
that
s
'
learn
:
Idea
where
,
simple
consider
,
understand
To
2
2
1
1
1
0
D
o
t
w
E
w
x
w
...
x
w
w
o
t
linear uni
D
d
d
d
i
n
n
∑
∈
−
≡
+
+
+
=
r
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 9
Gradient Descent
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 10
Gradient Descent
i
i
i
n
w
E
w
w
E
w
w
E
w
E
w
E
w
E
∂
∂
−
=
∆
∇
−
=
∆






∂
∂
∂
∂
∂
∂
≡
∇
i.e.,
]
[
:
rule
Training
,...,
,
]
[
Gradient
1
0
η
η
r
r
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 11
Gradient Descent
)
)(
(
)
(
)
(
)
(
)
(
2
2
1
)
(
2
1
)
(
2
1
,
2
2
∑
∑
∑
∑
∑
−
−
=
∂
∂
⋅
−
∂
∂
−
=
−
∂
∂
−
=
−
∂
∂
=
−
∂
∂
=
∂
∂
d
d
i
d
d
i
d
d
d i
d
d
d
d
d i
d
d
d
d
d i
d
d
d
i
i
x
o
t
w
E
x
w
t
w
o
t
o
t
w
o
t
o
t
w
o
t
w
w
E
r
r
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 12
Gradient Descent
i
i
i
i
i
i
i
i
i
w
w
w
x
o
t
w
w
w
o
x
examples
training
,t
x
w
w
).
(e.g., .
rning rate
is the lea
value.
et output
s the t
es and t i
input valu
ctor of
is the ve
x
where
,
,t
x
e form
pair of th
ples is a
ining exam
Each tra
examples
training
∆
+
←
−
+
∆
←
∆
∗
>
<
∆
•
•
>
<
−
do
t w,
unit weigh
linear
each
For
-
)
(
do
,
t
unit weigh
linear
each
For
*
output
compute
and
instance
Input the
do
,
_
in
each
For
-
zero.
to
each
Initialize
-
do
met,
is
condition
on
terminati
the
Until
value
random
small
some
to
each
Initialize
05
arg
)
,
_
(
DESCENT
GRADIENT
η
η
η
r
r
r
r
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 13
Summary
Perceptron training rule guaranteed to succeed if
• Training examples are linearly separable
• Sufficiently small learning rate η
Linear unit training rule uses gradient descent
• Guaranteed to converge to hypothesis with
minimum squared error
• Given sufficiently small learning rate η
• Even when training data contains noise
• Even when training data not separable by H
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 14
Incremental (Stochastic) Gradient Descent
Batch mode Gradient Descent:
Do until satisfied:
]
[
2.
gradient
the
Compute
.
1
w
E
w
w
]
w
[
E
D
D
r
r
r
r
∇
−
←
∇
η
Incremental mode Gradient Descent:
Do until satisfied:
- For each training example d in D
]
[
2.
gradient
the
Compute
.
1
w
E
w
w
]
w
[
E
d
d
r
r
r
r
∇
−
←
∇
η
2
2
1
)
(
]
[ d
D
d
d
D o
t
w
E −
≡ ∑
∈
r
2
2
1
)
(
]
[ d
d
d o
t
w
E −
≡
r
Incremental Gradient Descent can approximate Batch Gradient
Descent arbitrarily closely if η made small enough
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 15
Multilayer Networks of Sigmoid Units
d1
d0 d2
d3
d1
d0
d2
d3
h1
h2
x1
x2
h1 h2
o
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 16
Multilayer Decision Space
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 17
Sigmoid Unit
W n
W
2
W
0
W
1
X1
Σ
X2
Xn
X0=1
i
n
i
i x
w
net ∑
=
=
0
net
e
net
o −
+
=
=
1
1
)
(
σ
ation
Backpropag
networks
Multilayer
x
x
dx
x
d
e
x
-x
units
sigmoid
of
unit
sigmoid
One
train
to
rules
descent
gradient
derive
can
We
))
(
1
)(
(
)
(
:
property
Nice
1
1
function
sigmoid
the
is
)
(
σ
→
•
•
−
=
+
σ
σ
σ
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 18
The Sigmoid Function
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
net input
output
-x
e
x
+
=
1
1
)
(
σ
Sort of a rounded step function
Unlike step function, can take derivative (makes learning
possible)
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 19
Error Gradient for a Sigmoid Unit
i
d
d
d
d
d
d
d i
d
d
d
d
d
d
i
d
d
d
d
d
i
d
D
d
d
i
i
w
net
net
o
o
t
w
o
o
t
o
t
w
o
t
o
t
w
o
t
w
w
E
∂
∂
∂
∂
−
=








∂
∂
−
−
=
−
∂
∂
−
=
−
∂
∂
=
−
∂
∂
=
∂
∂
∑
∑
∑
∑
∑
∈
)
(
-
)
(
)
(
)
(
2
2
1
)
(
2
1
)
(
2
1
2
2
d
i
d
d
d
D
d
d
i
d
i
i
d
i
d
d
d
d
d
d
d
x
o
o
o
t
w
E
x
w
x
w
w
net
o
o
net
net
net
o
,
,
)
1
(
)
(
:
So
)
(
)
1
(
)
(
:
know
But we
−
−
−
=
∂
∂
=
∂
⋅
∂
=
∂
∂
−
=
∂
∂
=
∂
∂
∑
∈
r
r
σ
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 20
Backpropagation Algorithm
j
i
j
j
i
j
i
j
i
j
i
j
i
k
outputs
k
k
h
h
h
k
k
k
k
k
k
x
w
w
w
w
w
w
o
o
h
o
t
o
o
k
,
,
,
,
,
,
,
ere
wh
ight
network we
each
Update
4.
)
1
(
unit
hidden
each
For
3.
)
)(
1
(
unit
output
each
For
2.
outputs
the
compute
and
example
training
Input the
1.
do
example,
ing
each train
For
do
satisfied,
Until
numbers.
random
small
to
weights
all
Initialize
δ
η
δ
δ
δ
=
∆
∆
+
←
−
←
−
−
←
•
∑
∈
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 21
More on Backpropagation
• Gradient descent over entire network weight vector
• Easily generalized to arbitrary directed graphs
• Will find a local, not necessarily global error minimum
– In practice, often works well (can run multiple times)
• Often include weight momentum α
• Minimizes error over training examples
• Will it generalize well to subsequent examples?
• Training can take thousands of iterations -- slow!
– Using network after training is fast
)
1
(
)
( ,
,
, −
∆
+
=
∆ n
w
x
n
w j
i
j
i
j
j
i α
δ
η
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 22
Learning Hidden Layer Representations
00000001
00000001
00000010
00000010
00000100
00000100
00001000
00001000
00010000
00010000
00100000
00100000
01000000
01000000
10000000
10000000
Output
Input
→
→
→
→
→
→
→
→
Inputs
Outputs
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 23
Learning Hidden Layer Representations
Inputs
Outputs
00000001
.01
.94
.60
00000001
00000010
.98
.01
.80
00000010
00000100
.99
.99
.22
00000100
00001000
.02
.05
.03
00001000
00010000
.71
.97
.99
00010000
00100000
.27
.97
.01
00100000
01000000
.88
.11
.01
01000000
10000000
.08
.04
.89
10000000
Output
Input
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 24
Output Unit Error during Training
Sum of squared errors for each output unit
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 500 1000 1500 2000 2500
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 25
Hidden Unit Encoding
Hidden unit encoding for one input
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 500 1000 1500 2000 2500
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 26
Input to Hidden Weights
Weights from inputs to one hidden unit
-5
-4
-3
-2
-1
0
1
2
3
4
0 500 1000 1500 2000 2500
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 27
Convergence of Backpropagation
Gradient descent to some local minimum
• Perhaps not global minimum
• Momentum can cause quicker convergence
• Stochastic gradient descent also results in faster
convergence
• Can train multiple networks and get different results (using
different initial weights)
Nature of convergence
• Initialize weights near zero
• Therefore, initial networks near-linear
• Increasingly non-linear functions as training progresses
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 28
Expressive Capabilities of ANNs
Boolean functions:
• Every Boolean function can be represented by network
with a single hidden layer
• But that might require an exponential (in the number of
inputs) hidden units
Continuous functions:
• Every bounded continuous function can be approximated
with arbitrarily small error by a network with one hidden
layer [Cybenko 1989; Hornik et al. 1989]
• Any function can be approximated to arbitrary accuracy by
a network with two hidden layers [Cybenko 1988]
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 29
Overfitting in ANNs
Error versus weight updates (example 1)
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
0 5000 10000 15000 20000
Number of weight updates
Error
Training set
Validation set
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 30
Overfitting in ANNs
Error versus weight updates (Example 2)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0 1000 2000 3000 4000 5000 6000
Number of weight updates
Error
Training set
Validation set
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 31
Neural Nets for Face Recognition
30x32
inputs
left strt rgt up
Typical Input Images
90% accurate learning
head pose, and recognizing
1-of-20 faces
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 32
Learned Network Weights
Typical Input Images
30x32
inputs
left strt rgt up
Learned Weights
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 33
Alternative Error Functions
n
recognitio
phoneme
in
e.g.,
:
weights
together
Tie
)
(
)
(
:
values
as
well
as
slopes
on target
Train
)
(
)
(
:
weights
large
Penalize
2
2
2
1
j
i,
2
2
2
1
•














∂
∂
−
∂
∂
+
−
≡
+
−
≡
∑ ∑
∑
∑ ∑
∈ ∈
∈ ∈
D
d outputs
k d
j
kd
d
j
kd
kd
kd
ji
kd
D
d outputs
k
kd
x
o
x
t
o
t
w
E
w
o
t
w
E
µ
γ
r
r
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 34
Recurrent Networks
Feedforward
Network
y(t+1)
x(t)
Recurrent
Network
y(t+1)
x(t)
y(t+1)
x(t)
y(t)
x(t-1)
y(t-1)
x(t-2)
Recurrent Network
unfolded in time
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 35
Neural Network Summary
• physiologically (neurons) inspired model
• powerful (accurate), slow, opaque (hard to
understand resulting model)
• bias: preferential
– based on gradient descent
– finds local minimum
– effect by initial conditions, parameters
• neural units
– linear
– linear threshold
– sigmoid
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 36
Neural Network Summary (cont)
• gradient descent
– convergence
• linear units
– limitation: hyperplane decision surface
– learning rule
• multilayer network
– advantage: can have non-linear decision surface
– backpropagation to learn
• backprop learning rule
• learning issues
– units used
CS 5751 Machine
Learning
Chapter 4 Artificial Neural Networks 37
Neural Network Summary (cont)
• learning issues (cont)
– batch versus incremental (stochastic)
– parameters
• initial weights
• learning rate
• momentum
– cost (error) function
• sum of squared errors
• can include penalty terms
• recurrent networks
– simple
– backpropagation through time

More Related Content

Similar to ANN Slides.pdf

Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305Amazon Web Services
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowNicholas McClure
 
Feed forward back propogation algorithm .pptx
Feed forward back propogation algorithm .pptxFeed forward back propogation algorithm .pptx
Feed forward back propogation algorithm .pptxneelamsanjeevkumar
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningVishwas Lele
 
Neural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionNeural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionJohn Liu
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionEmanuele Bezzi
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on sparkSatyendra Rana
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
Neural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroNeural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroFariz Darari
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Simplilearn
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
Deep Neural Networks for Computer Vision
Deep Neural Networks for Computer VisionDeep Neural Networks for Computer Vision
Deep Neural Networks for Computer VisionAlex Conway
 

Similar to ANN Slides.pdf (20)

Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
 
Neural network
Neural networkNeural network
Neural network
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
Feed forward back propogation algorithm .pptx
Feed forward back propogation algorithm .pptxFeed forward back propogation algorithm .pptx
Feed forward back propogation algorithm .pptx
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Neural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionNeural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting Recognition
 
Deep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an IntroductionDeep Learning with Apache Spark: an Introduction
Deep Learning with Apache Spark: an Introduction
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on spark
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Neural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroNeural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An Intro
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Deep Neural Networks for Computer Vision
Deep Neural Networks for Computer VisionDeep Neural Networks for Computer Vision
Deep Neural Networks for Computer Vision
 

Recently uploaded

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

ANN Slides.pdf

  • 1. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 1 Artificial Neural Networks • Threshold units • Gradient descent • Multilayer networks • Backpropagation • Hidden layer representations • Example: Face recognition • Advanced topics
  • 2. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 2 Connectionist Models Consider humans • Neuron switching time ~.001 second • Number of neurons ~1010 • Connections per neuron ~104-5 • Scene recognition time ~.1 second • 100 inference step does not seem like enough must use lots of parallel computation! Properties of artificial neural nets (ANNs): • Many neuron-like threshold switching units • Many weighted interconnections among units • Highly parallel, distributed process • Emphasis on tuning weights automatically
  • 3. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 3 When to Consider Neural Networks • Input is high-dimensional discrete or real-valued (e.g., raw sensor input) • Output is discrete or real valued • Output is a vector of values • Possibly noisy data • Form of target function is unknown • Human readability of result is unimportant Examples: • Speech phoneme recognition [Waibel] • Image classification [Kanade, Baluja, Rowley] • Financial prediction
  • 4. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 4 ALVINN drives 70 mph on highways 4 Hidden Units 30x32 Sensor Input Retina Sharp Left Sharp Right Straight Ahead
  • 5. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 5 Perceptron W n W 2 W 0 W 1 X1 Σ X2 Xn X0=1 i n i i x w ∑ =0      = ∑ = otherwise 1 - if 1 n 1 i i i x w σ    > ⋅ =    > + + + = otherwise 1 - 0 if 1 : notation ctor simpler ve use will we Sometimes otherwise 1 - 0 ... if 1 ) ,..., ( 1 1 0 1 x w ) x o( x w x w w x x o n n n r r r
  • 6. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 6 Decision Surface of Perceptron X2 X 1 X2 X 1 Represents some useful functions • What weights represent g(x1,x2) = AND(x1,x2)? But some functions not representable • e.g., not linearly separable • therefore, we will want networks of these ...
  • 7. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 7 Perceptron Training Rule • small ly sufficient is and separable linearly is data training If converge it will prove Can rate learning called .1) (e.g., constant small is output perceptron is lue target va is ) ( ) ( where η η η • • • • = • − = ∆ ∆ + ← o x c t x o t w w w w i i i i i r
  • 8. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 8 Gradient Descent [ ] examples training of set the is Where ) ( error squared the minimize that s ' learn : Idea where , simple consider , understand To 2 2 1 1 1 0 D o t w E w x w ... x w w o t linear uni D d d d i n n ∑ ∈ − ≡ + + + = r
  • 9. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 9 Gradient Descent
  • 10. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 10 Gradient Descent i i i n w E w w E w w E w E w E w E ∂ ∂ − = ∆ ∇ − = ∆       ∂ ∂ ∂ ∂ ∂ ∂ ≡ ∇ i.e., ] [ : rule Training ,..., , ] [ Gradient 1 0 η η r r
  • 11. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 11 Gradient Descent ) )( ( ) ( ) ( ) ( ) ( 2 2 1 ) ( 2 1 ) ( 2 1 , 2 2 ∑ ∑ ∑ ∑ ∑ − − = ∂ ∂ ⋅ − ∂ ∂ − = − ∂ ∂ − = − ∂ ∂ = − ∂ ∂ = ∂ ∂ d d i d d i d d d i d d d d d i d d d d d i d d d i i x o t w E x w t w o t o t w o t o t w o t w w E r r
  • 12. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 12 Gradient Descent i i i i i i i i i w w w x o t w w w o x examples training ,t x w w ). (e.g., . rning rate is the lea value. et output s the t es and t i input valu ctor of is the ve x where , ,t x e form pair of th ples is a ining exam Each tra examples training ∆ + ← − + ∆ ← ∆ ∗ > < ∆ • • > < − do t w, unit weigh linear each For - ) ( do , t unit weigh linear each For * output compute and instance Input the do , _ in each For - zero. to each Initialize - do met, is condition on terminati the Until value random small some to each Initialize 05 arg ) , _ ( DESCENT GRADIENT η η η r r r r
  • 13. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 13 Summary Perceptron training rule guaranteed to succeed if • Training examples are linearly separable • Sufficiently small learning rate η Linear unit training rule uses gradient descent • Guaranteed to converge to hypothesis with minimum squared error • Given sufficiently small learning rate η • Even when training data contains noise • Even when training data not separable by H
  • 14. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 14 Incremental (Stochastic) Gradient Descent Batch mode Gradient Descent: Do until satisfied: ] [ 2. gradient the Compute . 1 w E w w ] w [ E D D r r r r ∇ − ← ∇ η Incremental mode Gradient Descent: Do until satisfied: - For each training example d in D ] [ 2. gradient the Compute . 1 w E w w ] w [ E d d r r r r ∇ − ← ∇ η 2 2 1 ) ( ] [ d D d d D o t w E − ≡ ∑ ∈ r 2 2 1 ) ( ] [ d d d o t w E − ≡ r Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if η made small enough
  • 15. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 15 Multilayer Networks of Sigmoid Units d1 d0 d2 d3 d1 d0 d2 d3 h1 h2 x1 x2 h1 h2 o
  • 16. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 16 Multilayer Decision Space
  • 17. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 17 Sigmoid Unit W n W 2 W 0 W 1 X1 Σ X2 Xn X0=1 i n i i x w net ∑ = = 0 net e net o − + = = 1 1 ) ( σ ation Backpropag networks Multilayer x x dx x d e x -x units sigmoid of unit sigmoid One train to rules descent gradient derive can We )) ( 1 )( ( ) ( : property Nice 1 1 function sigmoid the is ) ( σ → • • − = + σ σ σ
  • 18. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 18 The Sigmoid Function 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 net input output -x e x + = 1 1 ) ( σ Sort of a rounded step function Unlike step function, can take derivative (makes learning possible)
  • 19. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 19 Error Gradient for a Sigmoid Unit i d d d d d d d i d d d d d d i d d d d d i d D d d i i w net net o o t w o o t o t w o t o t w o t w w E ∂ ∂ ∂ ∂ − =         ∂ ∂ − − = − ∂ ∂ − = − ∂ ∂ = − ∂ ∂ = ∂ ∂ ∑ ∑ ∑ ∑ ∑ ∈ ) ( - ) ( ) ( ) ( 2 2 1 ) ( 2 1 ) ( 2 1 2 2 d i d d d D d d i d i i d i d d d d d d d x o o o t w E x w x w w net o o net net net o , , ) 1 ( ) ( : So ) ( ) 1 ( ) ( : know But we − − − = ∂ ∂ = ∂ ⋅ ∂ = ∂ ∂ − = ∂ ∂ = ∂ ∂ ∑ ∈ r r σ
  • 20. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 20 Backpropagation Algorithm j i j j i j i j i j i j i k outputs k k h h h k k k k k k x w w w w w w o o h o t o o k , , , , , , , ere wh ight network we each Update 4. ) 1 ( unit hidden each For 3. ) )( 1 ( unit output each For 2. outputs the compute and example training Input the 1. do example, ing each train For do satisfied, Until numbers. random small to weights all Initialize δ η δ δ δ = ∆ ∆ + ← − ← − − ← • ∑ ∈
  • 21. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 21 More on Backpropagation • Gradient descent over entire network weight vector • Easily generalized to arbitrary directed graphs • Will find a local, not necessarily global error minimum – In practice, often works well (can run multiple times) • Often include weight momentum α • Minimizes error over training examples • Will it generalize well to subsequent examples? • Training can take thousands of iterations -- slow! – Using network after training is fast ) 1 ( ) ( , , , − ∆ + = ∆ n w x n w j i j i j j i α δ η
  • 22. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 22 Learning Hidden Layer Representations 00000001 00000001 00000010 00000010 00000100 00000100 00001000 00001000 00010000 00010000 00100000 00100000 01000000 01000000 10000000 10000000 Output Input → → → → → → → → Inputs Outputs
  • 23. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 23 Learning Hidden Layer Representations Inputs Outputs 00000001 .01 .94 .60 00000001 00000010 .98 .01 .80 00000010 00000100 .99 .99 .22 00000100 00001000 .02 .05 .03 00001000 00010000 .71 .97 .99 00010000 00100000 .27 .97 .01 00100000 01000000 .88 .11 .01 01000000 10000000 .08 .04 .89 10000000 Output Input → → → → → → → → → → → → → → → →
  • 24. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 24 Output Unit Error during Training Sum of squared errors for each output unit 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 500 1000 1500 2000 2500
  • 25. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 25 Hidden Unit Encoding Hidden unit encoding for one input 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500
  • 26. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 26 Input to Hidden Weights Weights from inputs to one hidden unit -5 -4 -3 -2 -1 0 1 2 3 4 0 500 1000 1500 2000 2500
  • 27. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 27 Convergence of Backpropagation Gradient descent to some local minimum • Perhaps not global minimum • Momentum can cause quicker convergence • Stochastic gradient descent also results in faster convergence • Can train multiple networks and get different results (using different initial weights) Nature of convergence • Initialize weights near zero • Therefore, initial networks near-linear • Increasingly non-linear functions as training progresses
  • 28. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 28 Expressive Capabilities of ANNs Boolean functions: • Every Boolean function can be represented by network with a single hidden layer • But that might require an exponential (in the number of inputs) hidden units Continuous functions: • Every bounded continuous function can be approximated with arbitrarily small error by a network with one hidden layer [Cybenko 1989; Hornik et al. 1989] • Any function can be approximated to arbitrary accuracy by a network with two hidden layers [Cybenko 1988]
  • 29. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 29 Overfitting in ANNs Error versus weight updates (example 1) 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 0 5000 10000 15000 20000 Number of weight updates Error Training set Validation set
  • 30. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 30 Overfitting in ANNs Error versus weight updates (Example 2) 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0 1000 2000 3000 4000 5000 6000 Number of weight updates Error Training set Validation set
  • 31. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 31 Neural Nets for Face Recognition 30x32 inputs left strt rgt up Typical Input Images 90% accurate learning head pose, and recognizing 1-of-20 faces
  • 32. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 32 Learned Network Weights Typical Input Images 30x32 inputs left strt rgt up Learned Weights
  • 33. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 33 Alternative Error Functions n recognitio phoneme in e.g., : weights together Tie ) ( ) ( : values as well as slopes on target Train ) ( ) ( : weights large Penalize 2 2 2 1 j i, 2 2 2 1 •               ∂ ∂ − ∂ ∂ + − ≡ + − ≡ ∑ ∑ ∑ ∑ ∑ ∈ ∈ ∈ ∈ D d outputs k d j kd d j kd kd kd ji kd D d outputs k kd x o x t o t w E w o t w E µ γ r r
  • 34. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 34 Recurrent Networks Feedforward Network y(t+1) x(t) Recurrent Network y(t+1) x(t) y(t+1) x(t) y(t) x(t-1) y(t-1) x(t-2) Recurrent Network unfolded in time
  • 35. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 35 Neural Network Summary • physiologically (neurons) inspired model • powerful (accurate), slow, opaque (hard to understand resulting model) • bias: preferential – based on gradient descent – finds local minimum – effect by initial conditions, parameters • neural units – linear – linear threshold – sigmoid
  • 36. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 36 Neural Network Summary (cont) • gradient descent – convergence • linear units – limitation: hyperplane decision surface – learning rule • multilayer network – advantage: can have non-linear decision surface – backpropagation to learn • backprop learning rule • learning issues – units used
  • 37. CS 5751 Machine Learning Chapter 4 Artificial Neural Networks 37 Neural Network Summary (cont) • learning issues (cont) – batch versus incremental (stochastic) – parameters • initial weights • learning rate • momentum – cost (error) function • sum of squared errors • can include penalty terms • recurrent networks – simple – backpropagation through time