Prof. Mohammad-R.Akbarzadeh-T
Ferdowsi University of Mashhad
A Presentation by:
• Hosein Mohebbi
• M.-Sajad Abvisani
DEEP LEARNING
 Lan Goodfellow
Google Brain
h-index 37
 Yoshua Bengio
Professor, U of Montreal
h-index 111
 Aaron Courville
Professor, U of Montreal
h-index 46
2
Hubel & Wiesel
3
4
1-D 2-D 3-D
Single
channel
Audio waveform: The axis we
convolve over corresponds to
time. We discretize time and
measure the amplitude of the
waveform once per time step.
Audio data that has been
preprocessedwith a Fourier
transform: We can transform the audio
waveform into a 2D tensor with
different rows correspondingto
different frequencies and different
columns correspondingto different
points in time
Volumetric data: A
common
source of this kind of data
is medical
imaging technology, such
as
CT scans.
Multi-
channel
Skeleton animation data:
Animations of 3-D computer
renderedcharacters are
generated by altering the pose
of a “skeleton” over time. Each
channel in the data
model represents the angle about
one axis of one joint.
Color image data: One channel
contains the red pixels, one the green
pixels, and one the blue pixels. The
convolution kernel moves over both
the horizontal and vertical axes of the
image, conferringtranslation
equivariance in both directions.
Color video data: One axis
corresponds to time, one
to the height of the video
frame, and one to the
width of the video frame.
5
6
 Local connectivity
 Parameter sharing
 Deeper Nets
7
8
9
 For example, if we had 6 5x5 filters, we’ll get 6 separate
activation maps:
10
11
 Preview: ConvNet is a sequence of Convolutional Layers,
interspersed with activation functions
12
 Separable kernels
 naive multidimensional convolution requires O(wd )
 Separable convolution O(w × d)
a*b = f-1{f{a}.f{b}}
 FFT
13
Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
14
15
16
17
18
19
20
21
22
23
24
25
32x32 input convolved repeatedly with 5x5 filters shrinks volumes
spatially!(32 -> 28 -> 24 ...).
26
e.g. input 7x7
3x3 filter, applied with stride 1
pad with 1 pixel border => 7x7 output!
in general, common to see CONV layers with
stride 1, filters of size FxF, and zero-padding with
(F-1)/2. (will preserve size spatially)
e.g. F = 3 => zero pad with 1
F = 5 => zero pad with 2
F = 7 => zero pad with 3
27
28
 Randomly set some neurons to zero in the forward pass.
29
 Q: Suppose that with all inputs present at test time the output of this
neuron is x.What would its output be during training time, in
expection? (e.g. if p = 0.5)
30
 Makes the representations smaller and more manageable
 Operates over each activation map independently
31
32
How many parameters to be set in pooling a layer?
 Contains neurons that connect to the entire input volume, as in
ordinary Neural Networks
33
IMAGENET CLASSIFICATION WITH DEEP CONVOLUTIONAL
NEURAL NETWORKS
34
 Using Data Augmentation to reducing overfitting
“AlexNet”
35
Deep learning

Deep learning

  • 1.
    Prof. Mohammad-R.Akbarzadeh-T Ferdowsi Universityof Mashhad A Presentation by: • Hosein Mohebbi • M.-Sajad Abvisani
  • 2.
    DEEP LEARNING  LanGoodfellow Google Brain h-index 37  Yoshua Bengio Professor, U of Montreal h-index 111  Aaron Courville Professor, U of Montreal h-index 46 2
  • 3.
  • 4.
  • 5.
    1-D 2-D 3-D Single channel Audiowaveform: The axis we convolve over corresponds to time. We discretize time and measure the amplitude of the waveform once per time step. Audio data that has been preprocessedwith a Fourier transform: We can transform the audio waveform into a 2D tensor with different rows correspondingto different frequencies and different columns correspondingto different points in time Volumetric data: A common source of this kind of data is medical imaging technology, such as CT scans. Multi- channel Skeleton animation data: Animations of 3-D computer renderedcharacters are generated by altering the pose of a “skeleton” over time. Each channel in the data model represents the angle about one axis of one joint. Color image data: One channel contains the red pixels, one the green pixels, and one the blue pixels. The convolution kernel moves over both the horizontal and vertical axes of the image, conferringtranslation equivariance in both directions. Color video data: One axis corresponds to time, one to the height of the video frame, and one to the width of the video frame. 5
  • 6.
    6  Local connectivity Parameter sharing  Deeper Nets
  • 7.
  • 8.
  • 9.
  • 10.
     For example,if we had 6 5x5 filters, we’ll get 6 separate activation maps: 10
  • 11.
    11  Preview: ConvNetis a sequence of Convolutional Layers, interspersed with activation functions
  • 12.
    12  Separable kernels naive multidimensional convolution requires O(wd )  Separable convolution O(w × d) a*b = f-1{f{a}.f{b}}  FFT
  • 13.
    13 Feature visualization ofconvolutional net trained on ImageNet from [Zeiler & Fergus 2013]
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    32x32 input convolvedrepeatedly with 5x5 filters shrinks volumes spatially!(32 -> 28 -> 24 ...). 26
  • 27.
    e.g. input 7x7 3x3filter, applied with stride 1 pad with 1 pixel border => 7x7 output! in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3 27
  • 28.
  • 29.
     Randomly setsome neurons to zero in the forward pass. 29
  • 30.
     Q: Supposethat with all inputs present at test time the output of this neuron is x.What would its output be during training time, in expection? (e.g. if p = 0.5) 30
  • 31.
     Makes therepresentations smaller and more manageable  Operates over each activation map independently 31
  • 32.
    32 How many parametersto be set in pooling a layer?
  • 33.
     Contains neuronsthat connect to the entire input volume, as in ordinary Neural Networks 33
  • 34.
    IMAGENET CLASSIFICATION WITHDEEP CONVOLUTIONAL NEURAL NETWORKS 34  Using Data Augmentation to reducing overfitting “AlexNet”
  • 35.