2. DEEP LEARNING
Lan Goodfellow
Google Brain
h-index 37
Yoshua Bengio
Professor, U of Montreal
h-index 111
Aaron Courville
Professor, U of Montreal
h-index 46
2
5. 1-D 2-D 3-D
Single
channel
Audio waveform: The axis we
convolve over corresponds to
time. We discretize time and
measure the amplitude of the
waveform once per time step.
Audio data that has been
preprocessedwith a Fourier
transform: We can transform the audio
waveform into a 2D tensor with
different rows correspondingto
different frequencies and different
columns correspondingto different
points in time
Volumetric data: A
common
source of this kind of data
is medical
imaging technology, such
as
CT scans.
Multi-
channel
Skeleton animation data:
Animations of 3-D computer
renderedcharacters are
generated by altering the pose
of a “skeleton” over time. Each
channel in the data
model represents the angle about
one axis of one joint.
Color image data: One channel
contains the red pixels, one the green
pixels, and one the blue pixels. The
convolution kernel moves over both
the horizontal and vertical axes of the
image, conferringtranslation
equivariance in both directions.
Color video data: One axis
corresponds to time, one
to the height of the video
frame, and one to the
width of the video frame.
5
27. e.g. input 7x7
3x3 filter, applied with stride 1
pad with 1 pixel border => 7x7 output!
in general, common to see CONV layers with
stride 1, filters of size FxF, and zero-padding with
(F-1)/2. (will preserve size spatially)
e.g. F = 3 => zero pad with 1
F = 5 => zero pad with 2
F = 7 => zero pad with 3
27
29. Randomly set some neurons to zero in the forward pass.
29
30. Q: Suppose that with all inputs present at test time the output of this
neuron is x.What would its output be during training time, in
expection? (e.g. if p = 0.5)
30
31. Makes the representations smaller and more manageable
Operates over each activation map independently
31