The document discusses the history and development of artificial neural networks and deep learning. It describes early neural network models like perceptrons from the 1950s and their use of weighted sums and activation functions. It then explains how additional developments led to modern deep learning architectures like convolutional neural networks and recurrent neural networks, which use techniques such as hidden layers, backpropagation, and word embeddings to learn from large datasets.
10. Threshold logic function
• Saturating linear
function
• Contiguous
function
• Discontinuous
derivative
10
11. Sigmoid function
• Most popular
• Output (0,1)
• Continuous derivatives
• Easy to differentiate
11
12. Artificial neural network – ANN
structure
• Number of input/output signals
• Number of hidden layers
• Number of neurons per layer
• Neuron weights
• Topology
• Biases
12
16. Neural network learning
• 2 types of learning
– Parameter learning
• Learn neuron weight connections
– Structure learning
• Learn ANN structure from training data
16
17. Error function
• Consider an ANN with n neurons
• For each learning example (x,d)
– Training error caused by current weight w
• Training error caused by w for entire learning
examples
17
20. Parameter learning: back
propagation of error
• Calculate total error at the top
• Calculate contributions to error at each step going
backwards
20
21. Back propagation discussion
• Initial weights
• Learning rate
• Number of neurons per hidden layers
• Number of hidden layers
21
35. AI will transform the internet
• @Andrew Ng
• Technology areas with potential for paradigm shift:
– Computer vision
– Speech recognition & speech synthesis
– Language understanding: Machine translation; Web
search; Dialog systems; ….
– Advertising
– Personalization/recommendation systems
– Robotics
• All this is hard: scalability, algorithms.
35
42. Convolution
• Convolution is a mathematical operation on two
functions f and g, producing a third function that is
typically viewed as a modified version of one of the
original functions,
42
43. Convolutional neural networks
• Conv Nets is a kind of neural network that
uses many identical copies of the same
neuron
– Large number of neurons
– Large computational models
– Number of actual weights (parameters) to be
learned fairly small
43
44. A 2D Convolutional Neural
Network
• a convolutional neural network can learn a neuron once and
use it in many places, making it easier to learn the model
and reducing error.
44
45. Structure of Conv Nets
• Problem
– predict whether a human is speaking or not
• Input: audio samples at different points in
time
45
47. A more sophisticated approach
• Local properties of the data
– frequency of sounds (increasing/decreasing)
• Look at a small window of the audio sample
– Create a group of neuron A to compute certain features
– the output of this convolutional layer is fed into a fully-
connected layer, F
47
62. Recurrent Neural Networks (RNN)
have loops
• A loop allows information to
be passed from one step of
the network to the next.
62
63. Unroll RNN
• recurrent neural networks are intimately
related to sequences and lists.
63
64. Examples
• predict the last word in “the clouds are in the sky"
• the gap between the relevant information and the
place that it’s needed is small
• RNNs can learn to use the past information
64
65. • “I grew up in France… I speak fluent French.”
• As the gap grows, RNNs become unable to
learn to connect the information.
65
67. LSTM networks
• A Special kind of RNN
• Capable of learning long-term dependencies
• Structure in the form of a chain of repeating
modules of neural network
67
71. Core idea behind LSTMs
• The key to LSTMs is the cell state, the horizontal line
running through the top of the diagram.
• The cell state runs straight down the entire chain, with only
some minor linear interactions
• Easy for information to just flow along it unchanged
71
72. Gates
• The ability to remove or add information to
the cell state, carefully regulated by
structures called gates
• Sigmoid
– How much of each component should be let
through.
– Zero means nothing through
– One means let everything through
• An LSTM has three of these gates
72
73. LSTM step 1
• decide what information we’re going to throw
away from the cell state
• forget gate layer
73
74. LSTM step 2
• decide what new information we’re going to
store in the cell state
• input gate layer
74
75. LSTMs step 3
• update the old cell state, Ct−1, into the new
cell state Ct
75
92. • Inspired by the architectural depth of the brain,
researchers wanted for decades to train deep multi-
layer neural networks.
• No successful attempts were reported before 2006
…Exception: convolutional neural networks, LeCun
1998
• SVM: Vapnik and his co-workers developed the
Support Vector Machine (1993) (shallow
• architecture).
• Breakthrough in 2006!
92
93. 2006 breakthrough
• More data
• Faster hardware: GPU’s, multi-core CPU’s
• Working ideas on how to train deep
architectures
93
94. • Beat state of the art in many areas:
– Language Modeling (2012, Mikolov et al)
– Image Recognition (Krizhevsky won 2012
ImageNet competition)
– Sentiment Classification (2011, Socher et al)
– Speech Recognition (2010, Dahl et al)
– MNIST hand-written digit recognition (Ciresan et
al, 2010)
94