2. Outline
■ Modeling humans in machines
■ Introduction to neural nets
■ What makes an algorithm intelligent?
■ Learning
– Supervised learning
■ Deep learning
– Neural nets in detail
■ Demo and use case
■ Future
6. Neural networks
■ The mammal brain is organized in a deep
architecture (Serre, Kreiman, Kouh, Cadieu,
Knoblich, & Poggio, 2007)
(E.g. visual system has 5 to 10 levels)
■ Very popular at the beginning of 1990s but fell
out of favor after it was found that they were
not performing well
■ Why is it gaining power again now: Deep
architectures might be able to represent some
functions otherwise not efficiently
representable. Breakthrough in 2006/2007 with
Hinton, Bengio papers
16. Learning
■ Supervised machine learning:The program is “trained” on a pre-defined set of
“training examples”, which then facilitate its ability to reach an accurate conclusion
when given new data.
■ Semi-supervised machine learning:The program infers the unknown labels through
“label propagation”, utilizing similarities between different examples and inferring
non-existent labels from existent ones
■ Unsupervised machine learning:The program is given a bunch of data and must find
patterns and relationships therein. – e.g. clustering via nearest neighbor algorithm
17. Supervised Learning
■ Binary classification: Does this person have that disease?
■ Regression:What is the market value of this house?
■ Multiclass classification: Digit recognition, Face recognition
18. Supervised Learning
■ Goal: Given a number of features, try to make sense out of it!
■ Example: Employee satisfaction rates – depends on ? So, given these
features in a dataset, try to predict the rate
23. Supervised Learning
■ But how do we adjust ourselves? How do we know at each step we are getting better?
■ Measurement of wrongness: Loss functions
25. Gradient descent
How do we know how to “roll down
the hill”?
The gradient (the derivatives of the
loss function over all of the individual
weights of features -i.e. parameters-)
tells us “which way is down”.
26. What exactly is deep learning?
■ “a network would need more than one hidden layer to be a deep network, networks
with one or two hidden layers are traditional neural networks…….”
■ “in my experience, a network can be considered deep when there is at least one
hidden layer.Although the term deep learning can be fuzzy, …”
■ “in my own thinking, deep is not related to the number of layers, but it talks about
how hard the feature to be discovered is…….”
■ - a discussion from StackExchange
27. Deep learning
■ What is the difference? Remember the quote fromYann LeCun from before? It goes
on:
■ “A pattern recognition system is like a black box with a camera at one end, a green
light and a red light on top, and a whole bunch of knobs on the front…. Now, imagine a
box with 500 million knobs, 1,000 light bulbs, and 10 million images to train it with.
That’s what a typical Deep Learning system is.”
28.
29. Aim: Learning features
■ Deep learning excels in tasks where the basic unit, a single
pixel, a single frequency, or a single word has very little
meaning in and of itself, but the combination of such units
has a useful meaning. It can learn these useful combinations
of values without any human intervention.
31. Neural networks
■ An input, output, and one or more hidden layers
of units/neurons/perceptrons
■ Each connection between two neurons has a
weight w (similar to the perceptron weights).
Best weights can again be found with gradient
descent.
Image courtesy of
http://ljs.academicdirect.org/A15/053_070.htm
32. Neural networks
■ Example: Input vector: [7, 1, 2] Into the input
units
■ These values are then propagated forward to the
hidden units using the weighted sum transfer
function for each hidden unit - forward
propagation -, which in turn calculate their
outputs - activation function -.
Image courtesy of
http://ljs.academicdirect.org/A15/053_070.htm
33. Neural networks
■ Why deep?
■ Number of parameterized transformations a
signal encounters as it propagates from the
input layer to the output layer, where a
parameterized transformation is a processing
unit that has trainable parameters, such as
weights.
Image courtesy of
http://ljs.academicdirect.org/A15/053_070.htm
34. Aim: Learning features
■ The goal of deep learning methods is to learn higher
levels of feature from lower level features.
35. Notes for Demo
■ Overfitting – there is such a thing as learning too much –or too specific-!
■ Regularization – a technique that prevents overfitting
36. Notes for Demo
■ Overfitting – there is such a thing as learning too much –or too specific-!
■ Regularization – a technique that prevents overfitting
46. Future of deep learning
■ Deep learning has a lot of hype right now, and it is apparent that it is very useful for
specific tasks.
■ What frontiers and challenges do you think are the most exciting for researchers in
the field of neural networks in the next ten years?
■ I cannot see ten years into the future. For me, the wall of fog starts at about 5 years. ...
I think that the most exciting areas over the next five years will be really understanding
videos and text. I will be disappointed if in five years time we do not have something
that can watch aYouTube video and tell a story about what happened. I have had a lot
of disappointments.
– From Geoffrey Hinton’s AMA on Reddit
48. Further resources
■ Introductory:
■ Andrew Ng’s Machine Learning course on Coursera
■ Geoffrey Hinton’s Neural Networks course on Coursera
■ Advanced:
■ Who is afraid of non-convex loss functions? ByYann LeCun http://videolectures.net/eml07_lecun_wia/
■ For those who like papers, recent advances:
■ Playing Atari with Deep Reinforcement Learning - http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
■ Unsupervised Face Detection - http://cs.stanford.edu/~quocle/faces_full.pdf
Neural nets – the most common deep learning structure
Sensing, reasoning and communicating. Within these macro areas, we can make more fine-grained distinctions related to speech and image recognition, different flavors of reasoning (e.g., logic versus evidence-based), and the generation of language to facilitate communication. In other words, cognition breaks down to taking stuff in, thinking about it and then telling someone what you have concluded.
How do you tell a car from a dog as a human?
Netflix, Amazon recommendations, how do they do it, how do we do it? – similar people (you like matrix, I liked matrix, I also liked terminator, maybe you’ll like it too?)
WHY are we doing this?
Siri, Cortana, Google now -> we do it so you don’t have to
Specialized brain cells, take signal, carry it,
Shallow architectures took much less time to train and had comparable or even higher accuracies
2006 paper - A fast learning algorithm for deep belief nets.
How would you (as a human) describe this?
I will talk about ml first
Challenge: Teaching a machine to tell the difference between a dog and a car
Supervised machine learning: The program is “trained” on a pre-defined set of “training examples”, which then facilitate its ability to reach an accurate conclusion when given new data.
Semi-supervised machine learning: The program infers the unknown labels through “label propagation”, utilizing similarities between different examples and inferring non-existent labels from existent ones
Unsupervised machine learning: The program is given a bunch of data and must find patterns and relationships therein. – e.g. clustering via nearest neighbor algorithm
Important thing vs non-important thing
What do we do in this case, as humans? – Draw the best fit line
Typically, the dataset is represented in a matrix where rows are examples and columns are features
The features generally have “coefficients” in the equation that we call weights.
This coefficient for our feature becomes the weight – something that implies how important this is
Grandmother example!
Ask 0-1 error, gradient
3D, more complicated cases
Units -> Very specialized little workers
Specialized cells brain
The figure above shows a network with a 3-unit input layer, 5-unit hidden layer and an output layer with 2 units
Weight is what makes neurons specialized, each gets different weight (toilet size less weight, salary more weight) think of it as thicker/thinner connection lines in between
Lots of small impulses, if below a threshold nothing happens but if you cross the threshold, you get an action potential
texture neurons
Hair face body part neurons
Posture neurons
White dog example
Paul Allen AI Research Institute project
Computer learning from text, graphics to pass the 4th grade exam on its own