and Deep Learning
Prof dr. Max Welling
Drs. Jorgen Sandig
Msc. Taco Cohen
All purpose machine learning
Using Neural Networks:
- Using large amounts of data
- Learning very complex problems
- Automatically learning features
A new era of machine learning
Deep learning wins all competitions
- IJCNN 2011 Traffic Sign Recognition Competition
- ISBI 2012 Segmentation of neuronal structures in EM
- ICDAR 2011 Chinese
A lot of state of the art systems use deep learning to some
- IBMs Watson: Jeopardy contest 2011
- Google’s self-driving car
- Google Glasses
- Facebook face recognition
- Facebook user modelling
Mostly image and sound recognition tasks (difficult)
Linking neurons and training
- Initialize randomly
- Sequentially give it data.
- See what the difference is between
network output and actual output.
- Update the weights according to this error.
- End result: give a model input, and it
produces a proper output.
Quest for the weights. The weights are the
The Perceptron (1958)
“A machine which senses, recognizes, remembers, and responds like the human mind”
“Remarkable machine… [was] capable of what amounts to thought” - The New Yorker
Criticism and downfall (1969)
- Perceptrons are painfully limited. They can not even learn a
simple XOR function!
- No feasible way of learning networks with multiple layers
- Interest in neural networks close to fully disappeared
Renewed interest (90’s)
- Learning multiple layers
- “Back propagation”
- Can theoretically learn any
Very slow and inefficient
- Machine learning attention
towards SVMs, random forests
Deep learing (2006)
- Quest: Mimic human brain representations
- Large networks
- Lots of data
Simple back propagation fails
on large networks.
Deep learning (2006)
- Exactly same networks as
before, just BIGGER
- Combination of three factors:
- (Big data)
- Better algorithms
- Parallel computing (GPU)
Restricted Boltzmann machine
Pre-training: Learn the representation by parts!
Very strong unsupervised learning
After pre-training, use back propagation
Parallel (GPU) power
- Every set of weights can be stored as a matrix (w_ij)
- GPUs are made to do common parallel problems fast!
- All similar calculations done at the same time, huge performance boost.
- CPU parallelizing
Future of Deep Learning
- Currently an explosion of developments
- Hessian-Free networks (2010)
- Long Short Term Memory (2011)
- Large Convolutional nets, max-pooling (2011)
- Nesterov’s Gradient Descent (2013)
- Currently state of the art but...
- No way of doing logical inference (extrapolation)
- No easy integration of abstract knowledge
- Hypothetic space bias might not conform with reality
When to apply Deep Learning
- Generally, vision and sound
- Works great for any other problem too!
- A lot of data / features
- Don’t want to make your own features
- State of the art results
How to apply Deep Learning
Deep learning is very difficult!
- No easy plug and play software
- Far too many different networks/options/additions
- Mathematics and programming very challenging
- Research is fast paced
- Learning a network is both an art and a science
Cooperation university <=> business
How to apply Deep Learning
- For most current business problems, no need for
expensive hardware. e.g. we use a laptop