Deep Learning: Towards General Artificial Intelligence

Deep Learning: Towards General
Artificial Intelligence
Dr. Rukshan Batuwita (Machine Learning Scientist)
Senior Data Scientist
Ambiata Pvt Ltd, Sydney, Australia

What is Artificial Intelligence?
• Field of Study to Develop Machines that act
like Humans!
• Recent Definition: Develop Machines that act
rationally!

General AI
• Learning and Reasoning
• Planning
• Adaptability
• Vision
• Speech recognition
• Automation
• Mobility
• etc.

Narrow AI
• Learning and Reasoning
• Planning
• Adaptability
• Vision
• Speech recognition
• Automation
• Mobility
• etc.
• People have been working on for the last 50 years
• This has many applications
Narrow
AI

What is Machine Learning?
• Computer Program: Input to output mapping
Computer Program
(Algorithm/List of
Instructions)
Inputs Outputs
When we know the algorithm to solve a task, then we can program it
?Inputs Outputs
AI Problems:
Ex.
? Cat or Dog

In Machine Learning…
Algorithm
(Model) Cat or Dog
Machine Learning techniques
. . .
. . .
Train/Learn

Introduction to
Artificial Neural
Networks
(Biologically Inspired)

Biological Neurons
Inputs Outputs
Biological Brain Biological Neural Network
Biological Neuron
Processing/Computing

Biological Learning
• Biological Neuron
Learning happens due to some chemical reactions in synaptic connections
A synaptic connection
A typical adult human brain has
about 1014 synapses (connections)

Artificial Neuron
• A computational model
Y
y = f ( wj xj
j=1
d
å )
• Called ‘Perceptron’
• Introduced in 1960’s
• Weights can be learned by an optimization method
like Gradient Descent
Weights (represent
chemicals)
Inputs
Outputs
Processing
/Computing
Inputs
Inputs

Perceptron
Perceptron = Linear Regression
Perceptron = Logistic Regression
Activation Functions:
Y
y = f ( wj xj
j=1
d
å )

Artificial Neural Network
weights
weights
• Artificial Neurons are corrected together to form a network
• Called Multi Layer Perceptron (MLP)
• A Non-linear model of the parameters
• Trained by popular Backpropagation (Gradient Descent)

Backpropagation – Main Idea
1. Calculate Error/Loss = f(Label , Prediction)
2. Calculate Gradient/Derivative of the Loss w.r.t. each weight
3. In order to calculate the gradient of the inner weights,
apply the chain rule of derivatives
4. Update each weight in the direction of the negative gradient (Gradient
Descent)
Error = f(label, prediction)

• Quite popular in 1980’s and 1990’s
• Worked well for some pattern recognition
problems:
– Ex: Handwritten digit recognition Le-Net
used by US postal department
• Other ML methods (ex. Kernel methods such as
SVMs) dominated ANNs in early 2000’s
• Main problems of ANNs:
– Local-minima (since the loss function is non-convex)
– Difficult to train networks with more then 3/4 layers
• Overfitting
• Computational time
• Vanishing Gradient problem (e.g. when Sigmoid
activation is used)
• (didn’t work well in more complex problems like general
image classification)
Before 2006…
(LeCun et al., 1998)
Yan LeCun, NYU
Geoff Hinton,
Uni Torento
Yoshua Bengio,
Uni Montreal

After 2006…
• Several major breakthroughs happened giving birth to
Deep Learning
• In general, Deep Learning is nothing but good old
Neural Networks with many layers:
…
N
…
• Deep Learning methods have been significantly
outperforming the existing methods in major Computer
Vision and Speech Recognition competitions since 2010

ImageNet Results…
About 14M images of 100k categories/concepts

Main Advancements
made Deep Learning
possible

1. Unsupervised Feature Learning
• In classical Machine Learning:
Feature ExtractionRaw Data Feature pre-
processing
Model Learning
80%-90% of the effort (Human effort)
• In Deep Learning:
Feature LearningRaw Data Model Learning
Deep Learning
+
Model
Model
Feature Learning = Representation Learning = Embedding Learning

Feature Learning/Representation Learning
(Ex. Face Detection)
Layer 1
(Detects Edges)
Layer 2
(Detects Face parts
Combination of edges)
Deeper layer
(Detects Faces)
Input
Pixels
InputPixels

Techniques for Representation
Learning
1. Layer-wise unsupervised pre-training
1. Stacked Autoencoders
Input Output
Encode Decode
Edge Detectors
Autoencoder
• No labels required
• Unsupervised Training
Pixelinput
Pixeloutput

Stacked Autoencoders
1. Train one layer autoencoder at a time [unsupervised learning] and stack
them
2. Then train the final network using the available labels [supervised learning]
Low level features
Higher level features
Higher level features
INPUT LABEL
Learning
Input

1. Layer-wise unsupervised pre-training
2. Deep Belief Networks (Restricted Boltzmann
MMachines (RBM) are stacked together)
Learning

Learning
2. Deep Convolution Networks
Convolution Filters
Kernel/convolution matrix/mask/filter
Edge Detector
X_1 … …
… … …
… … X_9
W_1 … …
… … …
… … W_9
zi = xiwi
i=1
9
å
X
3x3 Image patch
Z
CONV( ),

Techniques for feature learning
Feature Extraction Classification
• Convolutional Filters (low-level and high-level) are also learned automatically with Backprop
Subsampling = average, max (max pooling) - noise reduction
Different types of filters result in
different feature maps

Inputlayer
W_1
x1
x1 x2
x3 x4
W_2
W_3
W_4
x5
x6
x2
x3
x4
x5
x6
W_1
W_2
W_3
W_4
… … …
…
…
…
W_1 W_2
W_3 W_4
2X2 filter
x5
x6
… … …
…
…
…
W_1 W_2
W_3 W_4
x1
x3
… … …
…
…
…
W_1 W_2
W_3 W_4
• Each layer is represented by connected neurons
• Each convolution layer is connected to the previous layers sparsely and with shared weights

• Convolution and Subsampling (Pooling) leads to detect translational invariance features
• Works with language (document classification, translation) and Voice recognition

Motivations for
Feature/Representation/Embeddin
g Learning

Motivations for
Feature/Representation learning
1. Cut down the effort of handcrafting features
2. Hierarchical, distributed, compositional knowledge
representations in Brain
– Humans organize their concepts and ideas hierarchically
– Humans first learn simple concepts and compose them
together to represent complex ideas
– Human problem solving/Engineering (multiple level of
abstractions)
– Human language understanding
– Pattern recognition in brain, etc.

Motivations for
• Hierarchical, distributed, compositional
knowledge representation/pattern recognition
in Brain
Pattern Recognition in Brain Pattern Recognition
In Deep Learning

Motivations for
3. Power of distributed, compositional
representations
• Concepts are represented as composition of features
at different levels
• The number of concepts can be represented grow
exponentially with the size of the network
Input
Low-level representations (e.x. edges) High-level representations

Motivations for
4. Manifold Learning
• Assumption: Input data has some structure (not 100%
random) which is concentrated in a lower-dimensional
manifold of the original features
• Ex: most of the arbitrary pixel value configurations don’t create
the images of faces
• Representation in each layer can be considered as a
learned manifold of the previous layer
28!
F or AI T a sk s: Ma n i f ol d st r uct ur e
• examples!concentrate!near!a!lower!dimensional!“manifold!
• Evidence:$most$input$configuraDons$are$unlikely$
Pixels (32*32 image)
E.x.

Motivations for
5. Transfer Learning
– Generalization: ability of a model to predict well on
unseen test data
– Representation of complex concepts -> Deep
Networks
– Good generalization of complex models like Deep
Neural Networks rely on the availability of large
number of labeled training data
– Most of the available data are not labeled
– In Transfer Learning
1. Train a Deep Network with unlabeled data in unsupervised
manner
2. Use the available labeled data to train the required model

Motivations for
5. Transfer Learning
Example: Image recognition model
. . .
Unsupervised
pre-training with unlabeled data
to learn the representations of
different levels of abstraction
Transfer the
knowledge
car
Supervised Learning with
available labeled data
...
Hu
man

Variations of Transfer Learning
• Multi-Instance Learning (when labels are not
available at the instance level)
Document Classification Model
Based on the similarity of the sentence/word
embedding [Kotzias, Denil and deFreitas, 2014]

• Max-margin Learning without labels
[From machine learning to machine reasoning, Leon Bottou, 2014]

• Max-margin Learning without labels
[NLP almost from scratch, Ronan Collobert et al., 2011]

Other advancements
made Deep Learning
possible

Other advancements…
• ‘Dropouts’ regularization for training with
Backpropagation for higher generalization
• Rectified Linear Functions instead of Sigmoid
(avoid vanishing gradient problem)

Other architectures…
• Memory Networks (LSTM)
– Question answering
• Recurrent Networks
– Detecting inputs with sequential relationships
(voice recognition)
• Combination of existing architectures

Improved Computing Power…
GPU Computation
– Parallel Neural Network Training on GPU clusters
(ideal for simple Matrix/Vector operations, hence for
backpropagation)
– Reduced the training time of deep networks from
weeks to days
– NVIDIA CUDA Deep Neural Network library

Improved Computing Power…
• Commodity Hardware
– Multi-core single machines, clusters, GUP clusters
• Open source software
– Torch (open source ML library,
https://github.com/torch/torch7/)
– From Yoshua Bengio’s group
http://deeplearning.net/software/theano/)
– Caffe
– Google TensorFlow

Industrial Applications of
Deep Learning
Techniques

Google Brain Project
– Started by Andrew Ng in 2011
– In 2012: Neural Network with 1 Billion connection
was trained across 16,000 CPU cores
– They considered this ANN as simulating a very small-
scale “newborn brain,” and show it YouTube video
for a week, what will it learn?
– Used an Unsupervised (Self-taught-learning) to learn
features from unlabeled Google images –
Autoencoder
– Exposed to fames of 10M YouTube videos over a
week
http://googleblog.blogspot.com.au/2012/06/using-large-scale-brain-simulations-for.html
http://static.googleusercontent.com/media/research.google.com/en//archive/unsupervised_icml2012.pdf
Andrew Ng,
Standford

Google Brain Project
What Happened?
• One of the artificial neurons learned to respond strongly
to pictures of Cats.

Evolution of Deep Leaning at Google
– Google has been heavily investing on Deep
Learning research
– In 2013 Google hired Geoff Hinton and acquired
his start-up company DNNResearch Inc.
– In 2014 they purchased a UK-based Machine
Learning company called DeepMind Technologies
for estimated $650 Million

Deep Mind
Apollo Program for AI
Working towards solving General AI with
Deep Reinforcement Learning….

DeepMind
• Famous paper: Applying Deep RL to train agents to play classic Atari games

DeepMind Video
• https://www.youtube.com/watch?v=V1eYniJ0
Rnk

AlphaGo
• Traditional Chinese game - Go
• The most complex board game of all
• Alpha Go beat the world champion in Go 4/5
Lee Sedol

Deep Dream
(http://deepdreamgenerator.com)
• What features will be picked up by Google’s
Deep ANNs?
Deep ANN
Original
Image
Original
Image + Recognized
Features

Google Voice Recognition (in Android and Search by Voice)
Deep Learning Products at Google
Google search by Image
(Search for similar images to an uploaded
image)

Facebook
• Yann Lecun is the head of Facebook AI Research
• Face Recognition: Deep Face
• claim to have close to human-level performance
• Personal Assistant:
Facebook M

Other…
• Microsoft Cortana, Skype Translate
• Nvida Self Driving Cars
• Image Captioning Systems
• Siemens Medical Image Diagnostics

Deep Learning in Robotics
• Computer Vision, Speech Recognition and NLP
are direct applications in Robotics
• Training Robots to do specific tasks through
Deep Learning
– At UC Berkley: Train robot to perform tasks via
trial and error (e.x. screw a cap into water bottle)

Deep Learning in Robotics
• At Cornell: Deep Learning for detecting
Robotic Grasps (using Baxter)
Deep Learning for Detecting Robotic Grasps, Ian Lenz, Honglak Lee, Ashutosh Saxena. To
appear in International Journal of Robotics Research (IJRR), 2014.
http://pr.cs.cornell.edu/deepgrasping/

Challenges
• So far worked only in Patter-recognition
domains where there is good structural
patterns in the input data (Vision, Voice,
Language)
• With other kind of datasets (finance,
marketing, human behavior, biology), there
are not any known applications

Resources
Yann Lecun,
NYU, Facebook AI Research
Geoff Hinton,
Uni Torento, Google
Yoshua Bengio,
Uni Montreal
Andrew Ng,
Standford, Baidu Nando De Freitas,
Oxford, Deepmind
Key players for talks, lectures, papers, tutorials,
datasets, etc.

Deep Learning: Towards General Artificial Intelligence

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Deep Learning: Towards General Artificial Intelligence

Similar to Deep Learning: Towards General Artificial Intelligence (20)

Recently uploaded

Recently uploaded (20)

Deep Learning: Towards General Artificial Intelligence