Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning: Towards General Artificial Intelligence


Published on

For the past several years Deep Learning methods have revolutionized the areas in Pattern Recognition, namely, Computer Vision, Speech Recognition, Natural Language Processing etc. These techniques have been mainly developed by academics, closely working with tech giants such as Google, Microsoft and Facebook where the research outcomes have been successfully integrated into commercial products such as Google image and voice search, Google Translate, Microsoft Cortana, Facebook M and many more interesting applications that are yet to come. More recently, Google DeepMind Technologies has been working on Artificial General Intelligence using Deep Reinforcement Learning methods, where their AlphaGo system beat the world champion of the complex Chinese game 'Go' in March 2016. This talk will present a thorough introduction to major Deep Learning techniques, recent breakthroughs and some exciting applications.

Published in: Data & Analytics
  • Develops your Dog's "Hidden Intelligence" To eliminate bad behavior and Create the obedient, well-behaved pet of your dreams... ◆◆◆
    Are you sure you want to  Yes  No
    Your message goes here
  • If only we knew about this 10 years ago! I wasted a ton of money on garbage 'stop snoring' products like mouth guards, throat sprays, lozenges and nasal strips, to name just a few! None of them worked. My doctor explained to me that the only way I was going to fix my snoring was with an operation, although he did say it was a last resort. I am so glad I didn't risk it because after finding your program my snoring has considerably decreased! If only I knew about this 10 years ago! 
    Are you sure you want to  Yes  No
    Your message goes here
  • Is Your Ex With a Woman? Don't lose your Ex boyfriend! This weird trick will get him back! ♥♥♥
    Are you sure you want to  Yes  No
    Your message goes here
  • How can I lose fat in 2 weeks? ➤➤
    Are you sure you want to  Yes  No
    Your message goes here

Deep Learning: Towards General Artificial Intelligence

  1. 1. Deep Learning: Towards General Artificial Intelligence Dr. Rukshan Batuwita (Machine Learning Scientist) Senior Data Scientist Ambiata Pvt Ltd, Sydney, Australia
  2. 2. What is Artificial Intelligence? • Field of Study to Develop Machines that act like Humans! • Recent Definition: Develop Machines that act rationally!
  3. 3. General AI • Learning and Reasoning • Planning • Adaptability • Vision • Speech recognition • Automation • Mobility • etc.
  4. 4. Narrow AI • Learning and Reasoning • Planning • Adaptability • Vision • Speech recognition • Automation • Mobility • etc. • People have been working on for the last 50 years • This has many applications Narrow AI
  5. 5. What is Machine Learning? • Computer Program: Input to output mapping Computer Program (Algorithm/List of Instructions) Inputs Outputs When we know the algorithm to solve a task, then we can program it ?Inputs Outputs AI Problems: Ex. ? Cat or Dog
  6. 6. In Machine Learning… Algorithm (Model) Cat or Dog Machine Learning techniques . . . . . . Train/Learn
  7. 7. Introduction to Artificial Neural Networks (Biologically Inspired)
  8. 8. Biological Neurons Inputs Outputs Biological Brain Biological Neural Network Biological Neuron Processing/Computing
  9. 9. Biological Learning • Biological Neuron Learning happens due to some chemical reactions in synaptic connections A synaptic connection A typical adult human brain has about 1014 synapses (connections)
  10. 10. Artificial Neuron • A computational model Y y = f ( wj xj j=1 d å ) • Called ‘Perceptron’ • Introduced in 1960’s • Weights can be learned by an optimization method like Gradient Descent Weights (represent chemicals) Inputs Outputs Processing /Computing Inputs Inputs
  11. 11. Perceptron Perceptron = Linear Regression Perceptron = Logistic Regression Activation Functions: Y y = f ( wj xj j=1 d å )
  12. 12. Artificial Neural Network weights weights • Artificial Neurons are corrected together to form a network • Called Multi Layer Perceptron (MLP) • A Non-linear model of the parameters • Trained by popular Backpropagation (Gradient Descent)
  13. 13. Backpropagation – Main Idea 1. Calculate Error/Loss = f(Label , Prediction) 2. Calculate Gradient/Derivative of the Loss w.r.t. each weight 3. In order to calculate the gradient of the inner weights, apply the chain rule of derivatives 4. Update each weight in the direction of the negative gradient (Gradient Descent) Error = f(label, prediction)
  14. 14. Evolution of Neural Networks
  15. 15. • Quite popular in 1980’s and 1990’s • Worked well for some pattern recognition problems: – Ex: Handwritten digit recognition Le-Net used by US postal department • Other ML methods (ex. Kernel methods such as SVMs) dominated ANNs in early 2000’s • Main problems of ANNs: – Local-minima (since the loss function is non-convex) – Difficult to train networks with more then 3/4 layers • Overfitting • Computational time • Vanishing Gradient problem (e.g. when Sigmoid activation is used) • (didn’t work well in more complex problems like general image classification) Before 2006… (LeCun et al., 1998) Yan LeCun, NYU Geoff Hinton, Uni Torento Yoshua Bengio, Uni Montreal
  16. 16. After 2006… • Several major breakthroughs happened giving birth to Deep Learning • In general, Deep Learning is nothing but good old Neural Networks with many layers: … N … • Deep Learning methods have been significantly outperforming the existing methods in major Computer Vision and Speech Recognition competitions since 2010
  17. 17. ImageNet Results… About 14M images of 100k categories/concepts
  18. 18. Main Advancements made Deep Learning possible
  19. 19. 1. Unsupervised Feature Learning • In classical Machine Learning: Feature ExtractionRaw Data Feature pre- processing Model Learning 80%-90% of the effort (Human effort) • In Deep Learning: Feature LearningRaw Data Model Learning Deep Learning + Model Model Feature Learning = Representation Learning = Embedding Learning
  20. 20. Feature Learning/Representation Learning (Ex. Face Detection) Layer 1 (Detects Edges) Layer 2 (Detects Face parts Combination of edges) Deeper layer (Detects Faces) Input Pixels InputPixels
  21. 21. Techniques for Representation Learning 1. Layer-wise unsupervised pre-training 1. Stacked Autoencoders Input Output Encode Decode Edge Detectors Autoencoder • No labels required • Unsupervised Training Pixelinput Pixeloutput
  22. 22. Stacked Autoencoders 1. Train one layer autoencoder at a time [unsupervised learning] and stack them 2. Then train the final network using the available labels [supervised learning] Low level features Higher level features Higher level features INPUT LABEL Techniques for Representation Learning Input
  23. 23. 1. Layer-wise unsupervised pre-training 2. Deep Belief Networks (Restricted Boltzmann MMachines (RBM) are stacked together) Techniques for Representation Learning
  24. 24. Techniques for Representation Learning 2. Deep Convolution Networks Convolution Filters Kernel/convolution matrix/mask/filter Edge Detector X_1 … … … … … … … X_9 W_1 … … … … … … … W_9 zi = xiwi i=1 9 å X 3x3 Image patch Z CONV( ),
  25. 25. Techniques for feature learning 2. Deep Convolution Networks Feature Extraction Classification • Convolutional Filters (low-level and high-level) are also learned automatically with Backprop Subsampling = average, max (max pooling) - noise reduction Different types of filters result in different feature maps
  26. 26. Techniques for feature learning 2. Deep Convolution Networks Inputlayer W_1 x1 x1 x2 x3 x4 W_2 W_3 W_4 x5 x6 x2 x3 x4 x5 x6 W_1 W_2 W_3 W_4 … … … … … … W_1 W_2 W_3 W_4 2X2 filter x5 x6 … … … … … … W_1 W_2 W_3 W_4 x1 x3 … … … … … … W_1 W_2 W_3 W_4 • Each layer is represented by connected neurons • Each convolution layer is connected to the previous layers sparsely and with shared weights
  27. 27. Techniques for feature learning 2. Deep Convolution Networks • Convolution and Subsampling (Pooling) leads to detect translational invariance features • Works with language (document classification, translation) and Voice recognition
  28. 28. Motivations for Feature/Representation/Embeddin g Learning
  29. 29. Motivations for Feature/Representation learning 1. Cut down the effort of handcrafting features 2. Hierarchical, distributed, compositional knowledge representations in Brain – Humans organize their concepts and ideas hierarchically – Humans first learn simple concepts and compose them together to represent complex ideas – Human problem solving/Engineering (multiple level of abstractions) – Human language understanding – Pattern recognition in brain, etc.
  30. 30. Motivations for Feature/Representation learning • Hierarchical, distributed, compositional knowledge representation/pattern recognition in Brain Pattern Recognition in Brain Pattern Recognition In Deep Learning
  31. 31. Motivations for Feature/Representation learning 3. Power of distributed, compositional representations • Concepts are represented as composition of features at different levels • The number of concepts can be represented grow exponentially with the size of the network Input Low-level representations (e.x. edges) High-level representations
  32. 32. Motivations for Feature/Representation learning 4. Manifold Learning • Assumption: Input data has some structure (not 100% random) which is concentrated in a lower-dimensional manifold of the original features • Ex: most of the arbitrary pixel value configurations don’t create the images of faces • Representation in each layer can be considered as a learned manifold of the previous layer 28! F or AI T a sk s: Ma n i f ol d st r uct ur e • examples!concentrate!near!a!lower!dimensional!“manifold! • Evidence:$most$input$configuraDons$are$unlikely$ Pixels (32*32 image) E.x.
  33. 33. Motivations for Feature/Representation learning 5. Transfer Learning – Generalization: ability of a model to predict well on unseen test data – Representation of complex concepts -> Deep Networks – Good generalization of complex models like Deep Neural Networks rely on the availability of large number of labeled training data – Most of the available data are not labeled – In Transfer Learning 1. Train a Deep Network with unlabeled data in unsupervised manner 2. Use the available labeled data to train the required model
  34. 34. Motivations for Feature/Representation learning 5. Transfer Learning Example: Image recognition model . . . Unsupervised pre-training with unlabeled data to learn the representations of different levels of abstraction Transfer the knowledge car Supervised Learning with available labeled data ... Hu man
  35. 35. Variations of Transfer Learning • Multi-Instance Learning (when labels are not available at the instance level) Document Classification Model Based on the similarity of the sentence/word embedding [Kotzias, Denil and deFreitas, 2014]
  36. 36. Variations of Transfer Learning • Max-margin Learning without labels [From machine learning to machine reasoning, Leon Bottou, 2014]
  37. 37. Variations of Transfer Learning • Max-margin Learning without labels [NLP almost from scratch, Ronan Collobert et al., 2011]
  38. 38. Other advancements made Deep Learning possible
  39. 39. Other advancements… • ‘Dropouts’ regularization for training with Backpropagation for higher generalization • Rectified Linear Functions instead of Sigmoid (avoid vanishing gradient problem)
  40. 40. Other architectures… • Memory Networks (LSTM) – Question answering • Recurrent Networks – Detecting inputs with sequential relationships (voice recognition) • Combination of existing architectures
  41. 41. Improved Computing Power… GPU Computation – Parallel Neural Network Training on GPU clusters (ideal for simple Matrix/Vector operations, hence for backpropagation) – Reduced the training time of deep networks from weeks to days – NVIDIA CUDA Deep Neural Network library
  42. 42. Improved Computing Power… • Commodity Hardware – Multi-core single machines, clusters, GUP clusters • Open source software – Torch (open source ML library, – From Yoshua Bengio’s group – Caffe – Google TensorFlow
  43. 43. Industrial Applications of Deep Learning Techniques
  44. 44. Google Brain Project – Started by Andrew Ng in 2011 – In 2012: Neural Network with 1 Billion connection was trained across 16,000 CPU cores – They considered this ANN as simulating a very small- scale “newborn brain,” and show it YouTube video for a week, what will it learn? – Used an Unsupervised (Self-taught-learning) to learn features from unlabeled Google images – Autoencoder – Exposed to fames of 10M YouTube videos over a week Andrew Ng, Standford
  45. 45. Google Brain Project What Happened? • One of the artificial neurons learned to respond strongly to pictures of Cats.
  46. 46. Evolution of Deep Leaning at Google – Google has been heavily investing on Deep Learning research – In 2013 Google hired Geoff Hinton and acquired his start-up company DNNResearch Inc. – In 2014 they purchased a UK-based Machine Learning company called DeepMind Technologies for estimated $650 Million
  47. 47. Deep Mind Apollo Program for AI Working towards solving General AI with Deep Reinforcement Learning….
  48. 48. DeepMind • Famous paper: Applying Deep RL to train agents to play classic Atari games
  49. 49. DeepMind Video • Rnk
  50. 50. AlphaGo • Traditional Chinese game - Go • The most complex board game of all • Alpha Go beat the world champion in Go 4/5 Lee Sedol
  51. 51. Deep Dream ( • What features will be picked up by Google’s Deep ANNs? Deep ANN Original Image Original Image + Recognized Features
  52. 52. Google Voice Recognition (in Android and Search by Voice) Deep Learning Products at Google Google search by Image (Search for similar images to an uploaded image)
  53. 53. Facebook • Yann Lecun is the head of Facebook AI Research • Face Recognition: Deep Face • claim to have close to human-level performance • Personal Assistant: Facebook M
  54. 54. Other… • Microsoft Cortana, Skype Translate • Nvida Self Driving Cars • Image Captioning Systems • Siemens Medical Image Diagnostics
  55. 55. Deep Learning in Robotics • Computer Vision, Speech Recognition and NLP are direct applications in Robotics • Training Robots to do specific tasks through Deep Learning – At UC Berkley: Train robot to perform tasks via trial and error (e.x. screw a cap into water bottle)
  56. 56. Deep Learning in Robotics • At Cornell: Deep Learning for detecting Robotic Grasps (using Baxter) Deep Learning for Detecting Robotic Grasps, Ian Lenz, Honglak Lee, Ashutosh Saxena. To appear in International Journal of Robotics Research (IJRR), 2014.
  57. 57. Challenges • So far worked only in Patter-recognition domains where there is good structural patterns in the input data (Vision, Voice, Language) • With other kind of datasets (finance, marketing, human behavior, biology), there are not any known applications
  58. 58. Resources Yann Lecun, NYU, Facebook AI Research Geoff Hinton, Uni Torento, Google Yoshua Bengio, Uni Montreal Andrew Ng, Standford, Baidu Nando De Freitas, Oxford, Deepmind Key players for talks, lectures, papers, tutorials, datasets, etc.
  59. 59. Thank you!