Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop

4,860 views

Published on

Rich media is exploding all around us. From our personal usage to retailers monitoring store traffic for optimized associate placement, there is wide and growing application of rich media. Despite the pervasive usage, enterprises have had limited choice of generally available tools to analyze rich media. In this session we will look into leveraging deep learning algorithms for rich media analysis and provide practical hands on example of image recognition using Apache Hadoop and Spark.

Published in: Technology
  • Be the first to comment

Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop

  1. 1. Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning on HDP Dhruv Kumar – dkumar@hortonworks.com Solutions Engineer, Hortonworks 2015 Version 1.0
  2. 2. Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 2
  3. 3. Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scientists See Promise in Deep-Learning Programs John Markoff November 23, 2012 Rich Rashid in Tianjin, October, 25, 2012
  4. 4. Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  5. 5. Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Impact of deep learning in speech technologyWhere is it?
  6. 6. Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 6 ……Facebook’s foray into deep learning sees it following its competitors Google and Microsoft, which have used the approach to impressive effect in the past year. Google has hired and acquired leading talent in the field (see “ 10 Breakthrough Technologies 2013: Deep Learning”), and last year created software that taught itself to recognize cats and other objects by reviewing stills from YouTube videos. The underlying deep learning technology was later used to slash the error rate of Google’s voice recognition services (see “Google’s Virtual Brain Goes to Work”) ….Researchers at Microsoft have used deep learning to build a system that translates speech from English to Mandarin Chinese in real time (see “Microsoft Brings Star Trek’s Voice Translator to Life”). Chinese Web giant Baidu also recently established a Silicon Valley research lab to work on deep learning. September 20, 2013
  7. 7. Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 7
  8. 8. Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 8
  9. 9. Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 9
  10. 10. Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 10
  11. 11. Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 11
  12. 12. Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 12
  13. 13. Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved “The word is spreading in all corners of the tech industry that the biggest part of big data, the unstructured part, possesses learnable patterns that we now have the computing power and algorithmic leverage to discern…This change marks a true disruption, and there are fortunes to be made. There are also tremendous social consequences to consider that require as much creativity and investment as the more immediately lucrative deep learning startups that are popping up all over…”
  14. 14. Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved “Using an artificial intelligence technique inspired by theories about how the brain recognizes patterns, technology companies are reporting startling gains in fields as diverse as computer vision, speech recognition and the identification of promising new molecules for designing drugs. The advances have led to widespread enthusiasm among researchers who design software to perform human activities like seeing, listening and thinking. They offer the promise of machines that converse with humans and perform tasks like driving cars and working in factories, raising the specter of automated robots that could replace human workers.”
  15. 15. Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Google Trends for “Deep Learning” keyword
  16. 16. Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Enterprise use cases
  17. 17. Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning •  One of the many pattern recognition techniques in Data Science •  Excels at rich media applications: •  Image recognition •  Speech translation •  Voice recognition •  Loosely inspired by human brain models •  Synonymous with Artificial Neural Networks, Multi Layer Networks
  18. 18. Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved In this workshop -  Fundamentals of Deep Learning -  Implementation and Libraries in Real Life -  Demo!
  19. 19. Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So, 1. what exactly is deep learning ? And, 2. why is it generally better than other methods on image, speech and certain other types of data?
  20. 20. Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So, 1. what exactly is deep learning ? And, 2. why is it generally better than other methods on image, speech and certain other types of data? The short answers 1. ‘Deep Learning’ means using a neural network with several layers of nodes between input and output 2. the series of layers between input & output do feature identification and processing in a series of stages, just as our brains seem to.
  21. 21. Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved but: 3. multilayer neural networks have been around for 25 years. What’s actually new?
  22. 22. Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved but: 3. multilayer neural networks have been around for 25 years. What’s actually new? we have always had good algorithms for learning the weights in networks with 1 hidden layer but these algorithms are not good at learning the weights for networks with more hidden layers what’s new is: algorithms for training many-layer networks
  23. 23. Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Longer answers 1.  reminder/quick-explanation of how neural network weights are learned; 2.  the idea of unsupervised feature learning (why ‘intermediate features’ are important for difficult classification tasks, and how NNs seem to naturally learn them) 3.  The ‘breakthrough’ – the simple trick for training Deep neural networks
  24. 24. Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Neuron Quick Look 24
  25. 25. Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved W1 W2 W3 f(x) 1.4 -2.5 -0.06
  26. 26. Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 2.7 -8.6 0.002 f(x) 1.4 -2.5 -0.06 x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34
  27. 27. Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved A dataset Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …
  28. 28. Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training the neural network Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …
  29. 29. Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Initialise with random weights
  30. 30. Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Present a training pattern 1.4 2.7 1.9
  31. 31. Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Feed it through to get output 1.4 2.7 0.8 1.9
  32. 32. Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Compare with target output 1.4 2.7 0.8 0 1.9 error 0.8
  33. 33. Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Adjust weights based on error 1.4 2.7 0.8 0 1.9 error 0.8
  34. 34. Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Present a training pattern 6.4 2.8 1.7
  35. 35. Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Feed it through to get output 6.4 2.8 0.9 1.7
  36. 36. Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Compare with target output 6.4 2.8 0.9 1 1.7 error -0.1
  37. 37. Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Adjust weights based on error 6.4 2.8 0.9 1 1.7 error -0.1
  38. 38. Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … And so on …. 6.4 2.8 0.9 1 1.7 error -0.1 Repeat this thousands, maybe millions of times – each time taking a random training instance, and making slight weight adjustments Algorithms for weight adjustment are designed to make changes that will reduce the error
  39. 39. Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Initial random weights
  40. 40. Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Present a training instance / adjust the weights
  41. 41. Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Present a training instance / adjust the weights
  42. 42. Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Present a training instance / adjust the weights
  43. 43. Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Present a training instance / adjust the weights
  44. 44. Page 44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Eventually ….
  45. 45. Page 45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved In essence •  weight-learning algorithms for NNs are dumb •  they work by making thousands and thousands of tiny adjustments, each making the network do better at the most recent pattern, but perhaps a little worse on many others •  but, by dumb luck, eventually this tends to be good enough to learn effective classifiers for many real applications
  46. 46. Page 46 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Some other points Detail of a standard NN weight learning algorithm – later If f(x) is non-linear, a network with 1 hidden layer can, in theory, learn perfectly any classification problem. A set of weights exists that can produce the targets from the inputs. The problem is finding them.
  47. 47. Page 47 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Some other ‘by the way’ points If f(x) is linear, the NN can only draw straight decision boundaries (even if there are many layers of units)
  48. 48. Page 48 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Some other ‘by the way’ points NNs use nonlinear f(x) so they can draw complex boundaries, but keep the data unchanged
  49. 49. Page 49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Some other ‘by the way’ points NNs use nonlinear f(x) so they SVMs only draw straight lines, can draw complex boundaries, but they transform the data first but keep the data unchanged in a way that makes that OK
  50. 50. Page 50 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Feature detectors
  51. 51. Page 51 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What’s this unit doing?
  52. 52. Page 52 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hidden layer units become self-organised feature detectors … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight
  53. 53. Page 53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What does this unit detect? … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight
  54. 54. Page 54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What does this unit detect? … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight it will send strong signal for a horizontal line in the top row, ignoring everywhere else
  55. 55. Page 55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What does this unit detect? … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight
  56. 56. Page 56 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What does this unit detect? … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight Strong signal for a dark area in the top left corner
  57. 57. Page 57 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What features might you expect a good NN to learn, when trained with data like this?
  58. 58. Page 58 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 63 1 vertical lines
  59. 59. Page 59 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 63 1 horizontal lines
  60. 60. Page 60 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 63 1 Small circles
  61. 61. Page 61 © Hortonworks Inc. 2011 – 2014. All Rights Reserved successive layers can learn higher-level features … etc …detect lines in Specific positions v Higher level detetors ( horizontal line, “RHS vertical lune” “upper loop”, etc… etc …
  62. 62. Page 62 © Hortonworks Inc. 2011 – 2014. All Rights Reserved successive layers can learn higher-level features … etc …detect lines in Specific positions v Higher level detetors ( horizontal line, “RHS vertical lune” “upper loop”, etc… etc … What does this unit detect?
  63. 63. Page 63 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So: multiple layers make sense
  64. 64. Page 64 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So: multiple layers make sense Your brain works that way
  65. 65. Page 65 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So: multiple layers make sense Many-layer neural network architectures should be capable of learning the true underlying features and ‘feature logic’, and therefore generalise very well …
  66. 66. Page 66 © Hortonworks Inc. 2011 – 2014. All Rights Reserved But, until very recently, weight-learning algorithms simply did not work on multi-layer architectures
  67. 67. Page 67 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Along came deep learning …
  68. 68. Page 68 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs…
  69. 69. Page 69 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first
  70. 70. Page 70 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first then this layer
  71. 71. Page 71 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first then this layer then this layer
  72. 72. Page 72 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first then this layer then this layer then this layer
  73. 73. Page 73 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first then this layer then this layer then this layer finally this layer
  74. 74. Page 74 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… EACH of the (non-output) layers is trained to be an auto-encoder Basically, it is forced to learn good features that describe what comes from the previous layer
  75. 75. Page 75 © Hortonworks Inc. 2011 – 2014. All Rights Reserved an auto-encoder is trained, with an absolutely standard weight- adjustment algorithm to reproduce the input
  76. 76. Page 76 © Hortonworks Inc. 2011 – 2014. All Rights Reserved an auto-encoder is trained, with an absolutely standard weight- adjustment algorithm to reproduce the input By making this happen with (many) fewer units than the inputs, this forces the ‘hidden layer’ units to become good feature detectors
  77. 77. Page 77 © Hortonworks Inc. 2011 – 2014. All Rights Reserved intermediate layers are each trained to be auto encoders (or similar)
  78. 78. Page 78 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Final layer trained to predict class based on outputs from previous layers
  79. 79. Page 79 © Hortonworks Inc. 2011 – 2014. All Rights Reserved But, how does one train? 79 Overall, the NN is trying to minimize a cost function while adjusting the weights. To find the minima, Gradient Descent is used. If the training set is very large, GD can be too slow since each input is evaluated in the cost function. So, one can sample a subset of input for computing GD. If sampling is done at random, it is called Stochastic Gradient Descent. The implementation of this mathematical formulation is done by the Error Backpropagation Algorithm.
  80. 80. Page 80 © Hortonworks Inc. 2011 – 2014. All Rights Reserved And that’s that •  That’s the basic idea •  There are many many types of deep learning, •  Different kinds of autoencoder, variations on architectures and training algorithms, etc… •  Very fast growing area …
  81. 81. Page 81 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Doing this in real life
  82. 82. Page 82 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Typical Workflow 82 1. Ingest training data and store it 2. Split data set into: training, testing and validation sets 3. Vectorize and extract features to go into next step 4. Architect multi layer network, initialize 5. Feed data and train 6. Test and Validate 7. Repeat steps 4 and 5 until desired 8. Store model 9. Put model in app, start generalizing on real data.
  83. 83. Page 83 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So what do you get? 83 1. Ingest training data and store it using Kafka, Flume, good old web scraping 2. Split data set into: training, testing and validation sets 3. Vectorize and extract features to go into next step 4. Architect multi layer network, initialize 5. Feed data and train 6. Test and Validate 7. Repeat steps 4 and 5 until desired 8. Store model 9. Put model in app, start generalizing on real data. Steps 2, 3, 4 and 5: Use libraries such as Caffe, Theano, Deeplearning4j, H20
  84. 84. Page 84 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Distributed Deep Learning on Hadoop •  In ASF: Apache Singa – new incubator project •  Two main partners of Hortonworks: Skymind, and H20 •  Skymind is focussed on Deep Learning exclusively (deeplearning4j), H2O includes other ML libraries. •  Both provide scale out on HDP + Spark •  Skymind has GPU Acceleration built in - uses CUDA for doing linear algebra operations •  Both are open source, Apache licensed. 84
  85. 85. Page 85 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deeplearning4j Architecture 85
  86. 86. Page 86 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DL4J: Canova for Vectorization and Ingest •  Canova uses an input/output format system (similar to how Hadoop uses MapReduce) •  Supports all major types of input data (text, CSV, audio, image and video) •  Can be extended for specialized input formats •  Connects to Kafka 86
  87. 87. Page 87 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ND4J: •  N-dimensional vector library •  Scientific computing for JVM •  DL4J uses it to do linear algebra for backpropagation •  Supports GPUs via CUDA and Native via Jblas •  Deploys on Android •  DL4J code remains unchanged whether using GPU or CPU 87
  88. 88. Page 88 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 88 How to chose a Neural Net in DL4J core?
  89. 89. Page 89 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Demo!
  90. 90. Page 90 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thank You hortonworks.com

×