Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-12-14

6,113 views

Published on

This talk covers 4 configurations of deep learning to solve different types of application needs. Also, strategies for speed up and real-time scoring are discussed.

Published in: Data & Analytics

Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-12-14

  1. 1. © 2015 ligaDATA, Inc. All Rights Reserved. Using Deep Learning to do Real-Time Scoring in Practical Applications Deep Learning Applications Meetup, Monday, 12/14/2015, Mountain View, CA http://www.meetup.com/Deep-Learning-Applications/events/227217853/ By Greg Makowski www.Linkedin.com/in/GregMakowski greg@LigaDATA.com Community @ http://Kamanja.org Try out
  2. 2. Deep Learning - Outline •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network ConfiguraBons for PracBcal ApplicaBons –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  ConvoluBonal (shiK invariance in Bme or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (R, H2O, DL4J, TensorFlow, Gorila, Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  ConBnuous Space Word Models (i.e. word2vec)
  3. 3. Gartner’s Top 2016 Strategic Technology Trends David Clearley
  4. 4. Gartner’s Top 2016 Strategic Technology Trends David Clearley
  5. 5. Advantages of a Net over Regression 5 field 1 field 2 $ c $ $ $ $ $ $ $ $ $ $ $ $ $ c c c c c c c c c c c c c c c c c c c c c c c c A Regression SoluBon “Linear” Fit one Line $ c Target values for a data point with source field values graphed by “field 1” and “field 2” Showing ONE target field, with values of $ or c https://en.wikipedia.org/wiki/Regression_analysis
  6. 6. Advantages of a Net over Regression 6 field 1 field 2 $ c $ $ $ $ $ $ $ $ $ $ $ $ $ c c c c c c c c c c c c c c c c c c c c c c c c A Neural Net SoluBon “Non-Linear” Several regions which are not adjacent Hidden nodes can be line or circle https://en.wikipedia.org/wiki/Artificial_neural_network
  7. 7. A Comparison of a Neural Net and Regression A Logis(c regression formula: Y = f( a0 + a1*X1 + a2*X2 + a3*X3) a* are coefficients Backpropaga(on, cast in a similar form: H1 = f(w0 + w1*I1 + w2*I2 + w3*I3) H2 = f(w4 + w5*I1 + w6*I2 + w7*I3) : Hn = f(w8 + w9*I1 + w10*I2 + w11*I3) O1 = f(w12 + w13*H1 + .... + w15*Hn) On = .... w* are weights, AKA coefficients I1..In are input nodes or input variables. H1..Hn are hidden nodes, which extract features of the data. O1..On are the outputs, which group disjoint categories. Look at raBo of training records v.s. free parameters (complexity, regularizaBon) a0 a1 a2 a3 X1 X2 X3 Y Input 1 I2 I3 Bias H1 Hidden 2 Output w1 w2 w3
  8. 8. Think of SeparaBng Land vs. Water 8 1 line, Regression (more errors) 5 Hidden Nodes in a Neural Network Different algorithms use different Basis Functions: •  One line •  Many horizontal & vertical lines •  Many diagonal lines •  Circles Decision Tree 12 splits (more elements, Less computation) Q) What is too detailed? “Memorizing high tide boundary” and applying it at all times
  9. 9. Deep Learning - Outline •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network ConfiguraBons for PracBcal ApplicaBons –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  ConvoluBonal (shiK invariance in Bme or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (R, H2O, DL4J, TensorFlow, Gorila, Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  ConBnuous Space Word Models (i.e. word2vec) http://deeplearning.net/ http://www.kdnuggets.com/ http://www.analyticbridge.com/
  10. 10. Leading up to an Auto Encoder •  Supervised Learning –  Regression, Tree or Net: 50 inputs à 1 output –  Possible nets: •  256 à 120 à 1 •  256 à 120 à 5 (trees, regressions and most are limited to 1 output) •  256 à 120 à 60 à 1 •  256 à 180 à 120 à 60 à 1 (start gemng into training stability problems, with old processes) •  Unsupervised Learning –  Clustering (tradiBonal unsupervised): •  60 inputs (no target); produce 1-2 new (cluster ID & distance)
  11. 11. Auto Encoder (like data compression) Relate input to output, through compressed middle •  Supervised Learning –  Regression, Tree or Net: 50 inputs à 1 output –  Possible nets: •  256 à 120 à 1 •  256 à 120 à 5 (trees, regressions, SVD and most are limited to 1 output) •  256 à 120 à 60 à 1 •  256 à 180 à 120 à 60 à 1 •  Unsupervised Learning –  Clustering (tradiBonal unsupervised): •  60 inputs (no target); produce 1-2 new (cluster ID & distance) –  Unsupervised training of a net, assign (target record == input record) AUTO-ENCODING –  Train net in stages, •  256 à 180 à 256 à 120 à à 120 à à 120 à •  Add supervised layer to forecast 10 target categories à 10 Because of symmetry, Only need to update mirrored weights once (start getting long training times to stabilize, or may not finish, The BREAKTHROUGH provided by DEEP LEARNING) 4 hidden layers w/ unsupervised training 1 layer at end w/ supervised traininghttps://en.wikipedia.org/wiki/Deep_learning
  12. 12. Auto Encoder How it can be generally used to solve problems •  Add supervised layer to forecast 10 target categories –  4 hidden layers trained with unuspervised training, –  1 new layer, trained with supervised learning à 10 •  Outlier detecBon •  The “acBvaBon” at each of the 120 output nodes indicates the “match” to that cluster or compressed feature •  When scoring new records, can detect outliers with a process like If ( max_output_match < 0.333) then suspected outlier •  How is it like PCA? –  Individual hidden nodes in the same layer are “different” or “orthogonal”
  13. 13. How Transferable are Features in Deep Neural Networks? http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf
  14. 14. Deep Learning - Outline •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network ConfiguraBons for PracBcal ApplicaBons –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  ConvoluBonal (shiK invariance in Bme or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (R, H2O, DL4J, TensorFlow, Gorila, Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  ConBnuous Space Word Models (i.e. word2vec)
  15. 15. Deep Learning Caused a 50% ReducBon in Speech recogniBon error rates in 4 yrs “The use of deep neural nets in producBon speech systems really started more like in 2011... I would esBmate that from the Bme before deep neural nets were used unBl now, the error rate on producBon speech systems fell from about 20% down to below 10%, so more than a 50% reducBon in error rate.” - Jeff Dean email to Greg 12/13/2015 http://research.google.com/people/jeff/ Senior Fellow in the Knowledge Group Google Drop in Speech Rec. Error Rates Deep Learning Deployments Started 2011
  16. 16. Internet of Things (IoT) is heavily signal data http://www.datasciencecentral.com/profiles/blogs/the-internet-of-things-data-science-and-big-data
  17. 17. ConvoluBonal Neural Net (CNN) Enables detecBng shiK invariant paxerns In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat. Neural Nets can be explicitly trained to provide a FFT (Fast Fourier Transform) to convert data from time domain to the frequency domain – but typically an explicit FFT is used Internet of Things Signal Data
  18. 18. ConvoluBonal Neural Net (CNN) Enables detecBng shiK invariant paxerns In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat. Solution: use a siding convolution to detect the pattern CNN can use very long observational windows, up to 400 ms, long context
  19. 19. ConvoluBon https://en.wikipedia.org/wiki/Convolution
  20. 20. ConvoluBon Neural Net: from LeNet-5 Gradient-Based Learning Applied to Document RecogniBon Proceedings of the IEEE, Nov 1998 Yann LeCun, Leon Boxou, Yoshua Bengio and Patrick Haffner Director Facebook, AI Research http://yann.lecun.com/
  21. 21. Auto Encoder (like data compression) Relate input to output, through compressed middle
  22. 22. ConvoluBon Neural Net (CNN) •  How is a CNN trained differently than a typical back propagaBon (BP) network? –  Parts of the training which is the same: •  Present input record •  Forward pass through the network •  Back propagate error (i.e. per epoch) –  Different parts of training: •  Some connecBons are CONSTRAINED to the same value –  The connecBons for the same paxern, sliding over all input space •  Error updates are averaged and applied equally to the one set of weight values •  End up with the same paxern detector feeding many nodes at the next level http://www.cs.toronto.edu/~rgrosse/icml09-cdbn.pdf Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, 2009
  23. 23. ConvoluBon Neural Net (CNN) Same Low Level Features http://stats.stackexchange.com/questions/146413/why-convolutional-neural-networks-belong-to-deep-learning
  24. 24. The Mammalian Visual Cortex is Hierarchical (The Brain is a Deep Neural Net - Yann LeCun) http://www.pamitc.org/cvpr15/files/lecun-20150610-cvpr-keynote.pdf 0 1 2 3 4 5 6 7 8 9 1011
  25. 25. ConvoluBon Neural Net (CNN) Facebook example https://gigaom.com/2014/03/18/facebook-shows-off-its-deep-learning-skills-with-deepface/
  26. 26. ConvoluBon Neural Net (CNN) Yahoo + Stanford example – find a face in a pic, even upside down http://www.dailymail.co.uk/sciencetech/article-2958597/Facial-recognition-breakthrough-Deep-Dense-software-spots-faces-images-partially-hidden-UPSIDE-DOWN.html
  27. 27. ConvoluBonal Neural Nets (CNN) RoboBc Grasp DetecBon (IoT) http://pjreddie.com/media/files/papers/grasp_detection_1.pdf
  28. 28. Deep Learning - Outline •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network ConfiguraBons for PracBcal ApplicaBons –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  ConvoluBonal (shiK invariance in Bme or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (R, H2O, DL4J, TensorFlow, Gorila, Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  ConBnuous Space Word Models (i.e. word2vec)
  29. 29. Real Time Scoring OpBmizaBons •  Auto-Encoding nets –  Can grow to millions of connecBons, and start to get computaBonal –  Can reduce connecBons by 5% to 25+% with pruning & retraining •  Train with increased regularizaBon semngs •  Drop connec(ons with near zero weights, then retrain •  Drop nodes with fan in connecBons which don’t get used much later, such as in your predicBve problem •  Perform sensiBvity analysis – delete possible input fields •  ConvoluBonal Neural Nets –  With large enough data, can even skip the FFT preprocessing step –  Can use wider than 10ms audio sampling rates for speed up •  Implement other preprocessing as lookup tables (i.e. Bayesian Priors) •  Use cloud compuBng, do not limit to device compuBng •  Large models don’t fit à use model or data parallelism to train
  30. 30. © 2015 ligaDATA, Inc. All Rights Reserved. 30 ligaDATA Real Time Scoring Lambda Architecture – for both Batch and Real Time •  First architecture to really define how batch and stream processing can work together •  Founded on the concepts of immutability and re-computaBon, with human fault tolerance •  Pre-computes the results of batch & real-Bme processes as a set of views, & query layer merges the views https://en.wikipedia.org/wiki/Lambda_architecture
  31. 31. © 2015 ligaDATA, Inc. All Rights Reserved. 31 ligaDATA Real Time Scoring Lambda Architecture With Kamanja Kamanja Decisions Transformations Enrichment Aggregations Master Dataset Real time Views & Indexing Serving Layer Query Query Real-time Data •  Kamanja embraces and extends Lambda architecture •  Transform and process messages in real-Bme, combine messages with historical data and compute real-Bme views to make real-Bme decisions based on the views Queue
  32. 32. © 2015 ligaDATA, Inc. All Rights Reserved. 32 ligaDATA Real Time Compu(ng Kamanja Technology Stack Kamanja (PMML, Java or Scala Consumer) High level languages / abstractions Compute Fabric Cloud, EC2 Internal Cloud Security Kerberos Real Time Streaming Kafka, MQ Spark* ligaDATA Data Store HBase, Cassandra, InfluxDB HDFS (Create adaptors to integrate others) Resource Management Zookeeper, Yarn*, Mesos* High Level Languages / Abstractions PMML Producers, MLlib
  33. 33. Deep Net Tools
  34. 34. Deep Learning - Outline •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network ConfiguraBons for PracBcal ApplicaBons –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  ConvoluBonal (shiK invariance in Bme or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (R, H2O, DL4J, TensorFlow, Gorila, Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  ConBnuous Space Word Models (i.e. word2vec)
  35. 35. Deep Reinforcement Learning, Q-Learning http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind https://en.wikipedia.org/wiki/Reinforcement_learning https://en.wikipedia.org/wiki/Q-learning Think in terms of IoT…. Device agent measures, infers user’s acBon Maximizes future reward, recommends to user or system
  36. 36. Deep Reinforcement Learning, Q-Learning (Think about IoT possibiliBes) http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind Use 4 screen shots
  37. 37. Deep Reinforcement Learning, Q-Learning http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind Use 4 screen shots Use 4 screen shots IoT challenge: How to replace game score with IoT score? Shift right fast shift right stay shift left shift left fast
  38. 38. Deep Reinforcement Learning, Q-Learning http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind Games w/ best Q-learning Video Pinball Breakout Star Gunner Crazy Climber Gopher
  39. 39. Deep Learning - Outline •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network ConfiguraBons for PracBcal ApplicaBons –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  ConvoluBonal (shiK invariance in Bme or space for voice, image or IoT) –  Real Time Scoring –  Deep Net libraries and tools (R, H2O, DL4J, TensorFlow, Gorila, Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  ConBnuous Space Word Models (i.e. word2vec)
  40. 40. ConBnuous Space Word Models (word2vec) •  Before (a predicBve “Bag of Words” model): –  One row per document, paragraph or web page –  Binary word space: 10k to 200k columns, one per word or phrase 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 …. “This word space model is ….” –  The “Bag of words model” relates input record to a target category
  41. 41. ConBnuous Space Word Models (word2vec) •  Before (a predicBve “Bag of Words” model): –  One row per document, paragraph or web page –  Binary word space: 10k to 200k columns, one per word or phrase 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 …. “This word space model is ….” –  The “Bag of words model” relates input record to a target category •  New: –  One row per word (word2vec), possibly per sentence (sent2vec) –  Con(nuous word space: 100 to 300 columns, conBnuous values .01 .05 .02 .00 .00 .68 .01 .01 .35 ... .00 à “King” .00 .00 .05 .01 .49 .52 .00 .11 .84 ... .01 à “Queen” –  The deep net training resulted in an Emergent Property: •  Numeric geometry locaBon relates to concept space •  “King” – “man” + “woman” = “Queen” (math to change gender relaBon) •  “USA” – “Washington DC” + “England” = “London” (math for capital relaBon)
  42. 42. ConBnuous Space Word Models (word2vec) How to SCALE to larger vocabularies? http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
  43. 43. Training ConBnuous Space Word Models •  How to Train These Models? –  Raw data: “This example sentence shows the word2vec model training.”
  44. 44. Training ConBnuous Space Word Models •  How to Train These Models? –  Raw data: “This example sentence shows the word2vec model training.” –  Training data (with target values underscored, and other words as input) “This example sentence shows word2vec” (prune “the”) “example sentence shows word2vec model” “sentence shows word2vec model training” –  The context of the 2 to 5 prior and following words predict the middle word –  Deep Net model architecture, data compression to 300 conBnuous nodes •  50k binary word input vector à ... à 300 à ... à 50k word target vector
  45. 45. Training ConBnuous Space Word Models •  How to Train These Models? –  Raw data: “This example sentence shows the word2vec model training.” –  Training data (with target values underscored, and other words as input) “This example sentence shows word2vec” (prune “the”) “example sentence shows word2vec model” “sentence shows word2vec model training” –  The context of the 2 to 5 prior and following words predict the middle word –  Deep Net model architecture, data compression to 300 conBnuous nodes •  50k binary word input vector à ... à 300 à ... à 50k word target vector •  Use Pre-Trained Models hxps://code.google.com/p/word2vec/ –  Trained on 100 billion words from Google News –  300 dim vectors for 3 million words and phrases –  hxps://code.google.com/p/word2vec/
  46. 46. Training ConBnuous Space Word Models http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
  47. 47. Applying ConBnuous Space Word Models http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn2015.pdf State of the art in machine translation Sequence to Sequence Learning with neural Networks, NIPS 2014 Language translaBon Document summary Generate text capBons for pictures .01 .05 .89 .00 .05 .62 .00 .34
  48. 48. “Greg’s Guts” on Deep Learning •  Some claim the need for preprocessing and knowledge representaBon has ended –  For most of the signal processing applicaBons à yes, simplify –  I am VERY READY TO COMPETE in other applicaBons, conBnuing •  expressing explicit domain knowledge •  opBmizing business value calculaBons •  Deep Learning gets big advantages from big data –  Why? Bexer populaBng high dimensional space combinaBon subsets –  Unsupervised feature extracBon reduces need for large labeled data •  However, “regular sized data” gets a big boost as well –  The “raBo of free parameters” (i.e. neurons) to training set records –  For regressions or regular nets, want 5-10 Bmes as many records –  RegularizaBon and weight drop out reduces this pressure –  Especially when only training “the next auto encoding layer”
  49. 49. Deep Learning Summary – ITS EXCITING! •  Discussed Deep Learning architectures –  Auto Encoder, convoluBonal, reinforcement learning, conBnuous word •  Real Time speed up –  Train model, reduce complexity, retrain –  Simplify preprocessing with lookup tables –  Use cloud compuBng, do not be limited to device compuBng –  Lambda architecture like Kamanja, to combine real Bme and batch •  ApplicaBons –  Signal Data: IoT, Speech, Images –  Control System models (like Atari game playing, IoT) –  Language Models https://www.quora.com/Why-is-deep-learning-in-such-demand-now
  50. 50. © 2015 ligaDATA, Inc. All Rights Reserved. Using Deep Learning to do Real-Time Scoring in Practical Applications Deep Learning Applications Meetup, Monday, 12/14/2015, Mountain View, CA http://www.meetup.com/Deep-Learning-Applications/events/227217853/ By Greg Makowski www.Linkedin.com/in/GregMakowski greg@LigaDATA.com Community @ http://Kamanja.org Try out

×