Successfully reported this slideshow.

Webinar: Deep Learning with H2O


Published on

Note: Make sure to download the slides to get the high-resolution version!

Also, you can find the webinar recording here (please also download for better quality):

Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction!

H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.

- Powered by the open source machine learning software Contributors welcome at:
- To view videos on H2O open source machine learning software, go to:

Published in: Technology, Education
  • Be the first to comment

Webinar: Deep Learning with H2O

  1. 1. Deep Learning with H2O !
 Scalable In-Memory Machine Learning ! Webinar, 5/21/14 SriSatish Ambati, CEO and Co-Founder Arno Candel, PhD, Physicst & Hacker
  2. 2. H2O Deep Learning, @ArnoCandel Outline Intro & Live Demo (5 mins) Methods & Implementation (10 mins) Results & Live Demo (10 mins) MNIST handwritten digits text classification Q & A (10 mins) 2
  3. 3. H2O Deep Learning, @ArnoCandel 3 About H20 (aka 0xdata) Pure Java, Apache v2 Open Source Join the! 3 +1 Cyprien Noel for prior work
  4. 4. H2O Deep Learning, @ArnoCandel Customer Demands for Practical Machine Learning 4 Requirements Value In-Memory Fast (Interactive) Distributed Big Data (No Sampling) Open Source Ownership of Methods API / SDK Extensibility H2O was developed by 0xdata to meet these requirements
  5. 5. H2O Deep Learning, @ArnoCandel H2O Integration H2O HDFS HDFS HDFS YARN Hadoop MR R ScalaJSON Python Standalone Over YARN On MRv1 5 H2O H2O Java
  6. 6. H2O Deep Learning, @ArnoCandel H2O Architecture Distributed
 In-Memory K-V store Col. compression Machine Learning Algorithms R Engine Nano fast Scoring Engine Prediction Engine Memory manager e.g. Deep Learning 6 MapReduce
  7. 7. H2O Deep Learning, @ArnoCandel H2O + R = Happy Data Scientist 7 Machine Learning on Big Data with R:
 Data resides on the H2O cluster!
  8. 8. H2O Deep Learning, @ArnoCandel H2O Deep Learning in Action Train: 60,000 rows 784 integer columns 10 classes Test: 10,000 rows 784 integer columns 10 classes 8 MNIST = Digitized handwritten digits database (Yann LeCun) Live Demo Build a H2O Deep Learning model on MNIST train/test data Data: 28x28=784 pixels with (gray-scale) values in 0…255 Yann LeCun: “Yet another advice: don't get fooled by people who claim to have a solution to Artificial General Intelligence. Ask them what error rate they get on MNIST or ImageNet.”
  9. 9. H2O Deep Learning, @ArnoCandel Wikipedia:
 Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple 
 non-linear transformations. What is Deep Learning? Example: Input data
 (image) Prediction (who?) 9 Facebook's DeepFace (Yann LeCun) recognises faces as well as humans
  10. 10. H2O Deep Learning, @ArnoCandel Deep Learning is Trending 20132012 Google trends 2011 10 Businesses are using
 Deep Learning techniques! Google Brain (Andrew Ng, Jeff Dean & Geoffrey Hinton) ! FBI FACE: $1 billion face recognition project ! Chinese Search Giant Baidu Hires Man Behind the “Google Brain” (Andrew Ng)
  11. 11. H2O Deep Learning, @ArnoCandel What is NOT Deep Linear models are not deep (by definition) ! Neural nets with 1 hidden layer are not deep (no feature hierarchy) ! SVMs and Kernel methods are not deep (2 layers: kernel + linear) ! Classification trees are not deep (operate on original input space) 11
  12. 12. H2O Deep Learning, @ArnoCandel 1970s multi-layer feed-forward Neural Network (supervised learning with stochastic gradient descent using back-propagation) ! + distributed processing for big data (H2O in-memory MapReduce paradigm on distributed data) ! + multi-threaded speedup (H2O Fork/Join worker threads update the model asynchronously) ! + breakthrough algorithms for accuracy (weight initialization, adaptive learning, momentum, dropout, regularization) ! = Top-notch prediction engine! Deep Learning in H2O 12
  13. 13. H2O Deep Learning, @ArnoCandel “fully connected” directed graph of neurons age income employment married single Input layer Hidden layer 1 Hidden layer 2 Output layer 3x4 4x3 3x2#connections information flow input/output neuron hidden neuron 4 3 2#neurons 3 Example Neural Network 13
  14. 14. H2O Deep Learning, @ArnoCandel age income employment yj = tanh(sumi(xi*uij)+bj) uij xi yj per-class probabilities
 sum(pl) = 1 zk = tanh(sumj(yj*vjk)+ck) vjk zk pl pl = softmax(sumk(zk*wkl)+dl) wkl softmax(xk) = exp(xk) / sumk(exp(xk)) “neurons activate each other via weighted sums” Prediction: Forward Propagation married single activation function: tanh alternative:
 x -> max(0,x) “rectifier” pl is a non-linear function of xi: can approximate ANY function with enough layers! bj, ck, dl: bias values
 (indep. of inputs) 14
  15. 15. H2O Deep Learning, @ArnoCandel Mean Square Error = (0.22 + 0.22)/2 “penalize differences per-class” ! Cross-entropy = -log(0.8) “strongly penalize non-1-ness” Training: Update Weights & Biases Stochastic Gradient Descent: Update weights and biases via gradient of the error (via back-propagation): For each training row, we make a prediction and compare with the actual label (supervised learning): married10.8 predicted actual Objective: minimize prediction error (MSE or cross-entropy) w <— w - rate * ∂E/∂w 1 15 single00.2 E w rate
  16. 16. H2O Deep Learning, @ArnoCandel H2O Deep Learning Architecture K-V K-V HTTPD HTTPD nodes/JVMs: sync threads: async communication w w w w w w w w1 w3 w2 w4 w2+w4 w1+w3 w* = (w1+w2+w3+w4)/4 map:
 each node trains a copy of the weights and biases with (some* or all of) its local data with asynchronous F/J threads initial model: weights and biases w updated model: w* H2O atomic in-memory
 K-V store reduce:
 model averaging: average weights and biases from all nodes, speedup is at least #nodes/log(#rows) arxiv:1209.4129v3 Keep iterating over the data (“epochs”), score from time to time Query & display the model via JSON, WWW 2 2 431 1 1 1 4 3 2 1 2 1 i *user can specify the number of total rows per MapReduce iteration 16
  17. 17. H2O Deep Learning, @ArnoCandel “Secret” Sauce to Higher Accuracy Adaptive learning rate - ADADELTA (Google)
 Automatically set learning rate for each neuron based on its training history Grid Search and Checkpointing
 Run a grid search to scan many hyper- parameters, then continue training the most promising model(s) Regularization
 L1: penalizes non-zero weights
 L2: penalizes large weights
 Dropout: randomly ignore certain inputs 17
  18. 18. H2O Deep Learning, @ArnoCandel MNIST: digits classification Standing world record: Without distortions or convolutions, the best-ever published error rate on test set: 0.83% (Microsoft) 18 Time to check in on the demo! Let’s see how H2O did in the past 10 minutes!
  19. 19. H2O Deep Learning, @ArnoCandel Frequent errors: confuse 2/7 and 4/9 H2O Deep Learning on MNIST: 0.87% test set error (so far) 19 test set error: 1.5% after 10 mins 1.0% after 1.5 hours
 0.87% after 4 hours World-class results! No pre-training No distortions No convolutions No unsupervised training Running on 4 nodes with 16 cores each On 4 nodes
  20. 20. H2O Deep Learning, @ArnoCandel Use Case: Text Classification Goal: Predict the item from seller’s text description 20 Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes “Vintage 18KT gold Rolex 2 Tone in great condition” Data: Binary word vector 0,0,1,0,0,0,0,0,1,0,0,0,1,…,0 vintagegold condition Let’s see how H2O does on the ebay dataset!
  21. 21. H2O Deep Learning, @ArnoCandel Out-Of-The-Box: 11.6% test set error after 10 epochs! Predicts the correct class (out of 143) 88.4% of the time! 21 Note 2: No tuning was done
 (results are for illustration only) Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes Note 1: H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB) Use Case: Text Classification
  22. 22. H2O Deep Learning, @ArnoCandel Parallel Scalability (for 64 epochs on MNIST, with “0.87%” parameters) 22 Speedup 0.00 10.00 20.00 30.00 40.00 1 2 4 8 16 32 63 H2O Nodes (4 cores per node, 1 epoch per node per MapReduce) 2.7 mins Training Time 0 25 50 75 100 1 2 4 8 16 32 63 H2O Nodes in minutes
  23. 23. H2O Deep Learning, @ArnoCandel Outlook for H2O Deep Learning 23 Convolutional and Pooling Layers for General Image Recognition (ImageNet) Sparse Auto-Encoders for Dimensionality Reduction and Anomaly Detection Execution on GPU clusters for even faster training
  24. 24. H2O Deep Learning, @ArnoCandel H2O Steam: Scoring Platform 24
  25. 25. H2O Deep Learning, @ArnoCandel H2O Steam: More Coming Soon! 25
  26. 26. H2O Deep Learning, @ArnoCandel Key Take-Aways H2O is a distributed in-memory math platform for enterprise-grade machine learning applications. ! H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data! ! Join our Community and Meetups! git clone @hexadata 26