FEBRUARY 9, 2017, WARSAW
H2O Deep Water
Making Deep Learning Accessible to Everyone
Jo-fai (Joe) Chow – Data Scientist at H2O.ai
FEBRUARY 9, 2017, WARSAW
About Me
• Civil (Water) Engineer
• 2010 – 2015
• Consultant (UK)
• Utilities
• Asset Management
• Constrained Optimization
• Industrial PhD (UK)
• Infrastructure Design Optimization
• Machine Learning +
Water Engineering
• Discovered H2O in 2014
• Data Scientist
• From 2015
• Virgin Media (UK)
• Domino Data Lab (Silicon Valley)
• H2O.ai (Silicon Valley)
FEBRUARY 9, 2017, WARSAW
Agenda
• About H2O.ai
• Company
• Machine Learning Platform
• Deep Learning Tools
• TensorFlow, MXNet, Caffe, H2O
• Deep Water
• Motivation
• Benefits
• Interface
• Learning Resources
• Conclusions
FEBRUARY 9, 2017, WARSAW
About H2O.ai
FEBRUARY 9, 2017, WARSAW
Company Overview
Founded 2011 Venture-backed, debuted in 2012
Products • H2O Open Source In-Memory AI Prediction Engine
• Sparkling Water
• Steam
Mission Operationalize Data Science, and provide a platform for users to build beautiful data products
Team 70 employees
• Distributed Systems Engineers doing Machine Learning
• World-class visualization designers
Headquarters Mountain View, CA
FEBRUARY 9, 2017, WARSAW
Scientific Advisory Council
FEBRUARY 9, 2017, WARSAW
https://www.cbinsights.com/blog/artificial-intelligence-top-startups/
FEBRUARY 9, 2017, WARSAW
H2O Machine Learning Platform
FEBRUARY 9, 2017, WARSAW
Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest: Classification
or regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with an
input layer followed by multiple layers of
nonlinear transformations
Algorithms Overview
Unsupervised Learning
• K-means: Partitions observations into k
clusters/groups of the same spatial size.
Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly transforms
correlated variables to independent components
• Generalized Low Rank Models: extend the idea of
PCA to handle arbitrary data consisting of numerical,
Boolean, categorical, and missing data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
FEBRUARY 9, 2017, WARSAW
HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
High Level Architecture
Flow (Web), R, Python API
Java for computation
FEBRUARY 9, 2017, WARSAW
11
H2O Flow (Web) Interface
FEBRUARY 9, 2017, WARSAW
12
H2O Flow (Web) Interface
FEBRUARY 9, 2017, WARSAW
H2O + R
FEBRUARY 9, 2017, WARSAW
Business Value of ML Platform
FEBRUARY 9, 2017, WARSAW
Top 10 Hot A.I. Technologies (Forbes 2017)
http://www.forbes.com/sites/gilpress/2017/01/23/top-10-hot-artificial-intelligence-ai-technologies/
FEBRUARY 9, 2017, WARSAW
Deep Learning Tools
TensorFlow, mxnet, Caffe and H2O Deep Learning
FEBRUARY 9, 2017, WARSAW
TensorFlow
• Open source machine learning
framework by Google
• Python / C++ API
• TensorBoard
• Data Flow Graph Visualization
• Multi CPU / GPU
• v0.8+ distributed machines support
• Multi devices support
• desktop, server and Android devices
• Image, audio and NLP applications
• HUGE Community
• Support for Spark, Windows …
https://github.com/tensorflow/tensorflow
FEBRUARY 9, 2017, WARSAW
https://github.com/dmlc/mxnet
https://www.zeolearn.com/magazine/amazon-to-use-mxnet-as-deep-learning-framework
FEBRUARY 9, 2017, WARSAW
Caffe
• Convolution Architecture For
Feature Extraction (CAFFE)
• Pure C++ / CUDA architecture for
deep learning
• Command line, Python and
MATLAB interface
• Model Zoo
• Open collection of models
https://docs.google.com/presentation/d/1UeKXVgRvvxg9OUdh_UiC5G71UMscNPlvArsWER41PsU/
FEBRUARY 9, 2017, WARSAW
Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest: Classification
or regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with an
input layer followed by multiple layers of
nonlinear transformations
H2O Deep Learning
Unsupervised Learning
• K-means: Partitions observations into k
clusters/groups of the same spatial size.
Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly transforms
correlated variables to independent components
• Generalized Low Rank Models: extend the idea of
PCA to handle arbitrary data consisting of numerical,
Boolean, categorical, and missing data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
FEBRUARY 9, 2017, WARSAW
Both TensorFlow and H2O are widely used
FEBRUARY 9, 2017, WARSAW
22
FEBRUARY 9, 2017, WARSAW
TensorFlow , MXNet, Caffe and H2O
democratize the power of deep learning.
H2O platform democratizes artificial
intelligence & big data science.
There are other open source deep learning libraries like Theano and Torch too.
Let’s have a party, this will be fun!
FEBRUARY 9, 2017, WARSAW
Deep Water
Next-Gen Distributed Deep Learning with H2O
H2O integrates with existing GPU backends
for significant performance gains
One Interface - GPU Enabled - Significant Performance Gains
Inherits All H2O Properties in Scalability, Ease of Use and Deployment
Recurrent Neural Networks
enabling natural language processing,
sequences, time series, and more
Convolutional Neural Networks enabling
Image, video, speech recognition
Hybrid Neural Network Architectures
enabling speech to text translation, image
captioning, scene parsing and more
Deep Water
FEBRUARY 9, 2017, WARSAW
Deep Water Architecture
Node 1 Node N
Scala
Spark
H2O
Java
Execution Engine
TensorFlow/mxnet/Caffe
C++
GPU CPU
TensorFlow/mxnet/Caffe
C++
GPU CPU
RPC
R/Py/Flow/Scala client
REST API
Web server
H2O
Java
Execution Engine
grpc/MPI/RDMA
Scala
Spark
FEBRUARY 9, 2017, WARSAW
26
Using H2O Flow to train Deep
Water Model
FEBRUARY 9, 2017, WARSAW
Available Networks in Deep Water
• LeNet
• AlexNet
• VGGNet
• Inception (GoogLeNet)
• ResNet (Deep Residual
Learning)
• Build Your Own
ResNet
FEBRUARY 9, 2017, WARSAW
28
Choosing different network structures
FEBRUARY 9, 2017, WARSAW
Unified Interface for TF, MXNet and Caffe
Change backend to
“mxnet”, “caffe” or “auto”
FEBRUARY 9, 2017, WARSAW
Easy Stacking with other H2O Models
Ensemble of Deep Water, Gradient Boosting
Machine & Random Forest models
FEBRUARY 9, 2017, WARSAW
docs.h2o.ai
FEBRUARY 9, 2017, WARSAW
https://github.com/h2oai/h2o-3/tree/master/examples/deeplearning/notebooks
FEBRUARY 9, 2017, WARSAW
Conclusions
FEBRUARY 9, 2017, WARSAW
Project “Deep Water”
• H2O + TF + MXNet + Caffe
• a powerful combination of widely
used open source machine
learning libraries.
• All Goodies from H2O
• inherits all H2O properties in
scalability, ease of use and
deployment.
• Unified Interface
• allows users to build, stack and
deploy deep learning models from
different libraries efficiently.
• 100% Open Source
• the party will get bigger!
FEBRUARY 9, 2017, WARSAW
Deep Water – Current Contributors
FEBRUARY 9, 2017, WARSAW
• Organizers & Sponsors
• Adam Kawa
• Karolina Seliga
• Code, Slides & Documents
• bit.ly/h2o_meetups
• bit.ly/h2o_deepwater
• docs.h2o.ai
• Contact
• joe@h2o.ai
• @matlabulous
• github.com/woobe
Thanks!
Making Machine Learning
Accessible to Everyone
Photo credit: Virgin Media
FEBRUARY 9, 2017, WARSAW
https://techcrunch.com/2017/01/26/h2os-deep-water-puts-deep-learning-in-the-hands-of-enterprise-users/

H2O Deep Water - Making Deep Learning Accessible to Everyone

  • 1.
    FEBRUARY 9, 2017,WARSAW H2O Deep Water Making Deep Learning Accessible to Everyone Jo-fai (Joe) Chow – Data Scientist at H2O.ai
  • 2.
    FEBRUARY 9, 2017,WARSAW About Me • Civil (Water) Engineer • 2010 – 2015 • Consultant (UK) • Utilities • Asset Management • Constrained Optimization • Industrial PhD (UK) • Infrastructure Design Optimization • Machine Learning + Water Engineering • Discovered H2O in 2014 • Data Scientist • From 2015 • Virgin Media (UK) • Domino Data Lab (Silicon Valley) • H2O.ai (Silicon Valley)
  • 3.
    FEBRUARY 9, 2017,WARSAW Agenda • About H2O.ai • Company • Machine Learning Platform • Deep Learning Tools • TensorFlow, MXNet, Caffe, H2O • Deep Water • Motivation • Benefits • Interface • Learning Resources • Conclusions
  • 4.
    FEBRUARY 9, 2017,WARSAW About H2O.ai
  • 5.
    FEBRUARY 9, 2017,WARSAW Company Overview Founded 2011 Venture-backed, debuted in 2012 Products • H2O Open Source In-Memory AI Prediction Engine • Sparkling Water • Steam Mission Operationalize Data Science, and provide a platform for users to build beautiful data products Team 70 employees • Distributed Systems Engineers doing Machine Learning • World-class visualization designers Headquarters Mountain View, CA
  • 6.
    FEBRUARY 9, 2017,WARSAW Scientific Advisory Council
  • 7.
    FEBRUARY 9, 2017,WARSAW https://www.cbinsights.com/blog/artificial-intelligence-top-startups/
  • 8.
    FEBRUARY 9, 2017,WARSAW H2O Machine Learning Platform
  • 9.
    FEBRUARY 9, 2017,WARSAW Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Algorithms Overview Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning
  • 10.
    FEBRUARY 9, 2017,WARSAW HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture Flow (Web), R, Python API Java for computation
  • 11.
    FEBRUARY 9, 2017,WARSAW 11 H2O Flow (Web) Interface
  • 12.
    FEBRUARY 9, 2017,WARSAW 12 H2O Flow (Web) Interface
  • 13.
    FEBRUARY 9, 2017,WARSAW H2O + R
  • 14.
    FEBRUARY 9, 2017,WARSAW Business Value of ML Platform
  • 15.
    FEBRUARY 9, 2017,WARSAW Top 10 Hot A.I. Technologies (Forbes 2017) http://www.forbes.com/sites/gilpress/2017/01/23/top-10-hot-artificial-intelligence-ai-technologies/
  • 16.
    FEBRUARY 9, 2017,WARSAW Deep Learning Tools TensorFlow, mxnet, Caffe and H2O Deep Learning
  • 17.
    FEBRUARY 9, 2017,WARSAW TensorFlow • Open source machine learning framework by Google • Python / C++ API • TensorBoard • Data Flow Graph Visualization • Multi CPU / GPU • v0.8+ distributed machines support • Multi devices support • desktop, server and Android devices • Image, audio and NLP applications • HUGE Community • Support for Spark, Windows … https://github.com/tensorflow/tensorflow
  • 18.
    FEBRUARY 9, 2017,WARSAW https://github.com/dmlc/mxnet https://www.zeolearn.com/magazine/amazon-to-use-mxnet-as-deep-learning-framework
  • 19.
    FEBRUARY 9, 2017,WARSAW Caffe • Convolution Architecture For Feature Extraction (CAFFE) • Pure C++ / CUDA architecture for deep learning • Command line, Python and MATLAB interface • Model Zoo • Open collection of models https://docs.google.com/presentation/d/1UeKXVgRvvxg9OUdh_UiC5G71UMscNPlvArsWER41PsU/
  • 20.
    FEBRUARY 9, 2017,WARSAW Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations H2O Deep Learning Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning
  • 21.
    FEBRUARY 9, 2017,WARSAW Both TensorFlow and H2O are widely used
  • 22.
  • 23.
    FEBRUARY 9, 2017,WARSAW TensorFlow , MXNet, Caffe and H2O democratize the power of deep learning. H2O platform democratizes artificial intelligence & big data science. There are other open source deep learning libraries like Theano and Torch too. Let’s have a party, this will be fun!
  • 24.
    FEBRUARY 9, 2017,WARSAW Deep Water Next-Gen Distributed Deep Learning with H2O H2O integrates with existing GPU backends for significant performance gains One Interface - GPU Enabled - Significant Performance Gains Inherits All H2O Properties in Scalability, Ease of Use and Deployment Recurrent Neural Networks enabling natural language processing, sequences, time series, and more Convolutional Neural Networks enabling Image, video, speech recognition Hybrid Neural Network Architectures enabling speech to text translation, image captioning, scene parsing and more Deep Water
  • 25.
    FEBRUARY 9, 2017,WARSAW Deep Water Architecture Node 1 Node N Scala Spark H2O Java Execution Engine TensorFlow/mxnet/Caffe C++ GPU CPU TensorFlow/mxnet/Caffe C++ GPU CPU RPC R/Py/Flow/Scala client REST API Web server H2O Java Execution Engine grpc/MPI/RDMA Scala Spark
  • 26.
    FEBRUARY 9, 2017,WARSAW 26 Using H2O Flow to train Deep Water Model
  • 27.
    FEBRUARY 9, 2017,WARSAW Available Networks in Deep Water • LeNet • AlexNet • VGGNet • Inception (GoogLeNet) • ResNet (Deep Residual Learning) • Build Your Own ResNet
  • 28.
    FEBRUARY 9, 2017,WARSAW 28 Choosing different network structures
  • 29.
    FEBRUARY 9, 2017,WARSAW Unified Interface for TF, MXNet and Caffe Change backend to “mxnet”, “caffe” or “auto”
  • 30.
    FEBRUARY 9, 2017,WARSAW Easy Stacking with other H2O Models Ensemble of Deep Water, Gradient Boosting Machine & Random Forest models
  • 31.
    FEBRUARY 9, 2017,WARSAW docs.h2o.ai
  • 32.
    FEBRUARY 9, 2017,WARSAW https://github.com/h2oai/h2o-3/tree/master/examples/deeplearning/notebooks
  • 33.
    FEBRUARY 9, 2017,WARSAW Conclusions
  • 34.
    FEBRUARY 9, 2017,WARSAW Project “Deep Water” • H2O + TF + MXNet + Caffe • a powerful combination of widely used open source machine learning libraries. • All Goodies from H2O • inherits all H2O properties in scalability, ease of use and deployment. • Unified Interface • allows users to build, stack and deploy deep learning models from different libraries efficiently. • 100% Open Source • the party will get bigger!
  • 35.
    FEBRUARY 9, 2017,WARSAW Deep Water – Current Contributors
  • 36.
    FEBRUARY 9, 2017,WARSAW • Organizers & Sponsors • Adam Kawa • Karolina Seliga • Code, Slides & Documents • bit.ly/h2o_meetups • bit.ly/h2o_deepwater • docs.h2o.ai • Contact • joe@h2o.ai • @matlabulous • github.com/woobe Thanks! Making Machine Learning Accessible to Everyone Photo credit: Virgin Media
  • 37.
    FEBRUARY 9, 2017,WARSAW https://techcrunch.com/2017/01/26/h2os-deep-water-puts-deep-learning-in-the-hands-of-enterprise-users/