This is my Deep Water talk for the TensorFlow Paris meetup.
Deep Water is H2O's integration with multiple open source deep learning libraries such as TensorFlow, MXNet and Caffe. On top of the performance gains from GPU backends, Deep Water naturally inherits all H2O properties in scalability. ease of use and deployment.
5. About TensorFlow
• Open source machine learning
framework by Google
• Python / C++ API
• TensorBoard
• Data Flow Graph Visualization
• Multi CPU / GPU
• v0.8+ distributed machines support
• Multi devices support
• desktop, server and Android devices
• Image, audio and NLP applications
• HUGE Community
• Support for Spark, Windows …
5
6. TensorFlow Wrappers
• TFLearn – Simplified interface
• keras – TensorFlow + Theano
• tensorflow.rb – Ruby wrapper
• TensorFlow.jl – Julia wrapper
• TensorFlow for R – R wrapper
• … and many more!
• See: github.com/jtoy/awesome-
tensorflow
6
7. TensorFlow Use Cases
Very Cool TensorFlow Use Cases
Some of the cool things you can do with TensorFlow
7
9. Neural Style Transfer in TensorFlow
• Neural Style
• “… a technique to train a deep
neural network to separate artistic
style from image structure, and
combine the style of one image
with the structure of another”
• Original Paper
• “A Neural Algorithm of Artistic
Style”
• TensorFlow Implementation
• [Link]
9
10. Sorting Cucumbers
• Problem
• Sorting cucumbers is a laborious
process.
• In a Japanese farm, the farmer’s wife
can spend up to eight hours a day
sorting cucumbers during peak
harvesting period.
• Solution
• Farmer’s son (Makoto Koike) used
TensorFlow, Arduino and Raspberry Pi
to create an automatic cucumber
sorting system.
10
11. Sorting Cucumbers
• Classification Problem
• Input: cucumber photos (side, top,
bottom)
• Output: one of nine classes
• Google’s Blog Post [Link]
• YouTube Video [Link]
11
18. Company Overview
Founded 2011 Venture-backed, debuted in 2012
Products • H2O Open Source In-Memory AI Prediction Engine
• Sparkling Water
• Steam
Mission Operationalize Data Science, and provide a platform for users to build beautiful data products
Team 70 employees
• Distributed Systems Engineers doing Machine Learning
• World-class visualization designers
Headquarters Mountain View, CA
18
19. Bring AI To Business
Empower Transformation
Financial Services, Insurance and
Healthcare as Our Vertical Focus
Community as Our Foundation
H2O is an open source platform
empowering business transformation
20. Users In Various Verticals Adore H2O
Financial Insurance MarketingTelecom Healthcare
20
22. Cloud Integration
Big Data Ecosystem
H2O.ai Makes A Difference as an AI Platform
Open Source Flexible Interface
Scalability and Performance GPU EnablementRapid Model Deployment
Smart and Fast Algorithms
H2O Flow
• 100% open source
• Highly portable models
deployed in Java (POJO) and
Model Object Optimized
(MOJO)
• Automated and streamlined
scoring service deployment
with Rest API
• Distributed In-Memory
Computing Platform
• Distributed Algorithms
• Fine-Grain MapReduce
23. 0
10000
20000
30000
40000
50000
60000
70000
1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-16
# H2O Users
H2O Community Growth
Tremendous Momentum Globally
65,000+ users globally
(Sept 2016)
• 65,000+ users from
~8,000 companies in 140
countries. Top 5 from:
Large User Circle
* DATA FROM GOOGLE ANALYTICS EMBEDDED IN THE END USER PRODUCT
23
0
2000
4000
6000
8000
10000
1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-16
# Companies Using H2O ~8,000+ companies
(Sept 2016)
+127%
+60%
28. Our Open Source Products
100% Open Source. Big Data Science for Everyone!
28
29. H2O.ai Offers AI Open Source Platform
Product Suite to Operationalize Data Science with Visual Intelligence
In-Memory, Distributed
Machine Learning
Algorithms with Speed and
Accuracy
H2O Integration with Spark.
Best Machine Learning on
Spark.
100% Open Source
29
Visual Intelligence and UX Framework For Data Interpretation and
Story Telling on top of Beautiful Data Products
Operationalize and
Streamline Model Building,
Training and Deployment
Automatically and Elastically
State-of-the-art
Deep Learning on GPUs with
TensorFlow, MXNet or Caffe
with the ease of use of H2O
Deep
Water
30. H2O.ai Offers AI Open Source Platform
Product Suite to Operationalize Data Science with Visual Intelligence
In-Memory, Distributed
Machine Learning
Algorithms with Speed and
Accuracy
H2O Integration with Spark.
Best Machine Learning on
Spark.
100% Open Source
30
Visual Intelligence and UX Framework For Data Interpretation and
Story Telling on top of Beautiful Data Products
Operationalize and
Streamline Model Building,
Training and Deployment
Automatically and Elastically
State-of-the-art
Deep Learning on GPUs with
TensorFlow, MXNet or Caffe
with the ease of use of H2O
Deep
Water
31. HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
High Level Architecture
31
32. Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest: Classification
or regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with an
input layer followed by multiple layers of
nonlinear transformations
Algorithms Overview
Unsupervised Learning
• K-means: Partitions observations into k
clusters/groups of the same spatial size.
Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly transforms
correlated variables to independent components
• Generalized Low Rank Models: extend the idea of
PCA to handle arbitrary data consisting of numerical,
Boolean, categorical, and missing data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
32
33. Distributed Algorithms
• Foundation for In-Memory Distributed Algorithm
Calculation - Distributed Data Frames and columnar
compression
• All algorithms are distributed in H2O: GBM, GLM, DRF, Deep
Learning and more. Fine-grained map-reduce iterations.
• Only enterprise-grade, open-source distributed algorithms
in the market
User Benefits
Advantageous Foundation
• “Out-of-box” functionalities for all algorithms (NO MORE
SCRIPTING) and uniform interface across all languages: R,
Python, Java
• Designed for all sizes of data sets, especially large data
• Highly optimized Java code for model exports
• In-house expertise for all algorithms
Parallel Parse into Distributed Rows
Fine Grain Map Reduce Illustration: Scalable
Distributed Histogram Calculation for GBM
FoundationforDistributedAlgorithms
33
41. H2O.ai Offers AI Open Source Platform
Product Suite to Operationalize Data Science with Visual Intelligence
In-Memory, Distributed
Machine Learning
Algorithms with Speed and
Accuracy
H2O Integration with Spark.
Best Machine Learning on
Spark.
100% Open Source
41
Visual Intelligence and UX Framework For Data Interpretation and
Story Telling on top of Beautiful Data Products
Operationalize and
Streamline Model Building,
Training and Deployment
Automatically and Elastically
State-of-the-art
Deep Learning on GPUs with
TensorFlow, MXNet or Caffe
with the ease of use of H2O
Deep
Water
43. TensorFlow democratizes the power of
deep learning.
H2O democratizes artificial intelligence &
big data science.
There are other open source libraries like MXNet and Caffe too.
Let’s have a party, this will be fun!
43
44. Deep Water
Next-Gen Distributed Deep Learning with H2O
H2O integrates with existing GPU backends
for significant performance gains
One Interface - GPU Enabled - Significant Performance Gains
Inherits All H2O Properties in Scalability, Ease of Use and Deployment
Recurrent Neural Networks
enabling natural language processing,
sequences, time series, and more
Convolutional Neural Networks enabling
Image, video, speech recognition
Hybrid Neural Network Architectures
enabling speech to text translation, image
captioning, scene parsing and more
Deep Water
44
45. Deep Water Architecture
Node 1 Node N
Scala
Spark
H2O
Java
Execution Engine
TensorFlow/mxnet/Caffe
C++
GPU CPU
TensorFlow/mxnet/Caffe
C++
GPU CPU
RPC
R/Py/Flow/Scala client
REST API
Web server
H2O
Java
Execution Engine
grpc/MPI/RDMA
Scala
Spark
45
52. Available Networks in Deep Water
• LeNet (This Demo)
• AlexNet
• VGGNet
• Inception (GoogLeNet)
• ResNet (Deep Residual
Learning)
• Build Your Own
52
ResNet
64. Want to try Deep Water?
64
• Build it
• github.com/h2oai/deepwater
• Ubuntu 16.04
• CUDA 8
• cuDNN 5
• …
• Pre-built Amazon Machine
Images (AMIs)
• Info to be confirmed
65. H2O.ai Offers AI Open Source Platform
Product Suite to Operationalize Data Science with Visual Intelligence
In-Memory, Distributed
Machine Learning
Algorithms with Speed and
Accuracy
H2O Integration with Spark.
Best Machine Learning on
Spark.
100% Open Source
65
Visual Intelligence and UX Framework For Data Interpretation and
Story Telling on top of Beautiful Data Products
Operationalize and
Streamline Model Building,
Training and Deployment
Automatically and Elastically
State-of-the-art
Deep Learning on GPUs with
TensorFlow, MXNet or Caffe
with the ease of use of H2O
Deep
Water
66. Want to find out more about Sparkling Water
and Steam?
• R Addicts Paris
• Tomorrow 6:30pm
• Three H2O Talks:
• Introduction to H2O
• Demos: H2O + R + Steam
• Auto Machine Learning using H2O
• Sparkling Water 2.0
66
68. Project “Deep Water”
• H2O + TensorFlow
• a powerful combination of two
widely used open source machine
learning libraries.
• All Goodies from H2O
• inherits all H2O properties in
scalability, ease of use and
deployment.
• Unified Interface
• allows users to build, stack and
deploy deep learning models from
different libraries efficiently.
68
• 100% Open Source
• the party will get bigger!
Open Source: 100% open source. Deep learning is in public domain and users should have full access to it in open source.
Seamless integration with Big Data frameworks, such as Apache Spark. We have Sparkling Water. Also, H2O co-located in the company’s Hadoop cluster, so analysts can discover insights in the data without extracting it or taking samples
Flexible interface: R, Python and Scala, Java
Smart Algorithms: Deep Learning differs in the depth of features they support. Our support is the capabilities listed below save time and manual programming for the data scientist, which leads to be better models.
• Automatic standardization on of the predictors• Automatic initialization of model parameters (weights & biases)• Automatic adaptive learning rates• Ability to specify manual learning rate with annealing and momentum • Automatic handling of categorical and missing data• Regularization on techniques to manage complexity• Automatic parameter optimization on with grid search• Early stopping based on validation on datasets• Automatic cross-validation
Distributed Computing Platform
Highly scalable
Superior performance
Rapid Model Deployment: POJO is very portable, and makes model deployment very fast. Steam has a streamlined scoring server deployment mechanism, which makes model deployment so much faster.
GPU Enablement: Deep Learning algorithms can be so much faster on GPU’s than on CPU’s. This enablement will give H2O implementations a boost in performance over other implementations.
Cloud integration: Amazon Web Services, Microsoft Azure, Google Cloud