H2O’s AI platform provides open source machine learning framework that works with sparklyr and PySpark. H2O’s Sparkling Water allows users to combine the fast, scalable machine learning algorithms of H2O with the capabilities of Spark. With Sparkling Water, users can drive computation from Scala/R/Python and utilize the H2O Flow UI, providing an ideal machine learning platform for application developers. H2O's open AutoML also fully automates the process training ML algorithms, tuning the right parameters and building ensemble models. Setting up an environment to perform advanced analytics on top of big data is hard, but with H2O Sparkling Water for HDInsight, customers can get started with just a few clicks. This solution will install Sparkling Water on an HDInsight Spark cluster so you can exploit all the benefits from both Spark and H2O. The solution can access data from Azure Blob storage and/or Azure Data Lake Store in addition to all the standard data sources that H2O support. It also provides Jupyter Notebooks with in-built examples for an easy jumpstart, and a user-friendly H2O FLOW UI to monitor and debug the applications.
4. H2O.ai at a Glance
Founded 2012 - Series C in November 2017
Products • H2O Open Source Machine Learning (Enterprise Support)
• Driverless AI – Automated Machine Learning
• Sparkling Water
What we do The Open Leader in AI
Team ~100 employees
• Distributed Systems Engineers doing Machine Learning
• World-class visualization designers
• 5 of the top World’s Kaggle Grandmasters
Global Mountain View, London, Prague, India, Japan, New Zealand
5. • H2O.ai recognized as a technology
leader with most completeness of
vision
• H2O.ai was recognized for the
mindshare, partner network and
status as a quasi-industry standard
for machine learning and AI.
• H2O.ai customers gave the highest
overall score among all the vendors
for sales relationship and account
management, customer support
(onboarding, troubleshooting, etc.)
and overall service and support.
10. CONFIDENTIAL
1
0
Predictive Maintenance
• Battery Failure
• Resilient networks
Enhanced Offerings
• Personalized program
recommendations
• Intelligent Ad placements
• In-Context Promotion
Customer Service
• Avoidable Truck-roll
• Customer Churn Prediction
• Improved customer viewing
experience (TV)
IT Infrastructure
• Security Cyberlake
• DoS Detection and Protection
• Master Data Management
11. In-Memory, Distributed
Machine Learning Algorithms
with H2O Flow GUI
H2O AI Open Source Engine
Integration with Spark
Lightning Fast machine
learning on GPUs
Automatic feature
engineering, machine
learning and interpretability
• 100% open source – Apache V2 licensed
• Built for data scientists – interface using R,
Python on H2O Flow (interactive notebook
interface)
• We offer Enterprise Support subscriptions
• Commercial Licensed (closed
source)
• Built for domain users, analysts &
data scientists – GUI based
interface for end-to-end data
science
• Fully automated machine learning
from ingest to deployment
• We offer user licenses on a per
seat basis (annual subscription)
12. • 100% open source – Apache V2 licensed
• Built for data scientists – interface using R,
Python on H2O Flow (interactive notebook
interface)
• We offer Enterprise Support subscriptions
• Fully automated machine learning
from ingest to deployment
• We offer user licenses on a per
seat basis (annual subscription)
13. • 100% open source – Apache V2 licensed
• Built for data scientists – interface using R,
Python on H2O Flow (interactive notebook
interface)
• We offer Enterprise Support subscriptions
• Fully automated machine learning
from ingest to deployment
• We offer user licenses on a per
seat basis (annual subscription)
14. HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Model Object, Optimized
Your
Imagination
Model Export:
Plain Old Java Object
Local
SQL
15. Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and
Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest:
Classification or regression models
• Gradient Boosting Machine:
Produces an ensemble of decision
trees with increasing refined
approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with
an input layer followed by multiple
layers of nonlinear transformations
Unsupervised Learning
• K-means: Partitions observations into
k clusters/groups of the same spatial
size. Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly
transforms correlated variables to independent
components
• Generalized Low Rank Models: extend the idea
of PCA to handle arbitrary data consisting of
numerical, Boolean, categorical, and missing
data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction
using deep learning
16. High Level Architecture
• Enable all capabilities of H2O on top
of Spark
• H2O code executes directly in the
Spark Executor JVM
• Spark RDDs and H2O Frames share
the same memory space. Data can
freely transfer between Spark and
H2O without any overhead.
17. High Level Architecture
• Enable all capabilities of H2O on top
of Spark
• H2O code executes directly in the
Spark Executor JVM
• Spark RDDs and H2O Frames share
the same memory space. Data can
freely transfer between Spark and
H2O without any overhead.
23. • To get started on H2O from the Azure marketplace, click
here
• To learn more: check out our documentation site
• To learn more about Sparkling Water, check out the
documentation and booklet.
• Blog: Developing & Operationalizing H2O models on Azure
• Blog post on using H2O on HDInsight
Editor's Notes
6
Gartner Predicts 2017:
According to the report, “by 2019, startups will overtake Amazon, Google, IBM and Microsoft in driving the artificial intelligence economy with disruptive business solutions.”
Gartner Predicts 2017:
According to the report, “by 2019, startups will overtake Amazon, Google, IBM and Microsoft in driving the artificial intelligence economy with disruptive business solutions.”
Gartner Predicts 2017:
According to the report, “by 2019, startups will overtake Amazon, Google, IBM and Microsoft in driving the artificial intelligence economy with disruptive business solutions.”