An introduction to Sparkling Water
Michal Malohlava
h2o.ai
Who Am I?
Background
• PhD in CS from Charles University in Prague,
Czech Republic
• Postdoc at Purdue University experimenting with
algos for large-scale computation
• Now at H2O.ai


 Experience with domain-specific languages,
distributed system, software engineering,
and big data.
H2O.ai
H2Oteam
Sri Ambati Cliff Click
Co-Founders
Stephen
Boyd
Rob
Tibshirani
Trevor
Hastie
Scientific
Advisory
Council
H2O
Open-Source In-Memory Data Science Platform
• Highly optimized Java code (in-house)
• Distributed in-memory K-V store and map/
reduce computation framework
• Data parser (HDFS, S3, NFS, HTTP, local
drives, etc.)
• Read/write access to distributed data
frames (R/Pandas-style)
• ML algos - Deep Learning, GBM, DRF,
GLM, GLRM, K-Means, PCA, CoxPH,
Ensembles
• REST API: clients Interactive UI/R/Python
Sparkling
Water
Sparkling Water
Provides
• Transparent integration of H2O into Spark
ecosystem

• Use H2O Frames and algorithms with Spark API



Excels in existing Spark workflows requiring
advanced Machine Learning algorithms
Where to use Sparkling Water?
Data

Source
Model
building
Modelling
Deep Learning, GBM
DRF, GLM, GLRM

K-Means, PCA
CoxPH, Ensembles
Prediction
processingData munging
Where to use Sparkling Water?
Data

Source
Dataparsing
munging
Modelling
Data load/munging/
exploration
Load and parse
data directly into 

H2OFrame
Ad hoc

data
transformation
Where to use Sparkling Water?
Data

Source
Off-line
model
training
Stream
processing
Data
Stream
Data munging
Model
prediction
Deploy
the model
Export model

in a binary format
or
as code
Modelling
WHAT IS INSIDE?
Cluster manager
Worker node
Spark executor
Scala/Py main
program
Driver node
H2OContext
SparkContext
Worker node
Spark executor
Worker node
Spark executor
H2OServicesH2OServices
Data

Source
SparkExecutorSparkExecutorSparkExecutor
Spark Cluster
DataFrame
H2OServices
H2OFrame
Data

Source
h2oContext.asDataFrame
h2oContext.asH2OFrame
TIME FOR DEMO
Key Points to Remember
Sparkling Water integrates H2O to Spark
• Enables using advanced 

machine learning algorithms 

inside Spark workflows
• Offers eager computation model,

mutable data structure 

H2OFrame
THANK YOU.
@h2oai @mmalohlava
h2o.ai/download

github.com/h2oai/sparkling-water



Visit our booth for live demos and more!

An Introduction to Sparkling Water by Michal Malohlava