SlideShare a Scribd company logo
Spark + H2O = Machine Learning at scale
Mateusz Dymczyk
Software Engineer
Machine Learning with Spark Tokyo
30.06.2016
Agenda
• Spark introduction
• H2O introduction
• Spark + H2O = Sparkling Water
• Demos
Spark
What is Spark?
• Fast and general engine for large-scale data processing.
• API in Java, Scala, Python and R
• Batch and streaming APIs
• Based on immutable data structure
*	http://spark.apache.org/
Architecture
*	http://spark.apache.org/docs/latest/cluster-overview.html
Why Spark?
• In-memory computation (fast)
• Ability to cache (intermediate) results in memory (or on
disk)
• Easy API
• Plenty of out-of-the box libraries
*	http://spark.apache.org/docs/latest/mllib-guide.html
MLlib
• Spark’s machine learning library
• Supports:
• basic statistics
• classification and regression
• clustering
• dimensionality reduction
• evaluations
• … *	http://spark.apache.org/docs/latest/mllib-guide.html
Linear regression demo
// imports
//V1,V2,V3,R
//1,1,1,0.1
//1,0,1,0.5
val sc: SparkContext = initContext()
val data = sc.textFile(...)
val parsedData: RDD[LabeledPoint] = data.map { line =>
// parsing
}.cache()
// Building the model
val numIterations = 100
val stepSize = 0.00000001
val model = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize)
// Evaluate model on training examples and compute training error
val valuesAndPreds = parsedData.map { point =>
val prediction: Double = model.predict(point.features)
(point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
*	http://spark.apache.org/docs/latest/mllib-linear-methods.html
Linear regression demo
// imports
//V1,V2,V3,R
//1,1,1,0.1
//1,0,1,0.5
val sc: SparkContext = initContext()
val data = sc.textFile(...)
val parsedData: RDD[LabeledPoint] = data.map { line =>
// parsing
}.cache()
// Building the model
val numIterations = 100
val stepSize = 0.00000001
val model = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize)
// Evaluate model on training examples and compute training error
val valuesAndPreds = parsedData.map { point =>
val prediction: Double = model.predict(point.features)
(point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
*	http://spark.apache.org/docs/latest/mllib-linear-methods.html
Linear regression demo
// imports
//V1,V2,V3,R
//1,1,1,0.1
//1,0,1,0.5
val sc: SparkContext = initContext()
val data = sc.textFile(...)
val parsedData: RDD[LabeledPoint] = data.map { line =>
// parsing
}.cache()
// Building the model
val numIterations = 100
val stepSize = 0.00000001
val model = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize)
// Evaluate model on training examples and compute training error
val valuesAndPreds = parsedData.map { point =>
val prediction: Double = model.predict(point.features)
(point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
*	http://spark.apache.org/docs/latest/mllib-linear-methods.html
Linear regression demo
// imports
//V1,V2,V3,R
//1,1,1,0.1
//1,0,1,0.5
val sc: SparkContext = initContext()
val data = sc.textFile(...)
val parsedData: RDD[LabeledPoint] = data.map { line =>
// parsing
}.cache()
// Building the model
val numIterations = 100
val stepSize = 0.00000001
val model = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize)
// Evaluate model on training examples and compute training error
val valuesAndPreds = parsedData.map { point =>
val prediction: Double = model.predict(point.features)
(point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
*	http://spark.apache.org/docs/latest/mllib-linear-methods.html
But…
• Are the implementations fast enough?
• Are the implementations accurate enough?
• What about other algorithms (i.e. where’s my
DeepLearning!)?
• What about visualisations?
*	http://spark.apache.org/docs/latest/mllib-guide.html
H2O
Math platform
What is H2O?
• Open source
• Set of math and predictive algorithms
• GLM, Random Forest, GBM, Deep Learning etc.
• Written in high performance Java - native Java API
• Drivers for R, Python, Excel, Tableau
• REST API
Math platform
API
What is H2O?
• Open source
• Set of math and predictive algorithms
• GLM, Random Forest, GBM, Deep Learning etc.
• Written in high performance Java - native Java API
• Drivers for R, Python, Excel, Tableau
• REST API
• Highly paralleled and distributed implementation
• Fast in-memory computation on highly compressed data
• Allows you to use all your data without sampling
• Based on mutable data structures
Math platform
API
Big data
focused
What is H2O?
• Open source
• Set of math and predictive algorithms
• GLM, Random Forest, GBM, Deep Learning etc.
FlowUI
• Notebook style open
source interface for H2O
• Allows you to combine
code execution, text,
mathematics, plots, and
rich media in a single
document
Why H2O?
• Speed and accuracy
• Algorithms/functionality not present in MLlib
• Access to FlowUI
• Possibility to generate dependency free (Java) models
• Option to checkpoint models (though not all) and continue
learning in the future
Sparkling Water
What is Sparkling Water?
• Framework integrating Spark and H2O
• Transparent use of H2O data structures and algorithms
with Spark API and vice versa
Common use-cases
Modeling
ETL
Data
Source
Modelling Predictions
Deep learning,
GBM, DRF, GLM,
PCA, Ensembles
etc.
ETL
ETL
Data
Source
Modelling Predictions
Stream Processing
ETL
Data
Source
Modelling
Predictions
Data
Stream
Spark Streaming/
Storm/Flink etc.
Demo #1
Sparkling Shell
REQUIREMENTS
• Windows/Linux/MacOS
• Java 1.7+
• Spark 1.3+
• SPARK_HOME set
INSTALLATION
1. http://www.h2o.ai/download
2. set MASTER env
3. unzip
4. run bin/sparkling-shell
DEV FLOW
1. create a script file containing application code
2. run with bin/sparkling-shell -i script_name.script.scala
OR
1. run bin/sparkling-shell and simply use the REPL
import org.apache.spark.h2o._
// sc - SparkContext already provided by the shell
val h2oContext = new H2OContext(sc).start()
import h2oContext._
// Application logic
Airline delay classification
Model
predicting flight
delays
ETL Modelling Predictions
• load data from CSVs
• use Spark APIs to filter
and join data
Model using
H2O’s GBM
*	https://github.com/h2oai/sparkling-water/tree/master/examples/scripts
Gradient Boosting Machines
• Classification and regression predictive modelling
• Ensemble of multiple weak models (usually decision trees)
• Iteratively solves residuals (gradient boosted)
• Stochastic
Demo #2
FlowUI
Demo #3
Standalone app
REQUIREMENTS
• git
• editor of choice (IntelliJ/eclipse support)
BOOTSTRAP
1. git clone https://github.com/h2oai/h2o-droplets.git
2. cd h2o-droplets/sparkling-water-droplet
3. if using IntelliJ or Eclipse:
– ./gradlew idea
– ./gradlew eclipse
– import project in the IDE
4. develop your app
DEPLOYMENT
1. build ./gradlew build shadowJar
2. submit with:
$SPARK_HOME/bin/spark-submit 
--class water.droplets.SWTokyoDemo 
--master local[*] 
--packages ai.h2o:sparkling-water-core_2.10:1.6.5 
build/libs/sparkling-water-droplet-app.jar
Open Source
• Github:
https://github.com/h2oai/sparkling-water
• JIRA:
http://jira.h2o.ai
• Google groups:
https://groups.google.com/forum/?hl=en#!forum/h2ostream
More Info
• Documentation and booklets:
http://www.h2o.ai/docs/
• H2O.ai blog:
http://h2o.ai/blog
• H2O.ai YouTube channel:
https://www.youtube.com/user/0xdata
@h2oai
http://www.h2o.ai
Thank you!
@mdymczyk
Mateusz Dymczyk
mateusz@h2o.ai
Q&A

More Related Content

What's hot

Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
Sri Ambati
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas Meetup
Sri Ambati
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Sri Ambati
 
H2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SFH2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SF
Sri Ambati
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
Jo-fai Chow
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
Sri Ambati
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara University
Sri Ambati
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14Sri Ambati
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Sri Ambati
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
Sri Ambati
 
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...
Sri Ambati
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
Sri Ambati
 
ISAX
ISAXISAX
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
Databricks
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
Bryan Yang
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
Yalçın Yenigün
 
H2O 3 REST API Overview
H2O 3 REST API OverviewH2O 3 REST API Overview
H2O 3 REST API Overview
Raymond Peck
 
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard PafkaH2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
Sri Ambati
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
Avkash Chauhan
 

What's hot (20)

Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas Meetup
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
H2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SFH2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SF
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara University
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
 
ISAX
ISAXISAX
ISAX
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
 
H2O 3 REST API Overview
H2O 3 REST API OverviewH2O 3 REST API Overview
H2O 3 REST API Overview
 
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard PafkaH2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 

Viewers also liked

Экзамены Cambridge english в СПбГУ
Экзамены Cambridge english в СПбГУЭкзамены Cambridge english в СПбГУ
Экзамены Cambridge english в СПбГУ
Aleksey Konovalenkov
 
H20: A platform for big math
H20: A platform for big math H20: A platform for big math
H20: A platform for big math
DataWorks Summit/Hadoop Summit
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Spark Summit
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Spark Summit
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Spark Summit
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Spark Summit
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit
 
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
Spark Summit
 
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Spark Summit
 

Viewers also liked (12)

Экзамены Cambridge english в СПбГУ
Экзамены Cambridge english в СПбГУЭкзамены Cambridge english в СПбГУ
Экзамены Cambridge english в СПбГУ
 
H20: A platform for big math
H20: A platform for big math H20: A platform for big math
H20: A platform for big math
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
 
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
 

Similar to Spark + H20 = Machine Learning at scale

Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
Stepan Pushkarev
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
Vienna Data Science Group
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Yao Yao
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
Andrii Gakhov
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
Databricks
 
Apache Spark & MLlib
Apache Spark & MLlibApache Spark & MLlib
Apache Spark & MLlib
Grigory Sapunov
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Apache spark
Apache sparkApache spark
Apache spark
TEJPAL GAUTAM
 
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupApache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Hyderabad Scalability Meetup
 
Dec6 meetup spark presentation
Dec6 meetup spark presentationDec6 meetup spark presentation
Dec6 meetup spark presentation
Ramesh Mudunuri
 
Spark core
Spark coreSpark core
Spark core
Prashant Gupta
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
 
R4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning FrameworkR4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning Framework
Alok Singh
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
Karan Alang
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
Lester Martin
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internal
David Lauzon
 

Similar to Spark + H20 = Machine Learning at scale (20)

Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
 
Apache Spark & MLlib
Apache Spark & MLlibApache Spark & MLlib
Apache Spark & MLlib
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache spark
Apache sparkApache spark
Apache spark
 
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupApache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
 
Dec6 meetup spark presentation
Dec6 meetup spark presentationDec6 meetup spark presentation
Dec6 meetup spark presentation
 
Spark core
Spark coreSpark core
Spark core
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
R4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning FrameworkR4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning Framework
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internal
 

Recently uploaded

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Spark + H20 = Machine Learning at scale

  • 1. Spark + H2O = Machine Learning at scale Mateusz Dymczyk Software Engineer Machine Learning with Spark Tokyo 30.06.2016
  • 2. Agenda • Spark introduction • H2O introduction • Spark + H2O = Sparkling Water • Demos
  • 4. What is Spark? • Fast and general engine for large-scale data processing. • API in Java, Scala, Python and R • Batch and streaming APIs • Based on immutable data structure * http://spark.apache.org/
  • 6. Why Spark? • In-memory computation (fast) • Ability to cache (intermediate) results in memory (or on disk) • Easy API • Plenty of out-of-the box libraries * http://spark.apache.org/docs/latest/mllib-guide.html
  • 7. MLlib • Spark’s machine learning library • Supports: • basic statistics • classification and regression • clustering • dimensionality reduction • evaluations • … * http://spark.apache.org/docs/latest/mllib-guide.html
  • 8. Linear regression demo // imports //V1,V2,V3,R //1,1,1,0.1 //1,0,1,0.5 val sc: SparkContext = initContext() val data = sc.textFile(...) val parsedData: RDD[LabeledPoint] = data.map { line => // parsing }.cache() // Building the model val numIterations = 100 val stepSize = 0.00000001 val model = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize) // Evaluate model on training examples and compute training error val valuesAndPreds = parsedData.map { point => val prediction: Double = model.predict(point.features) (point.label, prediction) } val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean() * http://spark.apache.org/docs/latest/mllib-linear-methods.html
  • 9. Linear regression demo // imports //V1,V2,V3,R //1,1,1,0.1 //1,0,1,0.5 val sc: SparkContext = initContext() val data = sc.textFile(...) val parsedData: RDD[LabeledPoint] = data.map { line => // parsing }.cache() // Building the model val numIterations = 100 val stepSize = 0.00000001 val model = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize) // Evaluate model on training examples and compute training error val valuesAndPreds = parsedData.map { point => val prediction: Double = model.predict(point.features) (point.label, prediction) } val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean() * http://spark.apache.org/docs/latest/mllib-linear-methods.html
  • 10. Linear regression demo // imports //V1,V2,V3,R //1,1,1,0.1 //1,0,1,0.5 val sc: SparkContext = initContext() val data = sc.textFile(...) val parsedData: RDD[LabeledPoint] = data.map { line => // parsing }.cache() // Building the model val numIterations = 100 val stepSize = 0.00000001 val model = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize) // Evaluate model on training examples and compute training error val valuesAndPreds = parsedData.map { point => val prediction: Double = model.predict(point.features) (point.label, prediction) } val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean() * http://spark.apache.org/docs/latest/mllib-linear-methods.html
  • 11. Linear regression demo // imports //V1,V2,V3,R //1,1,1,0.1 //1,0,1,0.5 val sc: SparkContext = initContext() val data = sc.textFile(...) val parsedData: RDD[LabeledPoint] = data.map { line => // parsing }.cache() // Building the model val numIterations = 100 val stepSize = 0.00000001 val model = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize) // Evaluate model on training examples and compute training error val valuesAndPreds = parsedData.map { point => val prediction: Double = model.predict(point.features) (point.label, prediction) } val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean() * http://spark.apache.org/docs/latest/mllib-linear-methods.html
  • 12. But… • Are the implementations fast enough? • Are the implementations accurate enough? • What about other algorithms (i.e. where’s my DeepLearning!)? • What about visualisations? * http://spark.apache.org/docs/latest/mllib-guide.html
  • 13. H2O
  • 14. Math platform What is H2O? • Open source • Set of math and predictive algorithms • GLM, Random Forest, GBM, Deep Learning etc.
  • 15. • Written in high performance Java - native Java API • Drivers for R, Python, Excel, Tableau • REST API Math platform API What is H2O? • Open source • Set of math and predictive algorithms • GLM, Random Forest, GBM, Deep Learning etc.
  • 16. • Written in high performance Java - native Java API • Drivers for R, Python, Excel, Tableau • REST API • Highly paralleled and distributed implementation • Fast in-memory computation on highly compressed data • Allows you to use all your data without sampling • Based on mutable data structures Math platform API Big data focused What is H2O? • Open source • Set of math and predictive algorithms • GLM, Random Forest, GBM, Deep Learning etc.
  • 17.
  • 18.
  • 19.
  • 20. FlowUI • Notebook style open source interface for H2O • Allows you to combine code execution, text, mathematics, plots, and rich media in a single document
  • 21. Why H2O? • Speed and accuracy • Algorithms/functionality not present in MLlib • Access to FlowUI • Possibility to generate dependency free (Java) models • Option to checkpoint models (though not all) and continue learning in the future
  • 23. What is Sparkling Water? • Framework integrating Spark and H2O • Transparent use of H2O data structures and algorithms with Spark API and vice versa
  • 24.
  • 25.
  • 26.
  • 32. REQUIREMENTS • Windows/Linux/MacOS • Java 1.7+ • Spark 1.3+ • SPARK_HOME set INSTALLATION 1. http://www.h2o.ai/download 2. set MASTER env 3. unzip 4. run bin/sparkling-shell
  • 33. DEV FLOW 1. create a script file containing application code 2. run with bin/sparkling-shell -i script_name.script.scala OR 1. run bin/sparkling-shell and simply use the REPL import org.apache.spark.h2o._ // sc - SparkContext already provided by the shell val h2oContext = new H2OContext(sc).start() import h2oContext._ // Application logic
  • 34. Airline delay classification Model predicting flight delays ETL Modelling Predictions • load data from CSVs • use Spark APIs to filter and join data Model using H2O’s GBM * https://github.com/h2oai/sparkling-water/tree/master/examples/scripts
  • 35. Gradient Boosting Machines • Classification and regression predictive modelling • Ensemble of multiple weak models (usually decision trees) • Iteratively solves residuals (gradient boosted) • Stochastic
  • 38. REQUIREMENTS • git • editor of choice (IntelliJ/eclipse support)
  • 39. BOOTSTRAP 1. git clone https://github.com/h2oai/h2o-droplets.git 2. cd h2o-droplets/sparkling-water-droplet 3. if using IntelliJ or Eclipse: – ./gradlew idea – ./gradlew eclipse – import project in the IDE 4. develop your app
  • 40. DEPLOYMENT 1. build ./gradlew build shadowJar 2. submit with: $SPARK_HOME/bin/spark-submit --class water.droplets.SWTokyoDemo --master local[*] --packages ai.h2o:sparkling-water-core_2.10:1.6.5 build/libs/sparkling-water-droplet-app.jar
  • 41. Open Source • Github: https://github.com/h2oai/sparkling-water • JIRA: http://jira.h2o.ai • Google groups: https://groups.google.com/forum/?hl=en#!forum/h2ostream
  • 42. More Info • Documentation and booklets: http://www.h2o.ai/docs/ • H2O.ai blog: http://h2o.ai/blog • H2O.ai YouTube channel: https://www.youtube.com/user/0xdata @h2oai http://www.h2o.ai
  • 44. Q&A