Michal Malohlava, Alex Tellez, Amy Wang and H2O.ai
Building Machine Learning Applications with Sparkling Water Series
04/22/2015 Meetup
Deep Learning for Public Safety:
Fighting Crime with Open City
Data
Sparkling Water Download
http://h2o.ai/download/
http://h2o-release.s3.amazonaws.com/
sparkling-water/master/95/index.html
Where is the code?
https://github.com/h2oai/sparkling-water/
blob/master/examples/scripts/README.md
Scalable 

Machine Learning
For Smarter
Applications
Smarter Applications
Scalable Applications
Distributed
Able to process huge amount of data from
different sources
Easy to develop and experiment
Powerful machine learning engine inside
BUT
how to build
them?
Build an application
with …
?
…with Spark and H2O
Open-source distributed execution platform
User-friendly API for data transformation based on RDD
Platform components - SQL, MLLib, text mining
Multitenancy
Large and active community
Open-source scalable machine
learning platform
Tuned for efficient computation
and memory use
Production ready machine
learning algorithms
R, Python, Java, Scala APIs
Interactive UI, robust data parser
Sparkling Water
Provides
Transparent integration of H2O with Spark ecosystem
Transparent use of H2O data structures and
algorithms with Spark API
Excels in existing Spark workflows
requiring advanced Machine Learning
algorithms
Platform for building
Smarter
Applications
Sparkling Water Design
spark-submit
Spark
Master
JVM
Spark
Worker
JVM
Spark
Worker
JVM
Spark
Worker
JVM
Sparkling Water Cluster
Spark
Executor
JVM
H2O
Spark
Executor
JVM
H2O
Spark
Executor
JVM
H2O
Sparkling
App
implements
?
Regular Spark application
containing also
Sparkling Water
classes
Data Distribution
H2O
H2O
H2O
Sparkling Water Cluster
Spark Executor JVM
Data
Source
(e.g.
HDFS)
H2O
RDD
Spark Executor JVM
Spark Executor JVM
Spark
RDD
RDDs and DataFrames
share same memory
space
toRDD
toH2OFrame
Lets build
an application !
Deep Learning for
Public Safety:
Fighting Crime
with OPEN City
Data
Predict probability of arrest
CHICAGO
OPEN CRIME DATA
Crime Dataset: Crimes from 2001 - Present Day
~ 4.6 million crimes
THE WINDY CITY
Harvest Chicago Weather data since 2001
SOCIOECONOMIC FACTORS
Crimes segmented into Community Area IDs
Percent of households below poverty, unemployed, etc.
H2O.ai

Machine Intelligence
22
Crime("02/08/2015 11:43:58 PM",
1811,
“NARCOTICS",
“STREET",
false, 422, 4, 7, 46, 18)
Crime("02/08/2015 11:00:39 PM",
1150,
"DECEPTIVE PRACTICE",
“RESIDENCE",
false, 923, 9, 14, 63, 11)
ARREST?
Predict arrest prob for
crime events
ML Workflow
H2O.ai

Machine Intelligence
CrimesCensusWeather
24
Data munging
Spark SQL join
Split table
Collect models metrics
Evaluate models
and score new crimes
Deep
Learning
GBM
Application
environment
sparkling-shell
Where is the code?
https://github.com/h2oai/sparkling-water/
blob/master/examples/scripts/
Sparkling Water Download
http://h2o.ai/download/
http://h2o-release.s3.amazonaws.com/
sparkling-water/master/95/index.html
More info about app
https://kddnuggets.com/2015/04/
deep-learning-fight-crime.html
Complete app code
at GitHub
https://github.com/h2oai/
sparkling-water/
Checkout H2O.ai Training Books
http://learn.h2o.ai/

Checkout H2O.ai Blog
http://h2o.ai/blog/

Checkout H2O.ai Youtube Channel
https://www.youtube.com/user/0xdata

Checkout GitHub
https://github.com/h2oai/sparkling-water
Meetups
https://meetup.com/
More info
Learn more at h2o.ai
Follow us at @h2oai
Thank you!
Sparkling Water is
open-source

ML application platform
combining

power of Spark and H2O

Sparkling Water Meetup: Deep Learning for Public Safety