SlideShare a Scribd company logo
1 of 50
OCT 29th, 2014 WEBINAR 
H2O 
Fast. Scalable. Machine Learning 
For Smarter Applications 
“Fluids are In, Animals are Out.” 
~ Svetlana Sicular, Gartner
Speakers 
Joel Horwitz 
Joel is a caffeine, data, and laughter driven product strategist. He is an 
active community member having founded Bay Area Analytics, Tweets 
regularly @JSHorwitz, blogs regularly joelshorwitz.com and speaks 
regularly at industry events. Always eager to learn and lend a helping hand 
makes him an invaluable asset to 0xdata. 
Michal Malohlava 
Michal is a geek, developer, Java, Linux, programming languages 
enthusiast developing software for over 10 years. He obtained PhD from 
the Charles University in Prague in 2012 and post-doc at Purdue 
University. 
H2O World Register at http://www.0xdata.com/h2o-world
Time is the only non-renewable resource. 
Speed Matters!
Today 
• Why Are We Here 
• Who We Are 
• How Do We Do It 
• Who We Work With 
• What We Believe 
• Demo and Q&A
A New Interpretation of Moore’s Law 
“Like the physical universe, the digital universe is large - by 2020 containing nearly as 
many digital bits as there are stars in the universe. It is doubling in size every two 
years, and by 2020 the digital universe - the data we create and copy annually - will 
reach 44 zettabytes, or 44 trillion gigabytes.” 
- IDC 2014
An Evolving Market to Meet the Demand 
RDBMS MPP 
Business 
Intelligence 
Data 
Science 
H 
O Distributed 
2 
File System 
Machine 
Learning
Decreasing Cost of Data is Driving Demand 
H 
O 
2 
1970 1980 1990 2000 2010 2020
H2O is the First Dedicated 
Machine Learning Open Source Platform 
H2O is for application developers and analysts who need 
scalable and fast machine learning. H2O is an open source 
predictive analytics platform. Unlike traditional analytics tools, 
H2O provides a combination of extraordinary math, a high 
performance parallel architecture, and unrivaled ease of use.
Who Are We? 
H 
2 
O
H2O Awards and Accolades 
• Top R Project of UserR Conference 2014 
• Fortune Big Data All-Stars 2014, Arno Candel 
• 100+ Meetups 
• 6000+ Users
H2O is Built for Speed and Scale 
• OpenSource 
• REST API 
• Native R Support 
• NanoFastTM Scoring Engine 
• Sophisticated Algorithms
H2O Seamlessly Integrates with Your Workflow 
• 20X Faster Imports and 3X 
Compression w/ .hex format. 
• 4 Billion Row Regression in 
Seconds. 
• Deploy in POJO or with our 
REST API
Code is incomplete without Community! 
Open Source Drives Innovation.
Law of Large Numbers Triumphs!
Every Generation Needs to Invent Its Own Math. 
ML is the new SQL!
What do our customers say about us? 
"The platform can generate Jar files to deploy models into production. This 
alone is a milestone!" - Hassan Namarvar, ShareThis 
“I have to give credit to H2O. They have a very complete way of showing which 
algorithm is the best.” – Nachum Shacham, Paypal 
“I analyzed 1 million rows training set, fitting a logistic regression with elastic 
penalty, and doing a grid search on parameters with 10-fold cross validation for 
each parameter combination… doing this analysis was a breeze… orders of 
magnitude faster than R.” - Antonio Molins, Netflix 
“Never have we had such a quick, simple, scalable and cost effective 
deployment solution for predictive modeling.” – Lou Carvalheira, Cisco
Advertising 
Better Conversions 
Brand Conversion Reach ROI 
Overall, I would say that the H2O platform is the most elegant open source in-memory ~ Hassan Namarvar, Principal Data Scientist
Fraud 
Better Detections 
Purchase 
Shopping Theft Passwords 
I have to give credit to H2O. 
They have a very complete way of showing which algorithm is the best. 
~ Nachum Shacham, Principal Data Scientist
Marketing 
Better Spend 
ROI 
Network Segments Measure 
H2O has established a new equilibrium point for performance, 
accuracy and cost for statistics and machine learning. 
~ Lou Carvalheira, Principal Data Scientist
Select Customers Powered by H2O
@hexadata & @mmalohlava 
presents 
Sparkling Water 
“Killer App for Spark”
Memory efficient 
Performance of computation 
Machine learning algorithms 
Parser, GUI, R-interface 
User-friendly API 
Large and active community 
Platform components - SQL 
Multitenancy
Sparkling Water 
+ 
RDD 
immutable 
world 
DataFrame 
mutable 
world
Sparkling Water 
RDD DataFrame
Sparkling Water 
Provides 
Transparent integration into Spark ecosystem 
Pure H2ORDD encapsulating H2O DataFrame 
Transparent use of H2O data structures and 
algorithms with Spark API 
Excels in Spark workflows requiring advanced 
Machine Learning algorithms
Sparkling Water Design 
implements 
spark-submit 
Spark 
Master 
JVM 
Spark 
Worker 
JVM 
Spark 
Worker 
JVM 
Spark 
Worker 
JVM 
Sparkling Water Cluster 
Spark 
Executor 
JVM 
H2O 
Spark 
Executor 
JVM 
H2O 
Spark 
Executor 
JVM 
H2O 
Sparklin 
g App 
jar file 
Contains application 
and Sparkling Water 
classes
Data Distribution 
Sparkling Water Cluster 
H2O 
H2O 
H2O 
Spark Executor JVM Data 
Source 
(e.g. 
HDFS) 
H2O 
RDD 
Spark 
RDD 
Spark Executor JVM 
Spark Executor JVM 
RDDs and DataFrames 
share same memory 
space
Demo Time
Flight delay prediction 
“Build a model using weather and 
flight data to predict delays of flights 
arriving to Chicago O’Hare 
International Airport”
Example Outline 
Load & Parse CSV data from 2 data sources 
Use Spark API to filter data, do SQL query for join 
Create a regression model 
Use model for delay prediction 
Plot residual plot from R
Sparkling Water 
Requirements 
Linux or Mac OS X 
Oracle Java 1.7
Download 
http://0xdata.com/download/
Install and Launch 
Unpack zip file 
and 
Point SPARK_HOME to your Spark installation 
and 
Launch h2o-examples/sparkling-shell
What is Sparkling Shell? 
Standard spark-shell 
With additional Sparkling Water classes 
export MASTER=“local-cluster[3,2,1024]” 
spark-shell  
—-jars sparkling-water.jar 
JAR containing 
Sparkling 
Water 
Spark Master 
address
Lets play with Sparkling 
shell…
Create H2O Client 
import org.apache.spark.h2o._ 
import org.apache.spark.examples.h2o._ 
val h2oContext = new H2OContext(sc).start(3) 
import h2oContext._ 
Regular Spark context 
provided by Spark shell 
Size of demanded 
H2O cloud 
Contains implicit utility functions 
Demo specific 
classes
Is Spark Running? 
Go to http://localhost:4040
Is H2O running? 
http://localhost:54321/steam/index.html
Load Data #1 
Load weather data into RDD 
val weatherDataFile = 
“examples/smalldata/weather.csv" 
val wrawdata = sc.textFile(weatherDataFile,3) 
.cache() 
val weatherTable = wrawdata 
.map(_.split(“,")) 
.map(row => WeatherParse(row)) 
.filter(!_.isWrongRow()) 
Regular Spark API 
Ad-hoc Parser
Weather Data 
case class Weather( val Year : Option[Int], 
val Month : Option[Int], 
val Day : Option[Int], 
val TmaxF : Option[Int], // Max temperatur in F 
val TminF : Option[Int], // Min temperatur in F 
val TmeanF : Option[Float], // Mean temperatur in F 
val PrcpIn : Option[Float], // Precipitation (inches) 
val SnowIn : Option[Float], // Snow (inches) 
val CDD : Option[Float], // Cooling Degree Day 
val HDD : Option[Float], // Heating Degree Day 
val GDD : Option[Float]) // Growing Degree Day 
Simple POSO to hold one row of weather data
Load Data #2 
Load flights data into H2O frame 
import java.io.File 
val dataFile = 
“examples/smalldata/allyears2k_headers.csv.gz" 
val airlinesData = new DataFrame(new File(dataFile))
Where is the data? 
Go to http://localhost:54321/steam/index.html
Use Spark API for Data 
Filtering 
// Create RDD wrapper around DataFrame 
val airlinesTable : RDD[Airlines] 
= toRDD[Airlines](airlinesData) 
// And use Spark RDD API directly 
val flightsToORD = airlinesTable 
.filter( f => f.Dest == Some(“ORD") ) 
Create a cheap wrapper 
around H2O DataFrame 
Regular Spark 
RDD call
Use Spark SQL to Data Join 
import org.apache.spark.sql.SQLContext 
// We need to create SQL context 
val sqlContext = new SQLContext(sc) 
import sqlContext._ 
flightsToORD.registerTempTable("FlightsToORD") 
weatherTable.registerTempTable("WeatherORD")
Join Data based on Flight Date 
val bigTable = sql( 
"""SELECT 
| f.Year,f.Month,f.DayofMonth, 
| f.CRSDepTime,f.CRSArrTime,f.CRSElapsedTime, 
| f.UniqueCarrier,f.FlightNum,f.TailNum, 
| f.Origin,f.Distance, 
| w.TmaxF,w.TminF,w.TmeanF, 
| w.PrcpIn,w.SnowIn,w.CDD,w.HDD,w.GDD, 
| f.ArrDelay 
| FROM FlightsToORD f 
| JOIN WeatherORD w 
| ON f.Year=w.Year AND f.Month=w.Month 
| AND f.DayofMonth=w.Day""".stripMargin)
Launch H2O Algorithms 
import hex.deeplearning._ 
import hex.deeplearning.DeepLearningModel 
.DeepLearningParameters 
// Setup deep learning parameters 
val dlParams = new DeepLearningParameters() 
dlParams._train = bigTable 
dlParams._response_column = 'ArrDelay 
dlParams._classification = false 
// Create a new model builder 
val dl = new DeepLearning(dlParams) 
val dlModel = dl.train.get 
Result of 
SQL query 
Blocking call
Make a prediction 
// Use model to score data 
val prediction = dlModel.score(result)(‘predict) 
// Collect predicted values via RDD API 
val predictionValues = toRDD[DoubleHolder](prediction) 
.collect 
.map ( _.result.getOrElse("NaN") )
Generate Residuals Plot 
# Import H2O library and initialize H2O client 
library(h2o) 
h = h2o.init() 
# Fetch prediction and actual data, use remembered keys 
pred = h2o.getFrame(h, "dframe_b5f449d0c04ee75fda1b9bc865b14a69") 
act = h2o.getFrame (h, "frame_rdd_14_b429e8b43d2d8c02899ccb61b72c4e57") 
# Select right columns 
predDelay = pred$predict 
actDelay = act$ArrDelay 
# Make sure that number of rows is same 
nrow(actDelay) == nrow(predDelay) 
# Compute residuals 
residuals = predDelay - actDelay 
# Plot residuals 
compare = cbind( 
as.data.frame(actDelay$ArrDelay), 
as.data.frame(residuals$predict)) 
plot( compare[,1:2] ) 
References 
of data
More info 
Checkout 0xdata Blog for Sparkling Water tutorials 
http://0xdata.com/blog/ 
Checkout 0xdata Youtube Channel 
https://www.youtube.com/user/0xdata 
Checkout github 
https://github.com/0xdata/sparkling-water
Thank you! 
Learn more about H2O at 
0xdata.com 
or 
neo> for r in sparkling-water; do 
git clone “git@github.com:0xdata/$r.git” 
done 
Follow us at @hexadata

More Related Content

What's hot

Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
Lambda Architecture using Google Cloud plus Apps
Lambda Architecture using Google Cloud plus AppsLambda Architecture using Google Cloud plus Apps
Lambda Architecture using Google Cloud plus AppsSimon Su
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopAdam Muise
 
Approximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingApproximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingGabriele Modena
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan
 
H2O 0xdata MLconf
H2O 0xdata MLconfH2O 0xdata MLconf
H2O 0xdata MLconfSri Ambati
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaEdureka!
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
h2oensemble with Erin Ledell at useR! Aalborg
h2oensemble with Erin Ledell at useR! Aalborgh2oensemble with Erin Ledell at useR! Aalborg
h2oensemble with Erin Ledell at useR! AalborgSri Ambati
 
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013Adam Muise
 
Sparkling Water, ASK CRAIG
Sparkling Water, ASK CRAIGSparkling Water, ASK CRAIG
Sparkling Water, ASK CRAIGSri Ambati
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data Amar kumar
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"Giivee The
 

What's hot (20)

Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Working with the Scalding Type -Safe API
Working with the Scalding Type -Safe APIWorking with the Scalding Type -Safe API
Working with the Scalding Type -Safe API
 
Lambda Architecture using Google Cloud plus Apps
Lambda Architecture using Google Cloud plus AppsLambda Architecture using Google Cloud plus Apps
Lambda Architecture using Google Cloud plus Apps
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of Hadoop
 
Approximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingApproximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processing
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
H2O 0xdata MLconf
H2O 0xdata MLconfH2O 0xdata MLconf
H2O 0xdata MLconf
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
h2oensemble with Erin Ledell at useR! Aalborg
h2oensemble with Erin Ledell at useR! Aalborgh2oensemble with Erin Ledell at useR! Aalborg
h2oensemble with Erin Ledell at useR! Aalborg
 
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013
 
Sparkling Water, ASK CRAIG
Sparkling Water, ASK CRAIGSparkling Water, ASK CRAIG
Sparkling Water, ASK CRAIG
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 

Viewers also liked

Sparkling Water Meetup
Sparkling Water MeetupSparkling Water Meetup
Sparkling Water MeetupSri Ambati
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_onSri Ambati
 
Deep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San FranciscoDeep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San FranciscoSri Ambati
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaSpark Summit
 
H2O Big Data Environments
H2O Big Data EnvironmentsH2O Big Data Environments
H2O Big Data EnvironmentsSri Ambati
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaSri Ambati
 
Skutil - H2O meets Sklearn - Taylor Smith
Skutil - H2O meets Sklearn - Taylor SmithSkutil - H2O meets Sklearn - Taylor Smith
Skutil - H2O meets Sklearn - Taylor SmithSri Ambati
 
Drools 6.0 (Red Hat Summit)
Drools 6.0 (Red Hat Summit)Drools 6.0 (Red Hat Summit)
Drools 6.0 (Red Hat Summit)Mark Proctor
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at ScaleSri Ambati
 

Viewers also liked (9)

Sparkling Water Meetup
Sparkling Water MeetupSparkling Water Meetup
Sparkling Water Meetup
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on
 
Deep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San FranciscoDeep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San Francisco
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
H2O Big Data Environments
H2O Big Data EnvironmentsH2O Big Data Environments
H2O Big Data Environments
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
 
Skutil - H2O meets Sklearn - Taylor Smith
Skutil - H2O meets Sklearn - Taylor SmithSkutil - H2O meets Sklearn - Taylor Smith
Skutil - H2O meets Sklearn - Taylor Smith
 
Drools 6.0 (Red Hat Summit)
Drools 6.0 (Red Hat Summit)Drools 6.0 (Red Hat Summit)
Drools 6.0 (Red Hat Summit)
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 

Similar to Sparkling Water Webinar October 29th, 2014

H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling WaterSri Ambati
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Sri Ambati
 
Sparkling Water
Sparkling WaterSparkling Water
Sparkling Waterh2oworld
 
Predictive churn h20_dsx
Predictive churn h20_dsxPredictive churn h20_dsx
Predictive churn h20_dsxNdjido Ardo BAR
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsSriskandarajah Suhothayan
 
Introduction to data science with H2O-Chicago
Introduction to data science with H2O-ChicagoIntroduction to data science with H2O-Chicago
Introduction to data science with H2O-ChicagoSri Ambati
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewSri Ambati
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
NSA for Enterprises Log Analysis Use Cases
NSA for Enterprises   Log Analysis Use Cases NSA for Enterprises   Log Analysis Use Cases
NSA for Enterprises Log Analysis Use Cases WSO2
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataOSCON Byrum
 
OSCON 2013: Using Cascalog to build an app with City of Palo Alto Open Data
OSCON 2013: Using Cascalog to build an app with City of Palo Alto Open DataOSCON 2013: Using Cascalog to build an app with City of Palo Alto Open Data
OSCON 2013: Using Cascalog to build an app with City of Palo Alto Open DataPaco Nathan
 
GraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
GraphConnect Europe 2016 - Opening Keynote, Emil EifremGraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
GraphConnect Europe 2016 - Opening Keynote, Emil EifremNeo4j
 

Similar to Sparkling Water Webinar October 29th, 2014 (20)

H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015
 
Sparkling Water
Sparkling WaterSparkling Water
Sparkling Water
 
Predictive churn h20_dsx
Predictive churn h20_dsxPredictive churn h20_dsx
Predictive churn h20_dsx
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 
Introduction to data science with H2O-Chicago
Introduction to data science with H2O-ChicagoIntroduction to data science with H2O-Chicago
Introduction to data science with H2O-Chicago
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain View
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
NSA for Enterprises Log Analysis Use Cases
NSA for Enterprises   Log Analysis Use Cases NSA for Enterprises   Log Analysis Use Cases
NSA for Enterprises Log Analysis Use Cases
 
What's new in Spark 2.0?
What's new in Spark 2.0?What's new in Spark 2.0?
What's new in Spark 2.0?
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open Data
 
OSCON 2013: Using Cascalog to build an app with City of Palo Alto Open Data
OSCON 2013: Using Cascalog to build an app with City of Palo Alto Open DataOSCON 2013: Using Cascalog to build an app with City of Palo Alto Open Data
OSCON 2013: Using Cascalog to build an app with City of Palo Alto Open Data
 
resumePdf
resumePdfresumePdf
resumePdf
 
GraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
GraphConnect Europe 2016 - Opening Keynote, Emil EifremGraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
GraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Sparkling Water Webinar October 29th, 2014

  • 1. OCT 29th, 2014 WEBINAR H2O Fast. Scalable. Machine Learning For Smarter Applications “Fluids are In, Animals are Out.” ~ Svetlana Sicular, Gartner
  • 2. Speakers Joel Horwitz Joel is a caffeine, data, and laughter driven product strategist. He is an active community member having founded Bay Area Analytics, Tweets regularly @JSHorwitz, blogs regularly joelshorwitz.com and speaks regularly at industry events. Always eager to learn and lend a helping hand makes him an invaluable asset to 0xdata. Michal Malohlava Michal is a geek, developer, Java, Linux, programming languages enthusiast developing software for over 10 years. He obtained PhD from the Charles University in Prague in 2012 and post-doc at Purdue University. H2O World Register at http://www.0xdata.com/h2o-world
  • 3. Time is the only non-renewable resource. Speed Matters!
  • 4. Today • Why Are We Here • Who We Are • How Do We Do It • Who We Work With • What We Believe • Demo and Q&A
  • 5. A New Interpretation of Moore’s Law “Like the physical universe, the digital universe is large - by 2020 containing nearly as many digital bits as there are stars in the universe. It is doubling in size every two years, and by 2020 the digital universe - the data we create and copy annually - will reach 44 zettabytes, or 44 trillion gigabytes.” - IDC 2014
  • 6. An Evolving Market to Meet the Demand RDBMS MPP Business Intelligence Data Science H O Distributed 2 File System Machine Learning
  • 7. Decreasing Cost of Data is Driving Demand H O 2 1970 1980 1990 2000 2010 2020
  • 8. H2O is the First Dedicated Machine Learning Open Source Platform H2O is for application developers and analysts who need scalable and fast machine learning. H2O is an open source predictive analytics platform. Unlike traditional analytics tools, H2O provides a combination of extraordinary math, a high performance parallel architecture, and unrivaled ease of use.
  • 9. Who Are We? H 2 O
  • 10. H2O Awards and Accolades • Top R Project of UserR Conference 2014 • Fortune Big Data All-Stars 2014, Arno Candel • 100+ Meetups • 6000+ Users
  • 11. H2O is Built for Speed and Scale • OpenSource • REST API • Native R Support • NanoFastTM Scoring Engine • Sophisticated Algorithms
  • 12. H2O Seamlessly Integrates with Your Workflow • 20X Faster Imports and 3X Compression w/ .hex format. • 4 Billion Row Regression in Seconds. • Deploy in POJO or with our REST API
  • 13. Code is incomplete without Community! Open Source Drives Innovation.
  • 14. Law of Large Numbers Triumphs!
  • 15. Every Generation Needs to Invent Its Own Math. ML is the new SQL!
  • 16. What do our customers say about us? "The platform can generate Jar files to deploy models into production. This alone is a milestone!" - Hassan Namarvar, ShareThis “I have to give credit to H2O. They have a very complete way of showing which algorithm is the best.” – Nachum Shacham, Paypal “I analyzed 1 million rows training set, fitting a logistic regression with elastic penalty, and doing a grid search on parameters with 10-fold cross validation for each parameter combination… doing this analysis was a breeze… orders of magnitude faster than R.” - Antonio Molins, Netflix “Never have we had such a quick, simple, scalable and cost effective deployment solution for predictive modeling.” – Lou Carvalheira, Cisco
  • 17. Advertising Better Conversions Brand Conversion Reach ROI Overall, I would say that the H2O platform is the most elegant open source in-memory ~ Hassan Namarvar, Principal Data Scientist
  • 18. Fraud Better Detections Purchase Shopping Theft Passwords I have to give credit to H2O. They have a very complete way of showing which algorithm is the best. ~ Nachum Shacham, Principal Data Scientist
  • 19. Marketing Better Spend ROI Network Segments Measure H2O has established a new equilibrium point for performance, accuracy and cost for statistics and machine learning. ~ Lou Carvalheira, Principal Data Scientist
  • 21. @hexadata & @mmalohlava presents Sparkling Water “Killer App for Spark”
  • 22. Memory efficient Performance of computation Machine learning algorithms Parser, GUI, R-interface User-friendly API Large and active community Platform components - SQL Multitenancy
  • 23. Sparkling Water + RDD immutable world DataFrame mutable world
  • 24. Sparkling Water RDD DataFrame
  • 25. Sparkling Water Provides Transparent integration into Spark ecosystem Pure H2ORDD encapsulating H2O DataFrame Transparent use of H2O data structures and algorithms with Spark API Excels in Spark workflows requiring advanced Machine Learning algorithms
  • 26. Sparkling Water Design implements spark-submit Spark Master JVM Spark Worker JVM Spark Worker JVM Spark Worker JVM Sparkling Water Cluster Spark Executor JVM H2O Spark Executor JVM H2O Spark Executor JVM H2O Sparklin g App jar file Contains application and Sparkling Water classes
  • 27. Data Distribution Sparkling Water Cluster H2O H2O H2O Spark Executor JVM Data Source (e.g. HDFS) H2O RDD Spark RDD Spark Executor JVM Spark Executor JVM RDDs and DataFrames share same memory space
  • 29. Flight delay prediction “Build a model using weather and flight data to predict delays of flights arriving to Chicago O’Hare International Airport”
  • 30. Example Outline Load & Parse CSV data from 2 data sources Use Spark API to filter data, do SQL query for join Create a regression model Use model for delay prediction Plot residual plot from R
  • 31. Sparkling Water Requirements Linux or Mac OS X Oracle Java 1.7
  • 33. Install and Launch Unpack zip file and Point SPARK_HOME to your Spark installation and Launch h2o-examples/sparkling-shell
  • 34. What is Sparkling Shell? Standard spark-shell With additional Sparkling Water classes export MASTER=“local-cluster[3,2,1024]” spark-shell —-jars sparkling-water.jar JAR containing Sparkling Water Spark Master address
  • 35. Lets play with Sparkling shell…
  • 36. Create H2O Client import org.apache.spark.h2o._ import org.apache.spark.examples.h2o._ val h2oContext = new H2OContext(sc).start(3) import h2oContext._ Regular Spark context provided by Spark shell Size of demanded H2O cloud Contains implicit utility functions Demo specific classes
  • 37. Is Spark Running? Go to http://localhost:4040
  • 38. Is H2O running? http://localhost:54321/steam/index.html
  • 39. Load Data #1 Load weather data into RDD val weatherDataFile = “examples/smalldata/weather.csv" val wrawdata = sc.textFile(weatherDataFile,3) .cache() val weatherTable = wrawdata .map(_.split(“,")) .map(row => WeatherParse(row)) .filter(!_.isWrongRow()) Regular Spark API Ad-hoc Parser
  • 40. Weather Data case class Weather( val Year : Option[Int], val Month : Option[Int], val Day : Option[Int], val TmaxF : Option[Int], // Max temperatur in F val TminF : Option[Int], // Min temperatur in F val TmeanF : Option[Float], // Mean temperatur in F val PrcpIn : Option[Float], // Precipitation (inches) val SnowIn : Option[Float], // Snow (inches) val CDD : Option[Float], // Cooling Degree Day val HDD : Option[Float], // Heating Degree Day val GDD : Option[Float]) // Growing Degree Day Simple POSO to hold one row of weather data
  • 41. Load Data #2 Load flights data into H2O frame import java.io.File val dataFile = “examples/smalldata/allyears2k_headers.csv.gz" val airlinesData = new DataFrame(new File(dataFile))
  • 42. Where is the data? Go to http://localhost:54321/steam/index.html
  • 43. Use Spark API for Data Filtering // Create RDD wrapper around DataFrame val airlinesTable : RDD[Airlines] = toRDD[Airlines](airlinesData) // And use Spark RDD API directly val flightsToORD = airlinesTable .filter( f => f.Dest == Some(“ORD") ) Create a cheap wrapper around H2O DataFrame Regular Spark RDD call
  • 44. Use Spark SQL to Data Join import org.apache.spark.sql.SQLContext // We need to create SQL context val sqlContext = new SQLContext(sc) import sqlContext._ flightsToORD.registerTempTable("FlightsToORD") weatherTable.registerTempTable("WeatherORD")
  • 45. Join Data based on Flight Date val bigTable = sql( """SELECT | f.Year,f.Month,f.DayofMonth, | f.CRSDepTime,f.CRSArrTime,f.CRSElapsedTime, | f.UniqueCarrier,f.FlightNum,f.TailNum, | f.Origin,f.Distance, | w.TmaxF,w.TminF,w.TmeanF, | w.PrcpIn,w.SnowIn,w.CDD,w.HDD,w.GDD, | f.ArrDelay | FROM FlightsToORD f | JOIN WeatherORD w | ON f.Year=w.Year AND f.Month=w.Month | AND f.DayofMonth=w.Day""".stripMargin)
  • 46. Launch H2O Algorithms import hex.deeplearning._ import hex.deeplearning.DeepLearningModel .DeepLearningParameters // Setup deep learning parameters val dlParams = new DeepLearningParameters() dlParams._train = bigTable dlParams._response_column = 'ArrDelay dlParams._classification = false // Create a new model builder val dl = new DeepLearning(dlParams) val dlModel = dl.train.get Result of SQL query Blocking call
  • 47. Make a prediction // Use model to score data val prediction = dlModel.score(result)(‘predict) // Collect predicted values via RDD API val predictionValues = toRDD[DoubleHolder](prediction) .collect .map ( _.result.getOrElse("NaN") )
  • 48. Generate Residuals Plot # Import H2O library and initialize H2O client library(h2o) h = h2o.init() # Fetch prediction and actual data, use remembered keys pred = h2o.getFrame(h, "dframe_b5f449d0c04ee75fda1b9bc865b14a69") act = h2o.getFrame (h, "frame_rdd_14_b429e8b43d2d8c02899ccb61b72c4e57") # Select right columns predDelay = pred$predict actDelay = act$ArrDelay # Make sure that number of rows is same nrow(actDelay) == nrow(predDelay) # Compute residuals residuals = predDelay - actDelay # Plot residuals compare = cbind( as.data.frame(actDelay$ArrDelay), as.data.frame(residuals$predict)) plot( compare[,1:2] ) References of data
  • 49. More info Checkout 0xdata Blog for Sparkling Water tutorials http://0xdata.com/blog/ Checkout 0xdata Youtube Channel https://www.youtube.com/user/0xdata Checkout github https://github.com/0xdata/sparkling-water
  • 50. Thank you! Learn more about H2O at 0xdata.com or neo> for r in sparkling-water; do git clone “git@github.com:0xdata/$r.git” done Follow us at @hexadata

Editor's Notes

  1. immutable v. mutable approach, racy updates Strong points H2O: column compression parser small nobles based on customers feedback/knowledge tunned algo