Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sparkling Water 2.0 - Michal Malohlava

518 views

Published on

Michal Malohlava from H2O.ai talks about the new features in Sparkling Water 2.0 and the future roadmap.

- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Sparkling Water 2.0 - Michal Malohlava

  1. 1. Sparkling Water 2.0 Michal Malohlava michal@h2o.ai H2O Open Tour 2016, Dallas, TX
 October 26, 2016
  2. 2. H2O.ai
 Machine Intelligence H2O + Spark = 
 Sparkling Water
  3. 3. Transparent integration of H2O machine learning platform with Spark ecosystem Transparent use of H2O data structures (H2OFrame) and algorithms with Spark API Excels in existing Spark workflows requiring advanced Machine Learning algorithms Sparkling Water provides Functionality missing in H2O can be replaced by Spark and vice versa
  4. 4. Data Munging & Modeling Data
 Source
  5. 5. Data Munging & Modeling Data
 Source
  6. 6. Data Munging & Modeling Data
 Source
  7. 7. Data Munging & Modeling Data
 Source Data load/munging/ exploration (H2O Flow UI)
  8. 8. Data Munging & Modeling Data
 Source Data load/munging/ exploration (H2O Flow UI) Modelling
  9. 9. Stream processing
  10. 10. Stream processing Data
 Source Off-line model training
  11. 11. Stream processing Data
 Source Off-line model training Data munging
  12. 12. Stream processing Data
 Source Off-line model training Data munging Modelling
  13. 13. Stream processing Data
 Source Off-line model training Data munging Stream processing Data Stream Spark Streaming/Storm/Flink Modelling
  14. 14. Stream processing Data
 Source Off-line model training Data munging Stream processing Data Stream Spark Streaming/Storm/Flink Export model
 Modelling
  15. 15. Stream processing Data
 Source Off-line model training Data munging Stream processing Data Stream Spark Streaming/Storm/Flink Model prediction Deploy the model Export model
 Modelling
  16. 16. Stream processing Data
 Source Off-line model training Data munging Stream processing Data Stream Spark Streaming/Storm/Flink Model prediction Deploy the model Export model
 Modelling Steam Model management
  17. 17. H2O.ai
 Machine Intelligence Interested in more details? Track Two
  18. 18. H2O.ai
 Machine Intelligence What’s new in 2.0 ?
  19. 19. Releasing Schema 1.5 1.6 2.0 + + + Latest Latest Latest 1.5.x 1.6.x 2.0.x Back porting
 new features Back porting
 new features
  20. 20. ✓Support of Spark Session based API
 ✓More aggressive optimization during Spark DataFrame to H2OFrame transformation
 ✓Scala 2.11 support • useful if you are building Spark based applications Spark 2.0 API Support
  21. 21. ✓Support of new Spark data structure DataSet • DataSet: fusion of strongly typed RDD[X] and DataFrame query API Datasets Support Available in Sparkling Water 1.6! case class SamplePerson(name: String, age: Int, email: String)
 
 val dataSource = ...
 
 val samplePeople: List[SamplePerson] = dataSource map SamplePerson.tupled
 // Create Dataset
 val df: Dataset[SamplePerson] = sqlContext.createDataset(samplePeople) // Publish Dataset as H2OFrame
 val hf: H2OFrame = h2oContext.asH2OFrame(df)
  22. 22. ✓We can now execute Scala code directly from H2O Flow UI! Scala Code from BrowserAvailableinSparklingWater1.6!
  23. 23. Spark algorithms can be called from H2O Flow UI ✓Quick visual feedback ✓Model performance reports ✓Model export (as code) Spark Models in Flow UI REST API Facade REST API Adapter REST API H2O Flow Web UI Mllib Impl Spark Execution Engine MLlib SVM Algorithm API Call
  24. 24. ✓ A New Member of Sparkling Water Ecosystem RSparkling PySpark sparklyrSpark
  25. 25. ✓ A New Member of Sparkling Water Ecosystem RSparkling Scala: Sparkling Water val sc = SparkContext.getOrCreate(…) val df = sc.parallelize(1 to 10).toDF val h2oContext = H2OContext.getOrCreate(sc) val hf = h2oContext.asH2OFrame(df) PySpark sparklyrSpark
  26. 26. ✓ A New Member of Sparkling Water Ecosystem RSparkling Scala: Sparkling Water val sc = SparkContext.getOrCreate(…) val df = sc.parallelize(1 to 10).toDF val h2oContext = H2OContext.getOrCreate(sc) val hf = h2oContext.asH2OFrame(df) Python: PySparkling Water sc = SparkContext(…) df = sc.parallelize(range(1,11))
 .toDF(“int”) h2o_context = H2OContext.getOrCreate(sc)
 
 hf = h2o_context.as_h2o_frame(df) PySpark sparklyrSpark
  27. 27. ✓ A New Member of Sparkling Water Ecosystem RSparkling Scala: Sparkling Water val sc = SparkContext.getOrCreate(…) val df = sc.parallelize(1 to 10).toDF val h2oContext = H2OContext.getOrCreate(sc) val hf = h2oContext.asH2OFrame(df) Python: PySparkling Water sc = SparkContext(…) df = sc.parallelize(range(1,11))
 .toDF(“int”) h2o_context = H2OContext.getOrCreate(sc)
 
 hf = h2o_context.as_h2o_frame(df) R: RSparkling Water sc <- spark_connect(…) tbl <- data_frame(c(1:10)) df <- copy_to(sc, tbl) hc <- h2o_context(sc) hf <- as_h2o_frame(sc, df) PySpark sparklyrSpark
  28. 28. ✓ A New Member of Sparkling Water Ecosystem RSparkling Scala: Sparkling Water val sc = SparkContext.getOrCreate(…) val df = sc.parallelize(1 to 10).toDF val h2oContext = H2OContext.getOrCreate(sc) val hf = h2oContext.asH2OFrame(df) Track two Python: PySparkling Water sc = SparkContext(…) df = sc.parallelize(range(1,11))
 .toDF(“int”) h2o_context = H2OContext.getOrCreate(sc)
 
 hf = h2o_context.as_h2o_frame(df) R: RSparkling Water sc <- spark_connect(…) tbl <- data_frame(c(1:10)) df <- copy_to(sc, tbl) hc <- h2o_context(sc) hf <- as_h2o_frame(sc, df) Available in Sparkling Water 1.6! PySpark sparklyrSpark
  29. 29. H2O.ai
 Machine Intelligence Coming soon…
  30. 30. API Convergence • H2O model builders compatibility with Spark pipelines • Scala/Python level • Reuse existing Spark functionality • e.g., applying Spark UDFs directly on H2OFrame • Heterogenous (Spark/H2O) model ensembles/pipelines • More Spark Algorithms in UI • e.g. ALS Bringing both platforms together Coming soon!
  31. 31. • Exposing DeepWater Algorithms • Integration with Steam • Steam as Sparkling Water launcher • Steam as model manager More Tooling From H2O Ecosystem Coming soon!
  32. 32. High Availability Cluster Worker node Spark executor Spark executorSpark executor Driver node SparkContext Worker nodeWorker node H2OContext NOW
  33. 33. High Availability Cluster Worker node Spark executor Spark executorSpark executor Driver node SparkContext Worker nodeWorker node H2OContext Coming soon!
  34. 34. H2O.ai
 Machine Intelligence Learn more at h2o.ai Follow us at @h2oai Thank you! Sparkling Water is open-source
 ML application platform combining
 power of Spark and H2O

×