Michal Malohlava's meetup on H2O Rains with Databricks Cloud at Parisoma SF, 02.02.16
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
2. Who Am I?
Background
• PhD in CS from Charles University in Prague, Czech
Republic
• Postdoc at Purdue University experimenting with algos
for large-scale computation
• Now SW engineer
at H2O.ai
Experience with domain-
specific languages, distributed
system, software engineering,
and big data.
6. Open-source distributed execution platform
User-friendly API for data transformation based on RDDs,
DataFrames (from 1.4) and DataSets (from 1.6)
Platform components - SQL, MLLib, text mining, Avro, Redshift,
Kinesis.
Easily extendable by
3rd party packages
Interactive shell
Current release 1.6
Supported releases 1.3, 1.4, 1.5
7. Databricks
Databricks
• founded by the creators of Apache Spark
• still contribute 75% of the code to the Spark project
• cloud platform for running Spark in your AWS account
Databricks Platform
• integrated collaborative data
science workspace
• notebook interface inspired by
iPython and Zeplin but purpose
built for Spark
• self service cluster manager
and job scheduler for production
Spark workloads
8. Sparkling Water
Provides
Transparent integration of H2O with Spark ecosystem
Transparent use of H2O data structures and
algorithms with Spark API
Platform for building Smarter Applications
Excels in existing Spark workflows requiring
advanced Machine Learning algorithms
Functionality missing in H2O can be
replaced by Spark and vice versa
17. What do we need?
Databricks account (14 day free trial at www.databricks.com)
AWS account
Sparkling Water coordinates:
ai.h2o:sparkling-water-examples_2.10:1.5.10
And some cool machine learning idea!
21. Machine Learning
Workflow
1. Extract data
2. Transform, tokenize messages
3. Build Tf-IDF model
4. Create and evaluate
Deep Learning model
5. Use the model to detect
spam
22. Checkout H2O.ai Training Books
http://learn.h2o.ai/
Checkout H2O.ai Blog
http://h2o.ai/blog/
Checkout H2O.ai Youtube Channel
https://www.youtube.com/user/0xdata
Checkout GitHub
https://github.com/h2oai/sparkling-water
Meetups
https://meetup.com/
More info
23. Learn more at h2o.ai
Follow us at @h2oai
Thank you!
Sparkling Water is
open-source
ML application platform
combining
power of Spark and H2O