H2O World - PySparkling Water - Nidhi Mehta

•

3 likes•1,476 views

Sri Ambati

H2O World 2015 - Nidhi Mehta

Software

Hands On : PySparkling Water
- By Nidhi Mehta

What is PySparkling Water
PySparkling Water = Python + Spark + H2O
Sparkling WaterPython +

Py4J
H2O Context
Spark Context
H2O Python
h2o.init ( ip, port )
Driver Python
Cluster Manager
Executor
H2O
Executor
H2O
H2O Rest API
Master Workers
PySparkling Architecture

Aim: Build a model to predict Arrest for Chicago
crime dataset
● Import Chicago Crime Dataset
● Combine Crime data with Census and Weather
data.
● Build a model to predict whether an arrest was
made
● Predict on a test dataset
Demo Workflow

- Install Spark-1.5.1
- Install and Build Sparkling Water-1.5.6
( ./gradlew build -x check )
- Install H2O-3.6.0.3
- Install H2O-python
( sudo pip install h2o-3.6.0.3-py2.py3-none-any.whl )
Pre Requisites to run the demo

1)
Set spark environment by specifying SPARK_HOME and Master
export SPARK_HOME =Path_to_Spark_dir
export MASTER ='local-cluster[2,8,6040]'
2)
- To run from Python notebook-
IPYTHON_OPTS="notebook" Path_to_Sparkling_dir/bin/pysparkling
- To run from regular Python shell
Path_to_Sparkling_dir/bin/pysparkling
Command to Start/Access PySparking Water Cluster

Why use PySparkling
● Automatic Parallelization and less lines of code
● Much Faster on big data - uses H2O's rest API calls to connect
to H2O Cluster

What do these stickers mean?
I have Sparkling
Water Installed
I have Python
installed
I have H2O
installed
I have the
H2O World
data sets
Pick up stickers or get install help at the information
booth

What's hot

Building an analytics workflow using Apache AirflowYohei Onishi

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying

Managing data workflows with LuigiTeemu Kurppa

Airflow and supervisorRafael Roman Otero

Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi

Airflow presentationIlias Okacha

Workflow Engines + LuigiVladislav Supalov

Data Pipelines with Apache AirflowManning Publications

Apache Airflow at DailymotionGermain Tanguy

PythonBrasil[8] - CPython for dummiesTatiana Al-Chueyr

Luigi presentation NYC Data ScienceErik Bernhardsson

What is SparkBruno Faria

Statsd introductionRick Chang

Spark summit2014 techtalk - testing sparkAnu Shetty

Apache AirflowSumit Maheshwari

Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand

Airflow introductionChandler Huang

(CMP310) Data Processing Pipelines Using Containers & Spot InstancesAmazon Web Services

Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...Red Hat Developers

Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group

What's hot (20)

Building an analytics workflow using Apache Airflow

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management

Managing data workflows with Luigi

Airflow and supervisor

Building a Data Pipeline using Apache Airflow (on AWS / GCP)

Airflow presentation

Workflow Engines + Luigi

Data Pipelines with Apache Airflow

Apache Airflow at Dailymotion

PythonBrasil[8] - CPython for dummies

Luigi presentation NYC Data Science

What is Spark

Statsd introduction

Spark summit2014 techtalk - testing spark

Apache Airflow

Introduction to Apache Airflow - Data Day Seattle 2016

Airflow introduction

(CMP310) Data Processing Pipelines Using Containers & Spot Instances

Jupyter Notebooks for machine learning on Kubernetes & OpenShift | DevNation ...

Yahoo! Mail antispam - Bay area Hadoop user group

Viewers also liked

Nvidia Deep Learning Solutions - Alex SabatierSri Ambati

H2O World - ML Could Solve NLP Challenges: Ontology Management - Erik HuddlestonSri Ambati

Introduction to Sparkling Water - Spark Summit East 2016Sri Ambati

Sparkling Waterh2oworld

H2O World - What Do Companies Need to do to Stay Ahead - Michael MarksSri Ambati

H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...Sri Ambati

H2O World - Collaborative, Reproducible Research with H2O - Nick ElprinSri Ambati

H2O World - Migrating from Proprietary Analytics Software - Fonda IngramSri Ambati

Data & Data Alliances - Scott MclellanSri Ambati

Cybersecurity with AI - Ashrith BarthurSri Ambati

H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...Sri Ambati

Building Machine Learning Applications with Sparkling WaterSri Ambati

Sparkling Water 2.0 - Michal MalohlavaSri Ambati

Deep Learning with MXNet - Dmitry LarkoSri Ambati

Skutil - H2O meets Sklearn - Taylor SmithSri Ambati

Deep Water - GPU Deep Learning for H2O - Arno CandelSri Ambati

H2O PySparkling WaterSri Ambati

H2O AutoML roadmap - Ray PeckSri Ambati

Machine Learning with H2O, Spark, and Python at Strata 2015Sri Ambati

H2O Deep Learning at Next.MLSri Ambati

Viewers also liked (20)

Nvidia Deep Learning Solutions - Alex Sabatier

H2O World - ML Could Solve NLP Challenges: Ontology Management - Erik Huddleston

Introduction to Sparkling Water - Spark Summit East 2016

Sparkling Water

H2O World - What Do Companies Need to do to Stay Ahead - Michael Marks

H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...

H2O World - Collaborative, Reproducible Research with H2O - Nick Elprin

H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram

Data & Data Alliances - Scott Mclellan

Cybersecurity with AI - Ashrith Barthur

H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...

Building Machine Learning Applications with Sparkling Water

Sparkling Water 2.0 - Michal Malohlava

Deep Learning with MXNet - Dmitry Larko

Skutil - H2O meets Sklearn - Taylor Smith

Deep Water - GPU Deep Learning for H2O - Arno Candel

H2O PySparkling Water

H2O AutoML roadmap - Ray Peck

Machine Learning with H2O, Spark, and Python at Strata 2015

H2O Deep Learning at Next.ML

Similar to H2O World - PySparkling Water - Nidhi Mehta

Docker to the Rescue of an Ops TeamDocker, Inc.

Docker to the Rescue of an Ops TeamRachid Zarouali

Heroku pyconSabatino Severino

Docker for data scienceCalvin Giles

Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...Jian-Hong Pan

Building a Django App on HerokuJoe Fusaro

Python para equipos de ciberseguridad Jose Manuel Ortega Candel

OpenStack API's and WSGIMike Pittaro

PyParis2018 - Python tooling for continuous deploymentArthur Lutz

Projeto-web-services-Spring-Boot-JPA.pdfAdrianoSantos888423

Data Science Workflows using Docker ContainersAly Sivji

Akash rajguru project report sem vAkash Rajguru

Python maya 2018 setup noteLee Jungpyo

Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonVMware Tanzu

2012 coscup - Build your PHP application on Herokuronnywang_tw

“warpdrive”, making Python web application deployment magically easy.Graham Dumpleton

Chris Swan ONUG Academy - Container Networks TutorialCohesive Networks

How to create a basic website with Python on DjangoArmağan Ersöz

Hacking pokemon go [droidcon tel aviv 2016]Guy Lis

ContainerDayVietnam2016: Django Development with DockerDocker-Hanoi

Similar to H2O World - PySparkling Water - Nidhi Mehta (20)

Docker to the Rescue of an Ops Team

Heroku pycon

Docker for data science

Package a PyApp as a Flatpak Package: An HTTP Server for Example @ PyCon APAC...

Building a Django App on Heroku

Python para equipos de ciberseguridad

OpenStack API's and WSGI

PyParis2018 - Python tooling for continuous deployment

Projeto-web-services-Spring-Boot-JPA.pdf

Data Science Workflows using Docker Containers

Akash rajguru project report sem v

Python maya 2018 setup note

Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson

2012 coscup - Build your PHP application on Heroku

“warpdrive”, making Python web application deployment magically easy.

Chris Swan ONUG Academy - Container Networks Tutorial

How to create a basic website with Python on Django

Hacking pokemon go [droidcon tel aviv 2016]

ContainerDayVietnam2016: Django Development with Docker

Recently uploaded

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Professional Resume Template for Software DevelopersVinodh Ram

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3

Asset Management Software - InfographicHr365.us smith

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden

Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08

DNT_Corporate presentation know about usDynamic Netsoft

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

chapter--4-software-project-planning.pptkotipi9215

Recently uploaded (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

Advancing Engineering with AI through the Next Generation of Strategic Projec...

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

why an Opensea Clone Script might be your perfect match.pdf

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...

Professional Resume Template for Software Developers

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data

Asset Management Software - Infographic

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

HR Software Buyers Guide in 2024 - HRSoftware.com

Engage Usergroup 2024 - The Good The Bad_The Ugly

Unit 1.1 Excite Part 1, class 9, cbse...

DNT_Corporate presentation know about us

A Secure and Reliable Document Management System is Essential.docx

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

chapter--4-software-project-planning.ppt

H2O World - PySparkling Water - Nidhi Mehta

1. Hands On : PySparkling Water - By Nidhi Mehta

2. What is PySparkling Water PySparkling Water = Python + Spark + H2O Sparkling WaterPython +

3. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster Manager Executor H2O Executor H2O H2O Rest API Master Workers PySparkling Architecture

4. Aim: Build a model to predict Arrest for Chicago crime dataset ● Import Chicago Crime Dataset ● Combine Crime data with Census and Weather data. ● Build a model to predict whether an arrest was made ● Predict on a test dataset Demo Workflow

5. - Install Spark-1.5.1 - Install and Build Sparkling Water-1.5.6 ( ./gradlew build -x check ) - Install H2O-3.6.0.3 - Install H2O-python ( sudo pip install h2o-3.6.0.3-py2.py3-none-any.whl ) Pre Requisites to run the demo

6. 1) Set spark environment by specifying SPARK_HOME and Master export SPARK_HOME =Path_to_Spark_dir export MASTER ='local-cluster[2,8,6040]' 2) - To run from Python notebook- IPYTHON_OPTS="notebook" Path_to_Sparkling_dir/bin/pysparkling - To run from regular Python shell Path_to_Sparkling_dir/bin/pysparkling Command to Start/Access PySparking Water Cluster

7. Let's Run the Demo!

8. Why use PySparkling ● Automatic Parallelization and less lines of code ● Much Faster on big data - uses H2O's rest API calls to connect to H2O Cluster

9. Thank You

10. What do these stickers mean? I have Sparkling Water Installed I have Python installed I have H2O installed I have the H2O World data sets Pick up stickers or get install help at the information booth

H2O World - PySparkling Water - Nidhi Mehta

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to H2O World - PySparkling Water - Nidhi Mehta

Similar to H2O World - PySparkling Water - Nidhi Mehta (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

H2O World - PySparkling Water - Nidhi Mehta