H2O Driverless AIMachine Learning with H2O
Sudalai Rajkumar @ SRK
Data Scientist
Nikhil Shekhar
Machine Learning Engineer
What is Machine Learning?
Sunny or Rainy?
Sunny or Rainy?
Machine
Learning
Algorithm
Types of Machine Learning
Applications of Machine Learning
Fraud Detection
Text classificationTranslation
Image classification Recommendation
About H2O.ai
H2O
H2O - Key Features
• Leading Algorithms
• Access from Python, R and Flow
• AutoML
• Distributed, In-memory processing
• Simple deployment
Sparkling Water
Sparkling Water - Key Features
• Access to H2O algorithms
• Drive computation from Scala, R, Python, Flow
• Simple deployment
H2O4GPU
H2O4GPU - Key Features
• Optimized for GPU performance
• Broad selection of GPU enabled algorithms
• Builds on scikit-learn python API
• Available R API
Driverless AI
Driverless AI - Key Features
• Automatic feature engineering
• Automatic model tuning
• Machine Learning Interpretability
• Automatic visualization
• Automatic scoring pipeline
• GPU acceleration
• Time series support
Installing H2O
Installing H2O
Import & Initialize
Pima Indians Diabetes Data
• This dataset is originally from the National Institute of Diabetes and
Digestive and Kidney Diseases.
• All patients here are females at least 21 years old of Pima Indian heritage.
Machine Learning Workflow
Dataset Description
Column Name Description
Pregnancies Number of times pregnant
Glucose Plasma glucose concentration 2 hours in an oral glucose tolerance test
Blood Pressure Diastolic blood pressure (mm Hg)
Skin Thickness Triceps skin fold thickness (mm)
Insulin 2-hour serum insulin (mu U/ml)
BMI Body mass index (weight in kg / (height in m)^2)
Diabetes Pedigree Function Diabetes pedigree function
Age Age in years
Outcome Presence of diabetes or not (0 or 1)
Exploratory Data Analysis
Exploratory Data Analysis
Correlation Heat Map
Data Preparation
• Data cleaning
• Data standardization
• Outlier analysis
• Missing value treatment
• Categorical encoding
• Cross validation
Modeling Algorithms in H2O
• Generalized Linear Models
• Naive Bayes Classifier
• Distributed Random Forest
• Gradient Boosting Machine
• XGBoost
• Deep Learning
• Stacked Ensembles
Code
Getting started with H2O
References
DZone AI - Machine Learning with H2O
H2O Documentation
GBM Tuning
AutoML Demo
Q&A
Thank You.!

Intro to ML with H2O