Introducción al Machine
Learning Automático
Meetup
“AI to do AI”
Rafael Coss (Rafael.Coss@h2o.ai)
Director of Community
@racoss
@h2oai
Chris Carpenter (Chris.Carpenter@h2o.ai)
Leobardo Morales (lmorales@mx1.ibm.com)
H2O.ai Meetup Groups
Contact Rafael Coss
community@h2o.ai
If you want to …
- Give a talk about AI /
machine learning use case (it
is a great opportunity to
promote your work)
- Host a joint meetup with
H2O.ai
https://www.meetup.com/pro/h2oai
H2O.ai Community Slack Workspace
•Join the H2O.ai Community Slack Workspace today!
•https://www.h2o.ai/community/driverless-ai-community/#chat
•Use emoji to tag messages
•:question :use_case :mli :get_started :bugs …
•Reply to message using threads
•Check out Community Guide for more info:
•https://tinyurl.com/hac-community-guide
Online Chat to ask questions, discuss use cases, give feedback and more
H2O WORLD SAN FRANCISCO
February 4-5, 2019
Hilton San Francisco Union Square
world.h2o.ai
AI is Transforming the IT Industry
"AI is the fastest growing workload on
the planet”
300%
Increase in AI spend year
over year
“Demand for AI Talent
on the Rise”
200%
Increase in jobs requiring
AI skills
“Businesses are preparing for
the widespread adoption of
machine learning”
9/10
CIOs planning to use
machine learning
6
H2O.ai HQ
Mountain View
H2O.ai Company Overview
Company Founded in Silicon Valley in 2012
Series C Investors: Wells Fargo, NVIDIA, Nexus Ventures, Paxion Ventures
Products • H2O Open Source Machine Learning (14,000 organizations)
• H2O Driverless AI – Automatic Machine Learning
Leadership Market Leader recognized by Gartner, Forrester, InfoWorld,
Constellation Research
Team 130+ AI experts (Kaggle Grandmasters, Distributed Computing and
Visualization experts)
Global Mountain View, London, Prague, Chennai
Worlds largest data science community
(over 2 million members)
AI and ML education
Best known for AI competitions
Public datasets
Code and analysis sharing
http://www.kaggle.com
1st 4th
48th33rd25th
13th
Grandmasters
(their highest ranking)
CONFIDENTIAL
14,000 Companies
using H2O
155,000
data scientists
130K Meet-up Members
H2O World
NYC, London, SF
Growing Worldwide Open Source Community
“Confidential and property of H2O.ai. All rights reserved”
Partner Ecosystem
Strategic
Partners
Cloud ProvidersHW Vendors System
Integrators
Value Added
Resellers
Data Stores
H2O.ai Product Suite
GPU-accelerated
machine learning package
Automatic feature
engineering, machine learning
and interpretability
• 100% open source – Apache V2 licensed
• Built for data scientists – interface using R, Python
on H2O Flow (interactive notebook interface)
• Enterprise Support subscriptions
• Built for domain users, analysts and
data scientists – GUI based interface
for end-to-end data science
• Fully automated machine learning
from ingest to deployment
• Licensed on a per seat basis
(annual subscription)
Open Source
In-memory, distributed
machine learning algorithms
with H2O Flow GUI
-3
H2O AI open source engine
integration with Spark
H2O.ai is a Recognized Leader in AI and ML
2018 Gartner Magic Quadrant
for Data Science and
Machine Learning Platforms
Forrester Wave: Notebook-Based
Predictive Analytics And Machine
Learning Solutions, Q3 2018
Top 3 Artificial Intelligence (AI)
and Machine Learning (ML)
Software Solution
“Technology leadership …
with a distinguished vision”
“the quasi-industry standard”
“its vision of creating an AI and
ML tool that ultimately aims to
allow almost everyone within the
business to create their own
predictive models”
“H2O.ai’s future is automated
machine learning”
“its bright future is in
Driverless AI”
Highly Regarded
by Customers
Dr. Robert Coop
AI and ML Manager
Stanley Black & Decker
“H2O Driverless AI feature
engineering is better than anything
I've seen out there right now. And
the scoring pipeline generation is
probably one of the bigger pluses for
me. These features alone have
provided us with a true competitive
edge in agile manufacturing. It's a
massive time saver.”
1st Generation
Automatic Machine
Learning
What is Data Science?
Clean, transform, filter, aggregate, impute
Convert into X and Y
Problem
Formulation
Data
Processing
Machine
Learning
• Identify a data task or prediction problem
• Collect relevant data
• Train models
• Evaluate models
The Data Science Venn
Diagram
Drew Conway (2010)
The Data Scientist “Unicorn”
What is Automatic Machine Learning
“the automated process of algorithm selection,
feature generation, hyperparameter tuning,
iterative modeling, and model assessment.”
Enabled by advances in computing power at lower cost that
make it possible for machines to try thousands of possible
combinations to find the best one.
Confidential and property of H2O.ai. All rights reserved
The Evolving Space of Automatic Machine Learning
01
02
Open source model
showdown with feature
encoding, automatic hyper-
parameter tuning, ensembles
and model leader board
First Gen
01
HPC powered evolutionary
model development with
advanced feature
engineering, extensive
model explainabilty
Second Gen
02
2014-15
2017-18
The picture can't be
displayed.
Confidential and property of H2O.ai. All rights reserved
Challenges in AI Model Development
Basic Encoding
Feature Generation
Advanced Encoding
Feature Engineering
Algorithm Selection
Parameter Tuning
Model Building
Model Ensembles
Pipeline Generation
Model Explainabilty
Model Deployment
Model Documentation
• Time consuming
• Requires advanced
skillset
• Creating new feature
combinations requires
advanced skill
• Time consuming
• Requires advanced
knowledge of
algorithms and
parameters
• Creating ensembles is
an advanced skill
• Time consuming
• Requires different set of skills to
deploy models
• Explaining how models make
decisions is critical to building trust
with business stakeholders and
regulators
The entire process is highly iterative and can take weeks or months to develop a single production-ready model.
Confidential and property of H2O.ai. All rights reserved
H2O AutoML
Different Flavors of AutoML
https://www.h2o.ai/blog/the-different-flavors-of-automl
The Challenges of Enterprise AI Adoption
Time to Insights Slow
Weeks to
Months
Lack of AI Talent
~100
Data Science
“Grandmasters” in the World
Time for a data scientist
to build a model
Lack of Trust in AI
Black box models
”US alone faces a shortage of 190,000 people with analytical expertise.”
2nd Generation
Automatic Machine
Learning
Challenges in AI Model Development
Basic Encoding
Feature Generation
Advanced Encoding
Feature Engineering
Algorithm Selection
Parameter Tuning
Model Building
Model Ensembles
Pipeline Generation
Model Explainabilty
Model Deployment
Model Documentation
• Time consuming
• Requires advanced
skillset
• Creating new feature
combinations requires
advanced skill
• Time consuming
• Requires advanced
knowledge of
algorithms and
parameters
• Creating ensembles is
an advanced skill
• Time consuming
• Requires different set of skills to
deploy models
• Explaining how models make
decisions is critical to building trust
with business stakeholders and
regulators
The entire process is highly iterative and can take weeks or months to develop a single production-ready model.
Confidential and property of H2O.ai. All rights reserved
Why Next Generation
Automatic Machine Learning
for the Enterprise
Time to Insight
Months down
to Hours
7 Kaggle Grandmasters
Top 10
Data Science Experts
Automated
GPU-accelerated ML
with IBM AC922
Explainability & Transparency
Trust
In AI
Supervised Learning
27
Problems Addressed by Driverless AI
28
• Supervised Learning
• Regression
• Classification
• Tabular Structured Data
• Numeric
• Categorical
• Time / Date
• Text
• Missing Values
• Identically and Independently Distributed
(iid) rows
• Time-series
• Single time-series
• Grouped time-series
• e.g. Store - Department - Item
• Time-series with gaps between
training and test set to account for time
to deploy
H2O Driverless AI – Simple, Fast, Accurate, Interpretable
Easy Deployment for
Low Latency Models
• Stand-alone scoring pipeline
that is easy for IT to deploy
and manage
• Easy to update when a new
model version is available
• Streamlined scoring code to
deploy on any device: on the
edge, mobile, …
• Very fast (milliseconds) to
satisfy today’s real-time apps
Fast and
Accurate Results
• “Data Scientist in a Box”
• Simple interface
• Automatic feature engineering
to increase accuracy
• Automatic recipes for solving
wide variety of use-cases
• Automatic tuning to
find and tune the right
ensemble of models
Industry Leading
Interpretability
• Trusted results with
explainability and
transparency
• Interpretability for debugging,
not just for regulators
• Get reason codes and model
interpretability in plain English
• K-Lime, LOCO, partial
dependence and more
Automatic Data
Visualization
• Automatic generation of
visualizations and graphs to
explore your data before the
model-building process
• Most relevant graphs shown
for the given data set
• Identify outliers and
missing values
H2O Driverless AI
Customer Use Cases
H2O Driverless AI Delivers Value in Every Industry
Matched 10 years of
machine learning expertise
Financial Services
+6%
Accuracy
Increased customer
satisfaction
Healthcare
Near
perfect
scores
Outperforms alternative
digital marketing
Marketing
2.5x
performance
Accurately predicting supplies
& materials for future orders
Manufacturing
25%
time savings
“Driverless AI is giving
amazing results in terms of
feature and model
performance “
“Driverless AI powers our data
science team to operate at
scale. We have the opportunity
to impact care at large.”
“Driverless AI helped us gain
an edge for our clients. AI to
do AI, truly is improving our
system on a daily basis.”
“H2O Driverless AI feature
engineering is better than
anything I've seen out there
right now.”
Venkatesh Ramanathan
Sr. Data Scientist, PayPal
Martin Stein
Chief Product Officer, G5
Bharath Sudarshan
Dir. of Data Science, ArmadaHealth
Robert Coop
Sr. Data Scientist, SB&D
www.h2o.ai/customer-stories/
32
www.h2o.ai/company/news/h2o-ai-ibm-vision-banco-machine-learning
Financial Fraud Detection
“Driverless AI is giving
amazing results in terms of
feature and model
performance “
Venkatesh Ramanathan
Senior Data Scientist, PayPal
• Driverless AI matched
10 years of expert
feature engineering
• Increased accuracy
from 0.89 to 0.947 (6%)
in detecting fraudulent
activity
• 6X speed up when
running on an IBM Power
GPU-based server
Connecting Patients to
Specialists for Better Healthcare
• Companies have seen
“skyrocketing” net promoter
scores and “near perfect”
customer satisfaction rates
• Customer loyalty and
premium retention rates
have increased
• Reduces costs, while
patients receive care faster
“Driverless AI powers our data
science team to operate
efficiently and experiment at
scale. With this latest innovation,
we have the opportunity to
impact care at large.”
Bharath Sudarshan
Director of Data Science and Innovation
Armada Health
Marketing Optimization
for the Real Estate Market
“Driverless AI helped us gain
an edge with our Intelligent
Marketing Cloud for our clients.
AI to do AI, truly is improving our
system on a daily basis.”
Martin Stein
Chief Product Officer
• Outperforms other real
estate digital marketing
solutions by 2.5X
• A G5 client saved $500K
annual digital spend while
increasing web traffic 3X
• 10X faster model creation
Improve Manufacturing
Sales and Forecasting
“H2O Driverless AI feature
engineering is better than anything
I've seen out there right now. And
the scoring pipeline generation is
probably one of the bigger pluses
for me. It's a massive time saver.”
Robert Coop
Sr. Data Scientist
Stanley Black & Decker
• Time savings of 25%
with 1 data scientist
• Saved 1 month of time in
model tuning and training
for industrial product line
• Accurately predicted
supplies and materials
for a future client order
increasing forecast
accuracy
IBM & H2O Driverless AI
Simplifying and Accelerating
Enterprise AI Initiatives
H2O Driverless AI Benefits from the
Power Systems Advantage
High Speed Data Transfer
9.5x
Big Data Scale
2.6x
More RAM Max I/O bandwidth
GPU Accelerated ML
Integrated Systems Approach
Faster on GPUs
30x
H2O Driverless AI on IBM Power Systems
A Winning Combination
High Speed Data Transfer
1.5x
Big Data Scale
2x
Data Ingest
Faster Feature
Engineering
GPU Accelerated ML
Time Series
5x
Integrated Systems Approach
PowerAI
Deep Learning Impact
(DLI) Module
Data & Model
Management, ETL,
Visualize, Advise
IBM Spectrum Conductor with Spark
Cluster Virtualization,
Auto Hyper-Parameter Optimization
PowerAI: Open Source ML Frameworks
Large Model Support (LMS)
Distributed Deep Learning
(DDL)
Auto ML
PowerAI
Enterprise
PowerAI Vision
Auto-DL for Images & Video
Label Train Deploy
Accelerated
Infrastructure
Accelerated Servers Storage
AI for
Data Scientists and
non-Data Scientists
H2O Driverless AI
Auto-ML for Text & Numeric Data, NLP
Import Experiment Deploy
H2O Driverless AI Complements IBM PowerAI Vision
Sensors
Log
Transactional
IBM PowerAI delivers
Deep Learning for Images
H2O Driverless AI is
Automatic Machine Learning
NLP
The H2O Driverless AI
Experience
Driverless AI: Automates Data Science
and Machine Learning Workflows
Driverless AI
H2O Driverless AI: How it Works
Local
Amazon S3
HDFS
X Y
Automatic
Scoring Pipeline
Machine learning
Interpretability
Deploy
Low-latency
Scoring to
Production
Modelling
Dataset
Model Recipes:
• IID data
• Time-series
• More on the way
Advanced
Feature
Engineering
Algorithm Model
Tuning
+ +
Survival of the Fittest
Automatic Machine Learning
Understand the data
shape, outliers, missing
values, etc.
Powered by GPU Acceleration
1
Drag and drop
data
2
Automatic
Visualization Use best practice model
recipes and the power of high
performance computing to
Iterate across thousands of
possible models including
advanced feature engineering
and parameter tuning
3
Automatic Machine Learning
Deploy ultra-low latency Python
or Java Automatic Scoring
Pipelines that include feature
transformations and models.
4
Automatic Scoring Pipelines
Ingest data from
cloud, big data and
desktop systems
Google BigQuery
Azure Blog Storage
Snowflake
Model
Documentation
The Driverless AI Experience
1. Import Data
2. Review Auto-Visualizations
3. Start Experiment
4. Review Winning Model
5. Review Model Interpretations
6. Deploy Model
The Driverless AI Experience
1. Import Data
2. Review Auto-
Visualizations
3. Start Experiment
4. Review Winning Model
5. Review Model Interpretations
6. Deploy Model
The Driverless AI Experience 2. Review Auto-Visualizations
The Driverless AI Experience
Quickly start an experiment and benefit
from built-in automation:
1. Import Data
2. Review Auto-Visualizations
3. Start Experiment
4. Review Winning Model
5. Review Model Interpretations
6. Deploy Model • Feature Engineering
• Model Tuning
• Model Selection
The Driverless AI Experience 3. Start Experiment
Feature Engineering
Model Tuning
Quickly Start Experiment
Model Selection
These are the only required
settings – all others are optional
depending on the scenario
It’s Easy to Start an Experiment
Dataset being
used to train
the models
What column
are we trying
to predict?
Should certain
rows of data
have a
higher weight?
Data used to
calculate metrics
for the final model;
not used
during training
Is this a time-
series forecasting
exercise?
Columns to
exclude from
experiment
Data used for
parameter tuning
Experiment
Settings
• Relative time for completing
the experiment
• Higher settings mean:
• More iterations are
performed to find the
best set of features
• Longer “early stopping”
threshold
Time
• Relative accuracy – higher
values should lead to higher
confidence in model
performance (accuracy)
• Impacts things such as level
of data sampling, how many
models are used in the final
ensemble, parameter tuning
level, among others
Accuracy
• Relative interpretability –
higher values favor more
interpretable models
• The higher the interpretability
setting, the lower the
complexity of the engineered
features and of the
final model(s).
Interpretability
Auto Feature Generation
Kaggle Grandmaster Out of the Box
• Automatic Text Handling
• Frequency Encoding
• Cross Validation
Target Encoding
• Truncated Singular
Value Decompression
• Clustering and more
Feature Transformations
Examples of
Original Features
Examples of
Generated
Features
The Driverless AI Experience
1. Import Data
2. Review Auto-Visualizations
3. Start Experiment
4. Review Winning Model
5. Review Model Interpretations
6. Deploy Model
The Driverless AI Experience 4. Review Winning Model
The Driverless AI Experience
1. Import Data
2. Review Auto-Visualizations
3. Start Experiment
4. Review Winning Model
5. Review Model Interpretations
6. Deploy Model
Live Demo
16th (of 2926)
http://h2o.ai
21-day free trial
Easy installation:
Native and Dockerized deployment options
CONFIDENTIAL
CONFIDENTIAL
H2O.ai in the Cloud
EMR
KubeFlow
DataPro
c
CONFIDENTIAL
H2O Driverless AI on the Cloud
• Easy setup on any cloud or on premise.
Support for Azure, AWS and Google Cloud
with marketplace offerings.
• Develop more models using H2O Driverless
AI automatic machine learning using high-
performance computing and evolutionary
algorithms to perform time-consuming
data science tasks like feature engineering
and model hyperparameter tuning.
• Leverage your existing ML workbench to
create and deploy streamlined production
models based on insights from Driverless
AI
H2O Driverless AI Delivers Automatic ML for the Enterprise
21 day free trial for Driverless AI
• Performs the function of an expert data
scientist
• Create models quickly with GPUs and
Machine Learning automation
• Delivers insights and interpretability
• Created and supported by world
renowned AI experts from H2O.ai
• Award-winning software
Getting Started
67
• Get the 21 day free trial for Driverless AI
• Don’t have the hardware try Qwiklab cloud training environment
• Go to your favorite cloud AWS, Azure, Google
• Try video Tutorial or follow Booklet
• Learn how Driverless AI delivers Trust & Explainable AI
• Learn more about NLP and Time-Series in Driverless AI
• Watch Replays from H2O World London 2018
• Watch “Democratizing Intelligence” by Sri Ambati, CEO &Founder
• Learn how PayPal is solving fraud with Driverless AI
• Docs
• H2O Community Slack
Gracias
Rafael Coss (Rafael.Coss@h2o.ai)
Director of Community
@racoss
@h2oai
Chris Carpenter (Chris.Carpenter@h2o.ai)
Leobardo Morales (lmorales@mx1.ibm.com)
Additional sources of information
Docs
• Docs: http://docs.h2o.ai
Slack Community (Public)
• Register
• https://www.h2o.ai/community/driverless-ai-community/#chat
• Guide: http://tinyurl.com/hac-community-guide
Ask questions, discuss use cases, feedback, …
• Driverless AI Trial:
• Walkthrough: http://ibm.biz/H2O-DAI-Power-Video
• Tutorials: https://www.youtube.com/watch?v=5jSU3CUReXY
• Booklet: http://docs.h2o.ai/driverless-ai/latest-
stable/docs/booklets/DriverlessAIBooklet.pdf
BACKUP SLIDES
Video
Tutorial
on
YouTube
71
https://www.youtube.com/watch?v=5jSU3CUReXY
Template
72
Hands-on Experiment:
Credit Card Example
73
Credit Card Example
74
• Dataset:
• information on default payments, demographic factors, credit data, history of payment,
etc.
• Source: www.kaggle.com/uciml/default-of-credit-card-clients- dataset
• File System:
• CreditCard-train.csv (for training models)
• CreditCard-test.csv (for making new predictions)
• Our Goal:
• Predict whether someone will default on their credit card payment.
• Tutorial:
• http://docs.h2o.ai/driverless-ai/latest-stable/docs/booklets/DriverlessAIBooklet.pdf
• http://docs.h2o.ai/driverless-ai/latest-stable/docs/booklets/MLIBooklet.pdf
Credit Card Example
75
Credit Card Example
76
Target
Learn the Pattern
Education, Marriage, Age, Sex,
Repayment Status, Limit Balance ...
77
Learning from Credit Card Data
Features
Default Payment
Next Month
(Binary)
Predictions
Probability
(0...1)

Introducción al Machine Learning Automático

  • 1.
    Introducción al Machine LearningAutomático Meetup “AI to do AI” Rafael Coss (Rafael.Coss@h2o.ai) Director of Community @racoss @h2oai Chris Carpenter (Chris.Carpenter@h2o.ai) Leobardo Morales (lmorales@mx1.ibm.com)
  • 2.
    H2O.ai Meetup Groups ContactRafael Coss community@h2o.ai If you want to … - Give a talk about AI / machine learning use case (it is a great opportunity to promote your work) - Host a joint meetup with H2O.ai https://www.meetup.com/pro/h2oai
  • 3.
    H2O.ai Community SlackWorkspace •Join the H2O.ai Community Slack Workspace today! •https://www.h2o.ai/community/driverless-ai-community/#chat •Use emoji to tag messages •:question :use_case :mli :get_started :bugs … •Reply to message using threads •Check out Community Guide for more info: •https://tinyurl.com/hac-community-guide Online Chat to ask questions, discuss use cases, give feedback and more
  • 4.
    H2O WORLD SANFRANCISCO February 4-5, 2019 Hilton San Francisco Union Square world.h2o.ai
  • 5.
    AI is Transformingthe IT Industry "AI is the fastest growing workload on the planet” 300% Increase in AI spend year over year “Demand for AI Talent on the Rise” 200% Increase in jobs requiring AI skills “Businesses are preparing for the widespread adoption of machine learning” 9/10 CIOs planning to use machine learning
  • 6.
  • 7.
    H2O.ai Company Overview CompanyFounded in Silicon Valley in 2012 Series C Investors: Wells Fargo, NVIDIA, Nexus Ventures, Paxion Ventures Products • H2O Open Source Machine Learning (14,000 organizations) • H2O Driverless AI – Automatic Machine Learning Leadership Market Leader recognized by Gartner, Forrester, InfoWorld, Constellation Research Team 130+ AI experts (Kaggle Grandmasters, Distributed Computing and Visualization experts) Global Mountain View, London, Prague, Chennai
  • 8.
    Worlds largest datascience community (over 2 million members) AI and ML education Best known for AI competitions Public datasets Code and analysis sharing http://www.kaggle.com 1st 4th 48th33rd25th 13th Grandmasters (their highest ranking)
  • 9.
    CONFIDENTIAL 14,000 Companies using H2O 155,000 datascientists 130K Meet-up Members H2O World NYC, London, SF Growing Worldwide Open Source Community
  • 10.
    “Confidential and propertyof H2O.ai. All rights reserved” Partner Ecosystem Strategic Partners Cloud ProvidersHW Vendors System Integrators Value Added Resellers Data Stores
  • 11.
    H2O.ai Product Suite GPU-accelerated machinelearning package Automatic feature engineering, machine learning and interpretability • 100% open source – Apache V2 licensed • Built for data scientists – interface using R, Python on H2O Flow (interactive notebook interface) • Enterprise Support subscriptions • Built for domain users, analysts and data scientists – GUI based interface for end-to-end data science • Fully automated machine learning from ingest to deployment • Licensed on a per seat basis (annual subscription) Open Source In-memory, distributed machine learning algorithms with H2O Flow GUI -3 H2O AI open source engine integration with Spark
  • 12.
    H2O.ai is aRecognized Leader in AI and ML 2018 Gartner Magic Quadrant for Data Science and Machine Learning Platforms Forrester Wave: Notebook-Based Predictive Analytics And Machine Learning Solutions, Q3 2018 Top 3 Artificial Intelligence (AI) and Machine Learning (ML) Software Solution “Technology leadership … with a distinguished vision” “the quasi-industry standard” “its vision of creating an AI and ML tool that ultimately aims to allow almost everyone within the business to create their own predictive models” “H2O.ai’s future is automated machine learning” “its bright future is in Driverless AI”
  • 13.
    Highly Regarded by Customers Dr.Robert Coop AI and ML Manager Stanley Black & Decker “H2O Driverless AI feature engineering is better than anything I've seen out there right now. And the scoring pipeline generation is probably one of the bigger pluses for me. These features alone have provided us with a true competitive edge in agile manufacturing. It's a massive time saver.”
  • 14.
  • 15.
    What is DataScience? Clean, transform, filter, aggregate, impute Convert into X and Y Problem Formulation Data Processing Machine Learning • Identify a data task or prediction problem • Collect relevant data • Train models • Evaluate models
  • 16.
    The Data ScienceVenn Diagram Drew Conway (2010)
  • 17.
    The Data Scientist“Unicorn”
  • 18.
    What is AutomaticMachine Learning “the automated process of algorithm selection, feature generation, hyperparameter tuning, iterative modeling, and model assessment.” Enabled by advances in computing power at lower cost that make it possible for machines to try thousands of possible combinations to find the best one. Confidential and property of H2O.ai. All rights reserved
  • 19.
    The Evolving Spaceof Automatic Machine Learning 01 02 Open source model showdown with feature encoding, automatic hyper- parameter tuning, ensembles and model leader board First Gen 01 HPC powered evolutionary model development with advanced feature engineering, extensive model explainabilty Second Gen 02 2014-15 2017-18 The picture can't be displayed. Confidential and property of H2O.ai. All rights reserved
  • 20.
    Challenges in AIModel Development Basic Encoding Feature Generation Advanced Encoding Feature Engineering Algorithm Selection Parameter Tuning Model Building Model Ensembles Pipeline Generation Model Explainabilty Model Deployment Model Documentation • Time consuming • Requires advanced skillset • Creating new feature combinations requires advanced skill • Time consuming • Requires advanced knowledge of algorithms and parameters • Creating ensembles is an advanced skill • Time consuming • Requires different set of skills to deploy models • Explaining how models make decisions is critical to building trust with business stakeholders and regulators The entire process is highly iterative and can take weeks or months to develop a single production-ready model. Confidential and property of H2O.ai. All rights reserved
  • 21.
  • 22.
    Different Flavors ofAutoML https://www.h2o.ai/blog/the-different-flavors-of-automl
  • 23.
    The Challenges ofEnterprise AI Adoption Time to Insights Slow Weeks to Months Lack of AI Talent ~100 Data Science “Grandmasters” in the World Time for a data scientist to build a model Lack of Trust in AI Black box models ”US alone faces a shortage of 190,000 people with analytical expertise.”
  • 24.
  • 25.
    Challenges in AIModel Development Basic Encoding Feature Generation Advanced Encoding Feature Engineering Algorithm Selection Parameter Tuning Model Building Model Ensembles Pipeline Generation Model Explainabilty Model Deployment Model Documentation • Time consuming • Requires advanced skillset • Creating new feature combinations requires advanced skill • Time consuming • Requires advanced knowledge of algorithms and parameters • Creating ensembles is an advanced skill • Time consuming • Requires different set of skills to deploy models • Explaining how models make decisions is critical to building trust with business stakeholders and regulators The entire process is highly iterative and can take weeks or months to develop a single production-ready model. Confidential and property of H2O.ai. All rights reserved
  • 26.
    Why Next Generation AutomaticMachine Learning for the Enterprise Time to Insight Months down to Hours 7 Kaggle Grandmasters Top 10 Data Science Experts Automated GPU-accelerated ML with IBM AC922 Explainability & Transparency Trust In AI
  • 27.
  • 28.
    Problems Addressed byDriverless AI 28 • Supervised Learning • Regression • Classification • Tabular Structured Data • Numeric • Categorical • Time / Date • Text • Missing Values • Identically and Independently Distributed (iid) rows • Time-series • Single time-series • Grouped time-series • e.g. Store - Department - Item • Time-series with gaps between training and test set to account for time to deploy
  • 29.
    H2O Driverless AI– Simple, Fast, Accurate, Interpretable Easy Deployment for Low Latency Models • Stand-alone scoring pipeline that is easy for IT to deploy and manage • Easy to update when a new model version is available • Streamlined scoring code to deploy on any device: on the edge, mobile, … • Very fast (milliseconds) to satisfy today’s real-time apps Fast and Accurate Results • “Data Scientist in a Box” • Simple interface • Automatic feature engineering to increase accuracy • Automatic recipes for solving wide variety of use-cases • Automatic tuning to find and tune the right ensemble of models Industry Leading Interpretability • Trusted results with explainability and transparency • Interpretability for debugging, not just for regulators • Get reason codes and model interpretability in plain English • K-Lime, LOCO, partial dependence and more Automatic Data Visualization • Automatic generation of visualizations and graphs to explore your data before the model-building process • Most relevant graphs shown for the given data set • Identify outliers and missing values
  • 30.
  • 31.
    H2O Driverless AIDelivers Value in Every Industry Matched 10 years of machine learning expertise Financial Services +6% Accuracy Increased customer satisfaction Healthcare Near perfect scores Outperforms alternative digital marketing Marketing 2.5x performance Accurately predicting supplies & materials for future orders Manufacturing 25% time savings “Driverless AI is giving amazing results in terms of feature and model performance “ “Driverless AI powers our data science team to operate at scale. We have the opportunity to impact care at large.” “Driverless AI helped us gain an edge for our clients. AI to do AI, truly is improving our system on a daily basis.” “H2O Driverless AI feature engineering is better than anything I've seen out there right now.” Venkatesh Ramanathan Sr. Data Scientist, PayPal Martin Stein Chief Product Officer, G5 Bharath Sudarshan Dir. of Data Science, ArmadaHealth Robert Coop Sr. Data Scientist, SB&D
  • 32.
  • 33.
    Financial Fraud Detection “DriverlessAI is giving amazing results in terms of feature and model performance “ Venkatesh Ramanathan Senior Data Scientist, PayPal • Driverless AI matched 10 years of expert feature engineering • Increased accuracy from 0.89 to 0.947 (6%) in detecting fraudulent activity • 6X speed up when running on an IBM Power GPU-based server
  • 34.
    Connecting Patients to Specialistsfor Better Healthcare • Companies have seen “skyrocketing” net promoter scores and “near perfect” customer satisfaction rates • Customer loyalty and premium retention rates have increased • Reduces costs, while patients receive care faster “Driverless AI powers our data science team to operate efficiently and experiment at scale. With this latest innovation, we have the opportunity to impact care at large.” Bharath Sudarshan Director of Data Science and Innovation Armada Health
  • 35.
    Marketing Optimization for theReal Estate Market “Driverless AI helped us gain an edge with our Intelligent Marketing Cloud for our clients. AI to do AI, truly is improving our system on a daily basis.” Martin Stein Chief Product Officer • Outperforms other real estate digital marketing solutions by 2.5X • A G5 client saved $500K annual digital spend while increasing web traffic 3X • 10X faster model creation
  • 36.
    Improve Manufacturing Sales andForecasting “H2O Driverless AI feature engineering is better than anything I've seen out there right now. And the scoring pipeline generation is probably one of the bigger pluses for me. It's a massive time saver.” Robert Coop Sr. Data Scientist Stanley Black & Decker • Time savings of 25% with 1 data scientist • Saved 1 month of time in model tuning and training for industrial product line • Accurately predicted supplies and materials for a future client order increasing forecast accuracy
  • 37.
    IBM & H2ODriverless AI Simplifying and Accelerating Enterprise AI Initiatives
  • 38.
    H2O Driverless AIBenefits from the Power Systems Advantage High Speed Data Transfer 9.5x Big Data Scale 2.6x More RAM Max I/O bandwidth GPU Accelerated ML Integrated Systems Approach Faster on GPUs 30x
  • 39.
    H2O Driverless AIon IBM Power Systems A Winning Combination High Speed Data Transfer 1.5x Big Data Scale 2x Data Ingest Faster Feature Engineering GPU Accelerated ML Time Series 5x Integrated Systems Approach
  • 40.
    PowerAI Deep Learning Impact (DLI)Module Data & Model Management, ETL, Visualize, Advise IBM Spectrum Conductor with Spark Cluster Virtualization, Auto Hyper-Parameter Optimization PowerAI: Open Source ML Frameworks Large Model Support (LMS) Distributed Deep Learning (DDL) Auto ML PowerAI Enterprise PowerAI Vision Auto-DL for Images & Video Label Train Deploy Accelerated Infrastructure Accelerated Servers Storage AI for Data Scientists and non-Data Scientists H2O Driverless AI Auto-ML for Text & Numeric Data, NLP Import Experiment Deploy
  • 41.
    H2O Driverless AIComplements IBM PowerAI Vision Sensors Log Transactional IBM PowerAI delivers Deep Learning for Images H2O Driverless AI is Automatic Machine Learning NLP
  • 42.
    The H2O DriverlessAI Experience
  • 43.
    Driverless AI: AutomatesData Science and Machine Learning Workflows Driverless AI
  • 44.
    H2O Driverless AI:How it Works Local Amazon S3 HDFS X Y Automatic Scoring Pipeline Machine learning Interpretability Deploy Low-latency Scoring to Production Modelling Dataset Model Recipes: • IID data • Time-series • More on the way Advanced Feature Engineering Algorithm Model Tuning + + Survival of the Fittest Automatic Machine Learning Understand the data shape, outliers, missing values, etc. Powered by GPU Acceleration 1 Drag and drop data 2 Automatic Visualization Use best practice model recipes and the power of high performance computing to Iterate across thousands of possible models including advanced feature engineering and parameter tuning 3 Automatic Machine Learning Deploy ultra-low latency Python or Java Automatic Scoring Pipelines that include feature transformations and models. 4 Automatic Scoring Pipelines Ingest data from cloud, big data and desktop systems Google BigQuery Azure Blog Storage Snowflake Model Documentation
  • 45.
    The Driverless AIExperience 1. Import Data 2. Review Auto-Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model
  • 46.
    The Driverless AIExperience 1. Import Data 2. Review Auto- Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model
  • 47.
    The Driverless AIExperience 2. Review Auto-Visualizations
  • 48.
    The Driverless AIExperience Quickly start an experiment and benefit from built-in automation: 1. Import Data 2. Review Auto-Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model • Feature Engineering • Model Tuning • Model Selection
  • 49.
    The Driverless AIExperience 3. Start Experiment Feature Engineering Model Tuning Quickly Start Experiment Model Selection
  • 50.
    These are theonly required settings – all others are optional depending on the scenario It’s Easy to Start an Experiment Dataset being used to train the models What column are we trying to predict? Should certain rows of data have a higher weight? Data used to calculate metrics for the final model; not used during training Is this a time- series forecasting exercise? Columns to exclude from experiment Data used for parameter tuning
  • 51.
    Experiment Settings • Relative timefor completing the experiment • Higher settings mean: • More iterations are performed to find the best set of features • Longer “early stopping” threshold Time • Relative accuracy – higher values should lead to higher confidence in model performance (accuracy) • Impacts things such as level of data sampling, how many models are used in the final ensemble, parameter tuning level, among others Accuracy • Relative interpretability – higher values favor more interpretable models • The higher the interpretability setting, the lower the complexity of the engineered features and of the final model(s). Interpretability
  • 52.
    Auto Feature Generation KaggleGrandmaster Out of the Box • Automatic Text Handling • Frequency Encoding • Cross Validation Target Encoding • Truncated Singular Value Decompression • Clustering and more Feature Transformations Examples of Original Features Examples of Generated Features
  • 53.
    The Driverless AIExperience 1. Import Data 2. Review Auto-Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model
  • 54.
    The Driverless AIExperience 4. Review Winning Model
  • 55.
    The Driverless AIExperience 1. Import Data 2. Review Auto-Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model
  • 62.
  • 63.
    http://h2o.ai 21-day free trial Easyinstallation: Native and Dockerized deployment options
  • 64.
    CONFIDENTIAL CONFIDENTIAL H2O.ai in theCloud EMR KubeFlow DataPro c
  • 65.
    CONFIDENTIAL H2O Driverless AIon the Cloud • Easy setup on any cloud or on premise. Support for Azure, AWS and Google Cloud with marketplace offerings. • Develop more models using H2O Driverless AI automatic machine learning using high- performance computing and evolutionary algorithms to perform time-consuming data science tasks like feature engineering and model hyperparameter tuning. • Leverage your existing ML workbench to create and deploy streamlined production models based on insights from Driverless AI
  • 66.
    H2O Driverless AIDelivers Automatic ML for the Enterprise 21 day free trial for Driverless AI • Performs the function of an expert data scientist • Create models quickly with GPUs and Machine Learning automation • Delivers insights and interpretability • Created and supported by world renowned AI experts from H2O.ai • Award-winning software
  • 67.
    Getting Started 67 • Getthe 21 day free trial for Driverless AI • Don’t have the hardware try Qwiklab cloud training environment • Go to your favorite cloud AWS, Azure, Google • Try video Tutorial or follow Booklet • Learn how Driverless AI delivers Trust & Explainable AI • Learn more about NLP and Time-Series in Driverless AI • Watch Replays from H2O World London 2018 • Watch “Democratizing Intelligence” by Sri Ambati, CEO &Founder • Learn how PayPal is solving fraud with Driverless AI • Docs • H2O Community Slack
  • 68.
    Gracias Rafael Coss (Rafael.Coss@h2o.ai) Directorof Community @racoss @h2oai Chris Carpenter (Chris.Carpenter@h2o.ai) Leobardo Morales (lmorales@mx1.ibm.com)
  • 69.
    Additional sources ofinformation Docs • Docs: http://docs.h2o.ai Slack Community (Public) • Register • https://www.h2o.ai/community/driverless-ai-community/#chat • Guide: http://tinyurl.com/hac-community-guide Ask questions, discuss use cases, feedback, … • Driverless AI Trial: • Walkthrough: http://ibm.biz/H2O-DAI-Power-Video • Tutorials: https://www.youtube.com/watch?v=5jSU3CUReXY • Booklet: http://docs.h2o.ai/driverless-ai/latest- stable/docs/booklets/DriverlessAIBooklet.pdf
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
    Credit Card Example 74 •Dataset: • information on default payments, demographic factors, credit data, history of payment, etc. • Source: www.kaggle.com/uciml/default-of-credit-card-clients- dataset • File System: • CreditCard-train.csv (for training models) • CreditCard-test.csv (for making new predictions) • Our Goal: • Predict whether someone will default on their credit card payment. • Tutorial: • http://docs.h2o.ai/driverless-ai/latest-stable/docs/booklets/DriverlessAIBooklet.pdf • http://docs.h2o.ai/driverless-ai/latest-stable/docs/booklets/MLIBooklet.pdf
  • 75.
  • 76.
  • 77.
    Target Learn the Pattern Education,Marriage, Age, Sex, Repayment Status, Limit Balance ... 77 Learning from Credit Card Data Features Default Payment Next Month (Binary) Predictions Probability (0...1)