Machine Learning Model Deployment and Scoring on the Edge with Automatic Machine Learning and Data Flow
YouTube Video URL: https://youtu.be/gB0bTH-L6DE
Deploying Machine Learning models to the edge can present significant ML/IoT challenges centered around the need for low latency and accurate scoring on minimal resource environments. H2O.ai's Driverless AI AutoML and Cloudera Data Flow work nicely together to solve this challenge. Driverless AI automates the building of accurate Machine Learning models, which are deployed as light footprint and low latency Java or C++ artifacts, also known as a MOJO (Model Optimized). And Cloudera Data Flow leverage Apache NiFi that offers an innovative data flow framework to host MOJOs to make predictions on data moving on the edge.
3. Confidential3
A Sample of Machine Learning Use Cases
Machine Learning predictive algorithms are beginning to “eat the world”
Wholesale / Commercial
Banking
• Know Your Customers (KYC)
• Anti-Money Laundering (AML)
Card / Payments Business
• Transaction frauds
• Collusion fraud
• Real-time targeting
• Credit risk scoring
• In-context promotion
Retail Banking
• Deposit fraud
• Customer churn prediction
• Auto-loan
Financial Services
• Early cancer detection
• Product recommendations
• Personalized prescription
matching
• Medical claim fraud detection
• Flu season prediction
• Drug discovery
• ER and hospital
management
• Remote patient monitoring
• Medical test predictions
Healthcare and
Life Science
• Predictive maintenance
• Avoidable truck-rolls
• Customer churn prediction
• Improved customer viewing
experience
• Master data management
• In-context promotions
• Intelligent ad placements
• Personalized program
recommendations
Telecom
• Funnel predictions
• Personalized ads
• Credit scoring
• Fraud detection
• Next best offer
• Next best customer
• Smart profiling
• Prediction
• Customer recommendations
• Ad predictions and spend
Marketing and Retail
4. Confidential4 Confidential and property of H2O.ai. All rights reserved
ML Model Lifecycle (super high level)
Data acquisition
and prep
Model Building
Model
Deployment
Data engineer Data scientist IT / DevOps
Business Value
predictive analytics
actionable responses
software applicationspredictive
model
deploybuildML Algos
& techniques
significant
challenge
5. Confidential5 Confidential and property of H2O.ai. All rights reserved
ML Challenges: Model Building and Deployment
Model Building
Model
Deployment
Data scientist IT / DevOps
Time
Numerous iterations across experiments
coding, algos, hyperparams, feature
engineering, scoring metric, imbalanced
data, etc
Talent
Skill shortage
Trust
Is the model biased? Is it overfit? etc
Diverse Targets
Diverse target environments
Java, C++, Python
Rest server, Relational DB, Kafka queue, IoT device,
batch, streaming, etc
Hand-off
Data scientist to DevOps: what do I do with this?
Does Dev need to write logic or data pipeline? Repeatable?
Latency & Throughput
How many predictions per second can this make?
predictive
model
6. Confidential6
H2O and the ML Challenge
Model Building Model Deployment
Data scientist IT / DevOps
predictive model
AutoML
Find best model in shortest amount
of time while retaining control
MOJO
generated by AutoML
Deployed MOJO
MOJO: Let’s drill down!
Your infrastucture
Your software /
integration
Compute / AI heuristics /
genetic algorithm based
Code based against
massive datasets (TBs)
A different meetup :)
MOJO =
Flexible, easy to
deploy, low-latency
scoring software
artifact
Demo today:
Deploy to
7. Confidential7
MOJO: Highly Flexible Deployment Ready Artifact
Flexible - same MOJO deployable to:
Infrastructure layer: Cloud, On-Prem, Edge, Device
Runtime: Java, C++, Python
Data speed: Batch, Realtime/Streaming
Target deployment: See list at right for examples
Fast: Low Latency Scoring (typically < 1 ms)
Familiar Algos: Generalized Linear Model (GLM),
Gradient Boosting Machine (GBM), XGBoost, Stacked
Ensembles ...
MOJO
export
Java example
“Train once, deploy anywhere”
Can integrate into SDLC tooling
& process
8. Confidential8 Confidential and property of H2O.ai. All rights reserved
Challenges: Deploying Models to the Edge
More challenging
server edge
● Low compute resources (cpu, mem,
storage)
● Minimum higher-level frameworks to
tie into (simplicity / barebones)
● Often high throughput data
● Typical need for fast scoring
● High compute resources (cpu, mem,
storage)
● Higher-level frameworks to tie into (e.g.
web server, spark streaming, UDF)
● Diverse data speeds
● Diverse latency requirements (e.g. low
for batch)
Train once, deploy anywhere:
MOJO created by model building flexibly deploys to full spectrum of targets
9. Confidential9
Model Training Scoring
MOJO
Smartphone Device
Manufacturing
step
Engine
Predictive Maintenance
Direct plugin to
Edge & IoT Production Cases
Learning Feedback Loop
MOJO on the edge for Full ML Lifecycle
Load
Data
Run
AutoML
Winning Model
Generated
Model Deployment
Scoring history
Analytics (Drift detection)
Retraining
deploy
Return data
(inputs, prediction,
shapley values, etc)
MOJO
Prediction /
response
MOJO api
data input
10. Confidential10 Confidential10
• Automatic feature engineering,
machine learning and interpretability
• Fully automated machine learning
from ingest to deployment
• User licenses on a per seat basis
annually
• GUI-based interface for end-to-end
data science
• A new and innovated
platform to make your own
AI apps
• Enterprise commercial
software
• Easy and intuitive platform
to have AI answer your
question
H2O.ai: AI Platforms
In-memory, distributed
machine learning algorithms with
H2O Flow GUI
Open Source H2O Driverless AI H2O Q
• 100% open source – Apache V2
Licensed
• Integration with Apache Spark
• Enterprise support subscriptions
• Interface using R, Python on
H2O Flow
12. Confidential12 Confidential and property of H2O.ai. All rights reserved
Model Deployment with H2O + CDF
• CDF can execute embedded H2O ML
Models to make predictions
• CDF can execute H2O ML Models via
REST Calls to make predictions
• H2O: Driverless AI MOJO Scoring
Pipeline, H20-3 MOJO
• Cloudera: CDF, NiFi, MiNiFi C++, Kafka,
Flink, Spark Streaming
• Use Cases: real-time scoring, batch
scoring
H2O
MOJO
Inside
H2O
MOJO
Inside
H2O
MOJO
Inside
H2O
MOJO
Inside
13. Confidential13 Confidential and property of H2O.ai. All rights reserved
Model Deployment with Driverless AI + NiFi
• Custom NiFi Processor executes
Driverless AI Mojo Scoring Pipeline in
Java Runtime to make predictions
• Capable of doing real-time and batch
scoring
• Ingest any data source supported by
NiFi’s Record Reader
• Output any data format supported by
NiFi’s Record Writer
• Example Use Case: Classify Hydraulic
Cooling Condition
15. Confidential15 Confidential and property of H2O.ai. All rights reserved
Model Deployment with Driverless AI + MiNiFi C++
• Custom MiNiFi Processor executes
Driverless AI Mojo Scoring Pipeline
in Py Runtime to make predictions
• Capable of doing real-time and
batch scoring
• Ingest any data source supported by
H2O’s Py DataTable Reader
• Output pandas data format
• Example Use Case: Classify
Hydraulic Cooling Condition
16. Confidential16 Confidential and property of H2O.ai. All rights reserved
Deployment with Driverless AI + Apache Flink
• Custom Flink DataStream Job will
execute Driverless AI Mojo Scoring
Pipeline in Java Runtime to do
real-time scoring
• Custom Flink DataSet Job will
execute Driverless AI Mojo Scoring
Pipeline in Java Runtime to do
batch scoring
• Ingest csv data source
• Write predictions to csv
• Example Use Case: Classify
Hydraulic Cooling Condition
17. Confidential17
Resources
• dai-deployment-examples/
• Github: Apache NiFi
• Github: Apache NiFi - MiNiFi C++
• Driverless AI Tutorials
• Driverless AI MOJO Docs
• Github: H2O-3
• H2O-3 MOJO Docs
• YouTube: MiNiFi Custom Processor
for Running the MOJO in MiNiFi
Data Flow
•
•
• NiFi Contributor Guide
• MiNiFi C++ Contributor Guide
• Contributing to H2O.ai Tutorials
•
• Contributing to H2O-3
18. Confidential18
H2O.ai Learning Center
What?
• Self paced tutorials
• Instructor led courses
– AI and ML Foundations (Free)
• Knowledge Achievement: Badges
H2O.ai Aquarium
• Cloud H2O.ai learning environments
• Driverless AI, H2O-3, Sparkling Water,
DataTable