Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Accelerate Your Machine
Learning Pipeline with
AutoML and MLflow
Elena Boiarskaia
Senior Data Scientist at H2O.ai
Eric Gud...
Agenda
§ Challenges with ML
§ Integration solution
§ Data Scientist workflow
§ Live Demo
§ Dev Ops workflow
Challenges with Machine Learning
▪ Feature selection
▪ Feature
transformation
▪ Select which
features
transformers work
▪ ...
Solution via Integration
Driverless AI:
▪ Automated feature
selection
▪ Automated feature
engineering
▪ Custom feature
tra...
Data Scientist Workflow
Example Notebook
Workflow
Update Update table with Databricks Delta
Score Score new data with Driverless AI model on Datab...
Live Demo
Example Wave App Workflow
Import Data
Set up Driverless AI experiment
Automatically generate Databricks notebook to run an...
Dev Ops Workflow
Model Deployment Questions to Consider
• Batch
• Real-time
Type
• SLA
• Data size
• Concurrent requests
Scoring Latency
• ...
Import Notebook
• Auto generated
• Integrate with existing
pipeline
• Easy 4 Step process
• Can schedule notebook
via UI
Simple Template Example
• Library provided for
different
environments
• Executes score using
C++ code
Internal Scorer
• Invoke call from
Databricks Worker nodes
• Notebook friendly API
• Create DataFrame
• Specify model to s...
Example Execution
• Fast…
• Data loaded into
DataFrame
• Model specified
External Rest API Scorer
• Invoke call from Databricks
Worker nodes
• Notebook friendly API
• Create DataFrame
• Specify m...
Example Execution
• Rest Endpoint runs
in the customers
VPC
• Feeds into Model
Monitoring
External Batch Scorer
• Use Databricks as a JDBC
data source
• Supports Delta Tables
and any Tables
accessible with SQL
• ...
Saving Results
§ Update existing table or
Insert into a new table
§ Update method
▪ Single row
▪ Bulk Upload
§ Catching ru...
Conclusion
Databricks and H2O.ai integration offers:
• End to end pipeline from data management to model
deployment
• High...
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Accelerate Your ML Pipeline with AutoML and MLflow

Download to read offline

Building ML models is a time consuming endeavor that requires a thorough understanding of feature engineering, selecting useful features, choosing an appropriate algorithm, and performing hyper-parameter tuning. Extensive experimentation is required to arrive at a robust and performant model. Additionally, keeping track of the models that have been developed and deployed may be complex. Solving these challenges is key for successfully implementing end-to-end ML pipelines at scale.

In this talk, we will present a seamless integration of automated machine learning within a Databricks notebook, thus providing a truly unified analytics lifecycle for data scientists and business users with improved speed and efficiency. Specifically, we will show an app that generates and executes a Databricks notebook to train an ML model with H2O’s Driverless AI automatically. The resulting model will be automatically tracked and managed with MLflow. Furthermore, we will show several deployment options to score new data on a Databricks cluster or with an external REST server, all within the app.

Accelerate Your ML Pipeline with AutoML and MLflow

  1. 1. Accelerate Your Machine Learning Pipeline with AutoML and MLflow Elena Boiarskaia Senior Data Scientist at H2O.ai Eric Gudgion Senior Principal Solutions Architect at H2O.ai
  2. 2. Agenda § Challenges with ML § Integration solution § Data Scientist workflow § Live Demo § Dev Ops workflow
  3. 3. Challenges with Machine Learning ▪ Feature selection ▪ Feature transformation ▪ Select which features transformers work ▪ Reuse engineered features ▪ Select scoring metric ▪ Algorithm selection ▪ Hyperparameter tuning ▪ Ensemble methods • Model Training • Feature Engineering ▪ Track experiments ▪ Select which model to deploy ▪ Deploy models in variety of environments • Model Deployment ▪ User friendly interface ▪ Build visualizations and dashboards ▪ Update visuals with model predictions • Presenting Results
  4. 4. Solution via Integration Driverless AI: ▪ Automated feature selection ▪ Automated feature engineering ▪ Custom feature transformers, models and scoring metrics Driverless AI: ▪ Genetic algorithm ▪ Algorithm selection ▪ Hyperparameter tuning ▪ Explainability ▪ Stand alone model object (MOJO) • Model Training • Feature Engineering Mlflow: ▪ Track experiments ▪ Reproducible projects ▪ Model deployment ▪ Model registry • Model Deployment Wave: ▪ Rapid app development ▪ Python based ▪ Create dashboards and visualizations ▪ Realtime apps connected to models and data sources • Presenting Results
  5. 5. Data Scientist Workflow
  6. 6. Example Notebook Workflow Update Update table with Databricks Delta Score Score new data with Driverless AI model on Databricks Log Log Driverless AI model in MLFlow Train Send data to Driverless AI and train model Prepare Prepare data with Spark on Databricks Manage Store and manage data with Databricks Delta
  7. 7. Live Demo
  8. 8. Example Wave App Workflow Import Data Set up Driverless AI experiment Automatically generate Databricks notebook to run and log experiment Send notebook to a Databricks cluster to run Trains Driverless AI model Logs model in MLFlow
  9. 9. Dev Ops Workflow
  10. 10. Model Deployment Questions to Consider • Batch • Real-time Type • SLA • Data size • Concurrent requests Scoring Latency • Errors • Metrics Monitoring
  11. 11. Import Notebook • Auto generated • Integrate with existing pipeline • Easy 4 Step process • Can schedule notebook via UI
  12. 12. Simple Template Example • Library provided for different environments • Executes score using C++ code
  13. 13. Internal Scorer • Invoke call from Databricks Worker nodes • Notebook friendly API • Create DataFrame • Specify model to score • Returns DataFrame with predictions Predictions (MOJO) Data Warehouse
  14. 14. Example Execution • Fast… • Data loaded into DataFrame • Model specified
  15. 15. External Rest API Scorer • Invoke call from Databricks Worker nodes • Notebook friendly API • Create DataFrame • Specify model to score • Call Endpoint • Call multiple models in one call • Returns DataFrame per model • Handy to test new vs. old model Predictions (MOJO) Data Warehouse Other Model Users Rest Endpoint
  16. 16. Example Execution • Rest Endpoint runs in the customers VPC • Feeds into Model Monitoring
  17. 17. External Batch Scorer • Use Databricks as a JDBC data source • Supports Delta Tables and any Tables accessible with SQL • Connects via JDBC connection Predictions (MOJO) Data Warehouse Batch Job CPU CPU CPU
  18. 18. Saving Results § Update existing table or Insert into a new table § Update method ▪ Single row ▪ Bulk Upload § Catching runtime data changes
  19. 19. Conclusion Databricks and H2O.ai integration offers: • End to end pipeline from data management to model deployment • Highly scalable model training and scoring • Leverage advanced automated ML with Driverless AI • Advanced feature engineering and feature selection • Highly accurate and explainable model
  20. 20. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • saurabhverma2412

    Jul. 24, 2021

Building ML models is a time consuming endeavor that requires a thorough understanding of feature engineering, selecting useful features, choosing an appropriate algorithm, and performing hyper-parameter tuning. Extensive experimentation is required to arrive at a robust and performant model. Additionally, keeping track of the models that have been developed and deployed may be complex. Solving these challenges is key for successfully implementing end-to-end ML pipelines at scale. In this talk, we will present a seamless integration of automated machine learning within a Databricks notebook, thus providing a truly unified analytics lifecycle for data scientists and business users with improved speed and efficiency. Specifically, we will show an app that generates and executes a Databricks notebook to train an ML model with H2O’s Driverless AI automatically. The resulting model will be automatically tracked and managed with MLflow. Furthermore, we will show several deployment options to score new data on a Databricks cluster or with an external REST server, all within the app.

Views

Total views

204

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

8

Shares

0

Comments

0

Likes

1

×