SlideShare a Scribd company logo
MACHINE LEARNING MODEL DEPLOYMENT
From Strategy to Implementation
2 © Cloudera, Inc. All rights reserved.
ABOUT ME
• Head of Cloudera’s Fast Forward Labs ML research and consulting
team
• Built and scaled numerous production ML systems and teams
spanning government, B2B and consumer organizations
• Tech blogger. Musician. Twitter: @justinJDN
•
Justin Norman
Director DS & Research Svcs
3 © Cloudera, Inc. All rights reserved.
ABOUT ME
• Cloudera Strategic Solutions Architect focused on Data Science
and Machine Learning
• Developed and deployed models across diverse verticals such
as Finance, Healthcare, etc.
• Frequent speaker at Big Data Conferences including Oreilly
Strata etc.
Sagar Kewalramani
Solutions Architect, Professional
Services
4 © Cloudera, Inc. All rights reserved.
• Google predicts
commute times.
ML IS
EVERYWHERE
Google didn’t set out to make a
traffic tool.
Apple isn’t in the facial recognition
business.
• Apple predicts facial
matches.
• Dozens of other ML-
powered models in
your phone today.
5 © Cloudera, Inc. All rights reserved.
ML IS AT THE HEART OF TRANSFORMATION
AI
MACHINE
LEARNING
DATA SCIENCE
ANALYTICS
"BIG DATA"
Probabilistic
Deterministic
What could happen?
What happened?
6 © Cloudera, Inc. All rights reserved.
WHAT IS PRODUCTION ML?
Data
Engineering
Business
Inputs
Data Science
Production Machine Learning
Packaging*
Pipeline
Hardening
(Data
Engineering)
Model
Hardening
Deploy Monitoring
MODEL SECURITY
MODEL
GOVERNANCE
DATA CATALOG
MODEL CATALOG FEATURE CATALOG
7 © Cloudera, Inc. All rights reserved.
WHICH TEAM ROLES ARE INVOLVED?
DATA ENGINEERING
DATA SCIENCE
PRODUCTION ML
DATA
PREP
PIPELINES
DATA MODELING
DATA
TRANSFORMATION
DATA INGEST JOB
MONITORING
TRAINING
DATA
DISCOVERY
JOB TUNING
EXPERIMENTATION
PROTOTYPING
MODEL
DEPLOYMENT
MODEL
MONITORING
DATA
MONITORING
8 © Cloudera, Inc. All rights reserved.
WHAT ARE THE KEY SKILLS?
Big Data
Platform
ML/AI
Frameworks
Container
Infrastructure
Orchestration
9 © Cloudera, Inc. All rights reserved.
WHAT IS A MODEL ANYWAY?
Taking many forms, an algorithm designed to make predictions based on data input
{key, value} - Prediction
- Metadata
Monitoring
Business
SystemsUpstream
Systems
Model
Batch or Stream
10 © Cloudera, Inc. All rights reserved.
HIDDEN TECHNICAL DEBT IN ML SYSTEMS
Google Paper
Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The
required surrounding infrastructure is vast and complex.
11 © Cloudera, Inc. All rights reserved.
SAMPLE DATA SCIENCE / ML WORKFLOW
From Data Exploration to Action
12 © Cloudera, Inc. All rights reserved.
CHALLENGES
Tools, Platforms, Data
?
13 © Cloudera, Inc. All rights reserved.
CHALLENGES
Recipes, not Cakes
Recode
Deployment Expectations
• Support A/B testing
• Support
Experiments
• Support measuring
& Evaluating model
performance
• Deployment should
be fast and adaptive
to business needs
14 © Cloudera, Inc. All rights reserved.
SUMMARY OF CHALLENGES
• Access
For sensitive data, secure clusters are
difficult to access. No shared security
• Flexibility
IT typically doesn’t want random
packages installed on a secure cluster.
• Tools
Popular open source tools don’t easily
connect to these environments, or
always support Hadoop data formats.
Nothing supports full workflow
• Scale
Laptops rarely have capacity for
medium, let alone big data. This
leads to a lot of sampling.
• Parallelism
Popular frameworks don’t easily
parallelize on a cluster. Typically
code has to get rewritten for
production.
• Security
Data being pulled into laptops
• Developer Experience
Notebooks, while awesome, don’t
easily support virtual environment
and dependency management,
especially for teams.
• Collaboration
No easy way to share code between
teams
• Deployment
Notebooks are also challenging to
“put into production.”
15 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING AT UBER, NETFLIX, AND FACEBOOK
Industrialized AI requires requires new supporting tools and platforms
Facebook
FBLearner
Uber
Michelangelo
Netflix
Recommendation
Platform
16 © Cloudera, Inc. All rights reserved.
ML AT SCALE REQUIRES A UNIFIED DATA STRATEGY
Streaming
Ingest
Batch Ingest
Machine
Learning Tools
BI Tools and
SQL Editors
Data Products
DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT
MACHINE
LEARNING
DATA
ENGINEERING
DATA
WAREHOUSE
OPERATIONAL
DATABASE
© Cloudera, Inc. All rights reserved.17 © Cloudera, Inc. All rights reserved.
YOU’VE GOT OPTIONS…
Model Dev, Training, Deployment & Monitoring
© Cloudera, Inc. All rights reserved.18 © Cloudera, Inc. All rights reserved.
MODEL DEVELOPMENT
19 © Cloudera, Inc. All rights reserved.
EVERYONE HAS AN OPINION
• Should enable collaboration and code reuse
(git integration)
• Should support open-source frameworks and
libraries
• Must handle dependencies and isolates dev
environment for and individual session
• Can scale compute resources/up down when
needed
• Doesn’t require you to move data to use it!
© Cloudera, Inc. All rights reserved.20 © Cloudera, Inc. All rights reserved.
TRAINING & EXPERIMENTS
© Cloudera, Inc. All rights reserved.21 © Cloudera, Inc. All rights reserved.
A/B TESTING & MULTIVARIATE TESTING FOR THE MODEL
Is the best trained model indeed the best model, or does a different model
perform better on new, unseen data?
MODEL
VARIATION A
MODEL
VARIATION B
INCOMING
TRAFFIC
Data scientists need ...
• A framework to identify the best performers
among a competing set of models
• To evaluate models which can maximize
business KPIs
• Track specified model metrics, performance,
and model artifacts
• Inspect, & compare deployed models
© Cloudera, Inc. All rights reserved.22 © Cloudera, Inc. All rights reserved.
EXPERIMENT MANAGEMENT
Versioned, reproducible model training & evaluation runs
Data scientists need to ...
• Create a snapshot of model code, dependencies,
and configuration necessary to train the model
• Build and execute the training run in an isolated
container
• Track specified model metrics, performance,
and model artifacts
• Inspect, compare, or deploy prior models
Many options of varying maturity and don’t all
play well with other ecosystem tools
Sacred
Proprietary
Open-Source
© Cloudera, Inc. All rights reserved.23 © Cloudera, Inc. All rights reserved.
MODEL DEPLOYMENT
24 © Cloudera, Inc. All rights reserved.
MODEL DEPLOYMENT PATTERNS
Knowing how business metrics will be improved help guide deployment options
Managers use data to make better
decisions
Centrally automate internal
decisions
Centrally automate customer-
facing decisions
Automate decisions at the edge
Batch Scoring, Hosted
Real Time Scoring, Hosted
Real Time Scoring, Data Flow + Custom
Monitoring
Real Time Scoring, Device Embedded
© Cloudera, Inc. All rights reserved.25 © Cloudera, Inc. All rights reserved.
MODEL DEPLOYMENT APPROACH : TECHNOLOGICAL VS COST BENEFITS
DIFFERENT MODEL DEPLOYMENT FORMATS
NATIVE JAVA/C++ MODEL
• Faster
• Limitation of Available Algo/DS Libraries
HYBRID APPROACH PMML:
• Compatibility across multiple tools
• Non Agile
• Not flexible in terms of deployment
PYTHON STACK
• PMML files are big
• Unit testing is tricky
API POWERED MODEL:
• Agile
• Scalable
• Can be used by both backend & fronted
• Faster
API POWERED
MODEL
HYBRID APPROACH
PMML
REBUILD THE
WHOLE STACK
TO PYTHON
NATIVE JAVA / C++
MODELS
COST $
TECHNOLOGICAL BENEFITS
© Cloudera, Inc. All rights reserved.26 © Cloudera, Inc. All rights reserved.
MONITORING
© Cloudera, Inc. All rights reserved.27 © Cloudera, Inc. All rights reserved.
MONITORING STATS
SCHEDULE & MONITOR
Production ML needs...
● A Monitoring mechanism that is model-agnostic
● Instrumentation of both the data flow in and the model performance metrics out
● To Collect Performance Metrics (e.g., accuracy, RMSE, ,Mean Absolute Error(MAE) )
© Cloudera, Inc. All rights reserved.28 © Cloudera, Inc. All rights reserved.
CLOUDERA ML APPROACH
Modern enterprise platform, tools and expert guidance to add SPEED and SCALE
Agile platform to build,
train, and deploy many
scalable ML applications
Enterprise data science
tools to accelerate
team productivity
Expert guidance,
services & training to
fast track value & scale
© Cloudera, Inc. All rights reserved.29 © Cloudera, Inc. All rights reserved.
ACCELERATING THREE STAGES OF MACHINE LEARNING
Enterprise AI platform supporting model development, training, and deployment
Manage models
Deploy models
Monitor performance
DEPLOYDEVELOP
Explore data
Develop models
Share results
TRAIN
Optimize parameters
Track experiments
Compare performance
© Cloudera, Inc. All rights reserved.30 © Cloudera, Inc. All rights reserved.
ACCELERATING MACHINE LEARNING
Lego Block for ML: Like a containerized edge node
Wrap with REST endpoint
Online Scoring
JSON in, JSON out
MODELSSESSIONS
Interactive session for
exploration and
development
EXPERIMENTS
Initiate and track
Like a lab notebook
Export artifacts to project
Runtime
Engine:
Kernels (R/Python/Scala)
Common Libraries
FS Mounts:
CDH - Parcel Dir
RPM - Hadoop Config Files
Project Dir:
Code
Files
Libraries
Dependencies
JOBS
Scheduled
Run a particular code end-to-
end
New snapshots retain history
Point in time
Git snapshot
© Cloudera, Inc. All rights reserved.31 © Cloudera, Inc. All rights reserved.
DEMO
© Cloudera, Inc. All rights reserved.32 © Cloudera, Inc. All rights reserved.
SELF-SERVICE
CLOUDERA DATA SCIENCE WORKBENCH
© Cloudera, Inc. All rights reserved.33 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Bringing the data scientists TO the data in a way that they want to work
For data scientists
• Experiment faster
Use R, Python, or Scala with
on-demand compute and
secure CDH/HDP data access
• Work together
Share reproducible research
with your whole team
• Deploy with confidence
Get to production repeatably
and without recoding
For IT professionals
• Bring data science to the data
Give your data science team
more freedom while reducing
the risk and cost of silos
• Secure by default
Leverage common security
and governance across
workloads
• Run anywhere
On-premises or in the cloud
© Cloudera, Inc. All rights reserved.34 © Cloudera, Inc. All rights reserved.
CDSW MODELS
Machine learning models as one-click microservices (REST APIs)
1. Choose file, e.g. score.py
2. Choose function, e.g. forecast
f = open('model.pk', 'rb')
model = pickle.load(f)
def forecast(data):
return model.predict(data)
3. Choose resources
4. Deploy!
Running model containers also have access to CDH
for data lookups.
© Cloudera, Inc. All rights reserved.35 © Cloudera, Inc. All rights reserved.
CDSW EXPERIMENTS
Versioned model training runs for evaluation and reproducibility
Data scientists can ...
• Create a snapshot of model code, dependencies,
and configuration necessary to train the model
• Build and execute the training run in an isolated
container
• Track specified model metrics, performance,
and model artifacts
• Inspect, compare, or deploy prior models
© Cloudera, Inc. All rights reserved.36 © Cloudera, Inc. All rights reserved.
MODEL MANAGEMENT
View, test, monitor, and update models by team or project
© Cloudera, Inc. All rights reserved.37 © Cloudera, Inc. All rights reserved.
CDSW JOBS TO ORCHESTRATE BATCH SCORING
Schedule reports & scoring to run on a periodic basis
Scheduling is easy and powerful
●Execute arbitrary scripts
●Schedule on a recurring basis
●Create dependencies on other jobs for
complex pipelines
●Allow output to be sent via email to
recipients
© Cloudera, Inc. All rights reserved.38 © Cloudera, Inc. All rights reserved.
SUMMARY OF FEATURES
End-to-End
Workflow
Support
• Development
• Train
• Deployment
Collaboration
• Teams
• Sharing
• Good coding
practices (Git)
Security and
Governance
• Transparent
• Leverages
underlying
frameworks
• No data
movement
• Reproducibility
Openness and
Self-service
• Any
framework
• Isolated for
individual
effectiveness
• Simplified
dependency
management
© Cloudera, Inc. All rights reserved.
THANK YOU

More Related Content

What's hot

Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Databricks
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
Saurabh Kaushik
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
doppenhe
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
James Serra
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
Carl W. Handlin
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Generative AI
Generative AIGenerative AI
Generative AI
All Things Open
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
AllenPeter7
 
ML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production Application
Hunter Carlisle
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
Seldon
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
Databricks
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
Michael Gerke
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problems
Amy Hodler
 

What's hot (20)

Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Generative AI
Generative AIGenerative AI
Generative AI
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
ML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production Application
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
 
ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problems
 

Similar to Machine Learning Model Deployment: Strategy to Implementation

The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
Cloudera, Inc.
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
The Hive
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera, Inc.
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
Cloudera, Inc.
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
Cloudera, Inc.
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
Cloudera, Inc.
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Matt Stubbs
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
DataWorks Summit
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
DataWorks Summit
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
Josh Yeh
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Timothy Spann
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
Timothy Spann
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
Neo4j
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
Cloudera, Inc.
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Hong Ong
 

Similar to Machine Learning Model Deployment: Strategy to Implementation (20)

The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Machine Learning Model Deployment: Strategy to Implementation

  • 1. MACHINE LEARNING MODEL DEPLOYMENT From Strategy to Implementation
  • 2. 2 © Cloudera, Inc. All rights reserved. ABOUT ME • Head of Cloudera’s Fast Forward Labs ML research and consulting team • Built and scaled numerous production ML systems and teams spanning government, B2B and consumer organizations • Tech blogger. Musician. Twitter: @justinJDN • Justin Norman Director DS & Research Svcs
  • 3. 3 © Cloudera, Inc. All rights reserved. ABOUT ME • Cloudera Strategic Solutions Architect focused on Data Science and Machine Learning • Developed and deployed models across diverse verticals such as Finance, Healthcare, etc. • Frequent speaker at Big Data Conferences including Oreilly Strata etc. Sagar Kewalramani Solutions Architect, Professional Services
  • 4. 4 © Cloudera, Inc. All rights reserved. • Google predicts commute times. ML IS EVERYWHERE Google didn’t set out to make a traffic tool. Apple isn’t in the facial recognition business. • Apple predicts facial matches. • Dozens of other ML- powered models in your phone today.
  • 5. 5 © Cloudera, Inc. All rights reserved. ML IS AT THE HEART OF TRANSFORMATION AI MACHINE LEARNING DATA SCIENCE ANALYTICS "BIG DATA" Probabilistic Deterministic What could happen? What happened?
  • 6. 6 © Cloudera, Inc. All rights reserved. WHAT IS PRODUCTION ML? Data Engineering Business Inputs Data Science Production Machine Learning Packaging* Pipeline Hardening (Data Engineering) Model Hardening Deploy Monitoring MODEL SECURITY MODEL GOVERNANCE DATA CATALOG MODEL CATALOG FEATURE CATALOG
  • 7. 7 © Cloudera, Inc. All rights reserved. WHICH TEAM ROLES ARE INVOLVED? DATA ENGINEERING DATA SCIENCE PRODUCTION ML DATA PREP PIPELINES DATA MODELING DATA TRANSFORMATION DATA INGEST JOB MONITORING TRAINING DATA DISCOVERY JOB TUNING EXPERIMENTATION PROTOTYPING MODEL DEPLOYMENT MODEL MONITORING DATA MONITORING
  • 8. 8 © Cloudera, Inc. All rights reserved. WHAT ARE THE KEY SKILLS? Big Data Platform ML/AI Frameworks Container Infrastructure Orchestration
  • 9. 9 © Cloudera, Inc. All rights reserved. WHAT IS A MODEL ANYWAY? Taking many forms, an algorithm designed to make predictions based on data input {key, value} - Prediction - Metadata Monitoring Business SystemsUpstream Systems Model Batch or Stream
  • 10. 10 © Cloudera, Inc. All rights reserved. HIDDEN TECHNICAL DEBT IN ML SYSTEMS Google Paper Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.
  • 11. 11 © Cloudera, Inc. All rights reserved. SAMPLE DATA SCIENCE / ML WORKFLOW From Data Exploration to Action
  • 12. 12 © Cloudera, Inc. All rights reserved. CHALLENGES Tools, Platforms, Data ?
  • 13. 13 © Cloudera, Inc. All rights reserved. CHALLENGES Recipes, not Cakes Recode Deployment Expectations • Support A/B testing • Support Experiments • Support measuring & Evaluating model performance • Deployment should be fast and adaptive to business needs
  • 14. 14 © Cloudera, Inc. All rights reserved. SUMMARY OF CHALLENGES • Access For sensitive data, secure clusters are difficult to access. No shared security • Flexibility IT typically doesn’t want random packages installed on a secure cluster. • Tools Popular open source tools don’t easily connect to these environments, or always support Hadoop data formats. Nothing supports full workflow • Scale Laptops rarely have capacity for medium, let alone big data. This leads to a lot of sampling. • Parallelism Popular frameworks don’t easily parallelize on a cluster. Typically code has to get rewritten for production. • Security Data being pulled into laptops • Developer Experience Notebooks, while awesome, don’t easily support virtual environment and dependency management, especially for teams. • Collaboration No easy way to share code between teams • Deployment Notebooks are also challenging to “put into production.”
  • 15. 15 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT UBER, NETFLIX, AND FACEBOOK Industrialized AI requires requires new supporting tools and platforms Facebook FBLearner Uber Michelangelo Netflix Recommendation Platform
  • 16. 16 © Cloudera, Inc. All rights reserved. ML AT SCALE REQUIRES A UNIFIED DATA STRATEGY Streaming Ingest Batch Ingest Machine Learning Tools BI Tools and SQL Editors Data Products DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT MACHINE LEARNING DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE
  • 17. © Cloudera, Inc. All rights reserved.17 © Cloudera, Inc. All rights reserved. YOU’VE GOT OPTIONS… Model Dev, Training, Deployment & Monitoring
  • 18. © Cloudera, Inc. All rights reserved.18 © Cloudera, Inc. All rights reserved. MODEL DEVELOPMENT
  • 19. 19 © Cloudera, Inc. All rights reserved. EVERYONE HAS AN OPINION • Should enable collaboration and code reuse (git integration) • Should support open-source frameworks and libraries • Must handle dependencies and isolates dev environment for and individual session • Can scale compute resources/up down when needed • Doesn’t require you to move data to use it!
  • 20. © Cloudera, Inc. All rights reserved.20 © Cloudera, Inc. All rights reserved. TRAINING & EXPERIMENTS
  • 21. © Cloudera, Inc. All rights reserved.21 © Cloudera, Inc. All rights reserved. A/B TESTING & MULTIVARIATE TESTING FOR THE MODEL Is the best trained model indeed the best model, or does a different model perform better on new, unseen data? MODEL VARIATION A MODEL VARIATION B INCOMING TRAFFIC Data scientists need ... • A framework to identify the best performers among a competing set of models • To evaluate models which can maximize business KPIs • Track specified model metrics, performance, and model artifacts • Inspect, & compare deployed models
  • 22. © Cloudera, Inc. All rights reserved.22 © Cloudera, Inc. All rights reserved. EXPERIMENT MANAGEMENT Versioned, reproducible model training & evaluation runs Data scientists need to ... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models Many options of varying maturity and don’t all play well with other ecosystem tools Sacred Proprietary Open-Source
  • 23. © Cloudera, Inc. All rights reserved.23 © Cloudera, Inc. All rights reserved. MODEL DEPLOYMENT
  • 24. 24 © Cloudera, Inc. All rights reserved. MODEL DEPLOYMENT PATTERNS Knowing how business metrics will be improved help guide deployment options Managers use data to make better decisions Centrally automate internal decisions Centrally automate customer- facing decisions Automate decisions at the edge Batch Scoring, Hosted Real Time Scoring, Hosted Real Time Scoring, Data Flow + Custom Monitoring Real Time Scoring, Device Embedded
  • 25. © Cloudera, Inc. All rights reserved.25 © Cloudera, Inc. All rights reserved. MODEL DEPLOYMENT APPROACH : TECHNOLOGICAL VS COST BENEFITS DIFFERENT MODEL DEPLOYMENT FORMATS NATIVE JAVA/C++ MODEL • Faster • Limitation of Available Algo/DS Libraries HYBRID APPROACH PMML: • Compatibility across multiple tools • Non Agile • Not flexible in terms of deployment PYTHON STACK • PMML files are big • Unit testing is tricky API POWERED MODEL: • Agile • Scalable • Can be used by both backend & fronted • Faster API POWERED MODEL HYBRID APPROACH PMML REBUILD THE WHOLE STACK TO PYTHON NATIVE JAVA / C++ MODELS COST $ TECHNOLOGICAL BENEFITS
  • 26. © Cloudera, Inc. All rights reserved.26 © Cloudera, Inc. All rights reserved. MONITORING
  • 27. © Cloudera, Inc. All rights reserved.27 © Cloudera, Inc. All rights reserved. MONITORING STATS SCHEDULE & MONITOR Production ML needs... ● A Monitoring mechanism that is model-agnostic ● Instrumentation of both the data flow in and the model performance metrics out ● To Collect Performance Metrics (e.g., accuracy, RMSE, ,Mean Absolute Error(MAE) )
  • 28. © Cloudera, Inc. All rights reserved.28 © Cloudera, Inc. All rights reserved. CLOUDERA ML APPROACH Modern enterprise platform, tools and expert guidance to add SPEED and SCALE Agile platform to build, train, and deploy many scalable ML applications Enterprise data science tools to accelerate team productivity Expert guidance, services & training to fast track value & scale
  • 29. © Cloudera, Inc. All rights reserved.29 © Cloudera, Inc. All rights reserved. ACCELERATING THREE STAGES OF MACHINE LEARNING Enterprise AI platform supporting model development, training, and deployment Manage models Deploy models Monitor performance DEPLOYDEVELOP Explore data Develop models Share results TRAIN Optimize parameters Track experiments Compare performance
  • 30. © Cloudera, Inc. All rights reserved.30 © Cloudera, Inc. All rights reserved. ACCELERATING MACHINE LEARNING Lego Block for ML: Like a containerized edge node Wrap with REST endpoint Online Scoring JSON in, JSON out MODELSSESSIONS Interactive session for exploration and development EXPERIMENTS Initiate and track Like a lab notebook Export artifacts to project Runtime Engine: Kernels (R/Python/Scala) Common Libraries FS Mounts: CDH - Parcel Dir RPM - Hadoop Config Files Project Dir: Code Files Libraries Dependencies JOBS Scheduled Run a particular code end-to- end New snapshots retain history Point in time Git snapshot
  • 31. © Cloudera, Inc. All rights reserved.31 © Cloudera, Inc. All rights reserved. DEMO
  • 32. © Cloudera, Inc. All rights reserved.32 © Cloudera, Inc. All rights reserved. SELF-SERVICE CLOUDERA DATA SCIENCE WORKBENCH
  • 33. © Cloudera, Inc. All rights reserved.33 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Bringing the data scientists TO the data in a way that they want to work For data scientists • Experiment faster Use R, Python, or Scala with on-demand compute and secure CDH/HDP data access • Work together Share reproducible research with your whole team • Deploy with confidence Get to production repeatably and without recoding For IT professionals • Bring data science to the data Give your data science team more freedom while reducing the risk and cost of silos • Secure by default Leverage common security and governance across workloads • Run anywhere On-premises or in the cloud
  • 34. © Cloudera, Inc. All rights reserved.34 © Cloudera, Inc. All rights reserved. CDSW MODELS Machine learning models as one-click microservices (REST APIs) 1. Choose file, e.g. score.py 2. Choose function, e.g. forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data) 3. Choose resources 4. Deploy! Running model containers also have access to CDH for data lookups.
  • 35. © Cloudera, Inc. All rights reserved.35 © Cloudera, Inc. All rights reserved. CDSW EXPERIMENTS Versioned model training runs for evaluation and reproducibility Data scientists can ... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models
  • 36. © Cloudera, Inc. All rights reserved.36 © Cloudera, Inc. All rights reserved. MODEL MANAGEMENT View, test, monitor, and update models by team or project
  • 37. © Cloudera, Inc. All rights reserved.37 © Cloudera, Inc. All rights reserved. CDSW JOBS TO ORCHESTRATE BATCH SCORING Schedule reports & scoring to run on a periodic basis Scheduling is easy and powerful ●Execute arbitrary scripts ●Schedule on a recurring basis ●Create dependencies on other jobs for complex pipelines ●Allow output to be sent via email to recipients
  • 38. © Cloudera, Inc. All rights reserved.38 © Cloudera, Inc. All rights reserved. SUMMARY OF FEATURES End-to-End Workflow Support • Development • Train • Deployment Collaboration • Teams • Sharing • Good coding practices (Git) Security and Governance • Transparent • Leverages underlying frameworks • No data movement • Reproducibility Openness and Self-service • Any framework • Isolated for individual effectiveness • Simplified dependency management
  • 39. © Cloudera, Inc. All rights reserved. THANK YOU