SlideShare a Scribd company logo
1 of 24
MACHINE LEARNING MODELS:
FROM RESEARCH TO PRODUCTION
Tristan Zajonc | Head of Machine Learning Engineering
2 © Cloudera, Inc. All rights reserved.
WHY ARE MACHINE LEARNING MODELS IMPORTANT?
PROTECT
business
CONNECT
products &
services (IoT)
DRIVE
customer insights
The world is filled with prediction problems that require a new approach to software
● Predictive maintenance
● Logistics optimization
● Self-driving cars
● Medical diagnostics
● Marketing effectiveness
● Next best action
● Insider threat
prevention
● Fraud prevention
● Payment integrity
3 © Cloudera, Inc. All rights reserved.
SOFTWARE 1.0
Traditional development relies on hand-coded programs that map inputs to outputs
FUNCTION OUTPUTINPUT
x f(x) y
This approach is intractable or performs poorly for many problems that are easy for humans.
4 © Cloudera, Inc. All rights reserved.
SOFTWARE 2.0 - MACHINE LEARNING
FUNCTION OUTPUTINPUT
x f(θ,x) y
Machine learning searches for programs that map inputs to outputs effectively
Machine Learning
Model
5 © Cloudera, Inc. All rights reserved.
THE POWER OF MACHINE LEARNING
FUNCTION TEXTPIXELS
Machine learning enables software to address entirely new applications
Source: https://cs.stanford.edu/people/karpathy/deepimagesent/
6 © Cloudera, Inc. All rights reserved.
7 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING AT FACEBOOK, UBER, AND GOOGLE
Efficient, reliable, large-scale machine learning requires new supporting tools
Facebook
FBLearner
Uber
Michelangelo
Google
TFX
8 © Cloudera, Inc. All rights reserved.
ACCELERATING THREE STAGES OF MACHINE LEARNING
Manage models
Deploy models
Monitor performance
DEPLOYDEVELOP
Explore data
Develop models
Share results
TRAIN
Optimize parameters
Track experiments
Compare performance
Machine learning platforms can accelerate model development, training, and deployment
9 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Enables self-service data science at scale in secure environments
For data scientists
• Open data science
Use R, Python, or Scala with
your favorite libraries and on-
demand compute
• No need to sample
Directly access secure data
via Spark, Impala, or HDFS
• Reproducible,
collaborative research
Share with your whole team
For IT professionals
• Bring analysis to the data
Give data science team the
freedom to work how they
want, when they want
• Secure by default
Stay compliant with out-of-the-
box Hadoop security
• Flexible deployment
On-premises or in the cloud
10 © Cloudera, Inc. All rights reserved.
NEW IN CLOUDERA DATA SCIENCE WORKBENCH 1.4
Accelerate machine learning from research to production
DEVELOP MODELS
• Explore data securely and
develop models as a team
TRAIN MODELS
• Train, track, and compare
reproducible experiments
DEPLOY MODELS
• Deploy and monitor models
as APIs to serve predictions
NEW! NEW!
11 © Cloudera, Inc. All rights reserved.
RUN AND TRACK EXPERIMENTS
12 © Cloudera, Inc. All rights reserved.
CHALLENGE: REPRODUCIBLE RESEARCH
How do you know what model is better? How can you repeat a result?
• Model development is iterative
• Try different data, features, libraries, algorithms,
hyperparameters, etc.
• Reproducing a model means you need (at
least)...
• Training data
• Data/feature pipeline code
• Model training code + dependencies
• Runtime environment (CPU, GPU, memory, …)
• Any results or performance metrics
• This is a lot to keep track of!
13 © Cloudera, Inc. All rights reserved.
HOW THIS WORKS TODAY
WHY THIS IS A PROBLEM
• Wasted time and effort keeping
track of model/environment
changes
• Wasted time and effort trying to
recreate a result, especially for junior
data scientists / new team members
• Compliance risk due to inability to
explain the modeling process
• Source control?
• Unless you forget
• And maybe the library changes
• Or this…
• mymodel.py
• mymodel.final.py
• mymodel.final.final2.py
• mymodel.final.final2.noreallyfinal.py
• Post-it notes or notebook
• To keep track of performance metrics
14 © Cloudera, Inc. All rights reserved.
INTRODUCING EXPERIMENTS
Versioned model training runs for evaluation and reproducibility
Data scientists can now...
• Create a snapshot of model code,
dependencies, and configuration
necessary to train the model
• Build and execute the training run in an
isolated container
• Track specified model metrics,
performance, and model artifacts
• Inspect, compare, or deploy prior models
15 © Cloudera, Inc. All rights reserved.
DEMO
16 © Cloudera, Inc. All rights reserved.
DEPLOY AND MANAGE MODELS
17 © Cloudera, Inc. All rights reserved.
CHALLENGE: GETTING TO PRODUCTION
So you’ve got a trained model. Now what?
• Data scientists want to rapidly expose
candidate models to serve predictions
• REST APIs are the most requested approach
• Development and production are very different
• Owners: Data Scientists vs. Data Engineers
• Languages: Python/R vs. Java/Scala/C++
• Policy Controls: Approved code, packages, etc.
• Vocabulary: Data Science vs. DevOps
• Data scientists do not often have the skills (or
entitlements) to deploy models
18 © Cloudera, Inc. All rights reserved.
HOW THIS WORKS TODAY
WHY THIS IS A PROBLEM
• Can take days to months, raising the
bar for what’s worth deploying,
leading to stale models and reduced
innovation
• Many opportunities to introduce
errors and compliance risk
• Workarounds are difficult to
maintain while increasing the skills
gap
• Models get handed off to an
engineering team for re-coding
• Can you reproduce the model?
• Can you prove it’s the same?
• Or data science teams try to hack it
• Requires an entirely new set of tools (e.g.
Flask, Docker, Kubernetes, …)
• Getting an API up is different from
maintaining it (e.g. managing, monitoring,
testing, updating, scaling, …)
19 © Cloudera, Inc. All rights reserved.
INTRODUCING MODELS
Machine learning models as one-click microservices (REST APIs)
Model APIs made easy!
1. Choose Python/R file, e.g. score.py
2. Choose function, e.g. forecast
f = open('model.pk', 'rb')
model = pickle.load(f)
def forecast(data):
return model.predict(data)
3. Choose resources
4. Deploy!
20 © Cloudera, Inc. All rights reserved.
DEMO
21 © Cloudera, Inc. All rights reserved.
HOW IT WORKS
Experiments and Models leverage a new way of building images from source
• When running and experiment or deploying a model:
• Provides declarative pathway from version control to experiment or model
SOURCE
• Stage 1: Git snapshot of
source, respecting .gitignore
(before sure to ignore any local
Python/R environment)
IMAGE
• Stage 2: Docker build from
source; cdsw-build.sh
defines build steps, e.g.:
RUN
• Stage 3: Run versioned image
as Experiment (batch) or
Model (online) in Kubernetes
#!/bin/bash
pip3 install -r requirements.txt
22 © Cloudera, Inc. All rights reserved.
Not all machine learning applications have the same requirements
AS ALWAYS, USE THE RIGHT TOOL FOR THE JOB
• Not all models need online scoring -- batch scoring is simple and reliable
• Latency sensitive models are often better deployed to the edge
• mobile applications
• autonomous vehicles
• high-frequency trading
• Self-service deployment is not always appropriate
• Facebook scale (200 trillion predictions per day)
• Prediction services with high SLAs
But self-service model deployment enables rapid delivery for many ML use cases
and should be an available tool in every agile enterprise.
Confidential-Restricted – For Discussion Purposes Only23 © Cloudera, Inc. All rights reserved.
BENEFITS OF AN UNIFIED MACHINE LEARNING PLATFORM
RESEARCH EXPERIENCE
✓ Faster iteration, with confidence
✓ Easier to identify best models
✓ Easier to reproduce/explain work
✓ Easier to onboard team members
DEPLOYMENT EXPERIENCE
✓ Faster business impact
✓ Easier to deploy more models
✓ Easier to reproduce/explain work
✓ Easier to access CDH data/compute
OPERATIONAL EXPERIENCE
✓ Lower cost and risk for managing models
✓ Easier to support through self-service
✓ Easier to scale a shared environment
✓ Easier to control access
THANK YOU

More Related Content

What's hot

Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceCloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019 Timothy Spann
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enoughCloudera, Inc.
 

What's hot (20)

Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 

Similar to Machine Learning Models: From Research to Production 6.13.18

Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionFlorian Wilhelm
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
 
Manoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentManoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentAgile Impact Conference
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprisedoppenhe
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-useltonrodriguez11
 
Lo Scenario Cloud-Native (Pivotal Cloud-Native Workshop: Milan)
Lo Scenario Cloud-Native (Pivotal Cloud-Native Workshop: Milan)Lo Scenario Cloud-Native (Pivotal Cloud-Native Workshop: Milan)
Lo Scenario Cloud-Native (Pivotal Cloud-Native Workshop: Milan)VMware Tanzu
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsDatabricks
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the EnterpriseThe Hive
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsNicolas (Nick) Barcet
 
Re-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudCarter Wickstrom
 
Securing your Machine Learning models
Securing your Machine Learning modelsSecuring your Machine Learning models
Securing your Machine Learning modelsPhilipBasford
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWDataWorks Summit
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science PlatformDecision Science Community
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitRafael Arana
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime PlatformAlexey Kharlamov
 

Similar to Machine Learning Models: From Research to Production 6.13.18 (20)

Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Manoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentManoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning Development
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
 
Lo Scenario Cloud-Native (Pivotal Cloud-Native Workshop: Milan)
Lo Scenario Cloud-Native (Pivotal Cloud-Native Workshop: Milan)Lo Scenario Cloud-Native (Pivotal Cloud-Native Workshop: Milan)
Lo Scenario Cloud-Native (Pivotal Cloud-Native Workshop: Milan)
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
 
Re-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the Cloud
 
Securing your Machine Learning models
Securing your Machine Learning modelsSecuring your Machine Learning models
Securing your Machine Learning models
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science Platform
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime Platform
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Cloudera, Inc.
 

More from Cloudera, Inc. (19)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Machine Learning Models: From Research to Production 6.13.18

  • 1. MACHINE LEARNING MODELS: FROM RESEARCH TO PRODUCTION Tristan Zajonc | Head of Machine Learning Engineering
  • 2. 2 © Cloudera, Inc. All rights reserved. WHY ARE MACHINE LEARNING MODELS IMPORTANT? PROTECT business CONNECT products & services (IoT) DRIVE customer insights The world is filled with prediction problems that require a new approach to software ● Predictive maintenance ● Logistics optimization ● Self-driving cars ● Medical diagnostics ● Marketing effectiveness ● Next best action ● Insider threat prevention ● Fraud prevention ● Payment integrity
  • 3. 3 © Cloudera, Inc. All rights reserved. SOFTWARE 1.0 Traditional development relies on hand-coded programs that map inputs to outputs FUNCTION OUTPUTINPUT x f(x) y This approach is intractable or performs poorly for many problems that are easy for humans.
  • 4. 4 © Cloudera, Inc. All rights reserved. SOFTWARE 2.0 - MACHINE LEARNING FUNCTION OUTPUTINPUT x f(θ,x) y Machine learning searches for programs that map inputs to outputs effectively Machine Learning Model
  • 5. 5 © Cloudera, Inc. All rights reserved. THE POWER OF MACHINE LEARNING FUNCTION TEXTPIXELS Machine learning enables software to address entirely new applications Source: https://cs.stanford.edu/people/karpathy/deepimagesent/
  • 6. 6 © Cloudera, Inc. All rights reserved.
  • 7. 7 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT FACEBOOK, UBER, AND GOOGLE Efficient, reliable, large-scale machine learning requires new supporting tools Facebook FBLearner Uber Michelangelo Google TFX
  • 8. 8 © Cloudera, Inc. All rights reserved. ACCELERATING THREE STAGES OF MACHINE LEARNING Manage models Deploy models Monitor performance DEPLOYDEVELOP Explore data Develop models Share results TRAIN Optimize parameters Track experiments Compare performance Machine learning platforms can accelerate model development, training, and deployment
  • 9. 9 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Enables self-service data science at scale in secure environments For data scientists • Open data science Use R, Python, or Scala with your favorite libraries and on- demand compute • No need to sample Directly access secure data via Spark, Impala, or HDFS • Reproducible, collaborative research Share with your whole team For IT professionals • Bring analysis to the data Give data science team the freedom to work how they want, when they want • Secure by default Stay compliant with out-of-the- box Hadoop security • Flexible deployment On-premises or in the cloud
  • 10. 10 © Cloudera, Inc. All rights reserved. NEW IN CLOUDERA DATA SCIENCE WORKBENCH 1.4 Accelerate machine learning from research to production DEVELOP MODELS • Explore data securely and develop models as a team TRAIN MODELS • Train, track, and compare reproducible experiments DEPLOY MODELS • Deploy and monitor models as APIs to serve predictions NEW! NEW!
  • 11. 11 © Cloudera, Inc. All rights reserved. RUN AND TRACK EXPERIMENTS
  • 12. 12 © Cloudera, Inc. All rights reserved. CHALLENGE: REPRODUCIBLE RESEARCH How do you know what model is better? How can you repeat a result? • Model development is iterative • Try different data, features, libraries, algorithms, hyperparameters, etc. • Reproducing a model means you need (at least)... • Training data • Data/feature pipeline code • Model training code + dependencies • Runtime environment (CPU, GPU, memory, …) • Any results or performance metrics • This is a lot to keep track of!
  • 13. 13 © Cloudera, Inc. All rights reserved. HOW THIS WORKS TODAY WHY THIS IS A PROBLEM • Wasted time and effort keeping track of model/environment changes • Wasted time and effort trying to recreate a result, especially for junior data scientists / new team members • Compliance risk due to inability to explain the modeling process • Source control? • Unless you forget • And maybe the library changes • Or this… • mymodel.py • mymodel.final.py • mymodel.final.final2.py • mymodel.final.final2.noreallyfinal.py • Post-it notes or notebook • To keep track of performance metrics
  • 14. 14 © Cloudera, Inc. All rights reserved. INTRODUCING EXPERIMENTS Versioned model training runs for evaluation and reproducibility Data scientists can now... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models
  • 15. 15 © Cloudera, Inc. All rights reserved. DEMO
  • 16. 16 © Cloudera, Inc. All rights reserved. DEPLOY AND MANAGE MODELS
  • 17. 17 © Cloudera, Inc. All rights reserved. CHALLENGE: GETTING TO PRODUCTION So you’ve got a trained model. Now what? • Data scientists want to rapidly expose candidate models to serve predictions • REST APIs are the most requested approach • Development and production are very different • Owners: Data Scientists vs. Data Engineers • Languages: Python/R vs. Java/Scala/C++ • Policy Controls: Approved code, packages, etc. • Vocabulary: Data Science vs. DevOps • Data scientists do not often have the skills (or entitlements) to deploy models
  • 18. 18 © Cloudera, Inc. All rights reserved. HOW THIS WORKS TODAY WHY THIS IS A PROBLEM • Can take days to months, raising the bar for what’s worth deploying, leading to stale models and reduced innovation • Many opportunities to introduce errors and compliance risk • Workarounds are difficult to maintain while increasing the skills gap • Models get handed off to an engineering team for re-coding • Can you reproduce the model? • Can you prove it’s the same? • Or data science teams try to hack it • Requires an entirely new set of tools (e.g. Flask, Docker, Kubernetes, …) • Getting an API up is different from maintaining it (e.g. managing, monitoring, testing, updating, scaling, …)
  • 19. 19 © Cloudera, Inc. All rights reserved. INTRODUCING MODELS Machine learning models as one-click microservices (REST APIs) Model APIs made easy! 1. Choose Python/R file, e.g. score.py 2. Choose function, e.g. forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data) 3. Choose resources 4. Deploy!
  • 20. 20 © Cloudera, Inc. All rights reserved. DEMO
  • 21. 21 © Cloudera, Inc. All rights reserved. HOW IT WORKS Experiments and Models leverage a new way of building images from source • When running and experiment or deploying a model: • Provides declarative pathway from version control to experiment or model SOURCE • Stage 1: Git snapshot of source, respecting .gitignore (before sure to ignore any local Python/R environment) IMAGE • Stage 2: Docker build from source; cdsw-build.sh defines build steps, e.g.: RUN • Stage 3: Run versioned image as Experiment (batch) or Model (online) in Kubernetes #!/bin/bash pip3 install -r requirements.txt
  • 22. 22 © Cloudera, Inc. All rights reserved. Not all machine learning applications have the same requirements AS ALWAYS, USE THE RIGHT TOOL FOR THE JOB • Not all models need online scoring -- batch scoring is simple and reliable • Latency sensitive models are often better deployed to the edge • mobile applications • autonomous vehicles • high-frequency trading • Self-service deployment is not always appropriate • Facebook scale (200 trillion predictions per day) • Prediction services with high SLAs But self-service model deployment enables rapid delivery for many ML use cases and should be an available tool in every agile enterprise.
  • 23. Confidential-Restricted – For Discussion Purposes Only23 © Cloudera, Inc. All rights reserved. BENEFITS OF AN UNIFIED MACHINE LEARNING PLATFORM RESEARCH EXPERIENCE ✓ Faster iteration, with confidence ✓ Easier to identify best models ✓ Easier to reproduce/explain work ✓ Easier to onboard team members DEPLOYMENT EXPERIENCE ✓ Faster business impact ✓ Easier to deploy more models ✓ Easier to reproduce/explain work ✓ Easier to access CDH data/compute OPERATIONAL EXPERIENCE ✓ Lower cost and risk for managing models ✓ Easier to support through self-service ✓ Easier to scale a shared environment ✓ Easier to control access