MACHINE LEARNING MODEL DEPLOYMENT
From Strategy to Implementation
2 © Cloudera, Inc. All rights reserved.
ABOUT ME
• Head of Cloudera’s Fast Forward Labs ML research and consulting
team
• Built and scaled numerous production ML systems and teams
spanning government, B2B and consumer organizations
• Tech blogger. Musician. Twitter: @justinJDN
•
Justin Norman
Director DS & Research Svcs
3 © Cloudera, Inc. All rights reserved.
ABOUT ME
• Cloudera Strategic Solutions Architect focused on Data Science
and Machine Learning
• Developed and deployed models across diverse verticals such
as Finance, Healthcare, etc.
• Frequent speaker at Big Data Conferences including Oreilly
Strata etc.
Sagar Kewalramani
Solutions Architect, Professional
Services
4 © Cloudera, Inc. All rights reserved.
• Google predicts
commute times.
ML IS
EVERYWHERE
Google didn’t set out to make a
traffic tool.
Apple isn’t in the facial recognition
business.
• Apple predicts facial
matches.
• Dozens of other ML-
powered models in
your phone today.
5 © Cloudera, Inc. All rights reserved.
ML IS AT THE HEART OF TRANSFORMATION
AI
MACHINE
LEARNING
DATA SCIENCE
ANALYTICS
"BIG DATA"
Probabilistic
Deterministic
What could happen?
What happened?
6 © Cloudera, Inc. All rights reserved.
WHAT IS PRODUCTION ML?
Data
Engineering
Business
Inputs
Data Science
Production Machine Learning
Packaging*
Pipeline
Hardening
(Data
Engineering)
Model
Hardening
Deploy Monitoring
MODEL SECURITY
MODEL
GOVERNANCE
DATA CATALOG
MODEL CATALOG FEATURE CATALOG
7 © Cloudera, Inc. All rights reserved.
WHICH TEAM ROLES ARE INVOLVED?
DATA ENGINEERING
DATA SCIENCE
PRODUCTION ML
DATA
PREP
PIPELINES
DATA MODELING
DATA
TRANSFORMATION
DATA INGEST JOB
MONITORING
TRAINING
DATA
DISCOVERY
JOB TUNING
EXPERIMENTATION
PROTOTYPING
MODEL
DEPLOYMENT
MODEL
MONITORING
DATA
MONITORING
8 © Cloudera, Inc. All rights reserved.
WHAT ARE THE KEY SKILLS?
Big Data
Platform
ML/AI
Frameworks
Container
Infrastructure
Orchestration
9 © Cloudera, Inc. All rights reserved.
WHAT IS A MODEL ANYWAY?
Taking many forms, an algorithm designed to make predictions based on data input
{key, value} - Prediction
- Metadata
Monitoring
Business
SystemsUpstream
Systems
Model
Batch or Stream
10 © Cloudera, Inc. All rights reserved.
HIDDEN TECHNICAL DEBT IN ML SYSTEMS
Google Paper
Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The
required surrounding infrastructure is vast and complex.
11 © Cloudera, Inc. All rights reserved.
SAMPLE DATA SCIENCE / ML WORKFLOW
From Data Exploration to Action
12 © Cloudera, Inc. All rights reserved.
CHALLENGES
Tools, Platforms, Data
?
13 © Cloudera, Inc. All rights reserved.
CHALLENGES
Recipes, not Cakes
Recode
Deployment Expectations
• Support A/B testing
• Support
Experiments
• Support measuring
& Evaluating model
performance
• Deployment should
be fast and adaptive
to business needs
14 © Cloudera, Inc. All rights reserved.
SUMMARY OF CHALLENGES
• Access
For sensitive data, secure clusters are
difficult to access. No shared security
• Flexibility
IT typically doesn’t want random
packages installed on a secure cluster.
• Tools
Popular open source tools don’t easily
connect to these environments, or
always support Hadoop data formats.
Nothing supports full workflow
• Scale
Laptops rarely have capacity for
medium, let alone big data. This
leads to a lot of sampling.
• Parallelism
Popular frameworks don’t easily
parallelize on a cluster. Typically
code has to get rewritten for
production.
• Security
Data being pulled into laptops
• Developer Experience
Notebooks, while awesome, don’t
easily support virtual environment
and dependency management,
especially for teams.
• Collaboration
No easy way to share code between
teams
• Deployment
Notebooks are also challenging to
“put into production.”
15 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING AT UBER, NETFLIX, AND FACEBOOK
Industrialized AI requires requires new supporting tools and platforms
Facebook
FBLearner
Uber
Michelangelo
Netflix
Recommendation
Platform
16 © Cloudera, Inc. All rights reserved.
ML AT SCALE REQUIRES A UNIFIED DATA STRATEGY
Streaming
Ingest
Batch Ingest
Machine
Learning Tools
BI Tools and
SQL Editors
Data Products
DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT
MACHINE
LEARNING
DATA
ENGINEERING
DATA
WAREHOUSE
OPERATIONAL
DATABASE
© Cloudera, Inc. All rights reserved.17 © Cloudera, Inc. All rights reserved.
YOU’VE GOT OPTIONS…
Model Dev, Training, Deployment & Monitoring
© Cloudera, Inc. All rights reserved.18 © Cloudera, Inc. All rights reserved.
MODEL DEVELOPMENT
19 © Cloudera, Inc. All rights reserved.
EVERYONE HAS AN OPINION
• Should enable collaboration and code reuse
(git integration)
• Should support open-source frameworks and
libraries
• Must handle dependencies and isolates dev
environment for and individual session
• Can scale compute resources/up down when
needed
• Doesn’t require you to move data to use it!
© Cloudera, Inc. All rights reserved.20 © Cloudera, Inc. All rights reserved.
TRAINING & EXPERIMENTS
© Cloudera, Inc. All rights reserved.21 © Cloudera, Inc. All rights reserved.
A/B TESTING & MULTIVARIATE TESTING FOR THE MODEL
Is the best trained model indeed the best model, or does a different model
perform better on new, unseen data?
MODEL
VARIATION A
MODEL
VARIATION B
INCOMING
TRAFFIC
Data scientists need ...
• A framework to identify the best performers
among a competing set of models
• To evaluate models which can maximize
business KPIs
• Track specified model metrics, performance,
and model artifacts
• Inspect, & compare deployed models
© Cloudera, Inc. All rights reserved.22 © Cloudera, Inc. All rights reserved.
EXPERIMENT MANAGEMENT
Versioned, reproducible model training & evaluation runs
Data scientists need to ...
• Create a snapshot of model code, dependencies,
and configuration necessary to train the model
• Build and execute the training run in an isolated
container
• Track specified model metrics, performance,
and model artifacts
• Inspect, compare, or deploy prior models
Many options of varying maturity and don’t all
play well with other ecosystem tools
Sacred
Proprietary
Open-Source
© Cloudera, Inc. All rights reserved.23 © Cloudera, Inc. All rights reserved.
MODEL DEPLOYMENT
24 © Cloudera, Inc. All rights reserved.
MODEL DEPLOYMENT PATTERNS
Knowing how business metrics will be improved help guide deployment options
Managers use data to make better
decisions
Centrally automate internal
decisions
Centrally automate customer-
facing decisions
Automate decisions at the edge
Batch Scoring, Hosted
Real Time Scoring, Hosted
Real Time Scoring, Data Flow + Custom
Monitoring
Real Time Scoring, Device Embedded
© Cloudera, Inc. All rights reserved.25 © Cloudera, Inc. All rights reserved.
MODEL DEPLOYMENT APPROACH : TECHNOLOGICAL VS COST BENEFITS
DIFFERENT MODEL DEPLOYMENT FORMATS
NATIVE JAVA/C++ MODEL
• Faster
• Limitation of Available Algo/DS Libraries
HYBRID APPROACH PMML:
• Compatibility across multiple tools
• Non Agile
• Not flexible in terms of deployment
PYTHON STACK
• PMML files are big
• Unit testing is tricky
API POWERED MODEL:
• Agile
• Scalable
• Can be used by both backend & fronted
• Faster
API POWERED
MODEL
HYBRID APPROACH
PMML
REBUILD THE
WHOLE STACK
TO PYTHON
NATIVE JAVA / C++
MODELS
COST $
TECHNOLOGICAL BENEFITS
© Cloudera, Inc. All rights reserved.26 © Cloudera, Inc. All rights reserved.
MONITORING
© Cloudera, Inc. All rights reserved.27 © Cloudera, Inc. All rights reserved.
MONITORING STATS
SCHEDULE & MONITOR
Production ML needs...
● A Monitoring mechanism that is model-agnostic
● Instrumentation of both the data flow in and the model performance metrics out
● To Collect Performance Metrics (e.g., accuracy, RMSE, ,Mean Absolute Error(MAE) )
© Cloudera, Inc. All rights reserved.28 © Cloudera, Inc. All rights reserved.
CLOUDERA ML APPROACH
Modern enterprise platform, tools and expert guidance to add SPEED and SCALE
Agile platform to build,
train, and deploy many
scalable ML applications
Enterprise data science
tools to accelerate
team productivity
Expert guidance,
services & training to
fast track value & scale
© Cloudera, Inc. All rights reserved.29 © Cloudera, Inc. All rights reserved.
ACCELERATING THREE STAGES OF MACHINE LEARNING
Enterprise AI platform supporting model development, training, and deployment
Manage models
Deploy models
Monitor performance
DEPLOYDEVELOP
Explore data
Develop models
Share results
TRAIN
Optimize parameters
Track experiments
Compare performance
© Cloudera, Inc. All rights reserved.30 © Cloudera, Inc. All rights reserved.
ACCELERATING MACHINE LEARNING
Lego Block for ML: Like a containerized edge node
Wrap with REST endpoint
Online Scoring
JSON in, JSON out
MODELSSESSIONS
Interactive session for
exploration and
development
EXPERIMENTS
Initiate and track
Like a lab notebook
Export artifacts to project
Runtime
Engine:
Kernels (R/Python/Scala)
Common Libraries
FS Mounts:
CDH - Parcel Dir
RPM - Hadoop Config Files
Project Dir:
Code
Files
Libraries
Dependencies
JOBS
Scheduled
Run a particular code end-to-
end
New snapshots retain history
Point in time
Git snapshot
© Cloudera, Inc. All rights reserved.31 © Cloudera, Inc. All rights reserved.
DEMO
© Cloudera, Inc. All rights reserved.32 © Cloudera, Inc. All rights reserved.
SELF-SERVICE
CLOUDERA DATA SCIENCE WORKBENCH
© Cloudera, Inc. All rights reserved.33 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Bringing the data scientists TO the data in a way that they want to work
For data scientists
• Experiment faster
Use R, Python, or Scala with
on-demand compute and
secure CDH/HDP data access
• Work together
Share reproducible research
with your whole team
• Deploy with confidence
Get to production repeatably
and without recoding
For IT professionals
• Bring data science to the data
Give your data science team
more freedom while reducing
the risk and cost of silos
• Secure by default
Leverage common security
and governance across
workloads
• Run anywhere
On-premises or in the cloud
© Cloudera, Inc. All rights reserved.34 © Cloudera, Inc. All rights reserved.
CDSW MODELS
Machine learning models as one-click microservices (REST APIs)
1. Choose file, e.g. score.py
2. Choose function, e.g. forecast
f = open('model.pk', 'rb')
model = pickle.load(f)
def forecast(data):
return model.predict(data)
3. Choose resources
4. Deploy!
Running model containers also have access to CDH
for data lookups.
© Cloudera, Inc. All rights reserved.35 © Cloudera, Inc. All rights reserved.
CDSW EXPERIMENTS
Versioned model training runs for evaluation and reproducibility
Data scientists can ...
• Create a snapshot of model code, dependencies,
and configuration necessary to train the model
• Build and execute the training run in an isolated
container
• Track specified model metrics, performance,
and model artifacts
• Inspect, compare, or deploy prior models
© Cloudera, Inc. All rights reserved.36 © Cloudera, Inc. All rights reserved.
MODEL MANAGEMENT
View, test, monitor, and update models by team or project
© Cloudera, Inc. All rights reserved.37 © Cloudera, Inc. All rights reserved.
CDSW JOBS TO ORCHESTRATE BATCH SCORING
Schedule reports & scoring to run on a periodic basis
Scheduling is easy and powerful
●Execute arbitrary scripts
●Schedule on a recurring basis
●Create dependencies on other jobs for
complex pipelines
●Allow output to be sent via email to
recipients
© Cloudera, Inc. All rights reserved.38 © Cloudera, Inc. All rights reserved.
SUMMARY OF FEATURES
End-to-End
Workflow
Support
• Development
• Train
• Deployment
Collaboration
• Teams
• Sharing
• Good coding
practices (Git)
Security and
Governance
• Transparent
• Leverages
underlying
frameworks
• No data
movement
• Reproducibility
Openness and
Self-service
• Any
framework
• Isolated for
individual
effectiveness
• Simplified
dependency
management
© Cloudera, Inc. All rights reserved.
THANK YOU

Machine Learning Model Deployment: Strategy to Implementation

  • 1.
    MACHINE LEARNING MODELDEPLOYMENT From Strategy to Implementation
  • 2.
    2 © Cloudera,Inc. All rights reserved. ABOUT ME • Head of Cloudera’s Fast Forward Labs ML research and consulting team • Built and scaled numerous production ML systems and teams spanning government, B2B and consumer organizations • Tech blogger. Musician. Twitter: @justinJDN • Justin Norman Director DS & Research Svcs
  • 3.
    3 © Cloudera,Inc. All rights reserved. ABOUT ME • Cloudera Strategic Solutions Architect focused on Data Science and Machine Learning • Developed and deployed models across diverse verticals such as Finance, Healthcare, etc. • Frequent speaker at Big Data Conferences including Oreilly Strata etc. Sagar Kewalramani Solutions Architect, Professional Services
  • 4.
    4 © Cloudera,Inc. All rights reserved. • Google predicts commute times. ML IS EVERYWHERE Google didn’t set out to make a traffic tool. Apple isn’t in the facial recognition business. • Apple predicts facial matches. • Dozens of other ML- powered models in your phone today.
  • 5.
    5 © Cloudera,Inc. All rights reserved. ML IS AT THE HEART OF TRANSFORMATION AI MACHINE LEARNING DATA SCIENCE ANALYTICS "BIG DATA" Probabilistic Deterministic What could happen? What happened?
  • 6.
    6 © Cloudera,Inc. All rights reserved. WHAT IS PRODUCTION ML? Data Engineering Business Inputs Data Science Production Machine Learning Packaging* Pipeline Hardening (Data Engineering) Model Hardening Deploy Monitoring MODEL SECURITY MODEL GOVERNANCE DATA CATALOG MODEL CATALOG FEATURE CATALOG
  • 7.
    7 © Cloudera,Inc. All rights reserved. WHICH TEAM ROLES ARE INVOLVED? DATA ENGINEERING DATA SCIENCE PRODUCTION ML DATA PREP PIPELINES DATA MODELING DATA TRANSFORMATION DATA INGEST JOB MONITORING TRAINING DATA DISCOVERY JOB TUNING EXPERIMENTATION PROTOTYPING MODEL DEPLOYMENT MODEL MONITORING DATA MONITORING
  • 8.
    8 © Cloudera,Inc. All rights reserved. WHAT ARE THE KEY SKILLS? Big Data Platform ML/AI Frameworks Container Infrastructure Orchestration
  • 9.
    9 © Cloudera,Inc. All rights reserved. WHAT IS A MODEL ANYWAY? Taking many forms, an algorithm designed to make predictions based on data input {key, value} - Prediction - Metadata Monitoring Business SystemsUpstream Systems Model Batch or Stream
  • 10.
    10 © Cloudera,Inc. All rights reserved. HIDDEN TECHNICAL DEBT IN ML SYSTEMS Google Paper Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.
  • 11.
    11 © Cloudera,Inc. All rights reserved. SAMPLE DATA SCIENCE / ML WORKFLOW From Data Exploration to Action
  • 12.
    12 © Cloudera,Inc. All rights reserved. CHALLENGES Tools, Platforms, Data ?
  • 13.
    13 © Cloudera,Inc. All rights reserved. CHALLENGES Recipes, not Cakes Recode Deployment Expectations • Support A/B testing • Support Experiments • Support measuring & Evaluating model performance • Deployment should be fast and adaptive to business needs
  • 14.
    14 © Cloudera,Inc. All rights reserved. SUMMARY OF CHALLENGES • Access For sensitive data, secure clusters are difficult to access. No shared security • Flexibility IT typically doesn’t want random packages installed on a secure cluster. • Tools Popular open source tools don’t easily connect to these environments, or always support Hadoop data formats. Nothing supports full workflow • Scale Laptops rarely have capacity for medium, let alone big data. This leads to a lot of sampling. • Parallelism Popular frameworks don’t easily parallelize on a cluster. Typically code has to get rewritten for production. • Security Data being pulled into laptops • Developer Experience Notebooks, while awesome, don’t easily support virtual environment and dependency management, especially for teams. • Collaboration No easy way to share code between teams • Deployment Notebooks are also challenging to “put into production.”
  • 15.
    15 © Cloudera,Inc. All rights reserved. MACHINE LEARNING AT UBER, NETFLIX, AND FACEBOOK Industrialized AI requires requires new supporting tools and platforms Facebook FBLearner Uber Michelangelo Netflix Recommendation Platform
  • 16.
    16 © Cloudera,Inc. All rights reserved. ML AT SCALE REQUIRES A UNIFIED DATA STRATEGY Streaming Ingest Batch Ingest Machine Learning Tools BI Tools and SQL Editors Data Products DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT MACHINE LEARNING DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE
  • 17.
    © Cloudera, Inc.All rights reserved.17 © Cloudera, Inc. All rights reserved. YOU’VE GOT OPTIONS… Model Dev, Training, Deployment & Monitoring
  • 18.
    © Cloudera, Inc.All rights reserved.18 © Cloudera, Inc. All rights reserved. MODEL DEVELOPMENT
  • 19.
    19 © Cloudera,Inc. All rights reserved. EVERYONE HAS AN OPINION • Should enable collaboration and code reuse (git integration) • Should support open-source frameworks and libraries • Must handle dependencies and isolates dev environment for and individual session • Can scale compute resources/up down when needed • Doesn’t require you to move data to use it!
  • 20.
    © Cloudera, Inc.All rights reserved.20 © Cloudera, Inc. All rights reserved. TRAINING & EXPERIMENTS
  • 21.
    © Cloudera, Inc.All rights reserved.21 © Cloudera, Inc. All rights reserved. A/B TESTING & MULTIVARIATE TESTING FOR THE MODEL Is the best trained model indeed the best model, or does a different model perform better on new, unseen data? MODEL VARIATION A MODEL VARIATION B INCOMING TRAFFIC Data scientists need ... • A framework to identify the best performers among a competing set of models • To evaluate models which can maximize business KPIs • Track specified model metrics, performance, and model artifacts • Inspect, & compare deployed models
  • 22.
    © Cloudera, Inc.All rights reserved.22 © Cloudera, Inc. All rights reserved. EXPERIMENT MANAGEMENT Versioned, reproducible model training & evaluation runs Data scientists need to ... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models Many options of varying maturity and don’t all play well with other ecosystem tools Sacred Proprietary Open-Source
  • 23.
    © Cloudera, Inc.All rights reserved.23 © Cloudera, Inc. All rights reserved. MODEL DEPLOYMENT
  • 24.
    24 © Cloudera,Inc. All rights reserved. MODEL DEPLOYMENT PATTERNS Knowing how business metrics will be improved help guide deployment options Managers use data to make better decisions Centrally automate internal decisions Centrally automate customer- facing decisions Automate decisions at the edge Batch Scoring, Hosted Real Time Scoring, Hosted Real Time Scoring, Data Flow + Custom Monitoring Real Time Scoring, Device Embedded
  • 25.
    © Cloudera, Inc.All rights reserved.25 © Cloudera, Inc. All rights reserved. MODEL DEPLOYMENT APPROACH : TECHNOLOGICAL VS COST BENEFITS DIFFERENT MODEL DEPLOYMENT FORMATS NATIVE JAVA/C++ MODEL • Faster • Limitation of Available Algo/DS Libraries HYBRID APPROACH PMML: • Compatibility across multiple tools • Non Agile • Not flexible in terms of deployment PYTHON STACK • PMML files are big • Unit testing is tricky API POWERED MODEL: • Agile • Scalable • Can be used by both backend & fronted • Faster API POWERED MODEL HYBRID APPROACH PMML REBUILD THE WHOLE STACK TO PYTHON NATIVE JAVA / C++ MODELS COST $ TECHNOLOGICAL BENEFITS
  • 26.
    © Cloudera, Inc.All rights reserved.26 © Cloudera, Inc. All rights reserved. MONITORING
  • 27.
    © Cloudera, Inc.All rights reserved.27 © Cloudera, Inc. All rights reserved. MONITORING STATS SCHEDULE & MONITOR Production ML needs... ● A Monitoring mechanism that is model-agnostic ● Instrumentation of both the data flow in and the model performance metrics out ● To Collect Performance Metrics (e.g., accuracy, RMSE, ,Mean Absolute Error(MAE) )
  • 28.
    © Cloudera, Inc.All rights reserved.28 © Cloudera, Inc. All rights reserved. CLOUDERA ML APPROACH Modern enterprise platform, tools and expert guidance to add SPEED and SCALE Agile platform to build, train, and deploy many scalable ML applications Enterprise data science tools to accelerate team productivity Expert guidance, services & training to fast track value & scale
  • 29.
    © Cloudera, Inc.All rights reserved.29 © Cloudera, Inc. All rights reserved. ACCELERATING THREE STAGES OF MACHINE LEARNING Enterprise AI platform supporting model development, training, and deployment Manage models Deploy models Monitor performance DEPLOYDEVELOP Explore data Develop models Share results TRAIN Optimize parameters Track experiments Compare performance
  • 30.
    © Cloudera, Inc.All rights reserved.30 © Cloudera, Inc. All rights reserved. ACCELERATING MACHINE LEARNING Lego Block for ML: Like a containerized edge node Wrap with REST endpoint Online Scoring JSON in, JSON out MODELSSESSIONS Interactive session for exploration and development EXPERIMENTS Initiate and track Like a lab notebook Export artifacts to project Runtime Engine: Kernels (R/Python/Scala) Common Libraries FS Mounts: CDH - Parcel Dir RPM - Hadoop Config Files Project Dir: Code Files Libraries Dependencies JOBS Scheduled Run a particular code end-to- end New snapshots retain history Point in time Git snapshot
  • 31.
    © Cloudera, Inc.All rights reserved.31 © Cloudera, Inc. All rights reserved. DEMO
  • 32.
    © Cloudera, Inc.All rights reserved.32 © Cloudera, Inc. All rights reserved. SELF-SERVICE CLOUDERA DATA SCIENCE WORKBENCH
  • 33.
    © Cloudera, Inc.All rights reserved.33 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Bringing the data scientists TO the data in a way that they want to work For data scientists • Experiment faster Use R, Python, or Scala with on-demand compute and secure CDH/HDP data access • Work together Share reproducible research with your whole team • Deploy with confidence Get to production repeatably and without recoding For IT professionals • Bring data science to the data Give your data science team more freedom while reducing the risk and cost of silos • Secure by default Leverage common security and governance across workloads • Run anywhere On-premises or in the cloud
  • 34.
    © Cloudera, Inc.All rights reserved.34 © Cloudera, Inc. All rights reserved. CDSW MODELS Machine learning models as one-click microservices (REST APIs) 1. Choose file, e.g. score.py 2. Choose function, e.g. forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data) 3. Choose resources 4. Deploy! Running model containers also have access to CDH for data lookups.
  • 35.
    © Cloudera, Inc.All rights reserved.35 © Cloudera, Inc. All rights reserved. CDSW EXPERIMENTS Versioned model training runs for evaluation and reproducibility Data scientists can ... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models
  • 36.
    © Cloudera, Inc.All rights reserved.36 © Cloudera, Inc. All rights reserved. MODEL MANAGEMENT View, test, monitor, and update models by team or project
  • 37.
    © Cloudera, Inc.All rights reserved.37 © Cloudera, Inc. All rights reserved. CDSW JOBS TO ORCHESTRATE BATCH SCORING Schedule reports & scoring to run on a periodic basis Scheduling is easy and powerful ●Execute arbitrary scripts ●Schedule on a recurring basis ●Create dependencies on other jobs for complex pipelines ●Allow output to be sent via email to recipients
  • 38.
    © Cloudera, Inc.All rights reserved.38 © Cloudera, Inc. All rights reserved. SUMMARY OF FEATURES End-to-End Workflow Support • Development • Train • Deployment Collaboration • Teams • Sharing • Good coding practices (Git) Security and Governance • Transparent • Leverages underlying frameworks • No data movement • Reproducibility Openness and Self-service • Any framework • Isolated for individual effectiveness • Simplified dependency management
  • 39.
    © Cloudera, Inc.All rights reserved. THANK YOU