Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning Model Deployment: Strategy to Implementation

1,918 views

Published on

This talk will introduce participants to the theory and practice of machine learning in production. The talk will begin with an intro on machine learning models and data science systems and then discuss data pipelines, containerization, real-time vs. batch processing, change management and versioning.

As part of this talk, an audience will learn more about:
• How data scientists can have the complete self-service capability to rapidly build, train, and deploy machine learning models.
• How organizations can accelerate machine learning from research to production while preserving the flexibility and agility of data scientists and modern business use cases demand.

A small demo will showcase how to rapidly build, train, and deploy machine learning models in R, python, and Spark, and continue with a discussion of API services, RESTful wrappers/Docker, PMML/PFA, Onyx, SQLServer embedded models, and
lambda functions.

Speakers
Sagar Kewalramani, Solutions Architect
Cloudera
Justin Norman, Director, Research and Data Science Services
Cloudera Fast Forward Labs

Published in: Technology
  • Thanks for the help. I also ordered from www.HelpWriting.net
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • www.HelpWriting.net helped me too. I always order there
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • My friend sent me a link to to tis site. This awesome company. They wrote my entire research paper for me, and it turned out brilliantly. I highly recommend this service to anyone in my shoes. ⇒ www.HelpWriting.net ⇐.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • John Buffi is a retired police offer who lost his home to Superstorm Sandy. He now uses the "Demolisher" system to help take care of his 91-year-old father and children. John says: "My only statement is "WOW"...I thought your other systems were special but this is going to turn out to be the " Holy Grail" of all MLB systems, no doubt! ■■■ https://bit.ly/2TSt66k
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Machine Learning Model Deployment: Strategy to Implementation

  1. 1. MACHINE LEARNING MODEL DEPLOYMENT From Strategy to Implementation
  2. 2. 2 © Cloudera, Inc. All rights reserved. ABOUT ME • Head of Cloudera’s Fast Forward Labs ML research and consulting team • Built and scaled numerous production ML systems and teams spanning government, B2B and consumer organizations • Tech blogger. Musician. Twitter: @justinJDN • Justin Norman Director DS & Research Svcs
  3. 3. 3 © Cloudera, Inc. All rights reserved. ABOUT ME • Cloudera Strategic Solutions Architect focused on Data Science and Machine Learning • Developed and deployed models across diverse verticals such as Finance, Healthcare, etc. • Frequent speaker at Big Data Conferences including Oreilly Strata etc. Sagar Kewalramani Solutions Architect, Professional Services
  4. 4. 4 © Cloudera, Inc. All rights reserved. • Google predicts commute times. ML IS EVERYWHERE Google didn’t set out to make a traffic tool. Apple isn’t in the facial recognition business. • Apple predicts facial matches. • Dozens of other ML- powered models in your phone today.
  5. 5. 5 © Cloudera, Inc. All rights reserved. ML IS AT THE HEART OF TRANSFORMATION AI MACHINE LEARNING DATA SCIENCE ANALYTICS "BIG DATA" Probabilistic Deterministic What could happen? What happened?
  6. 6. 6 © Cloudera, Inc. All rights reserved. WHAT IS PRODUCTION ML? Data Engineering Business Inputs Data Science Production Machine Learning Packaging* Pipeline Hardening (Data Engineering) Model Hardening Deploy Monitoring MODEL SECURITY MODEL GOVERNANCE DATA CATALOG MODEL CATALOG FEATURE CATALOG
  7. 7. 7 © Cloudera, Inc. All rights reserved. WHICH TEAM ROLES ARE INVOLVED? DATA ENGINEERING DATA SCIENCE PRODUCTION ML DATA PREP PIPELINES DATA MODELING DATA TRANSFORMATION DATA INGEST JOB MONITORING TRAINING DATA DISCOVERY JOB TUNING EXPERIMENTATION PROTOTYPING MODEL DEPLOYMENT MODEL MONITORING DATA MONITORING
  8. 8. 8 © Cloudera, Inc. All rights reserved. WHAT ARE THE KEY SKILLS? Big Data Platform ML/AI Frameworks Container Infrastructure Orchestration
  9. 9. 9 © Cloudera, Inc. All rights reserved. WHAT IS A MODEL ANYWAY? Taking many forms, an algorithm designed to make predictions based on data input {key, value} - Prediction - Metadata Monitoring Business SystemsUpstream Systems Model Batch or Stream
  10. 10. 10 © Cloudera, Inc. All rights reserved. HIDDEN TECHNICAL DEBT IN ML SYSTEMS Google Paper Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.
  11. 11. 11 © Cloudera, Inc. All rights reserved. SAMPLE DATA SCIENCE / ML WORKFLOW From Data Exploration to Action
  12. 12. 12 © Cloudera, Inc. All rights reserved. CHALLENGES Tools, Platforms, Data ?
  13. 13. 13 © Cloudera, Inc. All rights reserved. CHALLENGES Recipes, not Cakes Recode Deployment Expectations • Support A/B testing • Support Experiments • Support measuring & Evaluating model performance • Deployment should be fast and adaptive to business needs
  14. 14. 14 © Cloudera, Inc. All rights reserved. SUMMARY OF CHALLENGES • Access For sensitive data, secure clusters are difficult to access. No shared security • Flexibility IT typically doesn’t want random packages installed on a secure cluster. • Tools Popular open source tools don’t easily connect to these environments, or always support Hadoop data formats. Nothing supports full workflow • Scale Laptops rarely have capacity for medium, let alone big data. This leads to a lot of sampling. • Parallelism Popular frameworks don’t easily parallelize on a cluster. Typically code has to get rewritten for production. • Security Data being pulled into laptops • Developer Experience Notebooks, while awesome, don’t easily support virtual environment and dependency management, especially for teams. • Collaboration No easy way to share code between teams • Deployment Notebooks are also challenging to “put into production.”
  15. 15. 15 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT UBER, NETFLIX, AND FACEBOOK Industrialized AI requires requires new supporting tools and platforms Facebook FBLearner Uber Michelangelo Netflix Recommendation Platform
  16. 16. 16 © Cloudera, Inc. All rights reserved. ML AT SCALE REQUIRES A UNIFIED DATA STRATEGY Streaming Ingest Batch Ingest Machine Learning Tools BI Tools and SQL Editors Data Products DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT MACHINE LEARNING DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE
  17. 17. © Cloudera, Inc. All rights reserved.17 © Cloudera, Inc. All rights reserved. YOU’VE GOT OPTIONS… Model Dev, Training, Deployment & Monitoring
  18. 18. © Cloudera, Inc. All rights reserved.18 © Cloudera, Inc. All rights reserved. MODEL DEVELOPMENT
  19. 19. 19 © Cloudera, Inc. All rights reserved. EVERYONE HAS AN OPINION • Should enable collaboration and code reuse (git integration) • Should support open-source frameworks and libraries • Must handle dependencies and isolates dev environment for and individual session • Can scale compute resources/up down when needed • Doesn’t require you to move data to use it!
  20. 20. © Cloudera, Inc. All rights reserved.20 © Cloudera, Inc. All rights reserved. TRAINING & EXPERIMENTS
  21. 21. © Cloudera, Inc. All rights reserved.21 © Cloudera, Inc. All rights reserved. A/B TESTING & MULTIVARIATE TESTING FOR THE MODEL Is the best trained model indeed the best model, or does a different model perform better on new, unseen data? MODEL VARIATION A MODEL VARIATION B INCOMING TRAFFIC Data scientists need ... • A framework to identify the best performers among a competing set of models • To evaluate models which can maximize business KPIs • Track specified model metrics, performance, and model artifacts • Inspect, & compare deployed models
  22. 22. © Cloudera, Inc. All rights reserved.22 © Cloudera, Inc. All rights reserved. EXPERIMENT MANAGEMENT Versioned, reproducible model training & evaluation runs Data scientists need to ... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models Many options of varying maturity and don’t all play well with other ecosystem tools Sacred Proprietary Open-Source
  23. 23. © Cloudera, Inc. All rights reserved.23 © Cloudera, Inc. All rights reserved. MODEL DEPLOYMENT
  24. 24. 24 © Cloudera, Inc. All rights reserved. MODEL DEPLOYMENT PATTERNS Knowing how business metrics will be improved help guide deployment options Managers use data to make better decisions Centrally automate internal decisions Centrally automate customer- facing decisions Automate decisions at the edge Batch Scoring, Hosted Real Time Scoring, Hosted Real Time Scoring, Data Flow + Custom Monitoring Real Time Scoring, Device Embedded
  25. 25. © Cloudera, Inc. All rights reserved.25 © Cloudera, Inc. All rights reserved. MODEL DEPLOYMENT APPROACH : TECHNOLOGICAL VS COST BENEFITS DIFFERENT MODEL DEPLOYMENT FORMATS NATIVE JAVA/C++ MODEL • Faster • Limitation of Available Algo/DS Libraries HYBRID APPROACH PMML: • Compatibility across multiple tools • Non Agile • Not flexible in terms of deployment PYTHON STACK • PMML files are big • Unit testing is tricky API POWERED MODEL: • Agile • Scalable • Can be used by both backend & fronted • Faster API POWERED MODEL HYBRID APPROACH PMML REBUILD THE WHOLE STACK TO PYTHON NATIVE JAVA / C++ MODELS COST $ TECHNOLOGICAL BENEFITS
  26. 26. © Cloudera, Inc. All rights reserved.26 © Cloudera, Inc. All rights reserved. MONITORING
  27. 27. © Cloudera, Inc. All rights reserved.27 © Cloudera, Inc. All rights reserved. MONITORING STATS SCHEDULE & MONITOR Production ML needs... ● A Monitoring mechanism that is model-agnostic ● Instrumentation of both the data flow in and the model performance metrics out ● To Collect Performance Metrics (e.g., accuracy, RMSE, ,Mean Absolute Error(MAE) )
  28. 28. © Cloudera, Inc. All rights reserved.28 © Cloudera, Inc. All rights reserved. CLOUDERA ML APPROACH Modern enterprise platform, tools and expert guidance to add SPEED and SCALE Agile platform to build, train, and deploy many scalable ML applications Enterprise data science tools to accelerate team productivity Expert guidance, services & training to fast track value & scale
  29. 29. © Cloudera, Inc. All rights reserved.29 © Cloudera, Inc. All rights reserved. ACCELERATING THREE STAGES OF MACHINE LEARNING Enterprise AI platform supporting model development, training, and deployment Manage models Deploy models Monitor performance DEPLOYDEVELOP Explore data Develop models Share results TRAIN Optimize parameters Track experiments Compare performance
  30. 30. © Cloudera, Inc. All rights reserved.30 © Cloudera, Inc. All rights reserved. ACCELERATING MACHINE LEARNING Lego Block for ML: Like a containerized edge node Wrap with REST endpoint Online Scoring JSON in, JSON out MODELSSESSIONS Interactive session for exploration and development EXPERIMENTS Initiate and track Like a lab notebook Export artifacts to project Runtime Engine: Kernels (R/Python/Scala) Common Libraries FS Mounts: CDH - Parcel Dir RPM - Hadoop Config Files Project Dir: Code Files Libraries Dependencies JOBS Scheduled Run a particular code end-to- end New snapshots retain history Point in time Git snapshot
  31. 31. © Cloudera, Inc. All rights reserved.31 © Cloudera, Inc. All rights reserved. DEMO
  32. 32. © Cloudera, Inc. All rights reserved.32 © Cloudera, Inc. All rights reserved. SELF-SERVICE CLOUDERA DATA SCIENCE WORKBENCH
  33. 33. © Cloudera, Inc. All rights reserved.33 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Bringing the data scientists TO the data in a way that they want to work For data scientists • Experiment faster Use R, Python, or Scala with on-demand compute and secure CDH/HDP data access • Work together Share reproducible research with your whole team • Deploy with confidence Get to production repeatably and without recoding For IT professionals • Bring data science to the data Give your data science team more freedom while reducing the risk and cost of silos • Secure by default Leverage common security and governance across workloads • Run anywhere On-premises or in the cloud
  34. 34. © Cloudera, Inc. All rights reserved.34 © Cloudera, Inc. All rights reserved. CDSW MODELS Machine learning models as one-click microservices (REST APIs) 1. Choose file, e.g. score.py 2. Choose function, e.g. forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data) 3. Choose resources 4. Deploy! Running model containers also have access to CDH for data lookups.
  35. 35. © Cloudera, Inc. All rights reserved.35 © Cloudera, Inc. All rights reserved. CDSW EXPERIMENTS Versioned model training runs for evaluation and reproducibility Data scientists can ... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models
  36. 36. © Cloudera, Inc. All rights reserved.36 © Cloudera, Inc. All rights reserved. MODEL MANAGEMENT View, test, monitor, and update models by team or project
  37. 37. © Cloudera, Inc. All rights reserved.37 © Cloudera, Inc. All rights reserved. CDSW JOBS TO ORCHESTRATE BATCH SCORING Schedule reports & scoring to run on a periodic basis Scheduling is easy and powerful ●Execute arbitrary scripts ●Schedule on a recurring basis ●Create dependencies on other jobs for complex pipelines ●Allow output to be sent via email to recipients
  38. 38. © Cloudera, Inc. All rights reserved.38 © Cloudera, Inc. All rights reserved. SUMMARY OF FEATURES End-to-End Workflow Support • Development • Train • Deployment Collaboration • Teams • Sharing • Good coding practices (Git) Security and Governance • Transparent • Leverages underlying frameworks • No data movement • Reproducibility Openness and Self-service • Any framework • Isolated for individual effectiveness • Simplified dependency management
  39. 39. © Cloudera, Inc. All rights reserved. THANK YOU

×