Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Train, predict, serve: How to go into production your machine learning model

993 views

Published on

Aki Ariga did a talk at Strata Data Conf Singapore. This talk covers typical machine learning deployment patterns for production.

Published in: Technology
  • Be the first to comment

Train, predict, serve: How to go into production your machine learning model

  1. 1. 1© Cloudera, Inc. All rights reserved. Aki Ariga | Field Data Scientist Train, Predict, Serve: How to go into production your Machine Learning model
  2. 2. 2© Cloudera, Inc. All rights reserved. Aki Ariga ● Field Data Scientist at Cloudera ● Previously research engineer at Toshiba, Rails developer at Cookpad ● Contributor of sparklyr and creator of tabula-py ● co-author of Machine Learning at work (in Japanese) ● Twitter: @chezou ● Linkedin: https://www.linkedin.com/in/aki-ariga/ ● GitHub: https://github.com/chezou/ ● Slideshare: https://www.slideshare.net/chezou
  3. 3. 3© Cloudera, Inc. All rights reserved. What does machine learning?
  4. 4. 4© Cloudera, Inc. All rights reserved. ● Classification ○ Classify data into categories. e.g.) Spam detection, Image classification ● Regression ○ Predict continuous value. e.g.) Power consumption prediction ● Clustering ○ Grouping similar data into same group e.g.) Exploratory analysis ● Anomaly detection ○ Detect anomalous data. e.g.) Fraud detection ● Recommendation ○ Suggest contents which are interested in. e.g.) Amazon, Netflix ● Reinforcement learning ○ Learn strategy to maximize reward. e.g.) Alpha-Go, Self-driving car ● etc.. What machine learning does?
  5. 5. 5© Cloudera, Inc. All rights reserved. Classify email SPAM or not Non-Spam Spam Example: Spam detection
  6. 6. 6© Cloudera, Inc. All rights reserved. [0, 1, 0, 2.5, 0, -1, ...] [1, 0.5, 0.1, -2, 3, 2, ...] [1, 0, 1.0, 1.1, 0, 0, ...] Documents Feature Extraction Feature vector Logistic Regression, SVM, Random Forest, NN... Model Training Algorithm & parameters w1=1, w2=-1, w3=0 ... Predictive model e.g.) Extract: from Image: Array of RGB from Text: word frequency Training phase of Machine Learning pipeline (supervised learning)
  7. 7. 7© Cloudera, Inc. All rights reserved. Sample data science/machine learning workflow From data to exploration to action Data Engineering Data Science (Exploratory) Production (Operational) Data Wrangling Visualization and Analysis Model Training & Testing Production Data Pipelines Batch Scoring Online Scoring Serving Data GovernanceGovernance Processing Acquisition Reports, Dashboards Data Models Predictions Business value
  8. 8. 8© Cloudera, Inc. All rights reserved. What is “production” of a ML system?
  9. 9. 9© Cloudera, Inc. All rights reserved. Production of ML systems Reports Dashboards Scoring
  10. 10. 10© Cloudera, Inc. All rights reserved. Patterns of ML scoring systems
  11. 11. 11© Cloudera, Inc. All rights reserved. 1. Train by batch, predict on the fly, serve via REST API 2. Train by batch, predict by batch, serve through the shared DB 3. Train, predict, serve by streaming 4. Train by batch, predict on mobile app Patterns of ML scoring systems
  12. 12. 12© Cloudera, Inc. All rights reserved. 1. Train by batch, predict on the fly, serve via REST API 2. Train by batch, predict by batch, serve through the shared DB 3. Train, predict, serve by streaming 4. Train by batch, predict on mobile app Patterns of ML scoring systems
  13. 13. 13© Cloudera, Inc. All rights reserved. Web Application DB Trained Model Execute training Extract feature Prediction result Activity log/ Contents data Feature Training result Feature Batch SystemAPI Server REST API User ID/ Item ID ML System Pattern 1: Train by batch, predict on the fly, serve via REST API
  14. 14. 14© Cloudera, Inc. All rights reserved. ML System Web Application DB Execute training Prediction result Activity log/ Contents data Feature Training result Feature User ID/ Item ID What is the deployment target? Production REST API Development Validated Model Extract feature Validated Model Extract feature Deploy
  15. 15. 15© Cloudera, Inc. All rights reserved. Web Application Execute training Prediction result Feature Training result Feature API Server REST API User ID/ Item ID Batch System Trained Model Extract feature Train using activity logs in the DB and store model into the storage. DB Activity log/ Contents data Pattern 1-a: Training phase ML System
  16. 16. 16© Cloudera, Inc. All rights reserved. Batch SystemWeb Application Execute training Prediction result Feature Training result API Server REST API User ID/ Item ID Trained Model Extract feature Predict from logs/contents triggered by REST API DB Activity log/ Contents data Feature Pattern 1-b,c: Prediction & serving phase ML System
  17. 17. 17© Cloudera, Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Export model as PMML Model building layer Predicting & serving layer Updated model CDSW Prediction results HDFSRequest to predict Load model Example architecture 1: PMML + OpenScoring
  18. 18. 18© Cloudera, Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Save model on object storage Model building layer Predicting & serving layer Updated model Prediction results HDFSRequest to predict Load model Object storage Pack the runtime env with Docker CDSW Example architecture 2: Docker based API Server
  19. 19. 19© Cloudera, Inc. All rights reserved. Demo: Docker based prediction API https://github.com/chezou/cdsw-serve-docker
  20. 20. 20© Cloudera, Inc. All rights reserved. Demo architecture CDSW Docker HUB Docker image Source code Trained model Amazon ECS Application Load Balancer Prediction request Amazon S3
  21. 21. 21© Cloudera, Inc. All rights reserved. ● Pros ○ Able to choose a different system configuration for the front-end web application and the ML system. ■ Choose favorite languages for both systems ■ Use different machine spec for both systems ■ Deploy independently ○ Able to serve with low latency ○ Rapid prototyping with flexible ML libraries ○ Easy to A/B testing ○ Prevent reimplementation of ML algorithm ● Cons ○ Complex to implement scalable API server ○ Unable to use slow algorithm Pattern 1: Train by batch, predict on the fly, serve via REST API
  22. 22. 22© Cloudera, Inc. All rights reserved. 1. Train by batch, predict on the fly, serve via REST API 2. Train by batch, predict by batch, serve through the shared DB 3. Train, predict, serve by streaming 4. Train by batch, predict on mobile app Patterns of ML scoring systems
  23. 23. 23© Cloudera, Inc. All rights reserved. Web Application DB Trained Model Batch System Execute training Extract feature Prediction result Activity log/ Contents data Feature Training result Feature Serve prediction Training BatchPrediction Batch Pattern 2: Train by batch, predict by batch, serve through the shared DB
  24. 24. 24© Cloudera, Inc. All rights reserved. Web Application Execute training Prediction result Feature Training result Feature Serve prediction Training BatchPrediction Batch DB Trained Model Batch System Extract featureActivity log/ Contents data Training using activity logs in the DB and store model into the storage. Pattern 2-a: Training phase
  25. 25. 25© Cloudera, Inc. All rights reserved. Web Application Execute training Prediction result Feature Training result Feature Serve prediction Training BatchPrediction Batch DB Activity log/ Contents data Trained Model Batch System Extract feature Predict from contents data in the DB and store prediction results. Pattern 2-b: Prediction phase
  26. 26. 26© Cloudera, Inc. All rights reserved. Web Application Execute training Prediction result Feature Training result Feature Serve prediction Training BatchPrediction Batch Activity log/ Contents data Trained Model Batch System Extract feature Serve prediction results stored in the shared DB. DB Pattern 2-c: Serving phase
  27. 27. 27© Cloudera, Inc. All rights reserved. Kudu/HBase Extract feature & Train/update model Extract feature & Predict Activity log Prediction results Model building & predicting layerServing layer Updated model Activity log Load trained model Prediction results HDFS CDSW Historical data Historical data Example architecture: Serving by HBase/Kudu Trained Model
  28. 28. 28© Cloudera, Inc. All rights reserved. ● Pros ○ Able to choose a different system configuration for the front-end web application and the batch system. ■ Chose favorite languages for both systems ■ Use different machine spec for both systems ■ Deploy independently ○ Able to use slow and complex algorithm ○ Easy to manage: versioning model/prediction results ○ Rapid prototyping with flexible ML libraries ○ Prevent reimplementation of ML algorithm ● Cons ○ Unable to predict triggered with certain event (e.g. PV, purchase) ○ Unable to prevent time lag from prediction to serving Pattern 2: Train by batch, predict by batch, serve through the shared DB
  29. 29. 29© Cloudera, Inc. All rights reserved. 1. Train by batch, predict on the fly, serve via REST API 2. Train by batch, predict by batch, serve through the shared DB 3. Train, predict, serve by streaming 4. Train by batch, predict on mobile app Patterns of ML scoring systems
  30. 30. 30© Cloudera, Inc. All rights reserved. Web Application Trained Model Stream-based ML System (e.g. Spark Streaming) Train & Predict Extract feature Prediction results Recent log data Feature Model updates Model - Querying for prediction - Showing or sending alerts - This component may work with message queue like Kafka Messagequeue (e.g.Kafka) Log data Prediction results Pattern 3: Train, predict, serve by streaming
  31. 31. 31© Cloudera, Inc. All rights reserved. Example architecture: Lambda architecture with Oryx
  32. 32. 32© Cloudera, Inc. All rights reserved. ● Pros ○ Predict and serve with low latency ○ Update model interactively ○ Able to predict triggered with certain event (e.g. purchase, PV) ○ Strong capability for streamed data, especially for anomaly detection and recommendation ● Cons ○ Complex to manage model and system ■ Hard to versioning models Pattern 3: Train, predict, serve by streaming
  33. 33. 33© Cloudera, Inc. All rights reserved. 1. Train by batch, predict on the fly, serve via REST API 2. Train by batch, predict by batch, serve through the shared DB 3. Train, predict, serve by streaming 4. Train by batch, predict on mobile app Patterns of ML scoring systems
  34. 34. 34© Cloudera, Inc. All rights reserved. Mobile Application DB Trained Model Batch System Execute training Extract feature Extract feature Request for prediction Activity logs/ Contents data Prediction result Activity log/ Contents data Feature Training resultFeature DB Trained Model Convert model Pattern 4: Train by batch, predict on a mobile app
  35. 35. 35© Cloudera, Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Convert model to TFLite/CoreML Model building layer Predicting & serving layer Updated model Prediction results HDFSRequest to predict Load model Storage in a smart phone CDSW Example architecture: Serving on a mobile app
  36. 36. 36© Cloudera, Inc. All rights reserved. Pattern 1 (REST API) Pattern 2 (Shared DB) Pattern 3 (Streaming) Pattern 4 (Mobile app) Training by batch by batch by streaming by batch Prediction on the fly by batch by streaming on the fly Prediction result delivery via REST API through the shared DB by streaming via MQ via in-process API on mobile Latency for prediction from getting new data So so So so ~ Long Very low Low Required time to predict Short Long Short Short Tight/loose coupling with app Loose Loose Loose Tight Dependency of languages Independent Independent Independent Depends on frameworks System management difficulty So so Easy Very Hard So so 4 patterns Comparison
  37. 37. 37© Cloudera, Inc. All rights reserved. ● Sculley et al., “Hidden Technical Debt in Machine Learning Systems”, NIPS, 2015. ● Lucy Park, “My model has higher BLEU, can I ship it? The Joel Test for machine learning systems”, ACML-MLAIP, 2017 ● David, “When models go rogue: Hard-earned lessons about using machine learning in production”, Strata Data NYC, 2017 Good to read/watch for related topics
  38. 38. 38© Cloudera, Inc. All rights reserved. ● Monitor model performance with recurring model update with CI Model versioning & monitoring DB Trained Model Batch System Execute/update training Extract feature Activity log/ Contents data Feature Training result Gold standard data with labels Evaluate model Validated models Object storage Evaluated performance BI Show dashboard of evaluation results Versioned model Versioned model Versioned model
  39. 39. 39© Cloudera, Inc. All rights reserved. Thank you Aki Ariga aki@cloudera.com
  40. 40. 40© Cloudera, Inc. All rights reserved. Appendix
  41. 41. 41© Cloudera, Inc. All rights reserved. ● Reports ○ Send Email via CDSW Job ● Dashboards ○ NA (Web app can be run while container is running) ● Scoring ○ Pattern 1 ■ Export model with PMML serve OpenScoring/Use MLeap etc... ■ Create API server with Web framework and pack the environment with Docker ■ Need to manage API server by users ○ Pattern 2 ■ Store prediction results into HBase/Kudu/RDB from CDSW ○ Pattern 3 ■ You can develop and deploy jobs working on the same cluster ○ Pattern 4 ■ Convert to TFLite/CoreML model and bring the model on your app How does CDSW help for each deployment pattern?

×