Advertisement

More Related Content

Slideshows for you(20)

Similar to Clipper at UC Berkeley RISECamp 2017(20)

Advertisement

Clipper at UC Berkeley RISECamp 2017

  1. Dan Crankshaw crankshaw@cs.berkeley.edu http://clipper.ai https://github.com/ucbrise/clipper September 7, 2017 A Low-Latency Online Prediction Serving System Clipper
  2. Collaborators Joseph Gonzalez Ion Stoica Michael Franklin Xin Wang Alexey Tumanov Corey Zumar 2 Nishad Singh Yujia Luo Feynman Liang
  3. Big Data Training Learning Complex Model
  4. Learning Produces a Trained Model “CAT” Query Decision Model
  5. Big Data Training Learning Application Decision Query ? Serving Model
  6. Big Data Training Learning Application Decision Query Model Prediction-Serving for interactive applications Timescale: ~10s of milliseconds Serving
  7.  Challenges of prediction serving  Clipper overview and architecture  Deploy and serve models with Clipper In this talk…
  8. Prediction-Serving Raises New Challenges
  9. Prediction-Serving Challenges Support low-latency, high- throughput serving workloads ??? Create VWCaffe 10 Large and growing ecosystem of ML models and frameworks
  10. Models getting more complex  10s of GFLOPs [1] Support low-latency, high-throughput serving workloads Using specialized hardware for predictions Deployed on critical path  Maintain SLOs under heavy load [1] Deep Residual Learning for Image Recognition. He et al. CVPR
  11. Google Translate Serving 82,000 GPUs running 24/7 [1] https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html 140 billion words a day1 Invented New Hardware! Tensor Processing Unit (TPU)
  12. Big Companies Build One-Off Systems Problems:  Expensive to build and maintain  Highly specialized and require ML and systems expertise  Tightly-coupled model and application  Difficult to change or update model  Only supports single ML framework
  13. Prediction-Serving Challenges Support low-latency, high- throughput serving workloads ??? Create VWCaffe 14 Large and growing ecosystem of ML models and frameworks
  14. Large and growing ecosystem of ML models and frameworks ??? Fraud Detection 15
  15. Large and growing ecosystem of ML models and frameworks ??? Fraud Detection 16 Different programming languages: TensorFlow: Python and C++ Spark: JVM-based Different hardware: CPUs and GPUs Different costs DNNs in TensorFlow are generally much slower
  16. Large and growing ecosystem of ML models and frameworks ??? Fraud Detection Create VW Caffe 17 Content Rec. Personal Asst. Robotic Control Machine Translation
  17. But building new serving systems for each framework is expensive…
  18. Big Data Batch Analytics Model Training Use existing systems: Offline Scoring
  19. Big Data Model Training Batch Analytics Scoring X Y Datastore Use existing systems: Offline Scoring
  20. Application Decision Query Look up decision in datastore Low-Latency Serving X Y Use existing systems: Offline Scoring
  21. Application Decision Query Look up decision in datastore Low-Latency Serving X Y Problems:  Requires full set of queries ahead of time  Small and bounded input domain  Wasted computation and space  Can render and store unneeded predictions  Costly to update  Re-run batch job Use existing systems: Offline Scoring
  22. Prediction-Serving Challenges Support low-latency, high- throughput serving workloads ??? Create VWCaffe 24 Large and growing ecosystem of ML models and frameworks
  23. Prediction-Serving Today Highly specialized systems for specific problems Decision Query X Y Offline scoring with existing frameworks and systems Clipper aims to unify these approaches
  24. What would we like this to look like? ??? Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation Create VW Caffe 28
  25. Encapsulate each system in isolated environment ??? Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation Create VW Caffe 29
  26. Decouple models and applications Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation 30 Caffe ???
  27. Clipper Predict MC MC RPC RPC RPC RPC Clipper Decouples Applications and Models Applications Model Container (MC) MC Caffe
  28. Clipper Implementation Clipper MC MC MC RPC RPC RPC RPC Model Abstraction Layer Caching Adaptive Batching Model Container (MC) Core system: 10,000 lines of C++ Open source system: https://github.com/ucbrise/clipper Goal: Community-owned platform for model deployment and serving Applications Predict
  29. Clipper Query Processor Redis Clipper Managemen t Applications Model Container Model Container Predict Clipper Implementation clipper admin
  30. Driver Program SparkContext Worker Node Executor Task Task Worker Node Executor Task Task Worker Node Executor Task Task Web Server Database Cache Clipper Clipper Connects Analytics and Serving
  31. Getting Started with Clipper Docker images available on DockerHub Clipper admin is distributed as pip package: pip install clipper_admin Get up and running without cloning or compiling!
  32.  Containerized frameworks: unified abstraction and isolation  Cross-framework caching and batching: optimize throughput and latency  Cross-framework model composition: improved accuracy through ensembles and bandits Technologies inside Clipper
  33. Clipper Predict FeedbackRPC/REST Interface Caffe MC MC MC RPC RPC RPC RPC Clipper Decouples Applications and Models Applications Model Container (MC)
  34. Clipper Caffe MC MC MC RPC RPC RPC RPC Model Container (MC)
  35. Clipper Caffe MC MC MC RPC RPC RPC RPC Model Container (MC) Common Interface  Simplifies Deployment:  Evaluate models using original code & systems
  36. Container-based Model Deployment class ModelContainer: def __init__(model_data) def predict_batch(inputs) Implement Model API:
  37. class ModelContainer: def __init__(model_data) def predict_batch(inputs) Implement Model API:  API support for many programming languages  Python  Java  C/C++ Container-based Model Deployment
  38. Package model implementation and dependencies Model Container (MC) Container-based Model Deployment class ModelContainer: def __init__(model_data) def predict_batch(inputs)
  39. Clipper Caffe MC MC RPC RPC RPC RPC Model Container (MC) Container-based Model Deployment MC
  40. Clipper Caffe MC MC MC RPC RPC RPC RPC Model Container (MC) Common Interface  Simplifies Deployment:  Evaluate models using original code & systems  Models run in separate processes as Docker containers  Resource isolation: Cutting edge ML frameworks can be buggy
  41. Clipper Caffe MC RPC RPC RPC Container (MC) Common Interface  Simplifies Deployment:  Evaluate models using original code & systems  Models run in separate processes as Docker containers  Resource isolation: Cutting edge ML frameworks can be buggy  Scale-out MC RPCRPC RPC MC MC MC
  42. TensorFlow- Serving Predict RPC Interface Applications Overhead of decoupled architecture Clipper Predict FeedbackRPC/REST Interface Caffe MC MC RPC RPC RPC RPC Applications MC MC
  43. Overhead of decoupled architecture Throughput (QPS) Better P99 Latency (ms) Better Model: AlexNet trained on CIFAR-10 *[NSDI 17] Prototype version of Clipper
  44.  Containerized frameworks: unified abstraction and isolation  Cross-framework caching and batching: optimize throughput and latency  Cross-framework model composition: improved accuracy through ensembles and bandits Technologies inside Clipper
  45. Clipper Architecture Clipper Applications Predict ObserveRPC/REST Interface MC MC MC RPC RPC RPC RPC Model Abstraction Layer Caching Latency-Aware Batching Model Container (MC)
  46. A single page load may generate many queries Batching to Improve Throughput  Optimal batch depends on:  hardware configuration  model and framework  system load  Why batching helps: Throughput Throughput-optimized frameworks Batch size
  47. A single page load may generate many queries  Optimal batch depends on:  hardware configuration  model and framework  system load  Why batching helps: Throughput Throughput-optimized frameworks Batch size Batching to Improve ThroughputLatency-aware Clipper Solution: Adaptively tradeoff latency and throughput…  Inc. batch size until the latency objective is exceeded (Additive Increase)  If latency exceeds SLO cut batch size by
  48. Throughput (QPS) Better Batching to Improve ThroughputLatency-aware
  49. Overhead of decoupled architecture Clipper Predict FeedbackRPC/REST Interface Caffe MC MC MC RPC RPC RPC RPC Applications MC
  50.  Containerized frameworks: unified abstraction and isolation  Cross-framework caching and batching: optimize throughput and latency  Cross-framework model composition: improved accuracy through ensembles and bandits Technologies inside Clipper
  51. Clipper Architecture Clipper Applications Predict ObserveRPC/REST Interface MC MC MC RPC RPC RPC RPC Model Abstraction Layer Model Container (MC) Model Selection LayerSelection Policy
  52. Clipper Version 1 Version 2 Version 3 Periodic retraining Experiment with new models and frameworks Model Selection LayerSelection Policy
  53. “CAT ”“CAT ”“CAT ”“CAT ” “CAT ” CONFIDE NT Selection Policy: Estimate confidence Policy Version 2 Version 3
  54. “CAT ”“MAN ”“CAT ”“SPAC E” “CAT ” UNSURE Selection Policy: Estimate confidence Policy
  55. Clipper Model Selection LayerSelection Policy Exploit multiple models to estimate confidence  Use multi-armed bandit algorithms to learn optimal model-selection online  Online personalization across ML frameworks *See paper for details [NSDI 2017]
  56. Deploying Models with Clipper
  57. Driver Program Predict function Worker Node Executor Task Task Worker Node Executor Task Task Worker Node Executor Task Task Web Server Database Cache Clipper Goal: Connect Analytics and Serving Analytics Cluster
  58. You can always implement your own model container… Model Container (MC) class ModelContainer: def __init__(model_data): def predict_batch(inputs):
  59. Can we do better?
  60. Driver Program SparkContext Worker Node Executor Task Task Worker Node Executor Task Task Worker Node Executor Task Task Web Server Database Cache Clipper Deploy Models from Spark to Clipper
  61. Driver Program SparkContext Worker Node Executor Task Task Worker Node Executor Task Task Worker Node Executor Task Task Web Server Database Cache Deploy Models from Spark to Clipper Clipper
  62. MC Driver Program SparkContext Worker Node Executor Task Task Worker Node Executor Task Task Worker Node Executor Task Task Web Server Database Cache Deploy Models from Spark to Clipper Clipper
  63. Problem: Models don’t run in isolation Must extract model plus pre- and post-processing logic
  64. Clipper provides a library of model deployers  Deployer automatically and intelligently saves all prediction code  Captures both framework-specific models and arbitrary serializable code  Replicates training environment and loads prediction code in a Clipper model container You will use these in the Clipper exercise
  65. Clipper provides a (growing) library of model deployers Python Combine framework specific models with external featurization, post-processing, business logic Currently support Scikit-Learn and PySpark TensorFlow, Caffe2, Keras, coming soon Scala and Java with Spark: both MLLib and Pipelines APIs R support coming in 0.2 (soon!)
  66. Driver Program Predict function Worker Node Executor Task Task Worker Node Executor Task Task Worker Node Executor Task Task Web Server Database Cache Clipper Goal: Connect Analytics and Serving Analytics Cluster
  67. MC Web Server Database Cache Run on Kubernetes alongside other applications Clipper Other applications Other applications Other applications Other applications
  68. Status of the project  0.2.0 will be released right after RISE Camp  Actively working with early users  Post issues and questions on GitHub and subscribe to our mailing list clipper-dev@googlegroups.com http://clipper.ai
  69. What’s next?  Expand library of model deployers  Improved model and application reliability and debugging  Joint work with the Ground project  Bandit methods, online personalization, and cascaded predictions  Studied in research prototypes but need to integrate into system  Actively looking for contributors: https://github.com/ucbrise/clipper

Editor's Notes

  1. focus of systems for AI/ML has been training cite paper logos instead of text
  2. ”Abstract away the model behind a uniform API”
  3. FLOPs, numbers for models,
  4. Model scoring that drove Google to develop new hardware, not training
  5. Different models, different frameworks have different strengths Give example of why all are needed Multiple applications with different needs within an org Even a single application may need to support multiple frameworks
  6. Different models, different frameworks have different strengths Give example of why all are needed Multiple applications with different needs within an org Even a single application may need to support multiple frameworks
  7. Different models, different frameworks have different strengths Give example of why all are needed Multiple applications with different needs within an org Even a single application may need to support multiple frameworks
  8. Different models, different frameworks have different strengths Give example of why all are needed Multiple applications with different needs within an org Even a single application may need to support multiple frameworks
  9. Different models, different frameworks have different strengths Give example of why all are needed Multiple applications with different needs within an org Even a single application may need to support multiple frameworks
  10. Different models, different frameworks have different strengths Give example of why all are needed Multiple applications with different needs within an org Even a single application may need to support multiple frameworks
  11. Put github link
  12. sits adjacent to analytics frameworks Clipper provides the mechanism to bridge analytics and serving
  13. so all you need to do to launch a Clipper cluster is pip install
  14. Why a system? Because system challenges not AI challenges
  15. ”Abstract away the model behind a uniform API”
  16. ”Abstract away the model behind a uniform API”
  17. What is a model? What is an application?
  18. ”Abstract away the model behind a uniform API”
  19. Black-box abstraction
  20. Data scientist doesn’t have to learn any new skills
  21. From the frontend dev perspective Clipper
  22. adopts a more monolithic architecture with both the serving and model-evaluation code in a single process
  23. The performance gains from a monolithic design are minimal, but the advantages of containerized system allow us to do a lot more across different frameworks Don’t have to re-implement all this stuff for each new framework
  24. because model-abstraction layer lets you plug in many models, model selection layer chooses between them introduce learning in selection layer later
  25. Make clearer that batching navigates tradeoff between throughput and latency
  26. Make clearer that batching navigates tradeoff between throughput and latency
  27. Two plots, thorughput on top, latency on bottom
  28. Two plots, thorughput on top, latency on bottom
  29. Two plots, thorughput on top, latency on bottom
  30. ”Abstract away the model behind a uniform API”
  31. because model-abstraction layer lets you plug in many models, model selection layer chooses between them introduce learning in selection layer later
  32. because model-abstraction layer lets you plug in many models, model selection layer chooses between them introduce learning in selection layer later
  33. By combining predictions from multiple models we can improve application accuracy and quantify confidence in predictions
  34. “Not a cat picture”
  35. example registering endpoints endpoints get attached to different models
  36. Special sauce
  37. http://clipper.ai/examples/spark-demo/
  38. put batching on bottom
Advertisement