Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Daniel Crankshaw
Spark Summit East
February 2017
A Low-Latency Online Prediction
Serving System
Clipper
Big
Data
Big Model
Training
Learning
Timescale: minutes to days
Systems: offline and batch optimized
Heavily studied ... m...
Big
Data
Big Model
Training
Application
Decision
Query
?
Learning Inference
Big
Data
Training
Learning
Inference
Big Model
Application
Decision
Query
Timescale: ~20 milliseconds
Systems: online and ...
Big
Data
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
Big
Data
Training
Application
Decision
Learning Inference
Feedback
Timescale: hours to weeks
Systems: combination of syste...
Big
Data
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
Big
Data
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
Responsive
(~10ms)
Adaptive
(~1 seconds)
Example: Fraud Detection
Serving Predictions Today
Big
Data
Big Model
Training
Offline Batch System
Big
Data
Big Model
Training
Offline Batch System
Scoring
X Y
Serving Predictions Today: Offline Scoring
Serving Predictions Today: Offline Scoring
X Y
Application
Decision
Query
Look up decision in KV-Store
Online Serving Syst...
Serving Predictions Today: Offline Scoring
X Y
Application
Decision
Query
Look up decision in KV-Store
Online Serving Syst...
Serving Predictions Today: Online Scoring
Application
Decision
Query
Render prediction with model in real-time
Online Serv...
Fraud
Dataset
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
???
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
Create VW
Caffe
Many applications and ...
Many applications and many models
???
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
Crea...
Can we decouple models and applications?
???
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translati...
Requirements
• System cannot stand in way of independent evolution of applications
models, empowers
• enables separate evo...
Requirements
• Decouple applications from models and allow them to evolve
independently from each other
• The Data Scienti...
Prediction-Serving System:
Ø Decouple applications from models and allow them to
evolve independently from each other
Ø Th...
Clipper
Predict FeedbackRPC/REST Query Interface
Applications
create_application()
deploy_model()
Management REST API
repl...
From the Data Scientist perspective
class ModelContainer:
def __init__(model_data)
def predict_batch(inputs)
Implement Mod...
From the Data Scientist perspective
class ModelContainer:
def __init__(model_data)
def predict_batch(inputs)
Implement Mod...
From the Data Scientist perspective
Model implementation packaged in container
Model Container (MC)
Clipper
Caffe
MC MC MC
RPC RPC RPC RPC
From the Data Scientist perspective
Model Container (MC)
Clipper
Predict FeedbackRPC/REST Interface
Model Container (MC)
Caffe
MC MC MC
RPC RPC RPC RPC
From the data scientist per...
Clipper
Predict FeedbackRPC/REST Interface
Caffe
MC MC MC
RPC RPC RPC RPC
Clipper Decouples Applications and Models
Applic...
Clipper Generalizes Models Across ML Frameworks
Clipper
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machin...
DEMO
Clipper
Create VWCaffeKey Insight:
The challenges of prediction serving can be addressed between
end-user applications and...
Clipper
As a result
Ø hide complexity
Ø by providing a common predictioninterface to applications
Ø bound latency and maxi...
Challenges
Ø Managing heterogeneity everywhere
Ø different types of models (different software, different resource require...
Clipper Architecture
Clipper
Caffe
Applications
Predict ObserveRPC/REST Interface
MC MC MC
RPC RPC RPC RPC
Model Abstracti...
Clipper Architecture
Clipper
Caffe
Applications
Predict ObserveRPC/REST Interface
MC MC MC
RPC RPC RPC RPC
Model Selection...
Model Container (MC)
Caffe
Correction LayerCorrection Policy
MC MC MC
RPC RPC RPC
Model Abstraction Layer
Caching
Adaptive...
Correction LayerCorrection Policy
Model Container (MC)
RPC
Caffe
MC
RPC
MC
RPC
MC
RPC
Model Abstraction Layer
Caching
Adap...
Correction LayerCorrection Policy
Model Abstraction Layer
Caching
Adaptive Batching
odel Container (MC)
RPC
Caffe
MC
RPC
M...
A single
page load
may generate
many queries
Adaptive Batching to Improve Throughput
Ø Optimal batch depends on:
Ø hardwar...
Batching Results
SLO
Up to 25.5x
throughput increase
from batching
Clipper Architecture
Clipper
Caffe
Applications
Predict ObserveRPC/REST Interface
Model Container (MC) MC MC MC
RPC RPC RP...
Caffe
Big
Data
Application
Learning Inference
Feedback
Slow
Slow Changing
Model
Real-time
model selection
and ensembles
Cl...
Clipper
Model Selection LayerSelection Policy
Caffe
Slow Changing
Model
Clipper
Bring Learning into the Serving Tier
What ...
Road Map
Ø Open source on GitHub: https://github.com/ucbrise/clipper
Ø Kick the tires, try out our tutorial
Ø Alpha releas...
Upcoming SlideShare
Loading in …5
×

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East talk by Dan Crankshaw

2,282 views

Published on

Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment.

In this talk, we present Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluated Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. We also compared Clipper to the Tensorflow Serving system and demonstrate comparable prediction throughput and latency on a range of models while enabling new functionality, improved accuracy, and robustness.

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East talk by Dan Crankshaw

  1. 1. Daniel Crankshaw Spark Summit East February 2017 A Low-Latency Online Prediction Serving System Clipper
  2. 2. Big Data Big Model Training Learning Timescale: minutes to days Systems: offline and batch optimized Heavily studied ... major focus of the AMPLab
  3. 3. Big Data Big Model Training Application Decision Query ? Learning Inference
  4. 4. Big Data Training Learning Inference Big Model Application Decision Query Timescale: ~20 milliseconds Systems: online and latency optimized Less studied …
  5. 5. Big Data Big Model Training Application Decision Query Learning Inference Feedback
  6. 6. Big Data Training Application Decision Learning Inference Feedback Timescale: hours to weeks Systems: combination of systems Less studied …
  7. 7. Big Data Big Model Training Application Decision Query Learning Inference Feedback
  8. 8. Big Data Big Model Training Application Decision Query Learning Inference Feedback Responsive (~10ms) Adaptive (~1 seconds)
  9. 9. Example: Fraud Detection
  10. 10. Serving Predictions Today Big Data Big Model Training Offline Batch System
  11. 11. Big Data Big Model Training Offline Batch System Scoring X Y Serving Predictions Today: Offline Scoring
  12. 12. Serving Predictions Today: Offline Scoring X Y Application Decision Query Look up decision in KV-Store Online Serving System
  13. 13. Serving Predictions Today: Offline Scoring X Y Application Decision Query Look up decision in KV-Store Online Serving System Problems: Ø Requires full set of queries ahead of time Ø Small and bounded input domain Ø Wasted computation and space Ø Can render and store unneeded predictions Ø No feedback and costly to update
  14. 14. Serving Predictions Today: Online Scoring Application Decision Query Render prediction with model in real-time Online Serving System
  15. 15. Fraud Dataset Big Model Training Application Decision Query Learning Inference Feedback
  16. 16. ??? Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation Create VW Caffe Many applications and many models
  17. 17. Many applications and many models ??? Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation Create VW Caffe
  18. 18. Can we decouple models and applications? ??? Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation Create VW Caffe
  19. 19. Requirements • System cannot stand in way of independent evolution of applications models, empowers • enables separate evolution, development • From perspective of data scientist • Ease of application evolution • model rollout • application deployment • support for wide range of frameworks that data scientists • improve accuracy, use cutting edge techniques, frameworks • experiment with models in predictions • Don’t have to worryabout applications (performance • Frontend developer • Stable, reliable, performantAPIs (need systems that meet their SLOs) • scale system, hardware to meet application demands • Don’t worryabout models (oblivious to underlying)
  20. 20. Requirements • Decouple applications from models and allow them to evolve independently from each other • The Data Scientist perspective: focus on making accurate predictions • Support many models, frameworks • Simple deployment and online experimentation • (Mostly) oblivious to system performance and workload demands • The Frontend Dev perspective: focus on building reliable, low- latency applications • Provide stable, reliable,performant APIs (need systems that meet their SLOs) • scale system, hardware to meet application demands • Oblivious to the implementations of the underlying models
  21. 21. Prediction-Serving System: Ø Decouple applications from models and allow them to evolve independently from each other Ø The Frontend Dev perspective: focus on building reliable, low-latency applications Ø Provide stable, reliable, performant APIs to meet SLAs Ø scale system, hardware to meet application demands Ø Oblivious to the implementations of the underlying models Ø The Data Scientist perspective: focus on making accurate predictions Ø Support many models and frameworks simultaneously Ø Simple deployment and online experimentation Ø (Mostly) oblivious to system performance and workload demands Requirements
  22. 22. Clipper Predict FeedbackRPC/REST Query Interface Applications create_application() deploy_model() Management REST API replicate_model() inspect_instance() From the Frontend Dev perspective
  23. 23. From the Data Scientist perspective class ModelContainer: def __init__(model_data) def predict_batch(inputs) Implement Model API:
  24. 24. From the Data Scientist perspective class ModelContainer: def __init__(model_data) def predict_batch(inputs) Implement Model API: Ø Implemented in many languages Ø Python Ø Java Ø C/C++ Ø R Ø …
  25. 25. From the Data Scientist perspective Model implementation packaged in container Model Container (MC)
  26. 26. Clipper Caffe MC MC MC RPC RPC RPC RPC From the Data Scientist perspective Model Container (MC)
  27. 27. Clipper Predict FeedbackRPC/REST Interface Model Container (MC) Caffe MC MC MC RPC RPC RPC RPC From the data scientist perspective Applications
  28. 28. Clipper Predict FeedbackRPC/REST Interface Caffe MC MC MC RPC RPC RPC RPC Clipper Decouples Applications and Models Applications Model Container (MC)
  29. 29. Clipper Generalizes Models Across ML Frameworks Clipper Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation Create VW Caffe
  30. 30. DEMO
  31. 31. Clipper Create VWCaffeKey Insight: The challenges of prediction serving can be addressed between end-user applications and machine learning frameworks As a result, Clipper is able to: Ø hide complexity Ø by providing a common prediction interface to applications Ø bound latency and maximize throughput Ø through caching, adaptive batching, model scaleout Ø enable robust online learning and personalization Ø through model selection and ensemble algorithms without modifying machine learning frameworks or end-user applications
  32. 32. Clipper As a result Ø hide complexity Ø by providing a common predictioninterface to applications Ø bound latency and maximize throughput Ø through caching, adaptive batching, model scaleout Ø enable robust online learning and personalization Ø through model selection and ensemble algorithms without modifying machine learning frameworks or end-user applications Clipper Decouples Applications and Models
  33. 33. Challenges Ø Managing heterogeneity everywhere Ø different types of models (different software, different resource requirements) in a productionenvironment Ø Different applicationperformance requirements Ø workloads, latencies Ø Scheduling (space-time resource management) Ø Where and when to send predictionqueries to models Ø Latency-accuracy tradeoffs Ø Marginal utility of allocating additional resources Ø How to use feedback to improve accuracy in real-time
  34. 34. Clipper Architecture Clipper Caffe Applications Predict ObserveRPC/REST Interface MC MC MC RPC RPC RPC RPC Model Abstraction Layer Provide a common interface to models while bounding latency and maximizing throughput. Model Selection Layer Improve accuracy through bandit methods and ensembles, online learning, and personalization Model Container (MC)
  35. 35. Clipper Architecture Clipper Caffe Applications Predict ObserveRPC/REST Interface MC MC MC RPC RPC RPC RPC Model Selection LayerSelection Policy Model Abstraction Layer Caching Adaptive Batching Model Container (MC)
  36. 36. Model Container (MC) Caffe Correction LayerCorrection Policy MC MC MC RPC RPC RPC Model Abstraction Layer Caching Adaptive Batching Provide a common interface to models while RPC
  37. 37. Correction LayerCorrection Policy Model Container (MC) RPC Caffe MC RPC MC RPC MC RPC Model Abstraction Layer Caching Adaptive Batching Common Interface à Simplifies Deployment: Ø Evaluate models using original code & systems Ø Models run in separate processes as Docker containers Ø Resource isolation
  38. 38. Correction LayerCorrection Policy Model Abstraction Layer Caching Adaptive Batching odel Container (MC) RPC Caffe MC RPC MC RPC MC RPC MC RPC MC RPC Common Interface à Simplifies Deployment: Ø Evaluate models using original code & systems Ø Models run in separate processes as Docker containers Ø Resource isolation Ø Scale-out Problem: frameworks optimized for batch processing not latency
  39. 39. A single page load may generate many queries Adaptive Batching to Improve Throughput Ø Optimal batch depends on: Ø hardware configuration Ø model and framework Ø system load Clipper Solution: be as slow as allowed… Ø Inc. batch size until the latency objective is exceeded (Additive Increase) Ø If latency exceeds SLO cut batch size by a fraction (Multiplicative Decrease) Ø Why batching helps: Hardware Acceleration Helps amortize system overhead
  40. 40. Batching Results SLO Up to 25.5x throughput increase from batching
  41. 41. Clipper Architecture Clipper Caffe Applications Predict ObserveRPC/REST Interface Model Container (MC) MC MC MC RPC RPC RPC RPC Model Selection LayerSelection Policy Model Abstraction Layer Caching Adaptive Batching
  42. 42. Caffe Big Data Application Learning Inference Feedback Slow Slow Changing Model Real-time model selection and ensembles Clipper
  43. 43. Clipper Model Selection LayerSelection Policy Caffe Slow Changing Model Clipper Bring Learning into the Serving Tier What can we learn? Ø Dynamically weight mixture of experts Ø Select best model for each user Ø Use ensemble to estimate prediction confidence Ø Don’t try to retrain models Real-time model selection and ensembles
  44. 44. Road Map Ø Open source on GitHub: https://github.com/ucbrise/clipper Ø Kick the tires, try out our tutorial Ø Alpha release in mid-April Ø Focused on reliability and performance for serving single-model applications Ø First class support for Scikit-Learn and Spark models, arbitrary Python functions Ø Coordinating initial set of features with RISE Lab sponsors and collaborators Ø After alpha release Ø Support for selection policies and multi-model applications Ø Model performance monitoring to detect and correct accuracy degradation Ø New task scheduler design to leverage model and resource heterogeneity “Clipper: ALow-Latency Online Prediction Serving System” [NSDI ‘17] https://arxiv.org/abs/1612.03079 crankshaw@cs.berkeley.edu

×