Successfully reported this slideshow.
Your SlideShare is downloading. ×

Model Monitoring at Scale with Apache Spark and Verta

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 26 Ad

Model Monitoring at Scale with Apache Spark and Verta

Download to read offline

For any organization whose core product or business depends on ML models (think Slack search, Twitter feed ranking, or Tesla Autopilot), ensuring that production ML models are performing with high efficacy is crucial. In fact, according to the McKinsey report on model risk, defective models have led to revenue losses of hundreds of millions of dollars in the financial sector alone. However, in spite of the significant harms of defective models, tools to detect and remedy model performance issues for production ML models are missing.

Based on our experience building ML debugging and robustness tools at MIT CSAIL and managing large-scale model inference services at Twitter, Nvidia, and now at Verta, we developed a generalized model monitoring framework that can monitor a wide variety of ML models, work unchanged in batch and real-time inference scenarios, and scale to millions of inference requests. In this talk, we focus on how this framework applies to monitoring ML inference workflows built on top of Apache Spark and Databricks. We describe how we can supplement the massively scalable data processing capabilities of these platforms with statistical processors to support the monitoring and debugging of ML models.

Learn how ML Monitoring is fundamentally different from application performance monitoring or data monitoring. Understand what model monitoring must achieve for batch and real-time model serving use cases. Then dig in with us as we focus on the batch prediction use case for model scoring and demonstrate how we can leverage the core Apache Spark engine to easily monitor model performance and identify errors in serving pipelines.

For any organization whose core product or business depends on ML models (think Slack search, Twitter feed ranking, or Tesla Autopilot), ensuring that production ML models are performing with high efficacy is crucial. In fact, according to the McKinsey report on model risk, defective models have led to revenue losses of hundreds of millions of dollars in the financial sector alone. However, in spite of the significant harms of defective models, tools to detect and remedy model performance issues for production ML models are missing.

Based on our experience building ML debugging and robustness tools at MIT CSAIL and managing large-scale model inference services at Twitter, Nvidia, and now at Verta, we developed a generalized model monitoring framework that can monitor a wide variety of ML models, work unchanged in batch and real-time inference scenarios, and scale to millions of inference requests. In this talk, we focus on how this framework applies to monitoring ML inference workflows built on top of Apache Spark and Databricks. We describe how we can supplement the massively scalable data processing capabilities of these platforms with statistical processors to support the monitoring and debugging of ML models.

Learn how ML Monitoring is fundamentally different from application performance monitoring or data monitoring. Understand what model monitoring must achieve for batch and real-time model serving use cases. Then dig in with us as we focus on the batch prediction use case for model scoring and demonstrate how we can leverage the core Apache Spark engine to easily monitor model performance and identify errors in serving pipelines.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Model Monitoring at Scale with Apache Spark and Verta (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Model Monitoring at Scale with Apache Spark and Verta

  1. 1. Model Monitoring at Scale with Apache Spark and Verta Manasi Vartak, Ph.D. Founder and CEO, Verta Inc www.verta.ai | @DataCereal
  2. 2. About 2 https://github.com/VertaAI/modeldb - Ph.D. thesis at MIT CSAIL on model management and diagnosis - Created ModelDB: Open-source ML model management & versioning - Released at Spark Summit 2017! - ML @ Twitter, Google, Facebook https://www.verta.ai/product - End-to-end MLOps platform for ML model delivery, operations and monitoring - Serving models for some of the top tech cos, finance, insurance, etc.
  3. 3. Agenda ▴ Why Model Monitoring? ▴ What is Model Monitoring? ▴ Generalized Framework for Model Monitoring ▴ Monitoring at scale with Apache Spark ▴ Wrap up 3
  4. 4. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  5. 5. ML Models are used across all functions
  6. 6. AI-ML doesn’t always work as expected ...models used to predict delinquencies suddenly stopped working as the pandemic hit since data used to build them was simply no longer relevant. -- Head of Consumer Banking, Top US Bank https://www.globalbankingandfinance.com/a-framew ork-for-analytics-operational-risk-management/
  7. 7. What we are hearing in the field 7 Our ad-serving system saw a revenue loss of $20K in 10 minutes and we had no idea why that happened. We had to dig through all kinds of logs to piece together what had happened Our model results are used to make automated pricing decisions. So preempting bad model predictions can save us millions of dollars Engineers with minimal ML expertise consume these models, so they are black-boxes to them. We try our best to tell the product team when something is wrong with the model, but that’s really hard to do Head of DS, US Ad-tech Company ML Team Manager, Silicon Valley Unicorn Top-5 US E-Commerce Retailer
  8. 8. 8 How do we solve these problems? Enter, Model Monitoring.
  9. 9. What is Model Monitoring? 9
  10. 10. ▴ Know when models are failing ▴ Quickly find the root cause ▴ Close the loop by fast recovery 10 Ensuring model results are consistently of high quality *We refer to all latency, throughput etc. as model service health
  11. 11. I. How can we know when a model fails? 11 Model Input Output Ground- truth Model Input Output Ground- truth 30 days Feedback is not instantaneous Featurized Data (train) Input (train) Featurized Data (test) vs. Input (test) vs. Output (train) Output (test) vs.
  12. 12. II. How can we find the root cause of model failures? 12 DB1 DB2 ETL1 ETL3 Model1 Pred1 DB3 ETL4 ETL5 Model2 Pred2 ?? DB4 ETL6 Model3 Pred3 ?? ETL2 DB2 ETL7 ETL8
  13. 13. III. How can we close the loop for fast recovery? ▴ Know the problem before it happens so you can take action ○ E.g., Missing feature? Impute or fall back to a different model ○ E.g., Set alerts on upstream data so that defects do not propagate downstream ▴ Close the loop by integrating into rest of ML pipeline ○ Re-train model ○ Send data to labeling software ○ Fall back to previous version of the model 13
  14. 14. What’s the alternative? 14 M input output Logs Analysis Pipeline Monitoring X 100 X 100 ▴ Custom analysis pipelines for each model type (maintenance burden) ▴ Difficult to get a global view (vs. per-model view) required for root cause analysis ▴ Takes >quarter to get something basic set up
  15. 15. Challenges with ML Monitoring ▴ Measurement. Measuring quality in absence of ground-truth is challenging ▴ Customization. Quality metrics are unique to each model type and domain ▴ Pipeline Jungles. Convoluted model lineage and data pipelines make root cause analysis extremely hard ▴ Accessibility. For non-experts to consume models, monitoring must be easy to plug in and interpret ▴ Scale. Must scale to large datasets, large number of statistics, and to live+batch inference 15
  16. 16. Introducing a Generalized Framework for Model Monitoring 16
  17. 17. Goals ▴ Make it flexible ○ Monitor models running on any serving platform, any ML framework ○ Monitor data pipeline, batch and live models ▴ Make it customizable ○ Use out of box statistics, or ○ Define your own custom functions and statistical properties to monitor & visualize ▴ Close the loop ○ Automate recovery and alert resolution process 17
  18. 18. How does it work? 18 ... Ground truth Data/Model Pipelines Remediation - Retrain - Rollback - Human loop Models (Batch, Live) Take automated actions Get notified Get insights, visualize, debug Configure profilers, alerts Ingest ground-truth Ingest input, output Ingest data
  19. 19. Data1 Data1 SummarySamples + Metadata SummarySamples + Metadata How does data ingest work? 19 Data1 Profiler1 SummarySamples + Metadata SummarySamples + Metadata SummarySamples + Metadata ProfilerN SummarySamples + Metadata ... Summary1 SummaryN ... Data2 DataN ... Profiler1 ... Profiler2 ... ...
  20. 20. summary.enable_live(profiler) 20 But what about real-time?
  21. 21. Demo: Monitoring Spark ML Pipelines 21
  22. 22. Demo Setup ▴ Batch prediction pipeline w/Spark ▴ New data arrives daily 22 DB1 DB2 ETL1 ETL2 Model Pred DB1 DB2 ETL1 ETL2 Model Pred DB1 DB2 ETL1 ETL2 Model Pred ??
  23. 23. Demo Setup 23 CSV StringIndexer (0) GBDT (4) Pred StringIndexer (1) StringIndexer (2) VectorAssembler (3)
  24. 24. But what if? ▴ 3 interconnected pipelines w/model dependencies ▴ What happens when DB2 is broken? ▴ What happens when ETL4 is broken? 24 DB1 DB2 ETL1 ETL2 Model1 Pred1 DB3 ETL3 ETL4 Model 2 Pred2 ?? DB4 ETL5 Model 3 Pred3 ??
  25. 25. Summary ▴ ML Models drive key user experiences and business decisions ▴ Model Monitoring ensures model results are consistently of high quality ▴ When done right, Model Monitoring can: ○ Save $20K in 10 mins ○ Identify failing models before social media does! ○ Safely democratize AI 25
  26. 26. 26 Thank you. Intrigued? Check out: https://monitoring.verta.ai

×