SlideShare a Scribd company logo
1 of 19
Download to read offline
A “Real-Time”
Architecture for
Machine Learning
Execution with
MLeap
Noah Pritikin, Site Reliability Engineer
Spark+AI Summit 2019 | April 24, 2019
Machine Learning Applications
Detecting credit-card fraud
Financial markets
Online advertising
Recommender systems
Robotics
…
Agriculture
Automated medical diagnosis
Computer vision
Insurance
Marketing
Sentiment analysis
User behavior analytics
Weather forecasting
…
I am defining “Real-Time” as <100ms for the context of this presentation.
Not “Real-Time” “Real-Time”
Agenda
What is Kount?
Data Pipeline Context
“Real-Time” Architecture / Model Governance
Statistical Metrics and Monitoring
Q&A
What is Kount?
Fighting Fraud, Boosting Revenue
Industry-Leading Technology & Experience
Developing fraud-fighting technology since 1999
AI/Machine Learning Implemented in 2007
Dozens of Patented Technologies
Continuous Innovation
A SaaS-Based, All-in-One Fraud Mitigation
Platform Safeguard Some of the World’s Largest
Merchants
Payment Service Providers
Ecommerce Platforms
$80M Investment from CVC Growth Partners
Data Pipeline Context
Data Pipeline Context
Highly-available Client-facing
Infrastructure / Services
Kount Data Lake
Data Science
Magical Fairy Dust!
Machine Learning Model
(MLeap Pipeline)
Machine Learning
Execution Platform
MLeap API Servers
“Real-Time” Architecture / Model Governance
First iteration was our baseline for improvement.
We were faced with a technical problem to solve…
Kount Boost Technology™ was released to production in October 2017.
First iteration of the architecture based on Python3 / Scikit-learn worked, but…
• Lacked portability
• Challenging to scale into the future
• Lacked multiple model support
• Limited model governance
Built in-house Apache Spark cluster in January 2018.
• Begin iterating on Boost Technology™ model improvements (e.g. feature engineering, tuning
model hyper parameters, etc.).
Spark ML-generated models depend on a SparkContext, but “real-time” predictions required!
“Real-Time” Architecture Overview
Feature Extraction separated from
Transaction Prediction
Hosting multiple models allow for blue-
green deployments
Centralized model governance
Load balancer deployed in a “sidecar
proxy” implementation allowing for
simpler Feature Extraction instance
design
• Backend health checks make a
prediction on a test transaction
MLeap API instances run GC-optimized
Java8 configuration
JVM metrics (e.g. Jolokia, etc.)
Dark Production Infrastructure
Dark Production Infrastructure
An entirely separate parallel infrastructure
in production
NO customer impact
NO “real-time” requirements
Parallelization is implemented via a
message bus (e.g. Kafka, Kinesis,
ZeroMQ, etc.)
Optimize cost through only processing a
fraction of production traffic (e.g. 1/3)
Only logs raw predictions that are
returned from MLeap for later analysis
Dark production infrastructure enables model governance / validation.
Tools Enabling Model Governance
Centrally track state of machine learning models – end-to-end!
Train model &
verify quality
Add model to
governance data
store
Deploy model to
dark production
infrastructure
MLeap API
instances
Dark
production
infrastructure
test?
Bad Deploy to available
production MLeap
API instances
Good
Migrate production
traffic to MLeap
API instances
hosting new model
Unload retired
model from MLeap
API instances
End
Replaced
model?
No
Yes
Statistical Metrics and Monitoring
“Real-Time” Architecture Performance – Transforming LEAP frames
This is NOT machine learning model performance (e.g. TOC curve, ROC
curve, PR curve, etc.)
“Real-Time” system requires metrics to measure the systemic performance.
+ Distributions!
Due to “real-time” requirements, averages don’t cut it (by themselves…)
Distributions provide critical visibility in monitoring low latency systems.
Averages
Applied Statistics
Boost without MLeap (previous)
Boost with MLeap (current)
Average 95th Percentile 99th Percentile Standard Deviation
19.27ms 24ms 37ms 5.31ms
Average 95th Percentile 99th Percentile Standard Deviation
7.00ms 9ms 16ms 2.41ms
– Improvement with MLeap!
99th percentile
saw a ~56%
improvement!
Consider Improvements to Your “Real-Time” Architecture!
MLeap…
Model governance…
Dark Production Infrastructure (assisting with model testing)…
Latency Metrics (emphasize the use of distributions)…
Further reading…
• “Deploying Apache Spark Supervised Machine Learning Models to
Production with MLeap” - https://medium.com/@combust/9e0fb57f79db
• MLeap GitHub repo - https://github.com/combust/mleap
• MLeap documentation - http://mleap-docs.combust.ml/
Thank you! … and, Q&A?

More Related Content

More from Databricks

Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

More from Databricks (20)

Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 

Recently uploaded

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
hwhqz6r1y
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra MalangToko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
adet6151
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
0uyfyq0q4
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
adet6151
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 

Recently uploaded (20)

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
ℂall Girls Kashmiri Gate ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Kashmiri Gate ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Kashmiri Gate ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Kashmiri Gate ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra MalangToko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 

A “Real-Time” Architecture for Machine Learning Execution with MLeap

  • 1. A “Real-Time” Architecture for Machine Learning Execution with MLeap Noah Pritikin, Site Reliability Engineer Spark+AI Summit 2019 | April 24, 2019
  • 2. Machine Learning Applications Detecting credit-card fraud Financial markets Online advertising Recommender systems Robotics … Agriculture Automated medical diagnosis Computer vision Insurance Marketing Sentiment analysis User behavior analytics Weather forecasting … I am defining “Real-Time” as <100ms for the context of this presentation. Not “Real-Time” “Real-Time”
  • 3. Agenda What is Kount? Data Pipeline Context “Real-Time” Architecture / Model Governance Statistical Metrics and Monitoring Q&A
  • 5. Fighting Fraud, Boosting Revenue Industry-Leading Technology & Experience Developing fraud-fighting technology since 1999 AI/Machine Learning Implemented in 2007 Dozens of Patented Technologies Continuous Innovation A SaaS-Based, All-in-One Fraud Mitigation Platform Safeguard Some of the World’s Largest Merchants Payment Service Providers Ecommerce Platforms $80M Investment from CVC Growth Partners
  • 7. Data Pipeline Context Highly-available Client-facing Infrastructure / Services Kount Data Lake Data Science Magical Fairy Dust! Machine Learning Model (MLeap Pipeline) Machine Learning Execution Platform MLeap API Servers
  • 9. First iteration was our baseline for improvement. We were faced with a technical problem to solve… Kount Boost Technology™ was released to production in October 2017. First iteration of the architecture based on Python3 / Scikit-learn worked, but… • Lacked portability • Challenging to scale into the future • Lacked multiple model support • Limited model governance Built in-house Apache Spark cluster in January 2018. • Begin iterating on Boost Technology™ model improvements (e.g. feature engineering, tuning model hyper parameters, etc.). Spark ML-generated models depend on a SparkContext, but “real-time” predictions required!
  • 10. “Real-Time” Architecture Overview Feature Extraction separated from Transaction Prediction Hosting multiple models allow for blue- green deployments Centralized model governance Load balancer deployed in a “sidecar proxy” implementation allowing for simpler Feature Extraction instance design • Backend health checks make a prediction on a test transaction MLeap API instances run GC-optimized Java8 configuration JVM metrics (e.g. Jolokia, etc.)
  • 12. Dark Production Infrastructure An entirely separate parallel infrastructure in production NO customer impact NO “real-time” requirements Parallelization is implemented via a message bus (e.g. Kafka, Kinesis, ZeroMQ, etc.) Optimize cost through only processing a fraction of production traffic (e.g. 1/3) Only logs raw predictions that are returned from MLeap for later analysis Dark production infrastructure enables model governance / validation.
  • 13. Tools Enabling Model Governance Centrally track state of machine learning models – end-to-end! Train model & verify quality Add model to governance data store Deploy model to dark production infrastructure MLeap API instances Dark production infrastructure test? Bad Deploy to available production MLeap API instances Good Migrate production traffic to MLeap API instances hosting new model Unload retired model from MLeap API instances End Replaced model? No Yes
  • 15. “Real-Time” Architecture Performance – Transforming LEAP frames This is NOT machine learning model performance (e.g. TOC curve, ROC curve, PR curve, etc.) “Real-Time” system requires metrics to measure the systemic performance.
  • 16. + Distributions! Due to “real-time” requirements, averages don’t cut it (by themselves…) Distributions provide critical visibility in monitoring low latency systems. Averages
  • 17. Applied Statistics Boost without MLeap (previous) Boost with MLeap (current) Average 95th Percentile 99th Percentile Standard Deviation 19.27ms 24ms 37ms 5.31ms Average 95th Percentile 99th Percentile Standard Deviation 7.00ms 9ms 16ms 2.41ms – Improvement with MLeap! 99th percentile saw a ~56% improvement!
  • 18. Consider Improvements to Your “Real-Time” Architecture! MLeap… Model governance… Dark Production Infrastructure (assisting with model testing)… Latency Metrics (emphasize the use of distributions)… Further reading… • “Deploying Apache Spark Supervised Machine Learning Models to Production with MLeap” - https://medium.com/@combust/9e0fb57f79db • MLeap GitHub repo - https://github.com/combust/mleap • MLeap documentation - http://mleap-docs.combust.ml/
  • 19. Thank you! … and, Q&A?