SlideShare a Scribd company logo
1 of 27
Download to read offline
Serve ML Models (low-latency prediction
systems) at scale in the Cloud and at the Edge
Srininivasa Rao Aravilli
Senior Engineering Manager
Cisco Systems
Aravilli
About me
Name : Srinivasa Rao Aravilli
Experience : 23 years ( wish my age now J )
Interests : Distributed Computing, AI/ML, Security and Cloud
Patent : Reinforcement Learning based software recommendations for network devices
Papers : Arxiv: VEDAR (Anomaly Detection ), Advaita ( Bug Duplicity Detection System)
SOA Journal: Various papers related to SOAP, UDDI, JAX-RPC …
Speakers in various conferences : AI/ML Talks
Coach/Mentor : Advanced Certification in Machine Learning and Cloud -
Course from : IIT Madras and upGrad
Advaita – Flow Diagram/ML Pipeline (Offline /Online Mode)
Offline - detecting duplicates for list of new bugs which are already filed for a given product
Online - detecting duplicates while filing a new bug in the bugs systems
New
Bug/Bugs
Feature
ExtractionBugs
5ML
Model
Preprocessing
1 2
3
Preprocessing
Feature Extraction
4
Probable Duplicate
Bugs
6
Use case : Bug Duplicity Detection System
PredictionsML ModelData Set
§ Open Source Systems Bugs
§ Number of Bugs ( Firefox) =
~37,000
§ Framework : XGBoost
§ Classification : Binary
§ Features :
Syntax, Semantic, Edit
Distances, word embeddings,
fast-text
How to serve the predictions
at scale with low latency ?
New Bugs ( for Online )
Existing Bugs ( Batch )
One of the possible solution…..
Bug Duplicate
Dection System
(XgBoost)
Network Failure
(Spark)
PII
MXNet
New System
Bug Duplicate
Serving System
(XgBoost)
N/F Server
System
(Spark)
PII
Serving System
MXNet
Phishing
Serving System
Phishing
(Scikit)
Business APP
????
Use case : Phishing websites – Detection
PredictionsML ModelData Set
§ Phishing Websites Data Set
§ Data Set size ~2500
§ Number of attributes = 30
§ Classification = Binary
https://archive.ics.uci.edu/ml/da
tasets/Phishing+Websites
§ Framework : Scikit Learn
§ Classifier = Random Forrest
§ Features : 30
§ Model Persistence :
Joblib or Pickle
https://github.com/aravilli/Med
ha-AI/blob/master/Phishing-
RF.ipynb
How to serve the model
predictions at scale with low
latency?
Use case : Device Failures Detection
PredictionsML ModelData Set
§ Syslogs and Config Files
§ Billons of historic syslogs
§ Framework : Spark ML
§ Unsupervised Learning
(Clustering), Association
Mining
How to serve the predictions
at scale with low latency at
the edge?
name user|
Public
Confidential
Highly Confidential
Restricted
name
Public
Confidential
Highly Confidential
Restricted
name support|
Public
Confidential
Highly Confidential
Restricted
name product|
Public
Confidential
Highly Confidential
Restricted
APTA : Context Aware Automatic Detection of Sensitive
Terms in documents
Lets Add some context
APTA:Personally identifiable information - Detection
PredictionsML Model
§ Framework : MXNet
§ Classification : Multi Class
How to serve the predictions
at scale with low latency at
the edge?
Streaming and Batch
Data Set
§ SQL Files
§ Documents
Challenges to serve these models ….
• Building & maintaining separate severing systems for each framework is
expensive and maintenance
Bug Duplicate
Dection System
(XgBoost)
Network Failure
(Spark)
PII
MXNet
Multiple Models
Bug Duplicate
Serving System
(XgBoost)
N/F Server
System
(Spark)
PII
Serving System
MXNet
Phishing
Serving System
Phishing
(Scikit)
Business APP
Challenges to serve these models ….
• Building and Serving pre-materialized predictions have significant
computation, space costs , costly updates and may not possible in all use
cases
Clipper (A low-latency prediction-serving system )
Developed by riselab @ US Berkeley
Github : https://github.com/ucbrise/clipper
https://www.usenix.org/sites/default/files/conference/protected-
files/nsdi17_slides_crankshaw.pdf
http://learningsys.org/nips17/assets/slides/clipper-nips17.pdf
Clipper Architecture
Source : http://clipper.ai/tutorials/basic_concepts/
Model Deployment, Versioning, Replication
Let us run through an example
Model Creation &
Persistence
Clipper Installation Starting Cluster Model Linking
Model
Serving
pip install clipper_admin
Model Creation &
Persistence
Clipper Installation Starting Cluster Model Linking
Model
Serving
Clipper
Installation
Starting Cluster
Model Load &
Deployment
Model Linking
Model
Serving
Clipper
Installation
Starting Cluster
Model
Deployment
App Registration &
Model Linking
Model
Serving
Live Demo
Models Support in Clipper
Clipper
Models support
• Clipper provides the following deployer
modules:
• Arbitrary Python functions
• PySpark Models
• PyTorch Models
• Tensorflow Models
• MXNet Models
• PyTorch Models exported
as ONNX file with Caffe2
Serving Backend
Clipper – Adaptive batch
Ray Serve: A serving system for any scale
Source: https://risecamp.berkeley.edu/
Thank You
Questions ?

More Related Content

Similar to Serve and Scale ML Models ( Low latency prediction systems) at Scale

SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)limscoder
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Mark Tabladillo
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2Bill Liu
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
 
Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Trayan Iliev
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_VeriticalsPeyman Mohajerian
 
Fact based Generative AI
Fact based Generative AIFact based Generative AI
Fact based Generative AIStefan Weber
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
From SOA to SCA and FraSCAti
From SOA to SCA and FraSCAtiFrom SOA to SCA and FraSCAti
From SOA to SCA and FraSCAtiphilippe_merle
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
How to Think Multi-Cloud
How to Think Multi-CloudHow to Think Multi-Cloud
How to Think Multi-CloudRightScale
 
StratusLab: A IaaS Cloud Distribution Focusing on Simplicity
StratusLab: A IaaS Cloud Distribution Focusing on SimplicityStratusLab: A IaaS Cloud Distribution Focusing on Simplicity
StratusLab: A IaaS Cloud Distribution Focusing on Simplicitystratuslab
 
cloud computing preservity
cloud computing preservitycloud computing preservity
cloud computing preservitychennuruvishnu
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 

Similar to Serve and Scale ML Models ( Low latency prediction systems) at Scale (20)

SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
Software Engineering 101
Software Engineering 101Software Engineering 101
Software Engineering 101
 
Fact based Generative AI
Fact based Generative AIFact based Generative AI
Fact based Generative AI
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
From SOA to SCA and FraSCAti
From SOA to SCA and FraSCAtiFrom SOA to SCA and FraSCAti
From SOA to SCA and FraSCAti
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
How to Think Multi-Cloud
How to Think Multi-CloudHow to Think Multi-Cloud
How to Think Multi-Cloud
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
StratusLab: A IaaS Cloud Distribution Focusing on Simplicity
StratusLab: A IaaS Cloud Distribution Focusing on SimplicityStratusLab: A IaaS Cloud Distribution Focusing on Simplicity
StratusLab: A IaaS Cloud Distribution Focusing on Simplicity
 
cloud computing preservity
cloud computing preservitycloud computing preservity
cloud computing preservity
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

Serve and Scale ML Models ( Low latency prediction systems) at Scale

  • 1. Serve ML Models (low-latency prediction systems) at scale in the Cloud and at the Edge Srininivasa Rao Aravilli Senior Engineering Manager Cisco Systems Aravilli
  • 2. About me Name : Srinivasa Rao Aravilli Experience : 23 years ( wish my age now J ) Interests : Distributed Computing, AI/ML, Security and Cloud Patent : Reinforcement Learning based software recommendations for network devices Papers : Arxiv: VEDAR (Anomaly Detection ), Advaita ( Bug Duplicity Detection System) SOA Journal: Various papers related to SOAP, UDDI, JAX-RPC … Speakers in various conferences : AI/ML Talks Coach/Mentor : Advanced Certification in Machine Learning and Cloud - Course from : IIT Madras and upGrad
  • 3. Advaita – Flow Diagram/ML Pipeline (Offline /Online Mode) Offline - detecting duplicates for list of new bugs which are already filed for a given product Online - detecting duplicates while filing a new bug in the bugs systems New Bug/Bugs Feature ExtractionBugs 5ML Model Preprocessing 1 2 3 Preprocessing Feature Extraction 4 Probable Duplicate Bugs 6
  • 4. Use case : Bug Duplicity Detection System PredictionsML ModelData Set § Open Source Systems Bugs § Number of Bugs ( Firefox) = ~37,000 § Framework : XGBoost § Classification : Binary § Features : Syntax, Semantic, Edit Distances, word embeddings, fast-text How to serve the predictions at scale with low latency ? New Bugs ( for Online ) Existing Bugs ( Batch )
  • 5. One of the possible solution….. Bug Duplicate Dection System (XgBoost) Network Failure (Spark) PII MXNet New System Bug Duplicate Serving System (XgBoost) N/F Server System (Spark) PII Serving System MXNet Phishing Serving System Phishing (Scikit) Business APP ????
  • 6. Use case : Phishing websites – Detection PredictionsML ModelData Set § Phishing Websites Data Set § Data Set size ~2500 § Number of attributes = 30 § Classification = Binary https://archive.ics.uci.edu/ml/da tasets/Phishing+Websites § Framework : Scikit Learn § Classifier = Random Forrest § Features : 30 § Model Persistence : Joblib or Pickle https://github.com/aravilli/Med ha-AI/blob/master/Phishing- RF.ipynb How to serve the model predictions at scale with low latency?
  • 7. Use case : Device Failures Detection PredictionsML ModelData Set § Syslogs and Config Files § Billons of historic syslogs § Framework : Spark ML § Unsupervised Learning (Clustering), Association Mining How to serve the predictions at scale with low latency at the edge?
  • 8. name user| Public Confidential Highly Confidential Restricted name Public Confidential Highly Confidential Restricted name support| Public Confidential Highly Confidential Restricted name product| Public Confidential Highly Confidential Restricted APTA : Context Aware Automatic Detection of Sensitive Terms in documents Lets Add some context
  • 9. APTA:Personally identifiable information - Detection PredictionsML Model § Framework : MXNet § Classification : Multi Class How to serve the predictions at scale with low latency at the edge? Streaming and Batch Data Set § SQL Files § Documents
  • 10. Challenges to serve these models …. • Building & maintaining separate severing systems for each framework is expensive and maintenance Bug Duplicate Dection System (XgBoost) Network Failure (Spark) PII MXNet Multiple Models Bug Duplicate Serving System (XgBoost) N/F Server System (Spark) PII Serving System MXNet Phishing Serving System Phishing (Scikit) Business APP
  • 11. Challenges to serve these models …. • Building and Serving pre-materialized predictions have significant computation, space costs , costly updates and may not possible in all use cases
  • 12. Clipper (A low-latency prediction-serving system ) Developed by riselab @ US Berkeley Github : https://github.com/ucbrise/clipper https://www.usenix.org/sites/default/files/conference/protected- files/nsdi17_slides_crankshaw.pdf http://learningsys.org/nips17/assets/slides/clipper-nips17.pdf
  • 13. Clipper Architecture Source : http://clipper.ai/tutorials/basic_concepts/
  • 15. Let us run through an example
  • 16. Model Creation & Persistence Clipper Installation Starting Cluster Model Linking Model Serving
  • 17. pip install clipper_admin Model Creation & Persistence Clipper Installation Starting Cluster Model Linking Model Serving
  • 18. Clipper Installation Starting Cluster Model Load & Deployment Model Linking Model Serving
  • 21. Models Support in Clipper
  • 22. Clipper Models support • Clipper provides the following deployer modules: • Arbitrary Python functions • PySpark Models • PyTorch Models • Tensorflow Models • MXNet Models • PyTorch Models exported as ONNX file with Caffe2 Serving Backend
  • 24.
  • 25.
  • 26. Ray Serve: A serving system for any scale Source: https://risecamp.berkeley.edu/