SlideShare a Scribd company logo
1 of 27
Download to read offline
Serve ML Models (low-latency prediction
systems) at scale in the Cloud and at the Edge
Srininivasa Rao Aravilli
Senior Engineering Manager
Cisco Systems
Aravilli
About me
Name : Srinivasa Rao Aravilli
Experience : 23 years ( wish my age now J )
Interests : Distributed Computing, AI/ML, Security and Cloud
Patent : Reinforcement Learning based software recommendations for network devices
Papers : Arxiv: VEDAR (Anomaly Detection ), Advaita ( Bug Duplicity Detection System)
SOA Journal: Various papers related to SOAP, UDDI, JAX-RPC …
Speakers in various conferences : AI/ML Talks
Coach/Mentor : Advanced Certification in Machine Learning and Cloud -
Course from : IIT Madras and upGrad
Advaita – Flow Diagram/ML Pipeline (Offline /Online Mode)
Offline - detecting duplicates for list of new bugs which are already filed for a given product
Online - detecting duplicates while filing a new bug in the bugs systems
New
Bug/Bugs
Feature
ExtractionBugs
5ML
Model
Preprocessing
1 2
3
Preprocessing
Feature Extraction
4
Probable Duplicate
Bugs
6
Use case : Bug Duplicity Detection System
PredictionsML ModelData Set
§ Open Source Systems Bugs
§ Number of Bugs ( Firefox) =
~37,000
§ Framework : XGBoost
§ Classification : Binary
§ Features :
Syntax, Semantic, Edit
Distances, word embeddings,
fast-text
How to serve the predictions
at scale with low latency ?
New Bugs ( for Online )
Existing Bugs ( Batch )
One of the possible solution…..
Bug Duplicate
Dection System
(XgBoost)
Network Failure
(Spark)
PII
MXNet
New System
Bug Duplicate
Serving System
(XgBoost)
N/F Server
System
(Spark)
PII
Serving System
MXNet
Phishing
Serving System
Phishing
(Scikit)
Business APP
????
Use case : Phishing websites – Detection
PredictionsML ModelData Set
§ Phishing Websites Data Set
§ Data Set size ~2500
§ Number of attributes = 30
§ Classification = Binary
https://archive.ics.uci.edu/ml/da
tasets/Phishing+Websites
§ Framework : Scikit Learn
§ Classifier = Random Forrest
§ Features : 30
§ Model Persistence :
Joblib or Pickle
https://github.com/aravilli/Med
ha-AI/blob/master/Phishing-
RF.ipynb
How to serve the model
predictions at scale with low
latency?
Use case : Device Failures Detection
PredictionsML ModelData Set
§ Syslogs and Config Files
§ Billons of historic syslogs
§ Framework : Spark ML
§ Unsupervised Learning
(Clustering), Association
Mining
How to serve the predictions
at scale with low latency at
the edge?
name user|
Public
Confidential
Highly Confidential
Restricted
name
Public
Confidential
Highly Confidential
Restricted
name support|
Public
Confidential
Highly Confidential
Restricted
name product|
Public
Confidential
Highly Confidential
Restricted
APTA : Context Aware Automatic Detection of Sensitive
Terms in documents
Lets Add some context
APTA:Personally identifiable information - Detection
PredictionsML Model
§ Framework : MXNet
§ Classification : Multi Class
How to serve the predictions
at scale with low latency at
the edge?
Streaming and Batch
Data Set
§ SQL Files
§ Documents
Challenges to serve these models ….
• Building & maintaining separate severing systems for each framework is
expensive and maintenance
Bug Duplicate
Dection System
(XgBoost)
Network Failure
(Spark)
PII
MXNet
Multiple Models
Bug Duplicate
Serving System
(XgBoost)
N/F Server
System
(Spark)
PII
Serving System
MXNet
Phishing
Serving System
Phishing
(Scikit)
Business APP
Challenges to serve these models ….
• Building and Serving pre-materialized predictions have significant
computation, space costs , costly updates and may not possible in all use
cases
Clipper (A low-latency prediction-serving system )
Developed by riselab @ US Berkeley
Github : https://github.com/ucbrise/clipper
https://www.usenix.org/sites/default/files/conference/protected-
files/nsdi17_slides_crankshaw.pdf
http://learningsys.org/nips17/assets/slides/clipper-nips17.pdf
Clipper Architecture
Source : http://clipper.ai/tutorials/basic_concepts/
Model Deployment, Versioning, Replication
Let us run through an example
Model Creation &
Persistence
Clipper Installation Starting Cluster Model Linking
Model
Serving
pip install clipper_admin
Model Creation &
Persistence
Clipper Installation Starting Cluster Model Linking
Model
Serving
Clipper
Installation
Starting Cluster
Model Load &
Deployment
Model Linking
Model
Serving
Clipper
Installation
Starting Cluster
Model
Deployment
App Registration &
Model Linking
Model
Serving
Live Demo
Models Support in Clipper
Clipper
Models support
• Clipper provides the following deployer
modules:
• Arbitrary Python functions
• PySpark Models
• PyTorch Models
• Tensorflow Models
• MXNet Models
• PyTorch Models exported
as ONNX file with Caffe2
Serving Backend
Clipper – Adaptive batch
Ray Serve: A serving system for any scale
Source: https://risecamp.berkeley.edu/
Thank You
Questions ?

More Related Content

Similar to Serve and Scale ML Models ( Low latency prediction systems) at Scale

SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)limscoder
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Mark Tabladillo
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2Bill Liu
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
 
Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Trayan Iliev
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_VeriticalsPeyman Mohajerian
 
Fact based Generative AI
Fact based Generative AIFact based Generative AI
Fact based Generative AIStefan Weber
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
From SOA to SCA and FraSCAti
From SOA to SCA and FraSCAtiFrom SOA to SCA and FraSCAti
From SOA to SCA and FraSCAtiphilippe_merle
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
How to Think Multi-Cloud
How to Think Multi-CloudHow to Think Multi-Cloud
How to Think Multi-CloudRightScale
 
StratusLab: A IaaS Cloud Distribution Focusing on Simplicity
StratusLab: A IaaS Cloud Distribution Focusing on SimplicityStratusLab: A IaaS Cloud Distribution Focusing on Simplicity
StratusLab: A IaaS Cloud Distribution Focusing on Simplicitystratuslab
 
cloud computing preservity
cloud computing preservitycloud computing preservity
cloud computing preservitychennuruvishnu
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 

Similar to Serve and Scale ML Models ( Low latency prediction systems) at Scale (20)

SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
Software Engineering 101
Software Engineering 101Software Engineering 101
Software Engineering 101
 
Fact based Generative AI
Fact based Generative AIFact based Generative AI
Fact based Generative AI
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
From SOA to SCA and FraSCAti
From SOA to SCA and FraSCAtiFrom SOA to SCA and FraSCAti
From SOA to SCA and FraSCAti
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
How to Think Multi-Cloud
How to Think Multi-CloudHow to Think Multi-Cloud
How to Think Multi-Cloud
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
StratusLab: A IaaS Cloud Distribution Focusing on Simplicity
StratusLab: A IaaS Cloud Distribution Focusing on SimplicityStratusLab: A IaaS Cloud Distribution Focusing on Simplicity
StratusLab: A IaaS Cloud Distribution Focusing on Simplicity
 
cloud computing preservity
cloud computing preservitycloud computing preservity
cloud computing preservity
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Serve and Scale ML Models ( Low latency prediction systems) at Scale

  • 1. Serve ML Models (low-latency prediction systems) at scale in the Cloud and at the Edge Srininivasa Rao Aravilli Senior Engineering Manager Cisco Systems Aravilli
  • 2. About me Name : Srinivasa Rao Aravilli Experience : 23 years ( wish my age now J ) Interests : Distributed Computing, AI/ML, Security and Cloud Patent : Reinforcement Learning based software recommendations for network devices Papers : Arxiv: VEDAR (Anomaly Detection ), Advaita ( Bug Duplicity Detection System) SOA Journal: Various papers related to SOAP, UDDI, JAX-RPC … Speakers in various conferences : AI/ML Talks Coach/Mentor : Advanced Certification in Machine Learning and Cloud - Course from : IIT Madras and upGrad
  • 3. Advaita – Flow Diagram/ML Pipeline (Offline /Online Mode) Offline - detecting duplicates for list of new bugs which are already filed for a given product Online - detecting duplicates while filing a new bug in the bugs systems New Bug/Bugs Feature ExtractionBugs 5ML Model Preprocessing 1 2 3 Preprocessing Feature Extraction 4 Probable Duplicate Bugs 6
  • 4. Use case : Bug Duplicity Detection System PredictionsML ModelData Set § Open Source Systems Bugs § Number of Bugs ( Firefox) = ~37,000 § Framework : XGBoost § Classification : Binary § Features : Syntax, Semantic, Edit Distances, word embeddings, fast-text How to serve the predictions at scale with low latency ? New Bugs ( for Online ) Existing Bugs ( Batch )
  • 5. One of the possible solution….. Bug Duplicate Dection System (XgBoost) Network Failure (Spark) PII MXNet New System Bug Duplicate Serving System (XgBoost) N/F Server System (Spark) PII Serving System MXNet Phishing Serving System Phishing (Scikit) Business APP ????
  • 6. Use case : Phishing websites – Detection PredictionsML ModelData Set § Phishing Websites Data Set § Data Set size ~2500 § Number of attributes = 30 § Classification = Binary https://archive.ics.uci.edu/ml/da tasets/Phishing+Websites § Framework : Scikit Learn § Classifier = Random Forrest § Features : 30 § Model Persistence : Joblib or Pickle https://github.com/aravilli/Med ha-AI/blob/master/Phishing- RF.ipynb How to serve the model predictions at scale with low latency?
  • 7. Use case : Device Failures Detection PredictionsML ModelData Set § Syslogs and Config Files § Billons of historic syslogs § Framework : Spark ML § Unsupervised Learning (Clustering), Association Mining How to serve the predictions at scale with low latency at the edge?
  • 8. name user| Public Confidential Highly Confidential Restricted name Public Confidential Highly Confidential Restricted name support| Public Confidential Highly Confidential Restricted name product| Public Confidential Highly Confidential Restricted APTA : Context Aware Automatic Detection of Sensitive Terms in documents Lets Add some context
  • 9. APTA:Personally identifiable information - Detection PredictionsML Model § Framework : MXNet § Classification : Multi Class How to serve the predictions at scale with low latency at the edge? Streaming and Batch Data Set § SQL Files § Documents
  • 10. Challenges to serve these models …. • Building & maintaining separate severing systems for each framework is expensive and maintenance Bug Duplicate Dection System (XgBoost) Network Failure (Spark) PII MXNet Multiple Models Bug Duplicate Serving System (XgBoost) N/F Server System (Spark) PII Serving System MXNet Phishing Serving System Phishing (Scikit) Business APP
  • 11. Challenges to serve these models …. • Building and Serving pre-materialized predictions have significant computation, space costs , costly updates and may not possible in all use cases
  • 12. Clipper (A low-latency prediction-serving system ) Developed by riselab @ US Berkeley Github : https://github.com/ucbrise/clipper https://www.usenix.org/sites/default/files/conference/protected- files/nsdi17_slides_crankshaw.pdf http://learningsys.org/nips17/assets/slides/clipper-nips17.pdf
  • 13. Clipper Architecture Source : http://clipper.ai/tutorials/basic_concepts/
  • 15. Let us run through an example
  • 16. Model Creation & Persistence Clipper Installation Starting Cluster Model Linking Model Serving
  • 17. pip install clipper_admin Model Creation & Persistence Clipper Installation Starting Cluster Model Linking Model Serving
  • 18. Clipper Installation Starting Cluster Model Load & Deployment Model Linking Model Serving
  • 21. Models Support in Clipper
  • 22. Clipper Models support • Clipper provides the following deployer modules: • Arbitrary Python functions • PySpark Models • PyTorch Models • Tensorflow Models • MXNet Models • PyTorch Models exported as ONNX file with Caffe2 Serving Backend
  • 24.
  • 25.
  • 26. Ray Serve: A serving system for any scale Source: https://risecamp.berkeley.edu/