Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Serve and Scale ML Models ( Low latency prediction systems) at Scale
1. Serve ML Models (low-latency prediction
systems) at scale in the Cloud and at the Edge
Srininivasa Rao Aravilli
Senior Engineering Manager
Cisco Systems
Aravilli
2. About me
Name : Srinivasa Rao Aravilli
Experience : 23 years ( wish my age now J )
Interests : Distributed Computing, AI/ML, Security and Cloud
Patent : Reinforcement Learning based software recommendations for network devices
Papers : Arxiv: VEDAR (Anomaly Detection ), Advaita ( Bug Duplicity Detection System)
SOA Journal: Various papers related to SOAP, UDDI, JAX-RPC …
Speakers in various conferences : AI/ML Talks
Coach/Mentor : Advanced Certification in Machine Learning and Cloud -
Course from : IIT Madras and upGrad
3. Advaita – Flow Diagram/ML Pipeline (Offline /Online Mode)
Offline - detecting duplicates for list of new bugs which are already filed for a given product
Online - detecting duplicates while filing a new bug in the bugs systems
New
Bug/Bugs
Feature
ExtractionBugs
5ML
Model
Preprocessing
1 2
3
Preprocessing
Feature Extraction
4
Probable Duplicate
Bugs
6
4. Use case : Bug Duplicity Detection System
PredictionsML ModelData Set
§ Open Source Systems Bugs
§ Number of Bugs ( Firefox) =
~37,000
§ Framework : XGBoost
§ Classification : Binary
§ Features :
Syntax, Semantic, Edit
Distances, word embeddings,
fast-text
How to serve the predictions
at scale with low latency ?
New Bugs ( for Online )
Existing Bugs ( Batch )
5. One of the possible solution…..
Bug Duplicate
Dection System
(XgBoost)
Network Failure
(Spark)
PII
MXNet
New System
Bug Duplicate
Serving System
(XgBoost)
N/F Server
System
(Spark)
PII
Serving System
MXNet
Phishing
Serving System
Phishing
(Scikit)
Business APP
????
6. Use case : Phishing websites – Detection
PredictionsML ModelData Set
§ Phishing Websites Data Set
§ Data Set size ~2500
§ Number of attributes = 30
§ Classification = Binary
https://archive.ics.uci.edu/ml/da
tasets/Phishing+Websites
§ Framework : Scikit Learn
§ Classifier = Random Forrest
§ Features : 30
§ Model Persistence :
Joblib or Pickle
https://github.com/aravilli/Med
ha-AI/blob/master/Phishing-
RF.ipynb
How to serve the model
predictions at scale with low
latency?
7. Use case : Device Failures Detection
PredictionsML ModelData Set
§ Syslogs and Config Files
§ Billons of historic syslogs
§ Framework : Spark ML
§ Unsupervised Learning
(Clustering), Association
Mining
How to serve the predictions
at scale with low latency at
the edge?
9. APTA:Personally identifiable information - Detection
PredictionsML Model
§ Framework : MXNet
§ Classification : Multi Class
How to serve the predictions
at scale with low latency at
the edge?
Streaming and Batch
Data Set
§ SQL Files
§ Documents
10. Challenges to serve these models ….
• Building & maintaining separate severing systems for each framework is
expensive and maintenance
Bug Duplicate
Dection System
(XgBoost)
Network Failure
(Spark)
PII
MXNet
Multiple Models
Bug Duplicate
Serving System
(XgBoost)
N/F Server
System
(Spark)
PII
Serving System
MXNet
Phishing
Serving System
Phishing
(Scikit)
Business APP
11. Challenges to serve these models ….
• Building and Serving pre-materialized predictions have significant
computation, space costs , costly updates and may not possible in all use
cases
12. Clipper (A low-latency prediction-serving system )
Developed by riselab @ US Berkeley
Github : https://github.com/ucbrise/clipper
https://www.usenix.org/sites/default/files/conference/protected-
files/nsdi17_slides_crankshaw.pdf
http://learningsys.org/nips17/assets/slides/clipper-nips17.pdf