SlideShare a Scribd company logo
1 of 15
Notes on Deploying Machine-Learning Models at
Scale
Or, What they might not teach you in Data Science school!
Deep Kayal
11.10.2019 @
“The sexiest job of the 21st century” [1]
[1] https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
So we’re sorted..or are we?
Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper
argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical
debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. [2]
--- Hidden Technical Debt in Machine Learning Systems, Google, NeurIPS 2015.
[2] https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
How did we get here?
[3] https://www.bastagroup.nl/wp-content/uploads/2019/01/the-state-of-machine-learning-adoption-in-the-enterprise.pdf
The need for engineering in Data Science [4]
[4] https://www.oreilly.com/radar/what-are-machine-learning-engineers/
[5] https://www.oreilly.com/radar/we-need-to-build-machine-learning-tools-to-augment-machine-learning-engineers/
[6] https://www.youtube.com/watch?v=mJHvE2JLN3Q
Making useful models useful
- Let’s say you’re tasked with making a text classifier. You design a classifier..
- You fit it to your labeled dataset
- You check your cross-validation performance; it is very high, so you celebrate!
- What’s next?
Text TFIDF SVM LabelStem / Lemmatize
Why not do a small scale sanity check?
- Not an A/B test, but a sanity check
- Offline cross-validation performance is a great start
- But the first real acid test is to take completely unseen real data..
- ..pass it through the trained model..
- ..and get it fact-checked by an expert
- Example: a classifier to check if a paragraph is talking about a company’s annual report or not
- Expert maybe an risk analyst or an underwriter
- You can do this using something as simple as Excel:
- Expert select data points to validate
- You pass them through model
- You write data and labels to excel
- Expert checks if they are correct or wrong
- But this is uncontrolled, hand-held and chaotic
A better alternative
Pipeline
API
HTTP Server
VM
DB
Enter data
Predicted Label
Verify
Y N
Simple form
Expert
[7] https://towardsdatascience.com/publishing-machine-learning-api-with-python-flask-98be46fb2440
[8] https://pythonspot.com/flask-web-forms/
And with that, you’re on our way!
- Now you have re-usable template of a minimalist API you can build wrapping around your
intelligent model
- The web form and the objects exchanged over REST calls might change
- But the core can stay the same
- With our sanity check done, let’s move forward and deploy our model for incoming data
Deploying models at scale: A blueprint
.
.
.
Services
Load
Balancer
Registry
External
gateway
API
NGINX
Consul
Discover
Query/Result
Query/Result
Query/Result
Query/Result
Register
Service
Query/Result
Infrastructure as code
State
information
Types of deployment
- You’ve deployed a scalable API for streaming data
- This handles incoming requests
- And can be scaled horizontally
- But what about historical data, which needs to be processed?
- You’ve deployed a scalable API for streaming data
- A stream processor
- And a batch processor
- We can use the stream API as a batch processor, or…
Something for batch processing
Pipeline
Serialize
# model -> sklearn model
# sc -> spark context
spark_model = sc.broadcast(model)
def udf_predict(feature):
return spark_model.value.predict_proba(feature)
This way, you can leverage
Spark’s internal scalability
and fault tolerance aspects!
[9] https://towardsdatascience.com/deploy-a-python-model-more-efficiently-over-spark-497fc03e0a8d
Recap
- You have developed a machine learning model
- You have wrapped it up as an API
- You’ve tested it with real users
- Now, using best practice blueprints from software engineering, you can now deploy the model
seamlessly on forward-flow content
- Additionally, you’ve also managed to create a smart batch processor, by wrapping your model with a
Spark UDF
- With these two services, you can make a MLaaS for almost any project and task!
- Plus, all of this code (and effort) is reusable for new tasks. Just replace the model with a new one!
Final words
- Treating ML-software as any other software is our best bet moving forward
- But it’s hard!
- ML services are hard to unit test, for one
- It’s non-deterministic
- It produces a lot of binary artifacts to store and maintain
- …
- Consider metamorphic testing [10]
- Use data science version control systems such as DVC or MLFlow [11, 12]
- Version data, then you can recreate a model at any point in history
- Or version the model, to rollback easily
- Or better yet, version both!
[10] https://en.wikipedia.org/wiki/Metamorphic_testing
[11] https://dvc.org/
[12] https://mlflow.org/
Deep Kayal
Reach out, if there’s anything I can help
with: 
deep.kayal@pm.me
https://www.linkedin.com/in/subhradeepk/

More Related Content

What's hot

Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...
Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...
Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...Fwdays
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
 
Ads team12 final_project_presentation
Ads team12 final_project_presentationAds team12 final_project_presentation
Ads team12 final_project_presentationPriti Agarwal
 
Grokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking VN
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflowDatabricks
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsJohann Schleier-Smith
 
Resume ricky jairath
Resume   ricky jairathResume   ricky jairath
Resume ricky jairathRICKY JAIRATH
 
Ml ops deployment choices
Ml ops   deployment choicesMl ops   deployment choices
Ml ops deployment choicesAvinash Patil
 
Gautham Pai K - Resume
Gautham Pai K - ResumeGautham Pai K - Resume
Gautham Pai K - ResumeGautham Pai
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkDatabricks
 
VSSML18. Practical Workshops
VSSML18. Practical WorkshopsVSSML18. Practical Workshops
VSSML18. Practical WorkshopsBigML, Inc
 
Reproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflowReproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflowDatabricks
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
MLSEV. BigML Workshop I
MLSEV. BigML Workshop IMLSEV. BigML Workshop I
MLSEV. BigML Workshop IBigML, Inc
 
Automate your Machine Learning
Automate your Machine LearningAutomate your Machine Learning
Automate your Machine LearningAjit Ananthram
 
Resume (Rohan Mehta)
Resume (Rohan Mehta)Resume (Rohan Mehta)
Resume (Rohan Mehta)Rohan Mehta
 

What's hot (20)

Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...
Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...
Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
aymen cv
aymen cvaymen cv
aymen cv
 
Ads team12 final_project_presentation
Ads team12 final_project_presentationAds team12 final_project_presentation
Ads team12 final_project_presentation
 
Grokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking: Data Engineering Course
Grokking: Data Engineering Course
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
 
Resume ricky jairath
Resume   ricky jairathResume   ricky jairath
Resume ricky jairath
 
Ml ops deployment choices
Ml ops   deployment choicesMl ops   deployment choices
Ml ops deployment choices
 
Gautham Pai K - Resume
Gautham Pai K - ResumeGautham Pai K - Resume
Gautham Pai K - Resume
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache Spark
 
VSSML18. Practical Workshops
VSSML18. Practical WorkshopsVSSML18. Practical Workshops
VSSML18. Practical Workshops
 
Joshiprasad
JoshiprasadJoshiprasad
Joshiprasad
 
Reproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflowReproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflow
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
MLSEV. BigML Workshop I
MLSEV. BigML Workshop IMLSEV. BigML Workshop I
MLSEV. BigML Workshop I
 
Hitesh laware resume
Hitesh laware resumeHitesh laware resume
Hitesh laware resume
 
Automate your Machine Learning
Automate your Machine LearningAutomate your Machine Learning
Automate your Machine Learning
 
Resume (Rohan Mehta)
Resume (Rohan Mehta)Resume (Rohan Mehta)
Resume (Rohan Mehta)
 
Deepali's resume
Deepali's resumeDeepali's resume
Deepali's resume
 

Similar to Notes on Deploying Machine-learning Models at Scale

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleAlex Housley
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in productionAntoine Sauray
 
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDMACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDIRJET Journal
 
CV_Vasili_Tegza 2G
CV_Vasili_Tegza 2GCV_Vasili_Tegza 2G
CV_Vasili_Tegza 2GVasyl Tegza
 
Software architecture patterns
Software architecture patternsSoftware architecture patterns
Software architecture patternsMd. Sadhan Sarker
 
Machine Learning in the air
Machine Learning in the airMachine Learning in the air
Machine Learning in the airAntoine SAUVAGE
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningLviv Startup Club
 
Mohamed Taman short C.V version v1.0
Mohamed Taman short C.V version v1.0Mohamed Taman short C.V version v1.0
Mohamed Taman short C.V version v1.0Mohamed Taman
 
SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems Cavien Clever
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Gülden Bilgütay
 

Similar to Notes on Deploying Machine-learning Models at Scale (20)

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scale
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Symphony Driver Essay
Symphony Driver EssaySymphony Driver Essay
Symphony Driver Essay
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
 
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDMACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
 
CV_Vasili_Tegza 2G
CV_Vasili_Tegza 2GCV_Vasili_Tegza 2G
CV_Vasili_Tegza 2G
 
Software architecture patterns
Software architecture patternsSoftware architecture patterns
Software architecture patterns
 
Path to continuous delivery
Path to continuous deliveryPath to continuous delivery
Path to continuous delivery
 
kamal.docx
kamal.docxkamal.docx
kamal.docx
 
Machine Learning in the air
Machine Learning in the airMachine Learning in the air
Machine Learning in the air
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
 
Mohamed Taman short C.V version v1.0
Mohamed Taman short C.V version v1.0Mohamed Taman short C.V version v1.0
Mohamed Taman short C.V version v1.0
 
SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 

More from Deep Kayal

State of transformers in Computer Vision
State of transformers in Computer VisionState of transformers in Computer Vision
State of transformers in Computer VisionDeep Kayal
 
Unsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projectionUnsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projectionDeep Kayal
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteDeep Kayal
 
Topic Pages. From articles to answers.
Topic Pages. From articles to answers.Topic Pages. From articles to answers.
Topic Pages. From articles to answers.Deep Kayal
 
A Framework to Automatically Extract Funding Information from Text
A Framework to Automatically Extract Funding Information from TextA Framework to Automatically Extract Funding Information from Text
A Framework to Automatically Extract Funding Information from TextDeep Kayal
 
Large-Scale Data Extraction, Structuring and Matching using Python and Spark
Large-Scale Data Extraction, Structuring and Matching using Python and SparkLarge-Scale Data Extraction, Structuring and Matching using Python and Spark
Large-Scale Data Extraction, Structuring and Matching using Python and SparkDeep Kayal
 

More from Deep Kayal (6)

State of transformers in Computer Vision
State of transformers in Computer VisionState of transformers in Computer Vision
State of transformers in Computer Vision
 
Unsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projectionUnsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projection
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
Topic Pages. From articles to answers.
Topic Pages. From articles to answers.Topic Pages. From articles to answers.
Topic Pages. From articles to answers.
 
A Framework to Automatically Extract Funding Information from Text
A Framework to Automatically Extract Funding Information from TextA Framework to Automatically Extract Funding Information from Text
A Framework to Automatically Extract Funding Information from Text
 
Large-Scale Data Extraction, Structuring and Matching using Python and Spark
Large-Scale Data Extraction, Structuring and Matching using Python and SparkLarge-Scale Data Extraction, Structuring and Matching using Python and Spark
Large-Scale Data Extraction, Structuring and Matching using Python and Spark
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Notes on Deploying Machine-learning Models at Scale

  • 1. Notes on Deploying Machine-Learning Models at Scale Or, What they might not teach you in Data Science school! Deep Kayal 11.10.2019 @
  • 2. “The sexiest job of the 21st century” [1] [1] https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  • 3. So we’re sorted..or are we? Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. [2] --- Hidden Technical Debt in Machine Learning Systems, Google, NeurIPS 2015. [2] https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 4. How did we get here? [3] https://www.bastagroup.nl/wp-content/uploads/2019/01/the-state-of-machine-learning-adoption-in-the-enterprise.pdf
  • 5. The need for engineering in Data Science [4] [4] https://www.oreilly.com/radar/what-are-machine-learning-engineers/ [5] https://www.oreilly.com/radar/we-need-to-build-machine-learning-tools-to-augment-machine-learning-engineers/ [6] https://www.youtube.com/watch?v=mJHvE2JLN3Q
  • 6. Making useful models useful - Let’s say you’re tasked with making a text classifier. You design a classifier.. - You fit it to your labeled dataset - You check your cross-validation performance; it is very high, so you celebrate! - What’s next? Text TFIDF SVM LabelStem / Lemmatize
  • 7. Why not do a small scale sanity check? - Not an A/B test, but a sanity check - Offline cross-validation performance is a great start - But the first real acid test is to take completely unseen real data.. - ..pass it through the trained model.. - ..and get it fact-checked by an expert - Example: a classifier to check if a paragraph is talking about a company’s annual report or not - Expert maybe an risk analyst or an underwriter - You can do this using something as simple as Excel: - Expert select data points to validate - You pass them through model - You write data and labels to excel - Expert checks if they are correct or wrong - But this is uncontrolled, hand-held and chaotic
  • 8. A better alternative Pipeline API HTTP Server VM DB Enter data Predicted Label Verify Y N Simple form Expert [7] https://towardsdatascience.com/publishing-machine-learning-api-with-python-flask-98be46fb2440 [8] https://pythonspot.com/flask-web-forms/
  • 9. And with that, you’re on our way! - Now you have re-usable template of a minimalist API you can build wrapping around your intelligent model - The web form and the objects exchanged over REST calls might change - But the core can stay the same - With our sanity check done, let’s move forward and deploy our model for incoming data
  • 10. Deploying models at scale: A blueprint . . . Services Load Balancer Registry External gateway API NGINX Consul Discover Query/Result Query/Result Query/Result Query/Result Register Service Query/Result Infrastructure as code State information
  • 11. Types of deployment - You’ve deployed a scalable API for streaming data - This handles incoming requests - And can be scaled horizontally - But what about historical data, which needs to be processed? - You’ve deployed a scalable API for streaming data - A stream processor - And a batch processor - We can use the stream API as a batch processor, or…
  • 12. Something for batch processing Pipeline Serialize # model -> sklearn model # sc -> spark context spark_model = sc.broadcast(model) def udf_predict(feature): return spark_model.value.predict_proba(feature) This way, you can leverage Spark’s internal scalability and fault tolerance aspects! [9] https://towardsdatascience.com/deploy-a-python-model-more-efficiently-over-spark-497fc03e0a8d
  • 13. Recap - You have developed a machine learning model - You have wrapped it up as an API - You’ve tested it with real users - Now, using best practice blueprints from software engineering, you can now deploy the model seamlessly on forward-flow content - Additionally, you’ve also managed to create a smart batch processor, by wrapping your model with a Spark UDF - With these two services, you can make a MLaaS for almost any project and task! - Plus, all of this code (and effort) is reusable for new tasks. Just replace the model with a new one!
  • 14. Final words - Treating ML-software as any other software is our best bet moving forward - But it’s hard! - ML services are hard to unit test, for one - It’s non-deterministic - It produces a lot of binary artifacts to store and maintain - … - Consider metamorphic testing [10] - Use data science version control systems such as DVC or MLFlow [11, 12] - Version data, then you can recreate a model at any point in history - Or version the model, to rollback easily - Or better yet, version both! [10] https://en.wikipedia.org/wiki/Metamorphic_testing [11] https://dvc.org/ [12] https://mlflow.org/
  • 15. Deep Kayal Reach out, if there’s anything I can help with:  deep.kayal@pm.me https://www.linkedin.com/in/subhradeepk/