SlideShare a Scribd company logo

AZConf 2023 - Considerations for LLMOps: Running LLMs in production

With the recent explosion in development and interest in large language, vision and speech models, it has become apparent that running large models in production will be a key driver in enterprise adoption of ML. Traditional MLOps, i.e. running machine learning models in production, already has so many variabilities to address starting from data integrity, data drift and model optimization. Running a large model (language or vision) in production keeping in mind business requirements is different altogether. In this talk, I will try to explain the general framework for LLMOps and certain considerations while designing a system for inferencing a large model. This talk will be covered in sub-topics: 1. Model Optimization 2. Model fine-tuning 3. Model Editing 4. Model Serving and deployment 5. Model metrics monitoring 6. Embedding and artifact management In each sub-topic, a brief understanding of the current open-source tool sets will also be mentioned so that tool-chain selection is a bit easier.

1 of 16
Download to read offline
Cloud & AI
Conference 2023
Asia's Largest
17 - 18, November 2023
IIT Madras Research Park, Chennai
Saradindu Sengupta
Considerations for LLMOps: Running LLMs in production
Senior ML Engineer @Nunam
Where I work on building learning systems to forecast health and failure of Li-ion
batteries.
Why such a rush ?
Image Source: WandB Article
Release of ChatGPT in 2022
What is not there in MLOps ? Why Another Ops
Image Source: Kubeflow
Overview of an ML System
Overview of an LLM System

Recommended

Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyDebmalya Biswas
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMsLoic Merckel
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!taozen
 
Understanding LLMOps-Large Language Model Operations
Understanding LLMOps-Large Language Model OperationsUnderstanding LLMOps-Large Language Model Operations
Understanding LLMOps-Large Language Model OperationsMy Gen Tec
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 

More Related Content

What's hot

The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & ConcernsAjitesh Kumar
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023CoriFaklaris1
 
Generative AI and law.pptx
Generative AI and law.pptxGenerative AI and law.pptx
Generative AI and law.pptxChris Marsden
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scaleMaxim Salnikov
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesDianaGray10
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfAnastasiaSteele10
 
Generative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdfGenerative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdfSaeed Al Dhaheri
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfDung Hoang
 
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...DianaGray10
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfLiming Zhu
 

What's hot (20)

The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & Concerns
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Generative models
Generative modelsGenerative models
Generative models
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
 
Generative AI and law.pptx
Generative AI and law.pptxGenerative AI and law.pptx
Generative AI and law.pptx
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
Generative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdfGenerative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdf
 
Webinar on ChatGPT.pptx
Webinar on ChatGPT.pptxWebinar on ChatGPT.pptx
Webinar on ChatGPT.pptx
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
 
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AI
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Generative AI
Generative AIGenerative AI
Generative AI
 

Similar to AZConf 2023 - Considerations for LLMOps: Running LLMs in production

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTechgeetachauhan
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
Accelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google CloudAccelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google CloudSamantha Guerriero
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
 
Cheapest Analysis Essay
Cheapest Analysis EssayCheapest Analysis Essay
Cheapest Analysis EssayGloria Moore
 
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021StreamNative
 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobileAnirudh Koul
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Intel® Software
 
End of a trend
End of a trendEnd of a trend
End of a trendmml2000
 
1 SDEV 460 – Homework 4 Input Validation and Busine
1  SDEV 460 – Homework 4 Input Validation and Busine1  SDEV 460 – Homework 4 Input Validation and Busine
1 SDEV 460 – Homework 4 Input Validation and BusineVannaJoy20
 
IRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance AnalysisIRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance AnalysisIRJET Journal
 
Challenges in Embedded Development
Challenges in Embedded DevelopmentChallenges in Embedded Development
Challenges in Embedded DevelopmentSQABD
 

Similar to AZConf 2023 - Considerations for LLMOps: Running LLMs in production (20)

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
COBOL to Apache Spark
COBOL to Apache SparkCOBOL to Apache Spark
COBOL to Apache Spark
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Accelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google CloudAccelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google Cloud
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 
Isat06 Rev2
Isat06 Rev2Isat06 Rev2
Isat06 Rev2
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Higher Homework
Higher HomeworkHigher Homework
Higher Homework
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Cheapest Analysis Essay
Cheapest Analysis EssayCheapest Analysis Essay
Cheapest Analysis Essay
 
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobile
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
 
End of a trend
End of a trendEnd of a trend
End of a trend
 
1 SDEV 460 – Homework 4 Input Validation and Busine
1  SDEV 460 – Homework 4 Input Validation and Busine1  SDEV 460 – Homework 4 Input Validation and Busine
1 SDEV 460 – Homework 4 Input Validation and Busine
 
IRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance AnalysisIRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance Analysis
 
Challenges in Embedded Development
Challenges in Embedded DevelopmentChallenges in Embedded Development
Challenges in Embedded Development
 

More from SARADINDU SENGUPTA

Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSolar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSARADINDU SENGUPTA
 
An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...SARADINDU SENGUPTA
 
Pydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn somethingPydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn somethingSARADINDU SENGUPTA
 
GDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionGDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionSARADINDU SENGUPTA
 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...SARADINDU SENGUPTA
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine LearningSARADINDU SENGUPTA
 

More from SARADINDU SENGUPTA (6)

Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSolar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
 
An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...
 
Pydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn somethingPydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn something
 
GDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionGDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in production
 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
 

Recently uploaded

Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxJose Briones
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdfdigimartfamily
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxVighnesh Shashtri
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Cyber Security Experts
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxHizkiaJastis
 

Recently uploaded (18)

Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptx
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdf
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptx
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptx
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 

AZConf 2023 - Considerations for LLMOps: Running LLMs in production

  • 1. Cloud & AI Conference 2023 Asia's Largest 17 - 18, November 2023 IIT Madras Research Park, Chennai
  • 2. Saradindu Sengupta Considerations for LLMOps: Running LLMs in production Senior ML Engineer @Nunam Where I work on building learning systems to forecast health and failure of Li-ion batteries.
  • 3. Why such a rush ? Image Source: WandB Article Release of ChatGPT in 2022
  • 4. What is not there in MLOps ? Why Another Ops Image Source: Kubeflow
  • 5. Overview of an ML System
  • 6. Overview of an LLM System
  • 7. What are the primary challenges ? Building deterministic systems with stochastic algorithm Ambiguity Inconsistency Prompt Generation Versioning Evaluation Optimization Cost Latency Known Issues Unknown Issues
  • 8. Ambiguity and Inconsistency Any deterministic downstream application expects output in a certain format. Consider the case for a weather application using weather API. The responses are non-ambiguous in nature such that the format of the response doesn’t change with same given input. On a different use-case, where scoring an essay is the primary task. Solution 1. Engineering deterministic response; OpenAI Cookbook 2. Changing design mindset to accommodate the stochastic nature
  • 9. Cost Prompt Engineering Prompt: 10k tokens ($0.06/1k tokens) Output: 200 tokens ($0.12/1k tokens) Evaluate on 20 examples Experiment with 25 different versions of prompts Cost: $300 - One time Inferencing With GPT-4 Input Token: 10K Output Token: 200 Cost: $0.624 / inference With 1 million inference / day: $624,000 / day With GPT-3.5-turbo Input Token: 4K Output Token: 4K Cost: $0.004 / inference With 1 million inference / day: $4,000 / day Prompt Engineering Cost Inference Cost Source: Inferencing benchmark
  • 10. Latency 1. No SLA [Ref: Running into a Fire: OpenAI Has No SLA (aragonresearch.com)] a. Huge networking overhead b. Exponentially volatile development speed 2. No clear indication of latency due to model size, networking or engineering overhead a. With newer bigger models model size would be impact factor for latency b. Networking and engineering overhead will be easier with newer releases 3. Difficult to estimate cost and latency for LLMs a. Build vs buy estimates are tricky and outdated very quickly b. Open-source vs Proprietary model comparison for enterprise use case are difficult Latency for short token size: Input token: 51 Output token: 1 Latency for GPT-3.5-turbo: 500 ms Input Token Size Output Token Size Latency (Sec) for 90th Percentile 51 1 0.75 232 1 0.64 228 26 1.62 Source: Inferencing benchmark
  • 11. Prompting vs Fine-tuning 3 main Factor to consider 1. Availability of data a. It is not straightforward to estimate b. For a few it is better to stick to prompts 2. Performance increase 3. Cost reduction
  • 12. Prompting vs Fine-tuning 1. A prompt is approximately 100 examples. 2. As the number of examples are increased, fine-tuning will always give better result. Although the performance gain will saturate Why finetune anyways: 1. With more data, model performance will always be better 2. Will be cheaper to run a. Reducing 1K input token in GPT-3.5-turbo will save $2000 on 1 million inferences
  • 13. Embeddings and Vector Databases The cost for embeddings using the smaller model text-embedding-ada-002 is $0.0004/1k tokens. If each item averages 250 tokens (187 words), this pricing means $1 for every 10k items or $100 for 1 million items. It is still cheaper 1. Only need to generate once 2. Easy to generate embeddings in real-time for queries Image Source: Andreessen Horowitz
  • 14. References 1. MLOps guide (huyenchip.com) 2. How many data points is a prompt worth [How Many Data Points is a Prompt Worth? | Abstract (arxiv.org)] 3. OpenAI GPT-3 Text Embeddings - Really a new state-of-the-art in dense text embeddings? | by Nils Reimers | Medium 4. llama-chat [randaller/llama-chat: Chat with Meta's LLaMA models at home made easy (github.com)] 5. Understanding LLMOps: Large Language Model Operations - Weights & Biases (wandb.ai) 6. OpenAI Cookbook [openai/openai-cookbook: Examples and guides for using the OpenAI API (github.com)] Image Source: Andreessen Horowitz