SlideShare a Scribd company logo
1 of 16
Download to read offline
Cloud & AI
Conference 2023
Asia's Largest
17 - 18, November 2023
IIT Madras Research Park, Chennai
Saradindu Sengupta
Considerations for LLMOps: Running LLMs in production
Senior ML Engineer @Nunam
Where I work on building learning systems to forecast health and failure of Li-ion
batteries.
Why such a rush ?
Image Source: WandB Article
Release of ChatGPT in 2022
What is not there in MLOps ? Why Another Ops
Image Source: Kubeflow
Overview of an ML System
Overview of an LLM System
What are the primary challenges ?
Building deterministic systems with
stochastic algorithm
Ambiguity Inconsistency
Prompt Generation
Versioning Evaluation Optimization
Cost
Latency
Known Issues
Unknown
Issues
Ambiguity and Inconsistency
Any deterministic downstream application expects output in a certain format.
Consider the case for a weather application using weather API. The responses are non-ambiguous in nature such that the format of the response doesnโ€™t
change with same given input.
On a different use-case, where scoring an essay is the primary task.
Solution
1. Engineering deterministic response;
OpenAI Cookbook
2. Changing design mindset to accommodate the stochastic nature
Cost
Prompt Engineering
Prompt: 10k tokens ($0.06/1k tokens)
Output: 200 tokens ($0.12/1k tokens)
Evaluate on 20 examples
Experiment with 25 different versions of prompts
Cost: $300 - One time
Inferencing
With GPT-4
Input Token: 10K
Output Token: 200
Cost: $0.624 / inference
With 1 million inference / day: $624,000 / day
With GPT-3.5-turbo
Input Token: 4K
Output Token: 4K
Cost: $0.004 / inference
With 1 million inference / day: $4,000 / day
Prompt
Engineering Cost
Inference Cost
Source: Inferencing benchmark
Latency
1. No SLA [Ref: Running into a Fire: OpenAI Has No SLA (aragonresearch.com)]
a. Huge networking overhead
b. Exponentially volatile development speed
2. No clear indication of latency due to model size, networking or engineering overhead
a. With newer bigger models model size would be impact factor for latency
b. Networking and engineering overhead will be easier with newer releases
3. Difficult to estimate cost and latency for LLMs
a. Build vs buy estimates are tricky and outdated very quickly
b. Open-source vs Proprietary model comparison for enterprise use case are difficult
Latency for short token size:
Input token: 51
Output token: 1
Latency for GPT-3.5-turbo: 500 ms
Input Token Size Output Token Size Latency (Sec) for 90th Percentile
51 1 0.75
232 1 0.64
228 26 1.62
Source: Inferencing benchmark
Prompting vs Fine-tuning
3 main Factor to consider
1. Availability of data
a. It is not straightforward to estimate
b. For a few it is better to stick to prompts
2. Performance increase
3. Cost reduction
Prompting vs Fine-tuning
1. A prompt is approximately 100 examples.
2. As the number of examples are increased, fine-tuning will always give better result. Although the performance gain
will saturate
Why finetune anyways:
1. With more data, model performance will always be better
2. Will be cheaper to run
a. Reducing 1K input token in GPT-3.5-turbo
will save $2000 on 1 million inferences
Embeddings and Vector Databases
The cost for embeddings using the smaller model text-embedding-ada-002 is $0.0004/1k tokens. If each item averages
250 tokens (187 words), this pricing means $1 for every 10k items or $100 for 1 million items.
It is still cheaper
1. Only need to generate once
2. Easy to generate embeddings in real-time for queries
Image Source: Andreessen Horowitz
References
1. MLOps guide (huyenchip.com)
2. How many data points is a prompt worth [How Many Data Points is a Prompt Worth? | Abstract (arxiv.org)]
3. OpenAI GPT-3 Text Embeddings - Really a new state-of-the-art in dense text embeddings? | by Nils Reimers |
Medium
4. llama-chat [randaller/llama-chat: Chat with Meta's LLaMA models at home made easy (github.com)]
5. Understanding LLMOps: Large Language Model Operations - Weights & Biases (wandb.ai)
6. OpenAI Cookbook [openai/openai-cookbook: Examples and guides for using the OpenAI API (github.com)]
Image Source: Andreessen Horowitz
Thank You
LinkedIn
Saradindu Sengupta
Twitter GitHub
Platinum Partner
Gold Partner
Silver Partner

More Related Content

What's hot

AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
ย 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
ย 
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...DianaGray10
ย 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesDianaGray10
ย 
AIDRC_Generative_AI_TL_v5.pdf
AIDRC_Generative_AI_TL_v5.pdfAIDRC_Generative_AI_TL_v5.pdf
AIDRC_Generative_AI_TL_v5.pdfThierry Lestable
ย 
Responsible Generative AI
Responsible Generative AIResponsible Generative AI
Responsible Generative AICMassociates
ย 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
ย 
ChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessDion Hinchcliffe
ย 
Generative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGenerative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGene Leybzon
ย 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
ย 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGene Leybzon
ย 
An Introduction To Using ChatGPT For Business
An Introduction To Using ChatGPT For BusinessAn Introduction To Using ChatGPT For Business
An Introduction To Using ChatGPT For BusinessPaul Nguyen
ย 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
ย 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyDebmalya Biswas
ย 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersSteven Van Vaerenbergh
ย 
Generative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdfGenerative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdfSaeed Al Dhaheri
ย 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfDung Hoang
ย 

What's hot (20)

AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
ย 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
ย 
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
AI and ML Series - Leveraging Generative AI and LLMs Using the UiPath Platfor...
ย 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
ย 
AIDRC_Generative_AI_TL_v5.pdf
AIDRC_Generative_AI_TL_v5.pdfAIDRC_Generative_AI_TL_v5.pdf
AIDRC_Generative_AI_TL_v5.pdf
ย 
Responsible Generative AI
Responsible Generative AIResponsible Generative AI
Responsible Generative AI
ย 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
ย 
ChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for Business
ย 
Generative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGenerative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second Session
ย 
ChatGPT Use- Cases
ChatGPT Use- Cases ChatGPT Use- Cases
ChatGPT Use- Cases
ย 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ย 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
ย 
An Introduction To Using ChatGPT For Business
An Introduction To Using ChatGPT For BusinessAn Introduction To Using ChatGPT For Business
An Introduction To Using ChatGPT For Business
ย 
Content In The Age of AI
Content In The Age of AIContent In The Age of AI
Content In The Age of AI
ย 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
ย 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with Transparency
ย 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
ย 
Generative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdfGenerative AI - Responsible Path Forward.pdf
Generative AI - Responsible Path Forward.pdf
ย 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
ย 
Generative AI
Generative AIGenerative AI
Generative AI
ย 

Similar to AZConf 2023 - Considerations for LLMOps: Running LLMs in production

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
ย 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
ย 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTechgeetachauhan
ย 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intelยฎ Software
ย 
Accelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google CloudAccelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google CloudSamantha Guerriero
ย 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a DetailHostedbyConfluent
ย 
Isat06 Rev2
Isat06 Rev2Isat06 Rev2
Isat06 Rev2Rajesh Gupta
ย 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
ย 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
ย 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
ย 
AWS_Meetup_BLR_July_22_Social.pdf
AWS_Meetup_BLR_July_22_Social.pdfAWS_Meetup_BLR_July_22_Social.pdf
AWS_Meetup_BLR_July_22_Social.pdfAyyanar Jeyakrishnan
ย 
Higher Homework
Higher HomeworkHigher Homework
Higher Homeworkmrsmackenzie
ย 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
ย 
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021StreamNative
ย 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobileAnirudh Koul
ย 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Intelยฎ Software
ย 
End of a trend
End of a trendEnd of a trend
End of a trendmml2000
ย 
1 SDEV 460 โ€“ Homework 4 Input Validation and Busine
1  SDEV 460 โ€“ Homework 4 Input Validation and Busine1  SDEV 460 โ€“ Homework 4 Input Validation and Busine
1 SDEV 460 โ€“ Homework 4 Input Validation and BusineVannaJoy20
ย 
IRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance AnalysisIRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance AnalysisIRJET Journal
ย 

Similar to AZConf 2023 - Considerations for LLMOps: Running LLMs in production (20)

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
ย 
COBOL to Apache Spark
COBOL to Apache SparkCOBOL to Apache Spark
COBOL to Apache Spark
ย 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
ย 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
ย 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
ย 
Accelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google CloudAccelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google Cloud
ย 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
ย 
Isat06 Rev2
Isat06 Rev2Isat06 Rev2
Isat06 Rev2
ย 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
ย 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
ย 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
ย 
AWS_Meetup_BLR_July_22_Social.pdf
AWS_Meetup_BLR_July_22_Social.pdfAWS_Meetup_BLR_July_22_Social.pdf
AWS_Meetup_BLR_July_22_Social.pdf
ย 
Higher Homework
Higher HomeworkHigher Homework
Higher Homework
ย 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
ย 
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021
ย 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobile
ย 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
ย 
End of a trend
End of a trendEnd of a trend
End of a trend
ย 
1 SDEV 460 โ€“ Homework 4 Input Validation and Busine
1  SDEV 460 โ€“ Homework 4 Input Validation and Busine1  SDEV 460 โ€“ Homework 4 Input Validation and Busine
1 SDEV 460 โ€“ Homework 4 Input Validation and Busine
ย 
IRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance AnalysisIRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance Analysis
ย 

More from SARADINDU SENGUPTA

Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSolar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSARADINDU SENGUPTA
ย 
An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...SARADINDU SENGUPTA
ย 
Pydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn somethingPydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn somethingSARADINDU SENGUPTA
ย 
GDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionGDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionSARADINDU SENGUPTA
ย 
PyData Global 2022 - Lightning Talk - Bessel's Correction
PyData Global 2022 - Lightning Talk - Bessel's CorrectionPyData Global 2022 - Lightning Talk - Bessel's Correction
PyData Global 2022 - Lightning Talk - Bessel's CorrectionSARADINDU SENGUPTA
ย 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...SARADINDU SENGUPTA
ย 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine LearningSARADINDU SENGUPTA
ย 

More from SARADINDU SENGUPTA (7)

Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSolar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
ย 
An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...
ย 
Pydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn somethingPydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn something
ย 
GDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionGDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in production
ย 
PyData Global 2022 - Lightning Talk - Bessel's Correction
PyData Global 2022 - Lightning Talk - Bessel's CorrectionPyData Global 2022 - Lightning Talk - Bessel's Correction
PyData Global 2022 - Lightning Talk - Bessel's Correction
ย 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...
ย 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
ย 

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
ย 
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
ย 
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
ย 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
ย 
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Callshivangimorya083
ย 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
ย 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
ย 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
ย 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
ย 
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...soniya singh
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
ย 
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...Florian Roscheck
ย 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
ย 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
ย 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
ย 
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”soniya singh
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
ย 

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
ย 
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
ย 
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
ย 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
ย 
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
ย 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
ย 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
ย 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
ย 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
ย 
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
ย 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
ย 
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
ย 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
ย 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
ย 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
ย 
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
ย 

AZConf 2023 - Considerations for LLMOps: Running LLMs in production

  • 1. Cloud & AI Conference 2023 Asia's Largest 17 - 18, November 2023 IIT Madras Research Park, Chennai
  • 2. Saradindu Sengupta Considerations for LLMOps: Running LLMs in production Senior ML Engineer @Nunam Where I work on building learning systems to forecast health and failure of Li-ion batteries.
  • 3. Why such a rush ? Image Source: WandB Article Release of ChatGPT in 2022
  • 4. What is not there in MLOps ? Why Another Ops Image Source: Kubeflow
  • 5. Overview of an ML System
  • 6. Overview of an LLM System
  • 7. What are the primary challenges ? Building deterministic systems with stochastic algorithm Ambiguity Inconsistency Prompt Generation Versioning Evaluation Optimization Cost Latency Known Issues Unknown Issues
  • 8. Ambiguity and Inconsistency Any deterministic downstream application expects output in a certain format. Consider the case for a weather application using weather API. The responses are non-ambiguous in nature such that the format of the response doesnโ€™t change with same given input. On a different use-case, where scoring an essay is the primary task. Solution 1. Engineering deterministic response; OpenAI Cookbook 2. Changing design mindset to accommodate the stochastic nature
  • 9. Cost Prompt Engineering Prompt: 10k tokens ($0.06/1k tokens) Output: 200 tokens ($0.12/1k tokens) Evaluate on 20 examples Experiment with 25 different versions of prompts Cost: $300 - One time Inferencing With GPT-4 Input Token: 10K Output Token: 200 Cost: $0.624 / inference With 1 million inference / day: $624,000 / day With GPT-3.5-turbo Input Token: 4K Output Token: 4K Cost: $0.004 / inference With 1 million inference / day: $4,000 / day Prompt Engineering Cost Inference Cost Source: Inferencing benchmark
  • 10. Latency 1. No SLA [Ref: Running into a Fire: OpenAI Has No SLA (aragonresearch.com)] a. Huge networking overhead b. Exponentially volatile development speed 2. No clear indication of latency due to model size, networking or engineering overhead a. With newer bigger models model size would be impact factor for latency b. Networking and engineering overhead will be easier with newer releases 3. Difficult to estimate cost and latency for LLMs a. Build vs buy estimates are tricky and outdated very quickly b. Open-source vs Proprietary model comparison for enterprise use case are difficult Latency for short token size: Input token: 51 Output token: 1 Latency for GPT-3.5-turbo: 500 ms Input Token Size Output Token Size Latency (Sec) for 90th Percentile 51 1 0.75 232 1 0.64 228 26 1.62 Source: Inferencing benchmark
  • 11. Prompting vs Fine-tuning 3 main Factor to consider 1. Availability of data a. It is not straightforward to estimate b. For a few it is better to stick to prompts 2. Performance increase 3. Cost reduction
  • 12. Prompting vs Fine-tuning 1. A prompt is approximately 100 examples. 2. As the number of examples are increased, fine-tuning will always give better result. Although the performance gain will saturate Why finetune anyways: 1. With more data, model performance will always be better 2. Will be cheaper to run a. Reducing 1K input token in GPT-3.5-turbo will save $2000 on 1 million inferences
  • 13. Embeddings and Vector Databases The cost for embeddings using the smaller model text-embedding-ada-002 is $0.0004/1k tokens. If each item averages 250 tokens (187 words), this pricing means $1 for every 10k items or $100 for 1 million items. It is still cheaper 1. Only need to generate once 2. Easy to generate embeddings in real-time for queries Image Source: Andreessen Horowitz
  • 14. References 1. MLOps guide (huyenchip.com) 2. How many data points is a prompt worth [How Many Data Points is a Prompt Worth? | Abstract (arxiv.org)] 3. OpenAI GPT-3 Text Embeddings - Really a new state-of-the-art in dense text embeddings? | by Nils Reimers | Medium 4. llama-chat [randaller/llama-chat: Chat with Meta's LLaMA models at home made easy (github.com)] 5. Understanding LLMOps: Large Language Model Operations - Weights & Biases (wandb.ai) 6. OpenAI Cookbook [openai/openai-cookbook: Examples and guides for using the OpenAI API (github.com)] Image Source: Andreessen Horowitz