SlideShare a Scribd company logo
1 of 16
Download to read offline
Retrieval Augmented Generation in Practice
Scalable GenAI platforms with k8s, LangChain, HuggingFace and Vector
DBs
Mihai Criveti, Principal Architect, CKA, RHCA III
September 7, 2023
1
Large Language Models and their Limitations
Retrieval Augmented Generation or Conversation with your Documents
2
Introduction
Mihai Criveti, Principal Architect, Platform Engineering
• Responsible for large scale Cloud Native and AI Solutions
• Red Hat Certified Architect III, CKA/CKS/CKAD
• Driving the development of Retrieval Augmentation Generation platforms, and
solutions for Generative AI at IBM that leverage WatsonX, Vector databases,
LangChain, HuggingFace and open source AI models.
Abstract
• Large Language Models: use cases and limitations
• Scaling Large Language Models: Retrieval Augmented Generation
• LLAMA2, HuggingFace TGIS, SentenceTransformers, Python, LangChain, Weaviate,
ChromaDB vector databases, deployment to k8s
3
Large Language Models and
their Limitations
GenAI and Large Language Models Explained
Think of LLMs like mathematical functions, or your phone’s autocomplete
f(x) = x'
• Where the input (x) and the output (x') are strings
A more accurate representation
f(training_data, model_parameters, input_string) = output_string
• training_data represents the data it was trained.
• model_parameters represent things like “temperature”
• input_string is the combination of prompt and context you give to the model. Ex:
“What is Kubernetes” or “Summarize the following document: ${DOCUMENT}”
• the ‘prompt’ is usually an optional instruction like “summarize”, “extract”,
“translate”, “classify” etc. but more complex prompts are usually used. “Be a helpful
assistant that responds to my question.. etc.”
• The function can process a maximum of TOKEN_LIMIT (total input and output),
usually ~4096 tokens (~3000 words in English, fewer in say.. Japanese).
4
What Large Language Models DON’T DO
Learn
A model will not ‘learn’ from interactions (unless specifically trained/fine-tuned).
Remember
A model doesn’t remember previous prompts. In fact, it’s all done with prompt trickery:
previous prompts are injected. The API does a LOT of of filtering and heavy lifting!
Reason
Think of LLMs like your phone’s autocomplete, it doesn’t reason, or do math.
Use your data
LLMs don’t provide responses based on YOUR data (databases or files), unless it’s
include in the training dataset, or the prompt (ex: RAG).
Use the Internet
• A LLM doesn’t have the capacity to ‘search the internet’, or make API calls.
• In fact, a model does not perform any activity other than converting one string of text
into another string of text.
5
GenAI Use Cases
Figure 1: Use Cases
6
Think of adding this to your architecture
7
In fact, even a 9600 baud modem is much faster. Think teletype!
LLMs are really slow
• A WPM = ((BPS / 10) / 5) * 60, a 9600 baud modem will generate 11520 words /
minute.
• At an average 30 tokens / second (20 words) for LLAMA-70B, you’re getting 1200
words / minute!
• This is slower than a punch card reader :-)
LLMs are also expensive to run
• Running your own LLAMA2 70B might cost as much as 20K / month if you’re using a
dedicated GPU instance!
8
Model Limitations in Practice
Latency and Bandwidth: Tokens per second
• Large models (70B) such as LLAMA2 can be painfully slow
• Smaller models (20B, 13B, 7B) are faster, and can perform inference on a cheaper
GPU (less VRAM)
• Even so, models will perform at anywhere between 10 - 100 tokens / second.
Token Limits
• Models have a token limit as well
• Usually 4096 tokens (roughly ~3000 words) of total input and output
9
Getting LLMs to work with our data
Training
• Very Expensive, takes a long time
Fine Tuning
• Expensive, takes considerable time as well, but achievable
Retrieval Augmented Generation
• Insert your data into prompts every time
• Cheap, and can work with vast amounts of data
• While LLMs are SLOW, Vector Databases are FAST!
• Can help overcome model limitations (such as token limits) - as you’re only feeding
‘top search results’ to the LLM, instead of whole documents.
10
Retrieval Augmented
Generation or Conversation with
your Documents
RAG Explained
Figure 3: RAG Explained 11
Loading Documents
12
Scaling factor for RAG
• Vector Database: consider sharding and High Availability
• Fine Tuning: collecting data to be used for fine tuning
• Governance and Model Benchmarking: how are you testing your model performance
over time, with different prompts, one-shot, and various parameters
• Chain of Reasoning and Agents
• Caching embeddings and responses
• Personalization and Conversational Memory Database
• Streaming Responses and optimizing performance. A fine tuned 13B model may
perform better than a poor 70B one!
• Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are
terrible at reasoning and prediction, consider calling other models)
• Fallback techniques: fallback to a different model, or default answers
• API scaling techniques, rate limiting, etc.
• Async, streaming and parallelization, multiprocessing, GPU acceleration (including
embeddings), generating your API using OpenAPI, etc.
13
Contact
This talk can be found on GitHub
• https://github.com/crivetimihai/shipitcon-scaling-retrieval-augmented-generation
Social media
• https://twitter.com/CrivetiMihai - follow for more LLM content
• https://youtube.com/CrivetiMihai - more LLM videos to follow
• https://www.linkedin.com/in/crivetimihai/
14

More Related Content

What's hot

How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdfcaa28steve
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfPo-Chuan Chen
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AIOzgurOscarOzkan
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsAdventureWorld5
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfLiming Zhu
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfStephenAmell4
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!taozen
 

What's hot (20)

How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Generative models
Generative modelsGenerative models
Generative models
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 

Similar to Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s, LangChain, HuggingFace and Vector

10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation OptionsMihai Criveti
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha Talagala
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr TsapFwdays
 
LangChain + Docugami Webinar
LangChain + Docugami WebinarLangChain + Docugami Webinar
LangChain + Docugami WebinarTaqi Jaffri
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfAmazon Web Services
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesJohn Coggeshall
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.javier ramirez
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesAlice Zheng
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordicsjavier ramirez
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfAmazon Web Services
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakejavier ramirez
 

Similar to Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s, LangChain, HuggingFace and Vector (20)

10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
 
LangChain + Docugami Webinar
LangChain + Docugami WebinarLangChain + Docugami Webinar
LangChain + Docugami Webinar
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 Mistakes
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
 
moveMountainIEEE
moveMountainIEEEmoveMountainIEEE
moveMountainIEEE
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lake
 

More from Mihai Criveti

Ansible Workshop for Pythonistas
Ansible Workshop for PythonistasAnsible Workshop for Pythonistas
Ansible Workshop for PythonistasMihai Criveti
 
Mihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
ShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
ShipItCon - Continuous Deployment and Multicloud with Ansible and KubernetesShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
ShipItCon - Continuous Deployment and Multicloud with Ansible and KubernetesMihai Criveti
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleMihai Criveti
 
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShiftKubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShiftMihai Criveti
 
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...Mihai Criveti
 
Kubernetes Story - Day 1: Build and Manage Containers with Podman
Kubernetes Story - Day 1: Build and Manage Containers with PodmanKubernetes Story - Day 1: Build and Manage Containers with Podman
Kubernetes Story - Day 1: Build and Manage Containers with PodmanMihai Criveti
 
Container Technologies and Transformational value
Container Technologies and Transformational valueContainer Technologies and Transformational value
Container Technologies and Transformational valueMihai Criveti
 
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...Mihai Criveti
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...Mihai Criveti
 

More from Mihai Criveti (12)

Ansible Workshop for Pythonistas
Ansible Workshop for PythonistasAnsible Workshop for Pythonistas
Ansible Workshop for Pythonistas
 
Mihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate Everything
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
ShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
ShipItCon - Continuous Deployment and Multicloud with Ansible and KubernetesShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
ShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image Lifecycle
 
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShiftKubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
 
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
 
Kubernetes Story - Day 1: Build and Manage Containers with Podman
Kubernetes Story - Day 1: Build and Manage Containers with PodmanKubernetes Story - Day 1: Build and Manage Containers with Podman
Kubernetes Story - Day 1: Build and Manage Containers with Podman
 
Container Technologies and Transformational value
Container Technologies and Transformational valueContainer Technologies and Transformational value
Container Technologies and Transformational value
 
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s, LangChain, HuggingFace and Vector

  • 1. Retrieval Augmented Generation in Practice Scalable GenAI platforms with k8s, LangChain, HuggingFace and Vector DBs Mihai Criveti, Principal Architect, CKA, RHCA III September 7, 2023 1
  • 2. Large Language Models and their Limitations Retrieval Augmented Generation or Conversation with your Documents 2
  • 3. Introduction Mihai Criveti, Principal Architect, Platform Engineering • Responsible for large scale Cloud Native and AI Solutions • Red Hat Certified Architect III, CKA/CKS/CKAD • Driving the development of Retrieval Augmentation Generation platforms, and solutions for Generative AI at IBM that leverage WatsonX, Vector databases, LangChain, HuggingFace and open source AI models. Abstract • Large Language Models: use cases and limitations • Scaling Large Language Models: Retrieval Augmented Generation • LLAMA2, HuggingFace TGIS, SentenceTransformers, Python, LangChain, Weaviate, ChromaDB vector databases, deployment to k8s 3
  • 4. Large Language Models and their Limitations
  • 5. GenAI and Large Language Models Explained Think of LLMs like mathematical functions, or your phone’s autocomplete f(x) = x' • Where the input (x) and the output (x') are strings A more accurate representation f(training_data, model_parameters, input_string) = output_string • training_data represents the data it was trained. • model_parameters represent things like “temperature” • input_string is the combination of prompt and context you give to the model. Ex: “What is Kubernetes” or “Summarize the following document: ${DOCUMENT}” • the ‘prompt’ is usually an optional instruction like “summarize”, “extract”, “translate”, “classify” etc. but more complex prompts are usually used. “Be a helpful assistant that responds to my question.. etc.” • The function can process a maximum of TOKEN_LIMIT (total input and output), usually ~4096 tokens (~3000 words in English, fewer in say.. Japanese). 4
  • 6. What Large Language Models DON’T DO Learn A model will not ‘learn’ from interactions (unless specifically trained/fine-tuned). Remember A model doesn’t remember previous prompts. In fact, it’s all done with prompt trickery: previous prompts are injected. The API does a LOT of of filtering and heavy lifting! Reason Think of LLMs like your phone’s autocomplete, it doesn’t reason, or do math. Use your data LLMs don’t provide responses based on YOUR data (databases or files), unless it’s include in the training dataset, or the prompt (ex: RAG). Use the Internet • A LLM doesn’t have the capacity to ‘search the internet’, or make API calls. • In fact, a model does not perform any activity other than converting one string of text into another string of text. 5
  • 7. GenAI Use Cases Figure 1: Use Cases 6
  • 8. Think of adding this to your architecture 7
  • 9. In fact, even a 9600 baud modem is much faster. Think teletype! LLMs are really slow • A WPM = ((BPS / 10) / 5) * 60, a 9600 baud modem will generate 11520 words / minute. • At an average 30 tokens / second (20 words) for LLAMA-70B, you’re getting 1200 words / minute! • This is slower than a punch card reader :-) LLMs are also expensive to run • Running your own LLAMA2 70B might cost as much as 20K / month if you’re using a dedicated GPU instance! 8
  • 10. Model Limitations in Practice Latency and Bandwidth: Tokens per second • Large models (70B) such as LLAMA2 can be painfully slow • Smaller models (20B, 13B, 7B) are faster, and can perform inference on a cheaper GPU (less VRAM) • Even so, models will perform at anywhere between 10 - 100 tokens / second. Token Limits • Models have a token limit as well • Usually 4096 tokens (roughly ~3000 words) of total input and output 9
  • 11. Getting LLMs to work with our data Training • Very Expensive, takes a long time Fine Tuning • Expensive, takes considerable time as well, but achievable Retrieval Augmented Generation • Insert your data into prompts every time • Cheap, and can work with vast amounts of data • While LLMs are SLOW, Vector Databases are FAST! • Can help overcome model limitations (such as token limits) - as you’re only feeding ‘top search results’ to the LLM, instead of whole documents. 10
  • 12. Retrieval Augmented Generation or Conversation with your Documents
  • 13. RAG Explained Figure 3: RAG Explained 11
  • 15. Scaling factor for RAG • Vector Database: consider sharding and High Availability • Fine Tuning: collecting data to be used for fine tuning • Governance and Model Benchmarking: how are you testing your model performance over time, with different prompts, one-shot, and various parameters • Chain of Reasoning and Agents • Caching embeddings and responses • Personalization and Conversational Memory Database • Streaming Responses and optimizing performance. A fine tuned 13B model may perform better than a poor 70B one! • Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are terrible at reasoning and prediction, consider calling other models) • Fallback techniques: fallback to a different model, or default answers • API scaling techniques, rate limiting, etc. • Async, streaming and parallelization, multiprocessing, GPU acceleration (including embeddings), generating your API using OpenAPI, etc. 13
  • 16. Contact This talk can be found on GitHub • https://github.com/crivetimihai/shipitcon-scaling-retrieval-augmented-generation Social media • https://twitter.com/CrivetiMihai - follow for more LLM content • https://youtube.com/CrivetiMihai - more LLM videos to follow • https://www.linkedin.com/in/crivetimihai/ 14