SlideShare a Scribd company logo
Scaling RAG
applications to serve
millions of users
Lessons learned scaling LLMs, Vector databases and more…
Powered by
What we’ll cover today…
📝 Agenda - Who is SlideSpeak? What do we do?
- Our Growth so far
- Mistakes we’ve made
- Architecture Overview
- Challenges of scaling RAG applications
- Challenges of scaling Vector databases
- What we’re currently busy with…
- Q&A
Create presentations from
Word, PDF or Excel files
Create presentations for any
topic
Some of our key features:
SlideSpeak is an AI-powered platform
to create and automate presentations
What we do
We save people hours of work
by speeding up manual
presentation workflows
OUR GROWTH
SO FAR
We’ve launched 6 months ago,
here’s what we’ve achieved. >2m
Files upload
>2m
LLM Tokens
consumed per
minute
>1000
LLM calls
per minute
250k
MAUs
User
Vector DB
Context
LLM
Response Query
Inference Retrieval
Prompt
Quick recap
on RAG
Using LLMs for everything
Storing vector data forever
No monitoring
Downtimes cost us thousands of $$$
Mistakes we’ve
made… so far…
��
��
��
Storing vectors is expensive, like
really expensive…
If not absolutely necessary use avoid
using LLMs
Architecture
Prompts, JSON Mode, Regression
Scaling Vector Databases
Scaling LLM providers
Rate limits, Downtimes, …
Challenges
scaling RAG
Applications
��
��
��
Slow queries, index building, …
LLM specific prompts, Pydantic being
unreliable, …
- Rate limits are not as high as
they seem
- Difficult to balance RPM and
TPM
- 40 page document has on avg.
24k tokens, with 2m limit thats
83 documents per minute or 1.5
per second
OpenAI Rate Limits (6/10/24)
Scaling LLM
providers
The problem
- We’ve migrated to Azure
OpenAI
- Not because the rate limits are
higher, but you can load
balance 🤯
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
Scaling LLM
providers
The solution
Source:
https://techcommunity.microsoft.com/t5/fasttrack-for-azure/smart-load-balancing-for-openai-end
points-and-azure-api/ba-p/3991616
Scaling LLM providers
Load Balancing in Azure
Scaling LLM
providers
The solution
- Find similar information in a lot of data
- Feed that to the LLM as context
Why do we need Vector Databases?
5 Problems when scaling Postgres with PGVector
- 🦥 Slow queries
- 🧠 Memory intense
- 🔍 Challenge of combining vector search and
metadata search (also called hybrid search)
needs careful query optimization
Challenges of scaling Vector Databases
Challenges of scaling Vector Databases
A Vector is an array of 4-byte floating point numbers.
Number of Vectors Total Size
7 Thousand 7,000 × 1536 × 4 = 43 Megabyte
1 Million 1,000,000 × 1536 × 4 = 5.7 Gigabyte
10 Millions 10,000,000 × 1536 × 4 = 57 Gigabyte
1 Billion 1000,000,000 × 1536 × 4 = 5.5 Terabyte
Build an index over the vector data
Challenges of scaling Vector Databases
HNSW: Efficient nearest neighbor search algorithm (ANN) for high-dimensional data.
ef_construction: Defines how many similarity candidates to look for
m: Defines how many of the closest neighbors to pick from the ef_construction list
** This might be more tricky if you use hybrid search (metadata + vector data)
Define what could be a good partition value for you (date, filetype, category, …)
Use partitioning
What we’ve done to scale PGVector
General tips when
working with
PGVector
Make sure to delete unnecessary
vector data
01
Never use the same Postgres
database for Vectors and other data
02
If nothing helps… Scale horizontally
What we’ve done to scale PGVector
What we’re
currently busy
with…
System Extension
Azure AI Postgres Extension
RAG Evaluation
– Implementing robust evaluation testing methods
– Exploring Ragas for advanced system performance
✅
��
��
– Enabling direct creation of embeddings within Azure
– Extend system to cover images and other non-textual data
Here’s what currently keeps
us up at night…
Questions?
And now it’s your turn…

More Related Content

Similar to "Scaling RAG Applications to serve millions of users", Kevin Goedecke

ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
Sigmoid
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
ivan provalov
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
NEWYORKSYS-IT SOLUTIONS
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
Bill Liu
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Lucidworks
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Amazon Web Services
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon Web Services
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Grier Johnson
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
Grandata
GrandataGrandata
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Demi Ben-Ari
 
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
dbpublications
 
IBM Netezza Training.pdf
IBM Netezza Training.pdfIBM Netezza Training.pdf
IBM Netezza Training.pdf
NisaTrainings6
 
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
Amazon Web Services
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
Christopher Williams
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
Anant Corporation
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
Yosuke Mizutani
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Rio Info
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
Stylight
 

Similar to "Scaling RAG Applications to serve millions of users", Kevin Goedecke (20)

ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
Big Data in the Cloud: How the RISElab Enables Computers to Make Intelligent ...
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Grandata
GrandataGrandata
Grandata
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
 
IBM Netezza Training.pdf
IBM Netezza Training.pdfIBM Netezza Training.pdf
IBM Netezza Training.pdf
 
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 

More from Fwdays

"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
"Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ..."Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ...
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
Fwdays
 
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation..."Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
Fwdays
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
Fwdays
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
Fwdays
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
Fwdays
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
Fwdays
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
Fwdays
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
Fwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
 

More from Fwdays (20)

"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
"Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ..."Microservices and multitenancy - how to serve thousands of databases in one ...
"Microservices and multitenancy - how to serve thousands of databases in one ...
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
"Black Monday: The Story of 5.5 Hours of Downtime", Dmytro Dziubenko
 
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation..."Reaching 3_000_000 HTTP requests per second — conclusions from participation...
"Reaching 3_000_000 HTTP requests per second — conclusions from participation...
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh"What I learned through reverse engineering", Yuri Artiukh
"What I learned through reverse engineering", Yuri Artiukh
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov"Micro frontends: Unbelievably true life story", Dmytro Pavlov
"Micro frontends: Unbelievably true life story", Dmytro Pavlov
 
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak
 
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi
 
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Recently uploaded

Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
CEPTES Software Inc
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
The Evolution of Remote Server Management
The Evolution of Remote Server ManagementThe Evolution of Remote Server Management
The Evolution of Remote Server Management
Bert Blevins
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 

Recently uploaded (20)

Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
The Evolution of Remote Server Management
The Evolution of Remote Server ManagementThe Evolution of Remote Server Management
The Evolution of Remote Server Management
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 

"Scaling RAG Applications to serve millions of users", Kevin Goedecke

  • 1. Scaling RAG applications to serve millions of users Lessons learned scaling LLMs, Vector databases and more… Powered by
  • 2. What we’ll cover today… 📝 Agenda - Who is SlideSpeak? What do we do? - Our Growth so far - Mistakes we’ve made - Architecture Overview - Challenges of scaling RAG applications - Challenges of scaling Vector databases - What we’re currently busy with… - Q&A
  • 3. Create presentations from Word, PDF or Excel files Create presentations for any topic Some of our key features: SlideSpeak is an AI-powered platform to create and automate presentations What we do
  • 4. We save people hours of work by speeding up manual presentation workflows
  • 5. OUR GROWTH SO FAR We’ve launched 6 months ago, here’s what we’ve achieved. >2m Files upload >2m LLM Tokens consumed per minute >1000 LLM calls per minute 250k MAUs
  • 6. User Vector DB Context LLM Response Query Inference Retrieval Prompt Quick recap on RAG
  • 7. Using LLMs for everything Storing vector data forever No monitoring Downtimes cost us thousands of $$$ Mistakes we’ve made… so far… �� �� �� Storing vectors is expensive, like really expensive… If not absolutely necessary use avoid using LLMs
  • 9. Prompts, JSON Mode, Regression Scaling Vector Databases Scaling LLM providers Rate limits, Downtimes, … Challenges scaling RAG Applications �� �� �� Slow queries, index building, … LLM specific prompts, Pydantic being unreliable, …
  • 10. - Rate limits are not as high as they seem - Difficult to balance RPM and TPM - 40 page document has on avg. 24k tokens, with 2m limit thats 83 documents per minute or 1.5 per second OpenAI Rate Limits (6/10/24) Scaling LLM providers The problem
  • 11. - We’ve migrated to Azure OpenAI - Not because the rate limits are higher, but you can load balance 🤯 https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models Scaling LLM providers The solution
  • 14. - Find similar information in a lot of data - Feed that to the LLM as context Why do we need Vector Databases?
  • 15. 5 Problems when scaling Postgres with PGVector - 🦥 Slow queries - 🧠 Memory intense - 🔍 Challenge of combining vector search and metadata search (also called hybrid search) needs careful query optimization Challenges of scaling Vector Databases
  • 16. Challenges of scaling Vector Databases A Vector is an array of 4-byte floating point numbers. Number of Vectors Total Size 7 Thousand 7,000 × 1536 × 4 = 43 Megabyte 1 Million 1,000,000 × 1536 × 4 = 5.7 Gigabyte 10 Millions 10,000,000 × 1536 × 4 = 57 Gigabyte 1 Billion 1000,000,000 × 1536 × 4 = 5.5 Terabyte
  • 17. Build an index over the vector data Challenges of scaling Vector Databases HNSW: Efficient nearest neighbor search algorithm (ANN) for high-dimensional data. ef_construction: Defines how many similarity candidates to look for m: Defines how many of the closest neighbors to pick from the ef_construction list ** This might be more tricky if you use hybrid search (metadata + vector data)
  • 18. Define what could be a good partition value for you (date, filetype, category, …) Use partitioning What we’ve done to scale PGVector
  • 19. General tips when working with PGVector Make sure to delete unnecessary vector data 01 Never use the same Postgres database for Vectors and other data 02
  • 20. If nothing helps… Scale horizontally What we’ve done to scale PGVector
  • 21. What we’re currently busy with… System Extension Azure AI Postgres Extension RAG Evaluation – Implementing robust evaluation testing methods – Exploring Ragas for advanced system performance ✅ �� �� – Enabling direct creation of embeddings within Azure – Extend system to cover images and other non-textual data Here’s what currently keeps us up at night…