Loading your Life into a Vector Database

B
Loading your life into a
Vector Database
2
2
@bnchrch
Ben Church
Who am I?
● Live 3 flights away on the northern part of
Vancouver Island
● Engineer at Airbyte
● Like to Tinker
● Love to Teach
● A masochist to hit the potholes first so you
don’t have to
3
3
Stubbing your toe on
Token Limits
RAG Against the Machine
(Retrieval Augmented Generation)
Stealing my friends Strava
Data so I don’t have to
exercise as much as I told my
Wife I would.
7
7
New Systems,
New Possibilities,
New Boundaries,
New Constraints,
New Considerations.
8
8
Agenda
● Constraints of a modern LLM based
system
● Considerations that result and how to
work with this new technology
● The Second Order Effects and the
Future
● Through the lens of a simple
application
● Vectors and Vector Databases
● (simple) LLM Application Architecture
● Context Stuffing
● Token Limits
● GraphQL Types
● GraphQL Introspection
9
9
10
10
11
11
Step 0: Share Context
12
12
What does it mean to be a “simple” LLM application?
13
13
��
Credit: Nicole Choi and Github https://github.blog/2023-10-30-the-architecture-of-todays-llm-applications/
14
14
Retrieval Augmented Generation
source: https://www.ml6.eu/blogpost/leveraging-llms-on-your-domain-specific-knowledge-base
15
15
Retrieval Augmented Generation
(with GQL)
16
16
Constraint: Unstructured to Structured Data
17
17
Vectors
● Embeddings == Vectors used in ML/AI
● An Array of Numbers (Weights)
● Representation of similarity
● Used for similarity searches
● Not Unique to LLMs or AI
● Vector DBs make use of Vector Index’s
Credit https://stackoverflow.blog/2023/10/09/from-prototype-to-production-vector-databases-in-generative-ai-applications/
18
18
Vector Database
Credit https://dejanualex.medium.com/llm-and-vector-databases-d2530f03f6be
● Vector DBs make use of Vector Index’s
● Optimized for similarity searches
● Often built with the LLM use case in mind
● Specifically in Context Retrieval System
19
19
Consider: Building on top of a Vector DB
20
20
Consider: A Unified Datastore and API
21
21
Step 1: Load your Data
22
22
Calling All Sources
23
23
Setting up Weaviate Cloud
24
24
Setting up a Destination
25
25
Setting up a Source
26
26
Sync it!
27
27
Thank you Melissa! 󰣘
28
28
Step 2: Query for Context
29
29
30
30
Weaviate? GraphQL? Why?
31
31
Introspection, Descriptions, Types
32
32
Complex Filters and Aggregation
34
34
It Solves Two Major Constraints
36
36
Constraint: Token Limits
● There is a Limit on the amount of Context
you can Stuff into a Query
● That limit is set by the model
● It can be Low
● It can be High
○ Latest Chat GPT
○ Anthropic
● Tension between Token Limits and
meaningful Vectors
○ Makes Time Series Questions Difficult
37
37
Constraint: Too Much Context
● As Context Size increases
● Hallucinations increase
● Answer Quality decreases
● The balancing act of relevancy
38
38
Consider: A Flexible, Accessible, Schema & Query Language
39
39
Step 2: Build
41
41
Not that complicated…
1. Get a question
2. Ask the LLM to Transform it to a Vector
3. Get the GraphQL Schema
4. Ask the LLM to generate a Retrieval Query for our
Contextual Data
5. Use the Retrieval Query to get our Contextual
Data
6. Ask the LLM to generate the final answer
42
42
Question
43
43
Vector
44
44
Introspection
45
45
Example Record
46
46
Retrieval Query
47
47
Answer
48
48
Other Interesting Queries
49
49
Constraint: Bad at Math
50
50
Constraint: Non-Deterministic
52
52
For the next project
53
53
For the next project
● LLMs are
○ Non-deterministic
○ Bad at Math
○ Limited by their tokens
● Context and meaningful vectors are at odds with your
models Token limit
● You need a strategy for pulling in the most amount of
context in the least amount of text
● Your Data Models, API’s and Error Messages matter
more in this paradigm
54
54
Thoughts for the Future
Will Tom Cruise
Live Forever?
58
58
Thank You
1 of 52

Recommended

(Some) pitfalls of distributed learning by
(Some) pitfalls of distributed learning(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learningYves Raimond
62.3K views37 slides
Best practices for network troubleshooting by
Best practices for network troubleshootingBest practices for network troubleshooting
Best practices for network troubleshootingCumulus Networks
842 views50 slides
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc... by
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Databricks
1K views78 slides
Brief Introduction to Deep Learning for Object Recognition Using MATLAB by
Brief Introduction to Deep Learning for Object Recognition Using MATLABBrief Introduction to Deep Learning for Object Recognition Using MATLAB
Brief Introduction to Deep Learning for Object Recognition Using MATLABMohammad Alkhodary
156 views36 slides
Offline first: application data and synchronization by
Offline first: application data and synchronizationOffline first: application data and synchronization
Offline first: application data and synchronizationEatDog
2K views45 slides
Microservices at ibotta pitfalls and learnings by
Microservices at ibotta pitfalls and learningsMicroservices at ibotta pitfalls and learnings
Microservices at ibotta pitfalls and learningsMatthew Reynolds
236 views43 slides

More Related Content

Similar to Loading your Life into a Vector Database

How to build an event driven architecture with kafka and kafka connect by
How to build an event driven architecture with kafka and kafka connectHow to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connectLoi Nguyen
131 views51 slides
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ... by
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Databricks
1.7K views75 slides
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys... by
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...BATbern
331 views44 slides
Upleveling Analytics with Kafka with Amy Chen by
Upleveling Analytics with Kafka with Amy ChenUpleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy ChenHostedbyConfluent
149 views33 slides
Semantic Segmentation on Satellite Imagery by
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
903 views37 slides
Building Conclave: a decentralized, real-time collaborative text editor by
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorSun-Li Beatteay
109 views67 slides

Similar to Loading your Life into a Vector Database(20)

How to build an event driven architecture with kafka and kafka connect by Loi Nguyen
How to build an event driven architecture with kafka and kafka connectHow to build an event driven architecture with kafka and kafka connect
How to build an event driven architecture with kafka and kafka connect
Loi Nguyen131 views
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ... by Databricks
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Databricks1.7K views
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys... by BATbern
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
BATbern331 views
Upleveling Analytics with Kafka with Amy Chen by HostedbyConfluent
Upleveling Analytics with Kafka with Amy ChenUpleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy Chen
HostedbyConfluent149 views
Semantic Segmentation on Satellite Imagery by RAHUL BHOJWANI
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
RAHUL BHOJWANI903 views
Building Conclave: a decentralized, real-time collaborative text editor by Sun-Li Beatteay
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editor
Sun-Li Beatteay109 views
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ... by Ambassador Labs
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs2.7K views
Data Lineage, Property Based Testing & Neo4j by Neo4j
Data Lineage, Property Based Testing & Neo4j Data Lineage, Property Based Testing & Neo4j
Data Lineage, Property Based Testing & Neo4j
Neo4j84 views
How Criteo optimized and sped up its TensorFlow models by 10x and served them... by Nicolas Kowalski
How Criteo optimized and sped up its TensorFlow models by 10x and served them...How Criteo optimized and sped up its TensorFlow models by 10x and served them...
How Criteo optimized and sped up its TensorFlow models by 10x and served them...
Nicolas Kowalski583 views
Back to Basics Spanish Webinar 3 - Introducción a los replica sets by MongoDB
Back to Basics Spanish Webinar 3 - Introducción a los replica setsBack to Basics Spanish Webinar 3 - Introducción a los replica sets
Back to Basics Spanish Webinar 3 - Introducción a los replica sets
MongoDB1.4K views
[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Networ... by Pei Lee
[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Networ...[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Networ...
[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Networ...
Pei Lee1.1K views
How to mutate your immutable log | Andrey Falko, Stripe by HostedbyConfluent
How to mutate your immutable log | Andrey Falko, StripeHow to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, Stripe
HostedbyConfluent535 views
Intro to open source observability with grafana, prometheus, loki, and tempo(... by LibbySchulze
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze1.1K views
Mis 589 Success Begins / snaptutorial.com by WilliamsTaylor44
Mis 589  Success Begins / snaptutorial.comMis 589  Success Begins / snaptutorial.com
Mis 589 Success Begins / snaptutorial.com
WilliamsTaylor4413 views
Mis 589 Massive Success / snaptutorial.com by Stephenson185
Mis 589 Massive Success / snaptutorial.comMis 589 Massive Success / snaptutorial.com
Mis 589 Massive Success / snaptutorial.com
Stephenson18526 views
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ... by Justin Basilico
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Justin Basilico8.8K views
Machine Learning, Deep Learning and Data Analysis Introduction by Te-Yen Liu
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu6.3K views

Recently uploaded

SAP FOR TYRE INDUSTRY.pdf by
SAP FOR TYRE INDUSTRY.pdfSAP FOR TYRE INDUSTRY.pdf
SAP FOR TYRE INDUSTRY.pdfVirendra Rai, PMP
24 views3 slides
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports by
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsBushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsRa'Fat Al-Msie'deen
8 views49 slides
Short_Story_PPT.pdf by
Short_Story_PPT.pdfShort_Story_PPT.pdf
Short_Story_PPT.pdfutkarshsatishkumarsh
5 views16 slides
Software evolution understanding: Automatic extraction of software identifier... by
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...Ra'Fat Al-Msie'deen
9 views33 slides
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Lisi Hocke
30 views124 slides
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ... by
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...Deltares
11 views32 slides

Recently uploaded(20)

BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports by Ra'Fat Al-Msie'deen
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsBushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
Software evolution understanding: Automatic extraction of software identifier... by Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke30 views
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ... by Deltares
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
Deltares11 views
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta6 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller38 views
Generic or specific? Making sensible software design decisions by Bert Jan Schrijver
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action by Márton Kodok
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok5 views
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik5 views
Advanced API Mocking Techniques by Dimpy Adhikary
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking Techniques
Dimpy Adhikary19 views
DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h... by Deltares
DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h...DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h...
DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h...
Deltares5 views
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the... by Deltares
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
Deltares6 views
Myths and Facts About Hospice Care: Busting Common Misconceptions by Care Coordinations
Myths and Facts About Hospice Care: Busting Common MisconceptionsMyths and Facts About Hospice Care: Busting Common Misconceptions
Myths and Facts About Hospice Care: Busting Common Misconceptions

Loading your Life into a Vector Database