SlideShare a Scribd company logo
1 of 40
Download to read offline
Prompt Design
04: Structured Data, Assistants, & RAG
1. Importance of Structured Data
2. How to Generate Structured Data from LLMs
3. Importance of Consistency in LLM Outputs
4. How to Generate Consistent Responses
5. Vector Databases and Semantic Search
6. Retrieval Augmented Generation
7. Assistants
Goals
John went to Paris on 1 August 2023.
Named Entity Recognition
John went to Paris on 1 August 2023.
● John => PERSON
● Paris => LOCATION
● 1 August 2023 => DATE
Traditional Approaches
● Rules-Based
● Task-Specific Machine Learning Model
Zero Shot NER with LLMs
Structured Data
Structured Data
Types of Data
Important Structures
● CSV
● JSON
● HTML/XML
Important Questions:
1. Should the data be hierarchical (nested).
2. Do I want to preserve the input data? If
so, how?
3. What is the intended usage of the data?
4. How much data will I have (scalability)?
CSV Comma Separated Value
CSV
JSON JavaScript Object Notation
JSON
HTML HyperText Markup Language
HTML
<p>
Not that <span class="person">Belladonna
Took</span> ever had any adventures after she
became Mrs. <span class="person">Bungo
Baggins</span>.
<span class="person">Bungo</span>, that was
<span class="person">Bilbo</span>’s father, built
the most luxurious hobbit-hole for her
(and partly with her money) that was to be found
either under <span class="place">The Hill</span>
or over <span class="place">The Hill</span>
or across <span class="place">The Water</span>,
and there they remained to the end of their days.
</p>
XML eXtensible Markup Language
XML
<text>
<sentence>
Not that <person>Belladonna Took</person> ever had any
adventures after she became Mrs. <person>Bungo Baggins</person>.
</sentence>
<sentence>
<person>Bungo</person>, that was <person>Bilbo</person>’s
father, built the most luxurious hobbit-hole for her
(and partly with her money) that was to be found either under
<place>The Hill</place> or over <place>The Hill</place>
or across <place>The Water</place>, and there they remained to
the end of their days.
</sentence>
</text>
Exercise 1 (10 min): Generate Structured
Data Output for “John went to Paris on 1
August 2023.”
Importance of Structured Output
Exercise 2 (10 min): Create your Own Texts
and Try to get the Same Output each time,
first in the same chat, then in different chats.
Few-Shot NER.
Practical Applications with Real World Data
An ANCYL member who was shot
and severely injured by SAP
members at Lephoi, Bethulie,
Orange Free State (OFS) on 17
April 1991. Police opened fire on a
gathering at an ANC supporter's
house following a dispute between
two neighbours, one of whom was
linked to the ANC and the other to
the SAP and a councillor.
Assistants
Vector Databases
Representing
Texts
Digitally
Embeddings
● The apple is in the tree.
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○ 2-different vector
○ 3-different vector
○ 4-different vector
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○ 5-different vector
Vector
Database
What is it?
● It holds vectors in a database
as storage.
● Similar vectors are stored
closer.
Vector
Database
How do we use a vector
database?
● We populate a vector database
with by using a machine
learning model to vectorize
data and send them to the
database.
Vector
Database
Why use a vector database?
Vector
Database
Why use a vector database?
● Vector databases allow users
to store vector data in a way
that allows users to query it
and find similarity based on a
vector-level similarity, rather
than explicit human-defined
similarity.
Vector
Database
What is it?
● A vector database holds
numerous vectors or
embeddings of data.
Sometimes, the database will
also store the original data
alongside these vectors.
Vector Database Stacks
Vector Database Stacks
Vector Database
Stacks
What is available to us?
● Python, Annoy, Streamlit
○ Cheap, easy to deploy, great for
smaller datasets, but requires a
little bit of knowledge to build from
scratch
○ Best for smaller databases (under
10,000 data)
● Python, txtAI
○ Cheap and easy to use, more
resource intensive but easy to
deploy
○ Allows for easy interpretability (via
highlighting)
Multi-Modal
How does it work?
Retrieval-Augmented Generation
How tall is Wookie?
How tall is Wookie?
RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a query
RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a query
1
2
3
4
5 6

More Related Content

Similar to Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
Carmen Sanborn
 
How IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsHow IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problems
ikanow
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
Bernadette Hyland-Wood
 
MongoDC - Ikanow April 2012 Meetup
MongoDC - Ikanow April 2012 MeetupMongoDC - Ikanow April 2012 Meetup
MongoDC - Ikanow April 2012 Meetup
ikanow
 

Similar to Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG" (20)

Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
How IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsHow IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problems
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case study
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Knowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and ChallengesKnowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and Challenges
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
Polyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jPolyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4j
 
Real-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech StreamsReal-time Generation of Topic Maps from Speech Streams
Real-time Generation of Topic Maps from Speech Streams
 
Ten things to consider for interactive analytics on write once workloads
Ten things to consider for interactive analytics on write once workloadsTen things to consider for interactive analytics on write once workloads
Ten things to consider for interactive analytics on write once workloads
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
 
MongoDC - Ikanow April 2012 Meetup
MongoDC - Ikanow April 2012 MeetupMongoDC - Ikanow April 2012 Meetup
MongoDC - Ikanow April 2012 Meetup
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Chapter 1 Priliminaries.pptx
Chapter 1 Priliminaries.pptxChapter 1 Priliminaries.pptx
Chapter 1 Priliminaries.pptx
 

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 

Recently uploaded

SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
Peter Brusilovsky
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
EADTU
 

Recently uploaded (20)

Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
 
The Liver & Gallbladder (Anatomy & Physiology).pptx
The Liver &  Gallbladder (Anatomy & Physiology).pptxThe Liver &  Gallbladder (Anatomy & Physiology).pptx
The Liver & Gallbladder (Anatomy & Physiology).pptx
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

  • 1. Prompt Design 04: Structured Data, Assistants, & RAG
  • 2. 1. Importance of Structured Data 2. How to Generate Structured Data from LLMs 3. Importance of Consistency in LLM Outputs 4. How to Generate Consistent Responses 5. Vector Databases and Semantic Search 6. Retrieval Augmented Generation 7. Assistants Goals
  • 3. John went to Paris on 1 August 2023.
  • 4. Named Entity Recognition John went to Paris on 1 August 2023. ● John => PERSON ● Paris => LOCATION ● 1 August 2023 => DATE
  • 5. Traditional Approaches ● Rules-Based ● Task-Specific Machine Learning Model
  • 6. Zero Shot NER with LLMs
  • 8. Structured Data Types of Data Important Structures ● CSV ● JSON ● HTML/XML Important Questions: 1. Should the data be hierarchical (nested). 2. Do I want to preserve the input data? If so, how? 3. What is the intended usage of the data? 4. How much data will I have (scalability)?
  • 10. CSV
  • 12. JSON
  • 14. HTML <p> Not that <span class="person">Belladonna Took</span> ever had any adventures after she became Mrs. <span class="person">Bungo Baggins</span>. <span class="person">Bungo</span>, that was <span class="person">Bilbo</span>’s father, built the most luxurious hobbit-hole for her (and partly with her money) that was to be found either under <span class="place">The Hill</span> or over <span class="place">The Hill</span> or across <span class="place">The Water</span>, and there they remained to the end of their days. </p>
  • 16. XML <text> <sentence> Not that <person>Belladonna Took</person> ever had any adventures after she became Mrs. <person>Bungo Baggins</person>. </sentence> <sentence> <person>Bungo</person>, that was <person>Bilbo</person>’s father, built the most luxurious hobbit-hole for her (and partly with her money) that was to be found either under <place>The Hill</place> or over <place>The Hill</place> or across <place>The Water</place>, and there they remained to the end of their days. </sentence> </text>
  • 17. Exercise 1 (10 min): Generate Structured Data Output for “John went to Paris on 1 August 2023.”
  • 19. Exercise 2 (10 min): Create your Own Texts and Try to get the Same Output each time, first in the same chat, then in different chats.
  • 21. Practical Applications with Real World Data An ANCYL member who was shot and severely injured by SAP members at Lephoi, Bethulie, Orange Free State (OFS) on 17 April 1991. Police opened fire on a gathering at an ANC supporter's house following a dispute between two neighbours, one of whom was linked to the ANC and the other to the SAP and a councillor.
  • 24. Representing Texts Digitally Embeddings ● The apple is in the tree. ○ 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] ○ 2-different vector ○ 3-different vector ○ 4-different vector ○ 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] ○ 5-different vector
  • 25. Vector Database What is it? ● It holds vectors in a database as storage. ● Similar vectors are stored closer.
  • 26.
  • 27. Vector Database How do we use a vector database? ● We populate a vector database with by using a machine learning model to vectorize data and send them to the database.
  • 28. Vector Database Why use a vector database?
  • 29. Vector Database Why use a vector database? ● Vector databases allow users to store vector data in a way that allows users to query it and find similarity based on a vector-level similarity, rather than explicit human-defined similarity.
  • 30. Vector Database What is it? ● A vector database holds numerous vectors or embeddings of data. Sometimes, the database will also store the original data alongside these vectors.
  • 33. Vector Database Stacks What is available to us? ● Python, Annoy, Streamlit ○ Cheap, easy to deploy, great for smaller datasets, but requires a little bit of knowledge to build from scratch ○ Best for smaller databases (under 10,000 data) ● Python, txtAI ○ Cheap and easy to use, more resource intensive but easy to deploy ○ Allows for easy interpretability (via highlighting)
  • 36. How tall is Wookie?
  • 37.
  • 38. How tall is Wookie?
  • 39. RAG What is it? ● RAG allows for you to combine the strengths of large language models (LLMs) with vector databases ● It limits the chances for an LLM to hallucinate (generate fake information) ● It uses a vector database to find relevant material to a query
  • 40. RAG What is it? ● RAG allows for you to combine the strengths of large language models (LLMs) with vector databases ● It limits the chances for an LLM to hallucinate (generate fake information) ● It uses a vector database to find relevant material to a query 1 2 3 4 5 6