Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Prompt Design
04: Structured Data, Assistants, & RAG

1. Importance of Structured Data
2. How to Generate Structured Data from LLMs
3. Importance of Consistency in LLM Outputs
4. How to Generate Consistent Responses
5. Vector Databases and Semantic Search
6. Retrieval Augmented Generation
7. Assistants
Goals

John went to Paris on 1 August 2023.

Named Entity Recognition
John went to Paris on 1 August 2023.
● John => PERSON
● Paris => LOCATION
● 1 August 2023 => DATE

Traditional Approaches
● Rules-Based
● Task-Specific Machine Learning Model

Structured Data
Types of Data
Important Structures
● CSV
● JSON
● HTML/XML
Important Questions:
1. Should the data be hierarchical (nested).
2. Do I want to preserve the input data? If
so, how?
3. What is the intended usage of the data?
4. How much data will I have (scalability)?

JSON JavaScript Object Notation

HTML HyperText Markup Language

HTML

Not that Belladonna
Took ever had any adventures after she
became Mrs. Bungo
Baggins.
Bungo, that was
Bilbo’s father, built
the most luxurious hobbit-hole for her
(and partly with her money) that was to be found
either under The Hill
or over The Hill
or across The Water,
and there they remained to the end of their days.

XML eXtensible Markup Language

XML
<text>
<sentence>
Not that <person>Belladonna Took</person> ever had any
adventures after she became Mrs. <person>Bungo Baggins</person>.
</sentence>
<sentence>
<person>Bungo</person>, that was <person>Bilbo</person>’s
father, built the most luxurious hobbit-hole for her
(and partly with her money) that was to be found either under
<place>The Hill</place> or over <place>The Hill</place>
or across <place>The Water</place>, and there they remained to
the end of their days.
</sentence>
</text>

Exercise 1 (10 min): Generate Structured
Data Output for “John went to Paris on 1
August 2023.”

Importance of Structured Output

Exercise 2 (10 min): Create your Own Texts
and Try to get the Same Output each time,
first in the same chat, then in different chats.

Practical Applications with Real World Data
An ANCYL member who was shot
and severely injured by SAP
members at Lephoi, Bethulie,
Orange Free State (OFS) on 17
April 1991. Police opened fire on a
gathering at an ANC supporter's
house following a dispute between
two neighbours, one of whom was
linked to the ANC and the other to
the SAP and a councillor.

Representing
Texts
Digitally
Embeddings
● The apple is in the tree.
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○ 2-different vector
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]

Vector
Database
What is it?
● It holds vectors in a database
as storage.
● Similar vectors are stored
closer.

Vector
Database
How do we use a vector
database?
● We populate a vector database
with by using a machine
learning model to vectorize
data and send them to the
database.

Vector
Database
Why use a vector database?

Vector
Database
Why use a vector database?
● Vector databases allow users
to store vector data in a way
that allows users to query it
and find similarity based on a
vector-level similarity, rather
than explicit human-defined
similarity.

Vector
Database
What is it?
● A vector database holds
numerous vectors or
embeddings of data.
Sometimes, the database will
also store the original data
alongside these vectors.

Vector Database
Stacks
What is available to us?
● Python, Annoy, Streamlit
○ Cheap, easy to deploy, great for
smaller datasets, but requires a
little bit of knowledge to build from
scratch
○ Best for smaller databases (under
10,000 data)
● Python, txtAI
○ Cheap and easy to use, more
resource intensive but easy to
deploy
○ Allows for easy interpretability (via
highlighting)

Retrieval-Augmented Generation

RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a query

RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a query
1
2
3
4
5 6

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Recommended

Recommended

More Related Content

Similar to Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Similar to Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG" (20)

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Recently uploaded

Recently uploaded (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"