This presentation was provided by William Mattingly of the Smithsonian Institution, during the sixth segment of the NISO training series "AI & Prompt Design." Session Six: Text Classification with LLMs, was held on May 9, 2024.
2. 1. GliNER
2. Large Language Models NER
3. Vector Databases
4. Semantic Searching
5. RAG
Goals
3. Machine
Learning
● GliNER => A transformer
architecture that allows you to
pass a text and your own
labels to a model without any
training.
Example:
https://huggingface.co/spaces/toma
arsen/gliner_medium-v2.1
Zero-Shot NER
6. LLMs
● Resource Intensity (and Cost)
● Data Privacy Concerns
● Black Box Models
● Training Data Bias
● Generalization Challenges
● Latency Issues
● Hallucinations
● Consistency
Limitations
7. LLMs
● Thinking through your
methodology for NER
● Assisting in certain steps of
NER (RegEx)
● Zero-Shot NER
● Few-Shot NER
How to use LLMs
8. Mrs. Jessica Monica Kapitan works at the
office. Mrs. Kapitan is a lawyer. She is also
friends with Mrs. Thompson and Miss. Smith.
Sometimes Miss. Smith will miss her train.
9. Exercise 1: Capture all examples of Miss. and
Mrs. in the text with their corresponding
names using an LLM to generate RegEx
https://regex101.com/r/TLfbGE/1
10. Exercise 1: One Solution
b(Mrs.|Miss.)s+([A-Z][a-z]*(?:s+[A-Z][a-z]*)*)
11. Mr. Thomas and Dr. Jessica Davis went to the
store. They met Mrs. Stevens who works at a
nearby office. They are all friends with Colonel
Jackson. Col. Jackson is known to her friends
by her first name, Terry. They all know Mr.
and Mrs. Kapitan.
12. Exercise 2: Capture all examples [Honorific
Entity] in the text with their corresponding
names using an LLM to generate RegEx
https://regex101.com/r/FYcO8C/1
13. Exercise 2: One Solution
b(Mr.|Mrs.|Miss.|Dr.|Colonel|Col.)s+([A-Z][a-z]*(?:s+[A-Z][a-z]*)*)
14. Exercise 3: Use an LLM to identify the people
in the following text. Think through an ethical
way to use an LLM to identify potential women
in these contexts.
Dr. Tracey Jordan works at the Smithsonian where he develops methods to identify named entities. Mrs. Alex Jackson leads the team.
She was trained in machine learning at Stanford. While Tracey functions as the domain expert, Alex Jackson designs the experiments.
They have another colleague, Leslie Peters.
19. Vector
Database
How do we use a vector
database?
● We populate a vector database
with by using a machine
learning model to vectorize
data and send them to the
database.
21. Vector
Database
Why use a vector database?
● Vector databases allow users
to store vector data in a way
that allows users to query it
and find similarity based on a
vector-level similarity, rather
than explicit human-defined
similarity.
22. Vector
Database
What is it?
● A vector database holds
numerous vectors or
embeddings of data.
Sometimes, the database will
also store the original data
alongside these vectors.
25. Vector Database
Stacks
What is available to us?
● Python, Annoy, Streamlit
○ Cheap, easy to deploy, great for
smaller datasets, but requires a
little bit of knowledge to build from
scratch
○ Best for smaller databases (under
10,000 data)
● Python, txtAI
○ Cheap and easy to use, more
resource intensive but easy to
deploy
○ Allows for easy interpretability (via
highlighting)
31. RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a
query
32. RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a
query
1
2
3
4
5 6