Text and Data Mining
Building Data-Driven Apps
1. Refresher on Vector Databases
2. Large Language Models
3. Retrieval-Augmented Generation (RAG)
4. Streamlit
5. Gradio
6. R Shiny
Goals
Multi-Modal
How does it work?
Large Language Models
Large Language
Models
What are they?
● Large language models (LLMs)
like GPT-4 are advanced AI
systems trained on extensive
datasets to understand and
generate human-like text.
Large Language
Models
What are they?
● Natural Language
Understanding (NLU)
● Natural Language Generation
(NLG)
Large Language
Models
What are they?
● Natural Language
Understanding (NLU)
● Natural Language Generation
(NLG)
Large Language
Models
Training
● LLMs are trained on vast
amounts of text data from the
internet, books, and other
sources. This training enables
them to learn language
patterns, context, and various
knowledge domains.
Large Language
Models
Where do they excel?
● Generating new data based on
their knowledge of existing
data.
○ Code
○ Essays
○ Images
Large Language
Models
Limitations
● Hallucinations - Generating
incorrect data
● Ethics and Biases
● Copyright Infringement
Retrieval-Augmented Generation
How tall is Wookie?
How tall is Wookie?
RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a
query
RAG
What is it?
● RAG allows for you to combine
the strengths of large language
models (LLMs) with vector
databases
● It limits the chances for an LLM
to hallucinate (generate fake
information)
● It uses a vector database to
find relevant material to a
query
1
2
3
4
5 6

Mattingly "Text and Data Mining: Building Data Driven Applications"