SlideShare a Scribd company logo
Prompt Design
LLMs with Text Classification and Open Source
1. GPT-4o
2. Multimodal LLMs
3. Vector Databases and Semantic Search
4. What is Text Classification?
5. How is it useful?
6. Traditional Approaches
7. LLMs and Text Classification
8. Open Source LLMs
Goals
GPT-4o
GPT-4o
A New Model
● Pricing: GPT-4o is 50% cheaper than
GPT-4 Turbo, coming in at $5/M input
and $15/M output tokens).
● Rate limits: GPT-4o’s rate limits are 5x
higher than GPT-4 Turbo—up to 10
million tokens per minute.
● Speed: GPT-4o is 2x as fast as GPT-4
Turbo.
● Vision: GPT-4o’s vision capabilities
perform better than GPT-4 Turbo in
evals related to vision capabilities.
● Multilingual: GPT-4o has improved
support for non-English languages over
GPT-4 Turbo.
● GPT-4o currently has a context window
of 128k and has a knowledge cut-off
date of October 2023.
GPT-4o
A New Model
● Released This week
● Purely Multimodal
● Exceptionally fast (low latency)
● Cheaper
● Available via the API and Chat
GPT-4o
Multimodal
“GPT-4o is OpenAI's new flagship
model that can reason across
audio, vision, and text in real
time.” - OpenAI’s Docs
GPT-4o
Multimodal
Text, Audio, and Video are all
vectorized by the same model and
treated the same way. In other
words, a text that describes a
beach would be very similar in
vector space to an image of a
beach.
Vector Databases and Semantic Search
Representing
Texts
Digitally
Embeddings
● The apple is in the tree.
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○ 2-different vector
○ 3-different vector
○ 4-different vector
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○ 5-different vector
Vector
Database
What is it?
● It holds vectors in a database
as storage.
● Similar vectors are stored
closer.
Vector
Database
How do we use a vector
database?
● We populate a vector database
with by using a machine
learning model to vectorize
data and send them to the
database.
Vector
Database
Why use a vector database?
Vector
Database
Why use a vector database?
● Vector databases allow users
to store vector data in a way
that allows users to query it
and find similarity based on a
vector-level similarity, rather
than explicit human-defined
similarity.
Vector
Database
What is it?
● A vector database holds
numerous vectors or
embeddings of data.
Sometimes, the database will
also store the original data
alongside these vectors.
Vector Database Stacks
Vector Database Stacks
Vector Database
Stacks
What is available to us?
● Python, Annoy, Streamlit
○ Cheap, easy to deploy, great for
smaller datasets, but requires a
little bit of knowledge to build from
scratch
○ Best for smaller databases (under
10,000 data)
● Python, txtAI
○ Cheap and easy to use, more
resource intensive but easy to
deploy
○ Allows for easy interpretability (via
highlighting)
Multi-Modal
How does it work?
Text Classification
Text
Classification
Overview
Assign a text to a specific category
or categories.
Categories == labels.
Text
Classification
Emails
"Congratulations! You've won a
$1,000 Walmart gift card. Click here
to claim your prize."
"Limited time offer: Buy one get one
free on all items in our store."
"Dear customer, your account has
been temporarily suspended.
Please update your information to
restore access."
Text
Classification
Sentiment
"I love this product! It works exactly
as described."
"The product arrived late and was
damaged. Very disappointed."
"It's okay, not great but not terrible
either."
"Excellent service and quick
delivery. Highly recommend!"
Text
Classification
Types
Binary Classification
Multiclass Classification
Multilabel Classification
Hierarchical Classification
Text
Classification
Binary Classification
Classifies text into one of two
categories.
Spam detection in emails, where
emails are classified as either
"spam" or "not spam."
Text
Classification
Multiclass Classification
Classifies text into one of three or
more categories.
Sentiment analysis with categories
such as "positive," "negative," and
"neutral."
Text
Classification
Multilabel Classification
Assigns multiple (or single) labels to
a single text instance, where each
label represents a different
category.
News categorization where an
article can belong to multiple
categories such as "politics,"
"economy," and "health."
Text
Classification
Hierarchical Classification
Classifies text into a hierarchy of
categories, where categories are
structured in a tree-like hierarchy.
Document classification in a library
where documents are classified into
categories like "science," "arts,"
"technology," with subcategories
under each (e.g., "science" can
have "physics," "chemistry,"
"biology").
Open Source Machine Learning
Open Source
ML
Overview
Open source machine learning, like
open source software (OSS), is
driven by the public. It has several
components: open source datasets,
open source machine learning
models, and open source
applications.
The best resource: HuggingFace
Open Source
ML
Datasets
● Datasets for training task-
specific models
○ NER
○ Text Classification
○ Image Classification
○ Object Detection
● Datasets for training language
models
○ Unannotated collections of texts
● Dataset Cards
○ Task
○ Language
○ Biases
Open Source
ML
Models
● Trained Machine Learning
Models for specific Tasks
○ NER
○ Text Classification
○ Image Classification
○ Object Detection
○ ASR
○ HTR
○ OCR
● Trained machine learning
language models (including
LLMs)
● Dataset Cards
○ Task
○ Language
○ Biases
Open Source
ML
Benefits and Limitations
● Benefits
○ Open, meaning they are freely
available to use (though
sometimes with commercial
limitations)
○ Publicly Critiqued
○ Understanding of the Data
● Limitations
○ Closed models are better in many
cases (BUT!!! That gap is closing).

More Related Content

Similar to Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"

3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
RushikeshChikane2
 
Python Programming
Python ProgrammingPython Programming
Python Programming
SheikAllavudeenN
 
Data Oriented Programming in Java.pdf
Data Oriented Programming in Java.pdfData Oriented Programming in Java.pdf
Data Oriented Programming in Java.pdf
AnjaliYadav764696
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
smile790243
 
Modern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryModern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discovery
All Things Open
 
2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx
Mohamed Essam
 
Topic-oriented writing at McAfee
Topic-oriented writing at McAfeeTopic-oriented writing at McAfee
Topic-oriented writing at McAfee
John Sarr
 
Learn advanced java programming
Learn advanced java programmingLearn advanced java programming
Learn advanced java programming
TOPS Technologies
 
Slawek Korea
Slawek KoreaSlawek Korea
Slawek Korea
Slawek
 
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
cseij
 
Longwell final ppt
Longwell final pptLongwell final ppt
Longwell final ppt
Kuldeep Singh
 
Fake news detection
Fake news detection Fake news detection
Fake news detection
shalushamil
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
Scott Abel
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
KU Leuven
 
Web crawling and text classification
Web crawling and text classificationWeb crawling and text classification
Web crawling and text classification
Shubham Patil
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
petrknoth
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
David Nzoputa Ofili
 
Content Repositories vs Knowledge Bases
Content Repositories vs Knowledge BasesContent Repositories vs Knowledge Bases
Content Repositories vs Knowledge Bases
gokcebanu
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
National Information Standards Organization (NISO)
 

Similar to Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source" (20)

3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
 
Python Programming
Python ProgrammingPython Programming
Python Programming
 
Data Oriented Programming in Java.pdf
Data Oriented Programming in Java.pdfData Oriented Programming in Java.pdf
Data Oriented Programming in Java.pdf
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
Modern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryModern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discovery
 
2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx
 
Topic-oriented writing at McAfee
Topic-oriented writing at McAfeeTopic-oriented writing at McAfee
Topic-oriented writing at McAfee
 
Learn advanced java programming
Learn advanced java programmingLearn advanced java programming
Learn advanced java programming
 
Slawek Korea
Slawek KoreaSlawek Korea
Slawek Korea
 
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
 
Longwell final ppt
Longwell final pptLongwell final ppt
Longwell final ppt
 
Fake news detection
Fake news detection Fake news detection
Fake news detection
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Web crawling and text classification
Web crawling and text classificationWeb crawling and text classification
Web crawling and text classification
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
 
Content Repositories vs Knowledge Bases
Content Repositories vs Knowledge BasesContent Repositories vs Knowledge Bases
Content Repositories vs Knowledge Bases
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 

More from National Information Standards Organization (NISO)

Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
National Information Standards Organization (NISO)
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
National Information Standards Organization (NISO)
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
National Information Standards Organization (NISO)
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
National Information Standards Organization (NISO)
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
National Information Standards Organization (NISO)
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
National Information Standards Organization (NISO)
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
National Information Standards Organization (NISO)
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
National Information Standards Organization (NISO)
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
National Information Standards Organization (NISO)
 

More from National Information Standards Organization (NISO) (20)

Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 

Recently uploaded

FinalSD_MathematicsGrade7_Session2_Unida.pptx
FinalSD_MathematicsGrade7_Session2_Unida.pptxFinalSD_MathematicsGrade7_Session2_Unida.pptx
FinalSD_MathematicsGrade7_Session2_Unida.pptx
JennySularte1
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
Kalna College
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
Kalna College
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
andagarcia212
 
The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
heathfieldcps1
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
MJDuyan
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
shreyassri1208
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
Payaamvohra1
 
adjectives.ppt for class 1 to 6, grammar
adjectives.ppt for class 1 to 6, grammaradjectives.ppt for class 1 to 6, grammar
adjectives.ppt for class 1 to 6, grammar
7DFarhanaMohammed
 
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT KanpurDiversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Quiz Club IIT Kanpur
 
Ch-4 Forest Society and colonialism 2.pdf
Ch-4 Forest Society and colonialism 2.pdfCh-4 Forest Society and colonialism 2.pdf
Ch-4 Forest Society and colonialism 2.pdf
lakshayrojroj
 

Recently uploaded (20)

FinalSD_MathematicsGrade7_Session2_Unida.pptx
FinalSD_MathematicsGrade7_Session2_Unida.pptxFinalSD_MathematicsGrade7_Session2_Unida.pptx
FinalSD_MathematicsGrade7_Session2_Unida.pptx
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
 
The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
 
adjectives.ppt for class 1 to 6, grammar
adjectives.ppt for class 1 to 6, grammaradjectives.ppt for class 1 to 6, grammar
adjectives.ppt for class 1 to 6, grammar
 
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT KanpurDiversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
 
Ch-4 Forest Society and colonialism 2.pdf
Ch-4 Forest Society and colonialism 2.pdfCh-4 Forest Society and colonialism 2.pdf
Ch-4 Forest Society and colonialism 2.pdf
 

Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"

  • 1. Prompt Design LLMs with Text Classification and Open Source
  • 2. 1. GPT-4o 2. Multimodal LLMs 3. Vector Databases and Semantic Search 4. What is Text Classification? 5. How is it useful? 6. Traditional Approaches 7. LLMs and Text Classification 8. Open Source LLMs Goals
  • 4. GPT-4o A New Model ● Pricing: GPT-4o is 50% cheaper than GPT-4 Turbo, coming in at $5/M input and $15/M output tokens). ● Rate limits: GPT-4o’s rate limits are 5x higher than GPT-4 Turbo—up to 10 million tokens per minute. ● Speed: GPT-4o is 2x as fast as GPT-4 Turbo. ● Vision: GPT-4o’s vision capabilities perform better than GPT-4 Turbo in evals related to vision capabilities. ● Multilingual: GPT-4o has improved support for non-English languages over GPT-4 Turbo. ● GPT-4o currently has a context window of 128k and has a knowledge cut-off date of October 2023.
  • 5. GPT-4o A New Model ● Released This week ● Purely Multimodal ● Exceptionally fast (low latency) ● Cheaper ● Available via the API and Chat
  • 6. GPT-4o Multimodal “GPT-4o is OpenAI's new flagship model that can reason across audio, vision, and text in real time.” - OpenAI’s Docs
  • 7. GPT-4o Multimodal Text, Audio, and Video are all vectorized by the same model and treated the same way. In other words, a text that describes a beach would be very similar in vector space to an image of a beach.
  • 8. Vector Databases and Semantic Search
  • 9. Representing Texts Digitally Embeddings ● The apple is in the tree. ○ 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] ○ 2-different vector ○ 3-different vector ○ 4-different vector ○ 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] ○ 5-different vector
  • 10. Vector Database What is it? ● It holds vectors in a database as storage. ● Similar vectors are stored closer.
  • 11.
  • 12. Vector Database How do we use a vector database? ● We populate a vector database with by using a machine learning model to vectorize data and send them to the database.
  • 13. Vector Database Why use a vector database?
  • 14. Vector Database Why use a vector database? ● Vector databases allow users to store vector data in a way that allows users to query it and find similarity based on a vector-level similarity, rather than explicit human-defined similarity.
  • 15. Vector Database What is it? ● A vector database holds numerous vectors or embeddings of data. Sometimes, the database will also store the original data alongside these vectors.
  • 18. Vector Database Stacks What is available to us? ● Python, Annoy, Streamlit ○ Cheap, easy to deploy, great for smaller datasets, but requires a little bit of knowledge to build from scratch ○ Best for smaller databases (under 10,000 data) ● Python, txtAI ○ Cheap and easy to use, more resource intensive but easy to deploy ○ Allows for easy interpretability (via highlighting)
  • 21. Text Classification Overview Assign a text to a specific category or categories. Categories == labels.
  • 22. Text Classification Emails "Congratulations! You've won a $1,000 Walmart gift card. Click here to claim your prize." "Limited time offer: Buy one get one free on all items in our store." "Dear customer, your account has been temporarily suspended. Please update your information to restore access."
  • 23. Text Classification Sentiment "I love this product! It works exactly as described." "The product arrived late and was damaged. Very disappointed." "It's okay, not great but not terrible either." "Excellent service and quick delivery. Highly recommend!"
  • 25. Text Classification Binary Classification Classifies text into one of two categories. Spam detection in emails, where emails are classified as either "spam" or "not spam."
  • 26. Text Classification Multiclass Classification Classifies text into one of three or more categories. Sentiment analysis with categories such as "positive," "negative," and "neutral."
  • 27. Text Classification Multilabel Classification Assigns multiple (or single) labels to a single text instance, where each label represents a different category. News categorization where an article can belong to multiple categories such as "politics," "economy," and "health."
  • 28. Text Classification Hierarchical Classification Classifies text into a hierarchy of categories, where categories are structured in a tree-like hierarchy. Document classification in a library where documents are classified into categories like "science," "arts," "technology," with subcategories under each (e.g., "science" can have "physics," "chemistry," "biology").
  • 30. Open Source ML Overview Open source machine learning, like open source software (OSS), is driven by the public. It has several components: open source datasets, open source machine learning models, and open source applications. The best resource: HuggingFace
  • 31. Open Source ML Datasets ● Datasets for training task- specific models ○ NER ○ Text Classification ○ Image Classification ○ Object Detection ● Datasets for training language models ○ Unannotated collections of texts ● Dataset Cards ○ Task ○ Language ○ Biases
  • 32. Open Source ML Models ● Trained Machine Learning Models for specific Tasks ○ NER ○ Text Classification ○ Image Classification ○ Object Detection ○ ASR ○ HTR ○ OCR ● Trained machine learning language models (including LLMs) ● Dataset Cards ○ Task ○ Language ○ Biases
  • 33. Open Source ML Benefits and Limitations ● Benefits ○ Open, meaning they are freely available to use (though sometimes with commercial limitations) ○ Publicly Critiqued ○ Understanding of the Data ● Limitations ○ Closed models are better in many cases (BUT!!! That gap is closing).