SlideShare a Scribd company logo
Prompt Design
LLMs with Text Classification and Open Source
1. GPT-4o
2. Multimodal LLMs
3. Vector Databases and Semantic Search
4. What is Text Classification?
5. How is it useful?
6. Traditional Approaches
7. LLMs and Text Classification
8. Open Source LLMs
Goals
GPT-4o
GPT-4o
A New Model
● Pricing: GPT-4o is 50% cheaper than
GPT-4 Turbo, coming in at $5/M input
and $15/M output tokens).
● Rate limits: GPT-4o’s rate limits are 5x
higher than GPT-4 Turbo—up to 10
million tokens per minute.
● Speed: GPT-4o is 2x as fast as GPT-4
Turbo.
● Vision: GPT-4o’s vision capabilities
perform better than GPT-4 Turbo in
evals related to vision capabilities.
● Multilingual: GPT-4o has improved
support for non-English languages over
GPT-4 Turbo.
● GPT-4o currently has a context window
of 128k and has a knowledge cut-off
date of October 2023.
GPT-4o
A New Model
● Released This week
● Purely Multimodal
● Exceptionally fast (low latency)
● Cheaper
● Available via the API and Chat
GPT-4o
Multimodal
“GPT-4o is OpenAI's new flagship
model that can reason across
audio, vision, and text in real
time.” - OpenAI’s Docs
GPT-4o
Multimodal
Text, Audio, and Video are all
vectorized by the same model and
treated the same way. In other
words, a text that describes a
beach would be very similar in
vector space to an image of a
beach.
Vector Databases and Semantic Search
Representing
Texts
Digitally
Embeddings
● The apple is in the tree.
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○ 2-different vector
○ 3-different vector
○ 4-different vector
○ 1-[0.01234, -0.23456, 0.87654,
0.45678, -0.56123, 0.65432,
0.12345, -0.77123, 0.08456,
0.34567, ...]
○ 5-different vector
Vector
Database
What is it?
● It holds vectors in a database
as storage.
● Similar vectors are stored
closer.
Vector
Database
How do we use a vector
database?
● We populate a vector database
with by using a machine
learning model to vectorize
data and send them to the
database.
Vector
Database
Why use a vector database?
Vector
Database
Why use a vector database?
● Vector databases allow users
to store vector data in a way
that allows users to query it
and find similarity based on a
vector-level similarity, rather
than explicit human-defined
similarity.
Vector
Database
What is it?
● A vector database holds
numerous vectors or
embeddings of data.
Sometimes, the database will
also store the original data
alongside these vectors.
Vector Database Stacks
Vector Database Stacks
Vector Database
Stacks
What is available to us?
● Python, Annoy, Streamlit
○ Cheap, easy to deploy, great for
smaller datasets, but requires a
little bit of knowledge to build from
scratch
○ Best for smaller databases (under
10,000 data)
● Python, txtAI
○ Cheap and easy to use, more
resource intensive but easy to
deploy
○ Allows for easy interpretability (via
highlighting)
Multi-Modal
How does it work?
Text Classification
Text
Classification
Overview
Assign a text to a specific category
or categories.
Categories == labels.
Text
Classification
Emails
"Congratulations! You've won a
$1,000 Walmart gift card. Click here
to claim your prize."
"Limited time offer: Buy one get one
free on all items in our store."
"Dear customer, your account has
been temporarily suspended.
Please update your information to
restore access."
Text
Classification
Sentiment
"I love this product! It works exactly
as described."
"The product arrived late and was
damaged. Very disappointed."
"It's okay, not great but not terrible
either."
"Excellent service and quick
delivery. Highly recommend!"
Text
Classification
Types
Binary Classification
Multiclass Classification
Multilabel Classification
Hierarchical Classification
Text
Classification
Binary Classification
Classifies text into one of two
categories.
Spam detection in emails, where
emails are classified as either
"spam" or "not spam."
Text
Classification
Multiclass Classification
Classifies text into one of three or
more categories.
Sentiment analysis with categories
such as "positive," "negative," and
"neutral."
Text
Classification
Multilabel Classification
Assigns multiple (or single) labels to
a single text instance, where each
label represents a different
category.
News categorization where an
article can belong to multiple
categories such as "politics,"
"economy," and "health."
Text
Classification
Hierarchical Classification
Classifies text into a hierarchy of
categories, where categories are
structured in a tree-like hierarchy.
Document classification in a library
where documents are classified into
categories like "science," "arts,"
"technology," with subcategories
under each (e.g., "science" can
have "physics," "chemistry,"
"biology").
Open Source Machine Learning
Open Source
ML
Overview
Open source machine learning, like
open source software (OSS), is
driven by the public. It has several
components: open source datasets,
open source machine learning
models, and open source
applications.
The best resource: HuggingFace
Open Source
ML
Datasets
● Datasets for training task-
specific models
○ NER
○ Text Classification
○ Image Classification
○ Object Detection
● Datasets for training language
models
○ Unannotated collections of texts
● Dataset Cards
○ Task
○ Language
○ Biases
Open Source
ML
Models
● Trained Machine Learning
Models for specific Tasks
○ NER
○ Text Classification
○ Image Classification
○ Object Detection
○ ASR
○ HTR
○ OCR
● Trained machine learning
language models (including
LLMs)
● Dataset Cards
○ Task
○ Language
○ Biases
Open Source
ML
Benefits and Limitations
● Benefits
○ Open, meaning they are freely
available to use (though
sometimes with commercial
limitations)
○ Publicly Critiqued
○ Understanding of the Data
● Limitations
○ Closed models are better in many
cases (BUT!!! That gap is closing).

More Related Content

Similar to Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"

3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
RushikeshChikane2
 
Python Programming
Python ProgrammingPython Programming
Python Programming
SheikAllavudeenN
 
Data Oriented Programming in Java.pdf
Data Oriented Programming in Java.pdfData Oriented Programming in Java.pdf
Data Oriented Programming in Java.pdf
AnjaliYadav764696
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
smile790243
 
Modern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryModern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discovery
All Things Open
 
2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx
Mohamed Essam
 
Topic-oriented writing at McAfee
Topic-oriented writing at McAfeeTopic-oriented writing at McAfee
Topic-oriented writing at McAfeeJohn Sarr
 
Learn advanced java programming
Learn advanced java programmingLearn advanced java programming
Learn advanced java programming
TOPS Technologies
 
Slawek Korea
Slawek KoreaSlawek Korea
Slawek Korea
Slawek
 
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...cseij
 
Longwell final ppt
Longwell final pptLongwell final ppt
Longwell final ppt
Kuldeep Singh
 
Fake news detection
Fake news detection Fake news detection
Fake news detection
shalushamil
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
Scott Abel
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Web crawling and text classification
Web crawling and text classificationWeb crawling and text classification
Web crawling and text classification
Shubham Patil
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
petrknoth
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
David Nzoputa Ofili
 
Content Repositories vs Knowledge Bases
Content Repositories vs Knowledge BasesContent Repositories vs Knowledge Bases
Content Repositories vs Knowledge Basesgokcebanu
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
National Information Standards Organization (NISO)
 

Similar to Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source" (20)

3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
 
Python Programming
Python ProgrammingPython Programming
Python Programming
 
Data Oriented Programming in Java.pdf
Data Oriented Programming in Java.pdfData Oriented Programming in Java.pdf
Data Oriented Programming in Java.pdf
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
Modern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryModern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discovery
 
2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx2.Data_Strucures_and_modules.pptx
2.Data_Strucures_and_modules.pptx
 
Topic-oriented writing at McAfee
Topic-oriented writing at McAfeeTopic-oriented writing at McAfee
Topic-oriented writing at McAfee
 
Learn advanced java programming
Learn advanced java programmingLearn advanced java programming
Learn advanced java programming
 
Slawek Korea
Slawek KoreaSlawek Korea
Slawek Korea
 
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
AN ELABORATION OF TEXT CATEGORIZATION AND AUTOMATIC TEXT CLASSIFICATION THROU...
 
Longwell final ppt
Longwell final pptLongwell final ppt
Longwell final ppt
 
Fake news detection
Fake news detection Fake news detection
Fake news detection
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Web crawling and text classification
Web crawling and text classificationWeb crawling and text classification
Web crawling and text classification
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
 
Content Repositories vs Knowledge Bases
Content Repositories vs Knowledge BasesContent Repositories vs Knowledge Bases
Content Repositories vs Knowledge Bases
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 

More from National Information Standards Organization (NISO)

Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
National Information Standards Organization (NISO)
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
National Information Standards Organization (NISO)
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
National Information Standards Organization (NISO)
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
National Information Standards Organization (NISO)
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
National Information Standards Organization (NISO)
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
National Information Standards Organization (NISO)
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
National Information Standards Organization (NISO)
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
National Information Standards Organization (NISO)
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
National Information Standards Organization (NISO)
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
National Information Standards Organization (NISO)
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
National Information Standards Organization (NISO)
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
National Information Standards Organization (NISO)
 

More from National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 

Recently uploaded

Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 

Recently uploaded (20)

Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 

Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"

  • 1. Prompt Design LLMs with Text Classification and Open Source
  • 2. 1. GPT-4o 2. Multimodal LLMs 3. Vector Databases and Semantic Search 4. What is Text Classification? 5. How is it useful? 6. Traditional Approaches 7. LLMs and Text Classification 8. Open Source LLMs Goals
  • 4. GPT-4o A New Model ● Pricing: GPT-4o is 50% cheaper than GPT-4 Turbo, coming in at $5/M input and $15/M output tokens). ● Rate limits: GPT-4o’s rate limits are 5x higher than GPT-4 Turbo—up to 10 million tokens per minute. ● Speed: GPT-4o is 2x as fast as GPT-4 Turbo. ● Vision: GPT-4o’s vision capabilities perform better than GPT-4 Turbo in evals related to vision capabilities. ● Multilingual: GPT-4o has improved support for non-English languages over GPT-4 Turbo. ● GPT-4o currently has a context window of 128k and has a knowledge cut-off date of October 2023.
  • 5. GPT-4o A New Model ● Released This week ● Purely Multimodal ● Exceptionally fast (low latency) ● Cheaper ● Available via the API and Chat
  • 6. GPT-4o Multimodal “GPT-4o is OpenAI's new flagship model that can reason across audio, vision, and text in real time.” - OpenAI’s Docs
  • 7. GPT-4o Multimodal Text, Audio, and Video are all vectorized by the same model and treated the same way. In other words, a text that describes a beach would be very similar in vector space to an image of a beach.
  • 8. Vector Databases and Semantic Search
  • 9. Representing Texts Digitally Embeddings ● The apple is in the tree. ○ 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] ○ 2-different vector ○ 3-different vector ○ 4-different vector ○ 1-[0.01234, -0.23456, 0.87654, 0.45678, -0.56123, 0.65432, 0.12345, -0.77123, 0.08456, 0.34567, ...] ○ 5-different vector
  • 10. Vector Database What is it? ● It holds vectors in a database as storage. ● Similar vectors are stored closer.
  • 11.
  • 12. Vector Database How do we use a vector database? ● We populate a vector database with by using a machine learning model to vectorize data and send them to the database.
  • 13. Vector Database Why use a vector database?
  • 14. Vector Database Why use a vector database? ● Vector databases allow users to store vector data in a way that allows users to query it and find similarity based on a vector-level similarity, rather than explicit human-defined similarity.
  • 15. Vector Database What is it? ● A vector database holds numerous vectors or embeddings of data. Sometimes, the database will also store the original data alongside these vectors.
  • 18. Vector Database Stacks What is available to us? ● Python, Annoy, Streamlit ○ Cheap, easy to deploy, great for smaller datasets, but requires a little bit of knowledge to build from scratch ○ Best for smaller databases (under 10,000 data) ● Python, txtAI ○ Cheap and easy to use, more resource intensive but easy to deploy ○ Allows for easy interpretability (via highlighting)
  • 21. Text Classification Overview Assign a text to a specific category or categories. Categories == labels.
  • 22. Text Classification Emails "Congratulations! You've won a $1,000 Walmart gift card. Click here to claim your prize." "Limited time offer: Buy one get one free on all items in our store." "Dear customer, your account has been temporarily suspended. Please update your information to restore access."
  • 23. Text Classification Sentiment "I love this product! It works exactly as described." "The product arrived late and was damaged. Very disappointed." "It's okay, not great but not terrible either." "Excellent service and quick delivery. Highly recommend!"
  • 25. Text Classification Binary Classification Classifies text into one of two categories. Spam detection in emails, where emails are classified as either "spam" or "not spam."
  • 26. Text Classification Multiclass Classification Classifies text into one of three or more categories. Sentiment analysis with categories such as "positive," "negative," and "neutral."
  • 27. Text Classification Multilabel Classification Assigns multiple (or single) labels to a single text instance, where each label represents a different category. News categorization where an article can belong to multiple categories such as "politics," "economy," and "health."
  • 28. Text Classification Hierarchical Classification Classifies text into a hierarchy of categories, where categories are structured in a tree-like hierarchy. Document classification in a library where documents are classified into categories like "science," "arts," "technology," with subcategories under each (e.g., "science" can have "physics," "chemistry," "biology").
  • 30. Open Source ML Overview Open source machine learning, like open source software (OSS), is driven by the public. It has several components: open source datasets, open source machine learning models, and open source applications. The best resource: HuggingFace
  • 31. Open Source ML Datasets ● Datasets for training task- specific models ○ NER ○ Text Classification ○ Image Classification ○ Object Detection ● Datasets for training language models ○ Unannotated collections of texts ● Dataset Cards ○ Task ○ Language ○ Biases
  • 32. Open Source ML Models ● Trained Machine Learning Models for specific Tasks ○ NER ○ Text Classification ○ Image Classification ○ Object Detection ○ ASR ○ HTR ○ OCR ● Trained machine learning language models (including LLMs) ● Dataset Cards ○ Task ○ Language ○ Biases
  • 33. Open Source ML Benefits and Limitations ● Benefits ○ Open, meaning they are freely available to use (though sometimes with commercial limitations) ○ Publicly Critiqued ○ Understanding of the Data ● Limitations ○ Closed models are better in many cases (BUT!!! That gap is closing).