SlideShare a Scribd company logo
Alessandro Benedetti, Director @ Sease
21/02/2023
London Information Retrieval Meetup
How ChatGPT works: an Information
Retrieval Perspective
‣ Born in Tarquinia(ancient Etruscan city in Italy)
‣ R&D Software Engineer
‣ Director
‣ Master degree in Computer Science
‣ PC member for ECIR, SIGIR and Desires
‣ Apache Lucene/Solr PMC member/committer
‣ Elasticsearch expert
‣ Semantic, NLP, Machine Learning
technologies passionate
‣ Beach Volleyball player and Snowboarder
Who I am
Alessandro Benedetti
● Headquarter in London/distributed
● Open Source Enthusiasts
● Apache Lucene/Solr/Es experts
● Community Contributors
● Active Researchers
● Hot Trends :
Neural Search,
Learning To Rank,
Document Similarity,
Search Quality Evaluation,
Relevance Tuning
www.sease.io
Search Services
Sease
● Website: www.sease.io
● Blog: https://sease.io/blog
● Github: https://github.com/SeaseLtd
● Twitter: https://twitter.com/SeaseLtd
T The AI techniques in ChatGPT
Supervised Fine Tuning (SFT) Model
Reward Model
Proximal Policy Optimisation (PPO)
What’s the impact on Information Retrieval?
Overview
ChatGPT: what is it?
● Generative Pre-training Transformer
● product capable of generating text in a wide range of styles and
for different purposes responding to a prompt
● (based on) generative AI Large Language Models
● sibling model of InstructGPT
most of our explanations come from
here
ChatGPT: main tech behind it
From https://openai.com/blog/chatgpt/ :
“We trained this model using Reinforcement Learning from Human
Feedback (RLHF), using the same methods as InstructGPT, but with
slight differences in the data collection setup. ”
● Supervised Learning
● Deep Learning
● Pre-trained Large Language Models
● (Deep) Reinforcement Learning from Human Feedback
(RLHF)
AI, Machine learning and Deep Learning
https://sease.io/2021/07/artificial-intelligence-applied-to-search-introduction.html
Pre-trained Large Language Models
● Transformers
● Next-token-prediction and masked-
language-modeling
● estimate the likelihood of each possible
word (in its vocabulary) given the
previous sequence
● learn the statistical structure of
language
● pre-trained on huge quantities of text
https://towardsdatascience.com/how-chatgpt-works-the-models-behind-the-bot-1ce5fca96286
Deep Reinforcement Learning
● Input status -> vector
● Policy network: A probability for
the actions is estimated by a policy
(neural network)
● An action is sampled from the
probability distribution
● the action is performed on the real
system
● the reward is observed
● Policy Gradients: the reward is
back-propagated to the policy(to
affect next probability estimations)
http://karpathy.github.io/2016/05/31/rl/
Reinforcement Learning from Human Feedback
1. Supervised fine-tuning step
a pre-trained language model is fine-tuned on a relatively small human-curated dataset, to
learn a supervised policy (the SFT model) that generates text from a prompt
2. Reward estimation step
a pre-trained language model is fine-tuned on a relatively large human-curated dataset, to
learn a reward function that generates a rating from a prompt and a response
3. Proximal Policy Optimization (PPO) step: the reward model is used to fine-tune the SFT
model. The outcome of this step is the final model (that can be iteratively improved).
● 2-3 are iteratively repeated
Supervised Fine-Tuning (SFT) Model
● training sample <prompt, text> ->
human-curated
○ directly from Human labellers
○ from GPT3 clients
○ 10-15.000 ‘ish samples
● starting from GPT-3.5 series.
○ Presumably the baseline model used
is the latest one text-davinci-003, a
GPT-3 model which was fine-tuned
mostly on programming code.
● expensive -> scale this up is not a
solution to improve the model
Reward model
● Scope: fine-tune a model that estimates a score for <prompt, text> pair
● A list of prompts is selected and the SFT model generates multiple
outputs (4…9) for each prompt.
● Training Set: Humans rank the outputs. The size of this dataset is
approximately 10 times bigger than the dataset used for the SFT model.
● The fine-tuned model takes as input a few of the SFT model outputs and
ranks them in order of preference. (Learning to Rank, sounds familiar?)
● easier for humans to rate, rather than write text
● the reward function can be further updated with users’ feedback
Fine-tuning the SFT model via Proximal Policy Optimization (PPO)
● PPO is a reinforcement learning algorithm.
● "on-policy"
PPO is continuously adapting the current policy
according to the actions that the agent is
taking(sampling) and the rewards it is receiving
● PPO uses a trust region optimization method -> it
constrains the change in the policy to be within a
certain distance of the previous policy in order to
ensure stability
Fine-tuning the SFT model via Proximal Policy Optimization (PPO)
● PPO policy is initialized from the SFT model
● value function is initialized from the reward model.
● The environment presents a random prompt and expects a
response
● Given the prompt and response, it produces a reward
● policy get updated and the episode ends.
● During the fine-tuning many episodes happen
Proximal Policy Optimisation 2
● PPO2 is simply an updated version of the algorithm
● optimized for GPU and better supports parallel training.
● It has a number of other differences (e.g., advantages are normalized
automatically and value functions are clipped as well), but uses the same
mathematical foundations
● OpenAI implementation -> simply remember that PPO is obsolete and
you should use PPO2.
https://openai.com/blog/openai-baselines-ppo/
What’s the impact on Information Retrieval?
● start from one of the fine-tuned models available online
● build datasets from your own data to additionally fine-tune them
○ e.g.
○ from a query and top-k documents, write a snippet summarizing them
○ fine-tune a reward model, to just do re-ranking of results
○ integrate it out of the box to just add on top of your results
○ … be creative!
References
Reinforcement Learning
http://karpathy.github.io/2016/05/31/rl/
https://towardsdatascience.com/proximal-policy-optimization-ppo-explained-abed1952457b
Short Blogs
https://openai.com/blog/chatgpt/
https://www.assemblyai.com/blog/how-chatgpt-actually-works/
https://towardsdatascience.com/how-chatgpt-works-the-models-behind-the-bot-1ce5fca96286
Detailed Resources
https://gist.github.com/veekaybee/6f8885e9906aa9c5408ebe5c7e870698
InstructGPT:
https://openai.com/blog/instruction-following/
https://arxiv.org/pdf/2203.02155.pdf
THANK YOU!
@seaseltd @sease-
ltd
@seaseltd @sease_ltd

More Related Content

What's hot

Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
David Rostcheck
 
Transformers - Part 1
Transformers - Part 1Transformers - Part 1
Transformers - Part 1
Akshika Wijesundara
 
200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx
andre241421
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
annusharma26
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
Raven Jiang
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
Giacomo Frisoni
 
Blueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & LearnBlueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & Learn
gnakan
 
Revolutionary-ChatGPT
Revolutionary-ChatGPTRevolutionary-ChatGPT
Revolutionary-ChatGPT
9 series
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
ChatGPT for Academic
ChatGPT for AcademicChatGPT for Academic
ChatGPT for Academic
Andry Alamsyah
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
AnastasiaSteele10
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
Konstantin Savenkov
 
CHATGPT.pptx
CHATGPT.pptxCHATGPT.pptx
CHATGPT.pptx
SajedRahman2
 
Uses of AI text bot.pdf
Uses of AI text bot.pdfUses of AI text bot.pdf
Uses of AI text bot.pdf
SreeNivas983124
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
Numenta
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
RahulKumar854607
 
ChatGPT_Prompts.pptx
ChatGPT_Prompts.pptxChatGPT_Prompts.pptx
ChatGPT_Prompts.pptx
Chakrit Phain
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
Keymate.AI
 
ChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for Business
Dion Hinchcliffe
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
Sonal Tiwari
 

What's hot (20)

Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Transformers - Part 1
Transformers - Part 1Transformers - Part 1
Transformers - Part 1
 
200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx200109-Open AI Chat GPT-4-3.pptx
200109-Open AI Chat GPT-4-3.pptx
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 
Blueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & LearnBlueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & Learn
 
Revolutionary-ChatGPT
Revolutionary-ChatGPTRevolutionary-ChatGPT
Revolutionary-ChatGPT
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
ChatGPT for Academic
ChatGPT for AcademicChatGPT for Academic
ChatGPT for Academic
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
 
CHATGPT.pptx
CHATGPT.pptxCHATGPT.pptx
CHATGPT.pptx
 
Uses of AI text bot.pdf
Uses of AI text bot.pdfUses of AI text bot.pdf
Uses of AI text bot.pdf
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
ChatGPT_Prompts.pptx
ChatGPT_Prompts.pptxChatGPT_Prompts.pptx
ChatGPT_Prompts.pptx
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
 
ChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for Business
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
 

Similar to How does ChatGPT work: an Information Retrieval perspective

Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat
 
Rasa Open Source - What's next?
Rasa Open Source - What's next?Rasa Open Source - What's next?
Rasa Open Source - What's next?
Rasa Technologies
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Online learning &amp; adaptive game playing
Online learning &amp; adaptive game playingOnline learning &amp; adaptive game playing
Online learning &amp; adaptive game playing
Saeid Ghafouri
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
Kadambini Indurkar
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Dr. Cornelius Ludmann
 
Cmpe295 b finalppt-signed
Cmpe295 b finalppt-signedCmpe295 b finalppt-signed
Cmpe295 b finalppt-signed
Purva Yadkikar
 
Prototype model
Prototype modelPrototype model
Prototype modelshuisharma
 
Summarization and opinion detection in product reviews
Summarization and opinion detection in product reviewsSummarization and opinion detection in product reviews
Summarization and opinion detection in product reviewspapanaboinasuman
 
Summarization and opinion detection in product reviews
Summarization and opinion detection in product reviewsSummarization and opinion detection in product reviews
Summarization and opinion detection in product reviewspapanaboinasuman
 
online movie ticket booking system
online movie ticket booking systemonline movie ticket booking system
online movie ticket booking system
Sikandar Pandit
 
Teacher training material
Teacher training materialTeacher training material
Teacher training material
Vikram Parmar
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
Po-Chuan Chen
 
IRJET- Hybrid Recommendation System for Movies
IRJET-  	  Hybrid Recommendation System for MoviesIRJET-  	  Hybrid Recommendation System for Movies
IRJET- Hybrid Recommendation System for Movies
IRJET Journal
 
Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013
Preferred Networks
 
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
Jessica Tams
 
InstructGPT: Follow instructions with human feedback
InstructGPT: Follow instructions with human feedbackInstructGPT: Follow instructions with human feedback
InstructGPT: Follow instructions with human feedback
Yan Xu
 
Prototype Model
Prototype ModelPrototype Model
Prototype Model
RhealynAcejo
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model Analysis
Vivek Raja P S
 

Similar to How does ChatGPT work: an Information Retrieval perspective (20)

Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
Rasa Open Source - What's next?
Rasa Open Source - What's next?Rasa Open Source - What's next?
Rasa Open Source - What's next?
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Online learning &amp; adaptive game playing
Online learning &amp; adaptive game playingOnline learning &amp; adaptive game playing
Online learning &amp; adaptive game playing
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
 
Cmpe295 b finalppt-signed
Cmpe295 b finalppt-signedCmpe295 b finalppt-signed
Cmpe295 b finalppt-signed
 
Prototype model
Prototype modelPrototype model
Prototype model
 
Summarization and opinion detection in product reviews
Summarization and opinion detection in product reviewsSummarization and opinion detection in product reviews
Summarization and opinion detection in product reviews
 
Summarization and opinion detection in product reviews
Summarization and opinion detection in product reviewsSummarization and opinion detection in product reviews
Summarization and opinion detection in product reviews
 
online movie ticket booking system
online movie ticket booking systemonline movie ticket booking system
online movie ticket booking system
 
Presentation2
Presentation2Presentation2
Presentation2
 
Teacher training material
Teacher training materialTeacher training material
Teacher training material
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
 
IRJET- Hybrid Recommendation System for Movies
IRJET-  	  Hybrid Recommendation System for MoviesIRJET-  	  Hybrid Recommendation System for Movies
IRJET- Hybrid Recommendation System for Movies
 
Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013
 
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
 
InstructGPT: Follow instructions with human feedback
InstructGPT: Follow instructions with human feedbackInstructGPT: Follow instructions with human feedback
InstructGPT: Follow instructions with human feedback
 
Prototype Model
Prototype ModelPrototype Model
Prototype Model
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model Analysis
 

More from Sease

Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
Sease
 
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
Sease
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
Sease
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache Lucene
Sease
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Sease
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
Sease
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
Sease
 
Large Scale Indexing
Large Scale IndexingLarge Scale Indexing
Large Scale Indexing
Sease
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
Sease
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Sease
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Sease
 
How to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptxHow to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptx
Sease
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr Interleaving
Sease
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Sease
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
Sease
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache Lucene
Sease
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
Sease
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information Retrieval
Sease
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Sease
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
Sease
 

More from Sease (20)

Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
 
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache Lucene
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
Large Scale Indexing
Large Scale IndexingLarge Scale Indexing
Large Scale Indexing
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
How to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptxHow to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptx
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr Interleaving
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache Lucene
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information Retrieval
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
 

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

How does ChatGPT work: an Information Retrieval perspective

  • 1. Alessandro Benedetti, Director @ Sease 21/02/2023 London Information Retrieval Meetup How ChatGPT works: an Information Retrieval Perspective
  • 2. ‣ Born in Tarquinia(ancient Etruscan city in Italy) ‣ R&D Software Engineer ‣ Director ‣ Master degree in Computer Science ‣ PC member for ECIR, SIGIR and Desires ‣ Apache Lucene/Solr PMC member/committer ‣ Elasticsearch expert ‣ Semantic, NLP, Machine Learning technologies passionate ‣ Beach Volleyball player and Snowboarder Who I am Alessandro Benedetti
  • 3. ● Headquarter in London/distributed ● Open Source Enthusiasts ● Apache Lucene/Solr/Es experts ● Community Contributors ● Active Researchers ● Hot Trends : Neural Search, Learning To Rank, Document Similarity, Search Quality Evaluation, Relevance Tuning www.sease.io Search Services
  • 4. Sease ● Website: www.sease.io ● Blog: https://sease.io/blog ● Github: https://github.com/SeaseLtd ● Twitter: https://twitter.com/SeaseLtd
  • 5. T The AI techniques in ChatGPT Supervised Fine Tuning (SFT) Model Reward Model Proximal Policy Optimisation (PPO) What’s the impact on Information Retrieval? Overview
  • 6. ChatGPT: what is it? ● Generative Pre-training Transformer ● product capable of generating text in a wide range of styles and for different purposes responding to a prompt ● (based on) generative AI Large Language Models ● sibling model of InstructGPT most of our explanations come from here
  • 7. ChatGPT: main tech behind it From https://openai.com/blog/chatgpt/ : “We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. ” ● Supervised Learning ● Deep Learning ● Pre-trained Large Language Models ● (Deep) Reinforcement Learning from Human Feedback (RLHF)
  • 8. AI, Machine learning and Deep Learning https://sease.io/2021/07/artificial-intelligence-applied-to-search-introduction.html
  • 9. Pre-trained Large Language Models ● Transformers ● Next-token-prediction and masked- language-modeling ● estimate the likelihood of each possible word (in its vocabulary) given the previous sequence ● learn the statistical structure of language ● pre-trained on huge quantities of text https://towardsdatascience.com/how-chatgpt-works-the-models-behind-the-bot-1ce5fca96286
  • 10. Deep Reinforcement Learning ● Input status -> vector ● Policy network: A probability for the actions is estimated by a policy (neural network) ● An action is sampled from the probability distribution ● the action is performed on the real system ● the reward is observed ● Policy Gradients: the reward is back-propagated to the policy(to affect next probability estimations) http://karpathy.github.io/2016/05/31/rl/
  • 11. Reinforcement Learning from Human Feedback 1. Supervised fine-tuning step a pre-trained language model is fine-tuned on a relatively small human-curated dataset, to learn a supervised policy (the SFT model) that generates text from a prompt 2. Reward estimation step a pre-trained language model is fine-tuned on a relatively large human-curated dataset, to learn a reward function that generates a rating from a prompt and a response 3. Proximal Policy Optimization (PPO) step: the reward model is used to fine-tune the SFT model. The outcome of this step is the final model (that can be iteratively improved). ● 2-3 are iteratively repeated
  • 12. Supervised Fine-Tuning (SFT) Model ● training sample <prompt, text> -> human-curated ○ directly from Human labellers ○ from GPT3 clients ○ 10-15.000 ‘ish samples ● starting from GPT-3.5 series. ○ Presumably the baseline model used is the latest one text-davinci-003, a GPT-3 model which was fine-tuned mostly on programming code. ● expensive -> scale this up is not a solution to improve the model
  • 13. Reward model ● Scope: fine-tune a model that estimates a score for <prompt, text> pair ● A list of prompts is selected and the SFT model generates multiple outputs (4…9) for each prompt. ● Training Set: Humans rank the outputs. The size of this dataset is approximately 10 times bigger than the dataset used for the SFT model. ● The fine-tuned model takes as input a few of the SFT model outputs and ranks them in order of preference. (Learning to Rank, sounds familiar?) ● easier for humans to rate, rather than write text ● the reward function can be further updated with users’ feedback
  • 14. Fine-tuning the SFT model via Proximal Policy Optimization (PPO) ● PPO is a reinforcement learning algorithm. ● "on-policy" PPO is continuously adapting the current policy according to the actions that the agent is taking(sampling) and the rewards it is receiving ● PPO uses a trust region optimization method -> it constrains the change in the policy to be within a certain distance of the previous policy in order to ensure stability
  • 15. Fine-tuning the SFT model via Proximal Policy Optimization (PPO) ● PPO policy is initialized from the SFT model ● value function is initialized from the reward model. ● The environment presents a random prompt and expects a response ● Given the prompt and response, it produces a reward ● policy get updated and the episode ends. ● During the fine-tuning many episodes happen
  • 16. Proximal Policy Optimisation 2 ● PPO2 is simply an updated version of the algorithm ● optimized for GPU and better supports parallel training. ● It has a number of other differences (e.g., advantages are normalized automatically and value functions are clipped as well), but uses the same mathematical foundations ● OpenAI implementation -> simply remember that PPO is obsolete and you should use PPO2. https://openai.com/blog/openai-baselines-ppo/
  • 17. What’s the impact on Information Retrieval? ● start from one of the fine-tuned models available online ● build datasets from your own data to additionally fine-tune them ○ e.g. ○ from a query and top-k documents, write a snippet summarizing them ○ fine-tune a reward model, to just do re-ranking of results ○ integrate it out of the box to just add on top of your results ○ … be creative!