SlideShare a Scribd company logo
Webinar on ChatGPT
Abhilash Majumder (Intel SCG)
ChatGPT
• Chat GPT model is trained using Reinforcement Learning from
Human Feedback (RLHF),
• ChatGPT uses the same methods as InstructGPT, but with
slight differences in the data collection setup.
• ChatGPT is trained on an initial model using supervised fine-
tuning: human AI trainers provided conversations in which they
played both sides—the user and an AI assistant.
• For supervised fine-tuning ChatGPT leverages a reward
function based on PPO on policy algorithm to achieve SOTA
generative sequences
ChatGPT
ChatGPT- GPT3
• GPT-3 is an autoregressive
transformer model with 175
billion parameters. It uses
the same architecture/model
as GPT-2, including the
modified initialization,
pre-normalization, and
reversible tokenization,
with the exception that GPT-
3 uses alternating dense and
locally banded sparse
attention patterns in the
layers of the transformer,
similar to the Sparse
Transformer.
ChatGPT- PPO(A2C)
• There are two primary variants of PPO: PPO-
Penalty and PPO-Clip.
• PPO-Penalty approximately solves a KL-
constrained update like TRPO, but penalizes
the KL-divergence in the objective function
instead of making it a hard constraint, and
automatically adjusts the penalty coefficient
over the course of training so that it’s
scaled appropriately.
• PPO-Clip doesn’t have a KL-divergence term in
the objective and doesn’t have a constraint
at all. Instead relies on specialized
clipping in the objective function to remove
incentives for the new policy to get far from
the old policy.
• PPO is an on-policy algorithm.
• PPO can be used for environments with either
discrete or continuous action spaces.
•
ChatGPT
• In case of GPT, PPO
infusion is semi
supervised. This implies
that a reward function is
moderated by human
supervision based on
previous results. The
initial LLM
(GPT)generative sequences
are ranked based on the
cumulative rewards based
on human supervised PPO.
ChatGPT
• Both models are given a
prompt and get a response.
The tuned LLM responses
are scored with the reward
function and which is then
used to update the
parameters of the fine-
tuned LLM to maximize the
reward function score (PPO
rewards)
•
ChatGPT
• But we also don't want
it to deviate too much
from the initial
response, which is what
the KL penalty is used
for. Otherwise the
optimization might
result in an LLM that
produces gibberish but
maximizes the reward
model score.
ChatGPT
ChatGPT
• OpenAI Blog: https://openai.com/blog/chatgpt/
• InstructGPT: https://t.co/2VXhz0kK1o
• Minimalist Repository (in progress) :
https://github.com/abhilash1910/Minimalist-ChatGPT
• Other Repositories in RL/LLM :
https://github.com/abhilash1910/
ChatGPT
• Twitter: https://twitter.com/abhilash1396
• Github: https://github.com/abhilash1910/
• Linkedin: https://www.linkedin.com/in/abhilash-majumder-
1aa7b9138/

More Related Content

What's hot

Revolutionary-ChatGPT
Revolutionary-ChatGPTRevolutionary-ChatGPT
Revolutionary-ChatGPT
9 series
 
CHATGPT.pptx
CHATGPT.pptxCHATGPT.pptx
CHATGPT.pptx
SajedRahman2
 
Jawad's presentation on GPT.pptx
Jawad's presentation on GPT.pptxJawad's presentation on GPT.pptx
Jawad's presentation on GPT.pptx
JawadNadeem3
 
ChatGPT Use- Cases
ChatGPT Use- Cases ChatGPT Use- Cases
ChatGPT Use- Cases
Bluechip Technologies
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
annusharma26
 
Everything to know about ChatGPT
Everything to know about ChatGPTEverything to know about ChatGPT
Everything to know about ChatGPT
Knoldus Inc.
 
ChatGPT Evaluation for NLP
ChatGPT Evaluation for NLPChatGPT Evaluation for NLP
ChatGPT Evaluation for NLP
XiachongFeng
 
Uses of AI text bot.pdf
Uses of AI text bot.pdfUses of AI text bot.pdf
Uses of AI text bot.pdf
SreeNivas983124
 
ChatGPT SEO Guide 2023
ChatGPT SEO Guide 2023ChatGPT SEO Guide 2023
ChatGPT SEO Guide 2023
Web Trainings Academy
 
OpenAI Chatgpt.pptx
OpenAI Chatgpt.pptxOpenAI Chatgpt.pptx
OpenAI Chatgpt.pptx
Nawroz University
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
Steven Van Vaerenbergh
 
ChatGPT ChatBot
ChatGPT ChatBotChatGPT ChatBot
ChatGPT ChatBot
LinconMondal
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
Damian T. Gordon
 
What Are the Problems Associated with ChatGPT?
What Are the Problems Associated with ChatGPT?What Are the Problems Associated with ChatGPT?
What Are the Problems Associated with ChatGPT?
Windzoon Technologies
 
intro chatGPT workshop.pdf
intro chatGPT workshop.pdfintro chatGPT workshop.pdf
intro chatGPT workshop.pdf
peterpur
 
ChatGPT.pdf
ChatGPT.pdfChatGPT.pdf
ChatGPT.pdf
dhatura
 
5 BENIFITES OF CHAT GPT.pptx
5 BENIFITES OF CHAT GPT.pptx5 BENIFITES OF CHAT GPT.pptx
5 BENIFITES OF CHAT GPT.pptx
shailesh sangle
 
ChatGPT ppt.pptx
ChatGPT  ppt.pptxChatGPT  ppt.pptx
ChatGPT ppt.pptx
YuvrajS9
 
ChatGPT Training Session
ChatGPT Training SessionChatGPT Training Session
ChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for Business
Dion Hinchcliffe
 

What's hot (20)

Revolutionary-ChatGPT
Revolutionary-ChatGPTRevolutionary-ChatGPT
Revolutionary-ChatGPT
 
CHATGPT.pptx
CHATGPT.pptxCHATGPT.pptx
CHATGPT.pptx
 
Jawad's presentation on GPT.pptx
Jawad's presentation on GPT.pptxJawad's presentation on GPT.pptx
Jawad's presentation on GPT.pptx
 
ChatGPT Use- Cases
ChatGPT Use- Cases ChatGPT Use- Cases
ChatGPT Use- Cases
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
 
Everything to know about ChatGPT
Everything to know about ChatGPTEverything to know about ChatGPT
Everything to know about ChatGPT
 
ChatGPT Evaluation for NLP
ChatGPT Evaluation for NLPChatGPT Evaluation for NLP
ChatGPT Evaluation for NLP
 
Uses of AI text bot.pdf
Uses of AI text bot.pdfUses of AI text bot.pdf
Uses of AI text bot.pdf
 
ChatGPT SEO Guide 2023
ChatGPT SEO Guide 2023ChatGPT SEO Guide 2023
ChatGPT SEO Guide 2023
 
OpenAI Chatgpt.pptx
OpenAI Chatgpt.pptxOpenAI Chatgpt.pptx
OpenAI Chatgpt.pptx
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
 
ChatGPT ChatBot
ChatGPT ChatBotChatGPT ChatBot
ChatGPT ChatBot
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
 
What Are the Problems Associated with ChatGPT?
What Are the Problems Associated with ChatGPT?What Are the Problems Associated with ChatGPT?
What Are the Problems Associated with ChatGPT?
 
intro chatGPT workshop.pdf
intro chatGPT workshop.pdfintro chatGPT workshop.pdf
intro chatGPT workshop.pdf
 
ChatGPT.pdf
ChatGPT.pdfChatGPT.pdf
ChatGPT.pdf
 
5 BENIFITES OF CHAT GPT.pptx
5 BENIFITES OF CHAT GPT.pptx5 BENIFITES OF CHAT GPT.pptx
5 BENIFITES OF CHAT GPT.pptx
 
ChatGPT ppt.pptx
ChatGPT  ppt.pptxChatGPT  ppt.pptx
ChatGPT ppt.pptx
 
ChatGPT Training Session
ChatGPT Training SessionChatGPT Training Session
ChatGPT Training Session
 
ChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for Business
 

Similar to Webinar on ChatGPT.pptx

How does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspective
Sease
 
InstructGPT: Follow instructions with human feedback
InstructGPT: Follow instructions with human feedbackInstructGPT: Follow instructions with human feedback
InstructGPT: Follow instructions with human feedback
Yan Xu
 
chatGPT.pptx
chatGPT.pptxchatGPT.pptx
chatGPT.pptx
MahiJamunkar
 
Introduction for ChatGPT - Primer to Dummies
Introduction for ChatGPT - Primer to DummiesIntroduction for ChatGPT - Primer to Dummies
Introduction for ChatGPT - Primer to Dummies
SwethaKJ2
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPT
MuleSoft Meetups
 
Llama 2: Open Foundation and Fine-tuned Chat Models
Llama 2: Open Foundation and Fine-tuned Chat ModelsLlama 2: Open Foundation and Fine-tuned Chat Models
Llama 2: Open Foundation and Fine-tuned Chat Models
Yan Xu
 
chatgpt ..........................................
chatgpt ..........................................chatgpt ..........................................
chatgpt ..........................................
PranithaRao1
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Intel® Software
 
PPt on Chat GPT New users.pptx
PPt on Chat GPT New users.pptxPPt on Chat GPT New users.pptx
PPt on Chat GPT New users.pptx
MohdMansoorAli1
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
Po-Chuan Chen
 
LTE Short TTI Feature.docx
LTE Short TTI Feature.docxLTE Short TTI Feature.docx
LTE Short TTI Feature.docx
Akhtar Khan
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
SigOpt
 
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdfLLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
GDG Bujumbura
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
SigOpt
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
Scott Clark
 
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
Ishan Jain
 
2019 Levenshtein Transformer
2019 Levenshtein Transformer2019 Levenshtein Transformer
2019 Levenshtein Transformer
広樹 本間
 
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
thanhdowork
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
IRJET Journal
 

Similar to Webinar on ChatGPT.pptx (20)

How does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspective
 
InstructGPT: Follow instructions with human feedback
InstructGPT: Follow instructions with human feedbackInstructGPT: Follow instructions with human feedback
InstructGPT: Follow instructions with human feedback
 
chatGPT.pptx
chatGPT.pptxchatGPT.pptx
chatGPT.pptx
 
Introduction for ChatGPT - Primer to Dummies
Introduction for ChatGPT - Primer to DummiesIntroduction for ChatGPT - Primer to Dummies
Introduction for ChatGPT - Primer to Dummies
 
Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPT
 
Llama 2: Open Foundation and Fine-tuned Chat Models
Llama 2: Open Foundation and Fine-tuned Chat ModelsLlama 2: Open Foundation and Fine-tuned Chat Models
Llama 2: Open Foundation and Fine-tuned Chat Models
 
chatgpt ..........................................
chatgpt ..........................................chatgpt ..........................................
chatgpt ..........................................
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
PPt on Chat GPT New users.pptx
PPt on Chat GPT New users.pptxPPt on Chat GPT New users.pptx
PPt on Chat GPT New users.pptx
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
 
LTE Short TTI Feature.docx
LTE Short TTI Feature.docxLTE Short TTI Feature.docx
LTE Short TTI Feature.docx
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdfLLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
 
2019 Levenshtein Transformer
2019 Levenshtein Transformer2019 Levenshtein Transformer
2019 Levenshtein Transformer
 
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
 

Recently uploaded

Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 

Recently uploaded (20)

Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 

Webinar on ChatGPT.pptx

  • 1. Webinar on ChatGPT Abhilash Majumder (Intel SCG)
  • 2. ChatGPT • Chat GPT model is trained using Reinforcement Learning from Human Feedback (RLHF), • ChatGPT uses the same methods as InstructGPT, but with slight differences in the data collection setup. • ChatGPT is trained on an initial model using supervised fine- tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. • For supervised fine-tuning ChatGPT leverages a reward function based on PPO on policy algorithm to achieve SOTA generative sequences
  • 4. ChatGPT- GPT3 • GPT-3 is an autoregressive transformer model with 175 billion parameters. It uses the same architecture/model as GPT-2, including the modified initialization, pre-normalization, and reversible tokenization, with the exception that GPT- 3 uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer.
  • 5. ChatGPT- PPO(A2C) • There are two primary variants of PPO: PPO- Penalty and PPO-Clip. • PPO-Penalty approximately solves a KL- constrained update like TRPO, but penalizes the KL-divergence in the objective function instead of making it a hard constraint, and automatically adjusts the penalty coefficient over the course of training so that it’s scaled appropriately. • PPO-Clip doesn’t have a KL-divergence term in the objective and doesn’t have a constraint at all. Instead relies on specialized clipping in the objective function to remove incentives for the new policy to get far from the old policy. • PPO is an on-policy algorithm. • PPO can be used for environments with either discrete or continuous action spaces. •
  • 6. ChatGPT • In case of GPT, PPO infusion is semi supervised. This implies that a reward function is moderated by human supervision based on previous results. The initial LLM (GPT)generative sequences are ranked based on the cumulative rewards based on human supervised PPO.
  • 7. ChatGPT • Both models are given a prompt and get a response. The tuned LLM responses are scored with the reward function and which is then used to update the parameters of the fine- tuned LLM to maximize the reward function score (PPO rewards) •
  • 8. ChatGPT • But we also don't want it to deviate too much from the initial response, which is what the KL penalty is used for. Otherwise the optimization might result in an LLM that produces gibberish but maximizes the reward model score.
  • 10. ChatGPT • OpenAI Blog: https://openai.com/blog/chatgpt/ • InstructGPT: https://t.co/2VXhz0kK1o • Minimalist Repository (in progress) : https://github.com/abhilash1910/Minimalist-ChatGPT • Other Repositories in RL/LLM : https://github.com/abhilash1910/
  • 11. ChatGPT • Twitter: https://twitter.com/abhilash1396 • Github: https://github.com/abhilash1910/ • Linkedin: https://www.linkedin.com/in/abhilash-majumder- 1aa7b9138/