Developing Apps with GPT-4 and ChatGPT_ Build Intelligent Chatbots, Content Generators

Olivier Caelen &
Marie-Alice Blete
Developing Apps
with GPT-4 and
ChatGPT
Build Intelligent Chatbots, Content Generators,
and More

DATA
“Through practical
examples and step-
by-step instructions,
the authors set
forth the path to
cutting-edge app
development.”
—Tom Taulli
Author of Generative AI (Apress)
“Blends theory and
hands-on practice
seamlessly, making
the intricacies of
GPT-4 and ChatGPT
accessible.”
—Lucas Soares
ML engineer, Biometrid
Developing Apps with GPT-4 & ChatGPT
Twitter: @oreillymedia
linkedin.com/company/oreilly-media
youtube.com/oreillymedia
This minibook is a comprehensive guide for Python developers
who want to learn how to build applications with large language
models. Authors Olivier Caelen and Marie-Alice Blete cover
the main features and benefits of GPT-4 and ChatGPT and
explain how they work. You’ll also get a step-by-step guide
for developing applications using the GPT-4 and ChatGPT
Python library, including text generation, Q&A, and content
summarization tools.
Written in clear and concise language, Developing Apps
with GPT-4 and ChatGPT includes easy-to-follow examples
to help you understand and apply the concepts to your
projects. Python code examples are available in a GitHub
repository, and the book includes a glossary of key terms.
Ready to harness the power of large language models in your
applications? This book is a must.
You’ll learn:
• The fundamentals and benefits of ChatGPT and GPT-4
and how they work
• How to integrate these models into Python-based
applications for NLP tasks
• How to develop applications using GPT-4 or ChatGPT
APIs in Python for text generation, question answering,
and content summarization, among other tasks
• Advanced GPT topics including prompt engineering,
fine-tuning models for specific tasks, plug-ins, LangChain,
and more
Olivier Caelen is an ML researcher
at paytech pioneer Worldline. He has
a PhD in machine learning and also
teaches ML courses at the University
of Brussels.
Marie-Alice Blete is a software
architect and data engineer in
Worldline’s R&D department. She
is also a developer advocate and
technology speaker.
US $59.99 CAN $74.99
ISBN: 978-1-098-15248-2

Olivier Caelen and Marie-Alice Blete
Developing Apps with
GPT-4 and ChatGPT
Build Intelligent Chatbots,
Content Generators, and More
Boston Farnham Sebastopol Tokyo
Beijing Boston Farnham Sebastopol Tokyo
Beijing
Design eye-catching, gorgeous websites in seconds… and
charge $1,000s

978-1-098-15248-2
[LSI]
Developing Apps with GPT-4 and ChatGPT
by Olivier Caelen and Marie-Alice Blete
Copyright © 2023 Olivier Caelen and Marie-Alice Blete. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional
sales department: 800-998-9938 or corporate@oreilly.com.
Acquisitions Editor: Nicole Butterfield
Development Editor: Corbin Collins
Production Editor: Clare Laylock
Copyeditor: Audrey Doyle
Proofreader: Heather Walley
Indexer: Sue Klefstad
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Kate Dullea
September 2023: First Edition
Revision History for the First Edition
2023-08-29: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098152482 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Developing Apps with GPT-4 and
ChatGPT, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors and do not represent the publisher’s views.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use
of or reliance on this work. Use of the information and instructions contained in this work is at your
own risk. If any code samples or other technology this work contains or describes is subject to open
source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.

Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. GPT-4 and ChatGPT Essentials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introducing Large Language Models 1
Exploring the Foundations of Language Models and NLP 2
Understanding the Transformer Architecture and Its Role in LLMs 4
Demystifying the Tokenization and Prediction Steps in GPT Models 7
A Brief History: From GPT-1 to GPT-4 8
GPT-1 8
GPT-2 9
GPT-3 10
From GPT-3 to InstructGPT 11
GPT-3.5, Codex, and ChatGPT 13
LLM Use Cases and Example Products 15
Be My Eyes 15
Morgan Stanley 16
Khan Academy 16
Duolingo 17
Yabble 17
Waymark 18
Inworld AI 18
Beware of AI Hallucinations: Limitations and Considerations 19
Optimizing GPT Models with Plug-ins and Fine-Tuning 22
Summary 23
2. A Deep Dive into the GPT-4 and ChatGPT APIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Essential Concepts 26
Models Available in the OpenAI API 26
iii

Trying GPT Models with the OpenAI Playground 28
Getting Started: The OpenAI Python Library 33
OpenAI Access and API Key 34
“Hello World” Example 35
Using ChatGPT and GPT-4 37
Input Options for the Chat Completion Endpoint 38
Output Result Format for the Chat Completion Endpoint 40
From Text Completions to Functions 41
Using Other Text Completion Models 43
Input Options for the Text Completion Endpoint 44
Output Result Format for the Text Completion Endpoint 45
Considerations 46
Pricing and Token Limitations 46
Security and Privacy: Caution! 47
Other OpenAI APIs and Functionalities 47
Embeddings 48
Moderation Model 50
Whisper and DALL-E 53
Summary (and Cheat Sheet) 53
3. Building Apps with GPT-4 and ChatGPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
App Development Overview 55
API Key Management 55
Security and Data Privacy 57
Software Architecture Design Principles 58
LLM-Powered App Vulnerabilities 58
Analyzing Inputs and Outputs 59
The Inevitability of Prompt Injection 60
Example Projects 61
Project 1: Building a News Generator Solution 61
Project 2: Summarizing YouTube Videos 63
Project 3: Creating an Expert for Zelda BOTW 66
Project 4: Voice Control 72
Summary 78
4. Advanced GPT-4 and ChatGPT Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Prompt Engineering 79
Designing Effective Prompts 80
Thinking Step by Step 86
Implementing Few-Shot Learning 88
Improving Prompt Effectiveness 89
Fine-Tuning 91
iv | Table of Contents

Getting Started 91
Fine-Tuning with the OpenAI API 94
Fine-Tuning Applications 97
Generating and Fine-Tuning Synthetic Data for an Email
Marketing Campaign 99
Cost of Fine-Tuning 105
Summary 105
5. Advancing LLM Capabilities with the LangChain Framework and Plug-ins. . . . . . . . . 109
The LangChain Framework 109
Dynamic Prompts 111
Agents and Tools 112
Memory 115
Embeddings 116
GPT-4 Plug-ins 120
Overview 122
The API 122
The Plug-in Manifest 123
The OpenAPI Specification 124
Descriptions 126
Summary 127
Conclusion 127
Glossary of Key Terms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Table of Contents | v
CLICK HERE AND
Ask Anything, And Get Stunning, Real-Time Answers In 5-10 Seconds!

Preface
Within a mere five days of its release, ChatGPT reached an impressive one million
users, sending shock waves throughout the tech industry and beyond. As a side
effect, the OpenAI API for AI-powered text generation was suddenly brought to light,
despite having been available for three years. The ChatGPT interface showcased the
potential of such language models, and suddenly developers and inventors began to
realize the incredible possibilities available at their fingertips.
The field of natural language processing has made incredible technical progress over
the years, but until recently, use of the technology was limited to an elite few. The
OpenAI API and its accompanying libraries provide a ready-to-use solution for
anyone seeking to build AI-powered applications. There is no need to have powerful
hardware or deep knowledge of artificial intelligence; with just a few lines of code,
developers can integrate incredible features into their projects at a reasonable cost.
We combine our knowledge and experience, Olivier as a data scientist and Marie-
Alice as a software engineer, to give you a broad understanding of how to develop
applications with GPT-4 and ChatGPT. In these pages, you will find clear and
detailed explanations of AI concepts, as well as user-friendly guidelines on how to
integrate the OpenAI services effectively, securely, and cost-consciously.
This book is designed to be accessible to all, but some basic Python knowledge is pre‐
ferred. Through clear explanations, example projects, and step-by-step instructions,
we invite you to discover with us how GPT-4 and ChatGPT can transform the way we
interact with machines.
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
vii

Constant width
Used for program listings, as well as within paragraphs to refer to program
elements such as variable or function names, databases, data types, environment
variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.
This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
Using Code Examples
Supplemental material (code examples, exercises, etc.) is available for download at
https://oreil.ly/DevAppsGPT_GitHub.
If you have a technical question or a problem using the code examples, please send
email to support@oreilly.com.
This book is here to help you get your job done. In general, if example code is offered
with this book, you may use it in your programs and documentation. You do not
need to contact us for permission unless you’re reproducing a significant portion of the
code. For example, writing a program that uses several chunks of code from this book
does not require permission. Selling or distributing examples from O’Reilly books does
require permission. Answering a question by citing this book and quoting example
code does not require permission. Incorporating a significant amount of example code
from this book into your product’s documentation does require permission.
viii | Preface

Acknowledgments
Writing a book on one of the fastest-moving AI topics would not have been possible
without the help of many people. We would like to thank the incredible O’Reilly team
for their support, advice, and on-point comments; especially Corbin Collins, Nicole
Butterfield, Clare Laylock, Suzanne Huston, and Audrey Doyle.
The book also benefited from the help of exceptional reviewers who took a lot of time
to provide invaluable feedback. Many thanks to Tom Taulli, Lucas Soares, and Leonie
Monigatti.
Many thanks to our Worldline Labs colleagues for their insights and never-ending
discussions on ChatGPT and the OpenAI services; especially Liyun He Guelton,
Guillaume Coter, Luxin Zhang, and Patrik De Boe. A huge thank you as well to
Worldline’s team of developer advocates who provided support and encouragement
from the start; especially Jean-Francois James and Fanilo Andrianasolo.
And finally, we thank our friends and family for bearing with us during our ChatGPT
craze, allowing us to release this book in such a short time.
x | Preface
If you thought ChatGPT was smart…
Then RoboChat will make you say “holy smokes!”
Yes, that’s right, it’s 16X smarter than ChatGPT…
That means, you’re getting:
16X better articles…
16X better websites…
16X better landing pages…
16X better SEO-friendly blog posts
16X better emails…
CLICK HERE TO TRY NOW

CHAPTER 1
GPT-4 and ChatGPT Essentials
Imagine a world where you can communicate with computers as quickly as you can
with your friends. What would that look like? What applications could you create?
This is the world that OpenAI is helping to build with its GPT models, bringing
human-like conversational capabilities to our devices. As the latest advancements
in AI, GPT-4 and other GPT models are large language models (LLMs) trained on
massive amounts of data, enabling them to recognize and generate human-like text
with very high accuracy.
The implications of these AI models go far beyond simple voice assistants. Thanks to
OpenAI’s models, developers can now exploit the power of natural language process‐
ing (NLP) to create applications that understand our needs in ways that were once
science fiction. From innovative customer support systems that learn and adapt to
personalized educational tools that understand each student’s unique learning style,
GPT-4 and ChatGPT open up a whole new world of possibilities.
But what are GPT-4 and ChatGPT? The goal of this chapter is to take a deep dive into
the foundations, origins, and key features of these AI models. By understanding the
basics of these models, you will be well on your way to building the next generation of
LLM-powered applications.
Introducing Large Language Models
This section lays down the fundamental building blocks that have shaped the devel‐
opment of GPT-4 and ChatGPT. We aim to provide a comprehensive understanding
of language models and NLP, the role of transformer architectures, and the tokeniza‐
tion and prediction processes within GPT models.
1
hought ChatGPT Was Smart? Then This Will Blow Your
Socks Off...
CLICK HERE TO TRY NOW

Exploring the Foundations of Language Models and NLP
As LLMs, GPT-4 and ChatGPT are the latest type of model obtained in the field of
NLP, which is itself a subfield of machine learning (ML) and AI. Before delving into
GPT-4 and ChatGPT, it is essential to take a look at NLP and its related fields.
There are different definitions of AI, but one of them, more or less the consensus,
says that AI is the development of computer systems that can perform tasks that
typically require human intelligence. With this definition, many algorithms fall under
the AI umbrella. Consider, for example, the traffic prediction task in GPS applications
or the rule-based systems used in strategic video games. In these examples, seen from
the outside, the machine seems to require intelligence to accomplish these tasks.
ML is a subset of AI. In ML, we do not try to directly implement the decision rules
used by the AI system. Instead, we try to develop algorithms that allow the system to
learn by itself from examples. Since the 1950s, when ML research began, many ML
algorithms have been proposed in the scientific literature.
Among them, deep learning algorithms have come to the fore. Deep learning is a
branch of ML that focuses on algorithms inspired by the structure of the brain. These
algorithms are called artificial neural networks. They can handle very large amounts of
data and perform very well on tasks such as image and speech recognition and NLP.
GPT-4 and ChatGPT are based on a particular type of deep learning algorithm called
transformers. Transformers are like reading machines. They pay attention to different
parts of a sentence or block of text to understand its context and produce a coherent
response. They can also understand the order of words in a sentence and their con‐
text. This makes them highly effective at tasks such as language translation, question
answering, and text generation. Figure 1-1 illustrates the relationships among these
terms.
Figure 1-1. A nested set of technologies from AI to transformers
2 | Chapter 1: GPT-4 and ChatGPT Essentials

NLP is a subfield of AI focused on enabling computers to process, interpret, and gen‐
erate natural human language. Modern NLP solutions are based on ML algorithms.
The goal of NLP is to allow computers to process natural language text. This goal
covers a wide range of tasks:
Text classification
Categorizing input text into predefined groups. This includes, for example, senti‐
ment analysis and topic categorization. Companies can use sentiment analysis
to understand customers’ opinions about their services. Email filtering is an
example of topic categorization in which email can be put into categories such as
“Personal,” “Social,” “Promotions,” and “Spam.”
Automatic translation
Automatic translation of text from one language to another. Note that this can
include areas like translating code from one programming language to another,
such as from Python to C++.
Question answering
Answering questions based on a given text. For example, an online customer
service portal could use an NLP model to answer FAQs about a product, or
educational software could use NLP to provide answers to students’ questions
about the topic being studied.
Text generation
Generating a coherent and relevant output text based on a given input text, called
a prompt.
As mentioned earlier, LLMs are ML models trying to solve text generation tasks,
among others. LLMs enable computers to process, interpret, and generate human
language, allowing for more effective human–machine communication. To be able to
do this, LLMs analyze or train on vast amounts of text data and thereby learn patterns
and relationships between words in sentences. A variety of data sources can be used
to perform this learning process. This data can include text from Wikipedia, Reddit,
the archive of thousands of books, or even the archive of the internet itself. Given
an input text, this learning process allows the LLMs to make predictions about the
likeliest following words and, in this way, can generate meaningful responses to the
input text. The modern language models, published in the past few months, are so
large and have been trained on so many texts that they can now directly perform
most NLP tasks, such as text classification, machine translation, question answering,
and many others. The GPT-4 and ChatGPT models are modern LLMs that excel at
text generation tasks.
The development of LLMs goes back several years. It started with simple language
models such as n-grams, which tried to predict the next word in a sentence based
on the previous words. N-gram models use frequency to do this. The predicted
Introducing Large Language Models | 3

next word is the most frequent word that follows the previous words in the text
the n-gram model was trained on. While this approach was a good start, n-gram
models’ need for improvement in understanding context and grammar resulted in
inconsistent text generation.
To improve the performance of n-gram models, more advanced learning algorithms
were introduced, including recurrent neural networks (RNNs) and long short-term
memory (LSTM) networks. These models could learn longer sequences and analyze
the context better than n-grams, but they still needed help processing large amounts
of data efficiently. These types of recurrent models were the most efficient ones for
a long time and therefore were the most used in tools such as automatic machine
translation.
Understanding the Transformer Architecture and Its Role in LLMs
The Transformer architecture revolutionized NLP, primarily because transformers
effectively address one of the critical limitations of previous NLP models such as
RNNs: their struggle with long text sequences and maintaining context over these
lengths. In other words, while RNNs tended to forget the context in longer sequences
(the infamous “catastrophic forgetting”), transformers came with the ability to handle
and encode this context effectively.
The central pillar of this revolution is the attention mechanism, a simple yet powerful
idea. Instead of treating all words in a text sequence as equally important, the model
“pays attention” to the most relevant terms for each step of its task. Cross-attention
and self-attention are two architectural blocks based on this attention mechanism,
and they are often found in LLMs. The Transformer architecture makes extensive use
of these cross-attention and self-attention blocks.
Cross-attention helps the model determine the relevance of the different parts of
the input text for accurately predicting the next word in the output text. It’s like a
spotlight that shines on words or phrases in the input text, highlighting the relevant
information needed to make the next word prediction while ignoring less important
details.
To illustrate this, let’s take an example of a simple sentence translation task. Imagine
we have an input English sentence, “Alice enjoyed the sunny weather in Brussels,”
which should be translated into French as “Alice a profité du temps ensoleillé à
Bruxelles.” In this example, let us focus on generating the French word ensoleillé,
which means sunny. For this prediction, cross-attention would give more weight to
the English words sunny and weather since they are both relevant to the meaning
of ensoleillé. By focusing on these two words, cross-attention helps the model gener‐
ate an accurate translation for this part of the sentence. Figure 1-2 illustrates this
example.

Figure 1-2. Cross-attention uses the attention mechanism to focus on essential parts of
the input text (English sentence) to predict the next word in the output text (French
sentence)
Self-attention refers to the ability of a model to focus on different parts of its input
text. In the context of NLP, the model can evaluate the importance of each word in
a sentence with the other words. This allows it to better understand the relationships
between the words and helps the model build new concepts from multiple words in
the input text.
As a more specific example, consider the following: “Alice received praise from her
colleagues.” Assume that the model is trying to understand the meaning of the word
her in the sentence. The self-attention mechanism assigns different weights to the words
in the sentence, highlighting the words relevant to her in this context. In this example,
self-attention would place more weight on the words Alice and colleagues. Self-attention
helps the model build new concepts from these words. In this example, one of the
concepts that could emerge would be “Alice’s colleagues,” as shown in Figure 1-3.
Figure 1-3. Self-attention allows the emergence of the “Alice’s colleagues” concept

Unlike the recurrent architecture, transformers also have the advantage of being
easily parallelized. This means the Transformer architecture can process multiple
parts of the input text simultaneously rather than sequentially. This allows faster
computation and training because different parts of the model can work in parallel
without waiting for previous steps to complete, unlike recurrent architectures, which
require sequential processing. The parallel processing capability of transformer mod‐
els fits perfectly with the architecture of graphics processing units (GPUs), which are
designed to handle multiple computations simultaneously. Therefore, GPUs are ideal
for training and running these transformer models because of their high parallelism
and computational power. This advance allowed data scientists to train models on
much larger datasets, paving the way for developing LLMs.
The Transformer architecture, introduced in 2017 by Vaswani et al. from Google
in the paper “Attention Is All You Need”, was originally developed for sequence-to-
sequence tasks such as machine translation. A standard transformer consists of two
primary components: an encoder and a decoder, both of which rely heavily on
attention mechanisms. The task of the encoder is to process the input text, identify
valuable features, and generate a meaningful representation of that text, known as
embedding. The decoder then uses this embedding to produce an output, such as a
translation or summary. This output effectively interprets the encoded information.
Generative pre-trained transformers, commonly known as GPT, are a family of mod‐
els that are based on the Transformer architecture and that specifically utilize the
decoder part of the original architecture. In GPT, the encoder is not present, so there
is no need for cross-attention to integrate the embeddings produced by an encoder.
As a result, GPT relies solely on the self-attention mechanism within the decoder to
generate context-aware representations and predictions. Note that other well-known
models, such as BERT (Bidirectional Encoder Representations from Transformers),
are based on the encoder part. We don’t cover this type of model in this book.
Figure 1-4 illustrates the evolution of these different models.
Figure 1-4. The evolution of NLP techniques from n-grams to the emergence of LLMs

Demystifying the Tokenization and Prediction Steps in GPT Models
LLMs in the GPT family receive a prompt as input, and in response they generate a
text. This process is known as text completion. For example, the prompt could be “The
weather is nice today, so I decided to” and the model output might be “go for a walk”.
You may be wondering how the LLM model builds this output text from the input
prompt. As you will see, it’s mostly just a question of probabilities.
When a prompt is sent to an LLM, it first breaks the input into smaller pieces
called tokens. These tokens represent single words, parts of words, or spaces and
punctuation. For example, the preceding prompt could be broken like this: [“The”,
“wea”, “ther”, “is”, “nice”, “today”, “,”, “so”, “I”, “de”, “ci”, “ded”, “to”]. Each language
model comes with its own tokenizer. The GPT-4 tokenizer is not available at the time
of this writing, but you can test the GPT-3 tokenizer.
A rule of thumb for understanding tokens in terms of word length
is that 100 tokens equal approximately 75 words for an English
text.
Thanks to the attention principle and the Transformer architecture introduced ear‐
lier, the LLM processes these tokens and can interpret the relationships between them
and the overall meaning of the prompt. The Transformer architecture allows a model
to efficiently identify the critical information and the context within the text.
To create a new sentence, the LLM predicts the tokens most likely to follow, based
on the context of the prompt. OpenAI produced two versions of GPT-4, with context
windows of 8,192 tokens and 32,768 tokens. Unlike the previous recurrent models,
which had difficulty handling long input sequences, the Transformer architecture
with the attention mechanism allows the modern LLM to consider the context as a
whole. Based on this context, the model assigns a probability score for each potential
subsequent token. The token with the highest probability is then selected as the next
token in the sequence. In our example, after “The weather is nice today, so I decided
to”, the next best token could be “go”.
This process is then repeated, but now the context becomes “The weather is nice
today, so I decided to go”, where the previously predicted token “go” is added to the
original prompt. The second token that the model might predict could be “for”. This
process is repeated until a complete sentence is formed: “go for a walk”. This process
relies on the LLM’s ability to learn the next most probable word from massive text
data. Figure 1-5 illustrates this process.

Figure 1-5. The completion process is iterative, token by token
A Brief History: From GPT-1 to GPT-4
In this section, we will review the evolution of the OpenAI GPT models from GPT-1
to GPT-4.
GPT-1
In mid-2018, just one year after the invention of the Transformer architecture,
OpenAI published a paper titled “Improving Language Understanding by Generative
Pre-Training”, by Radford et al., in which the company introduced the Generative
Pre-trained Transformer, also known as GPT-1.
Before GPT-1, the common approach to building high-performance NLP neural
models relied on supervised learning. These learning techniques use large amounts of
manually labeled data. For example, in a sentiment analysis task where the goal is to
classify whether a given text has positive or negative sentiment, a common strategy
would require collecting thousands of manually labeled text examples to build an
effective classification model. However, the need for large amounts of well-annotated,
supervised data has limited the performance of these techniques because such data‐
sets are both difficult and expensive to generate.

In their paper, the authors of GPT-1 proposed a new learning process in which an
unsupervised pre-training step is introduced. In this pre-training step, no labeled
data is needed. Instead, the model is trained to predict what the next token is.
Thanks to the use of the Transformer architecture, which allows parallelization, this
pre-training was performed on a large amount of data. For the pre-training, the
GPT-1 model used the BookCorpus dataset, which contains the text of approximately
11,000 unpublished books. This dataset was initially presented in 2015 in the scien‐
tific paper “Aligning Books and Movies: Towards Story-Like Visual Explanations by
Watching Movies and Reading Books” by Zhu et al., and was initially made available
on a University of Toronto web page. However, today the official version of the
original dataset is no longer publicly accessible.
The GPT-1 model was found to be effective in a variety of basic completion tasks.
In the unsupervised learning phase, the model learned to predict the next item in
the texts of the BookCorpus dataset. However, since GPT-1 is a small model, it was
unable to perform complex tasks without fine-tuning. Therefore, fine-tuning was
performed as a second supervised learning step on a small set of manually labeled
data to adapt the model to a specific target task. For example, in a classification task
such as sentiment analysis, it may be necessary to retrain the model on a small set of
manually labeled text examples to achieve reasonable accuracy. This process allowed
the parameters learned in the initial pre-training phase to be modified to better fit the
task at hand.
Despite its relatively small size, GPT-1 showed remarkable performance on several
NLP tasks using only a small amount of manually labeled data for fine-tuning. The
GPT-1 architecture consisted of a decoder similar to the original transformer, which
was introduced in 2017 and had 117 million parameters. This first GPT model paved
the way for more powerful models with larger datasets and more parameters to take
better advantage of the potential of the Transformer architecture.
GPT-2
In early 2019, OpenAI proposed GPT-2, a scaled-up version of the GPT-1 model that
increased the number of parameters and the size of the training dataset tenfold. The
number of parameters of this new version was 1.5 billion, trained on 40 GB of text. In
November 2019, OpenAI released the full version of the GPT-2 language model.
A Brief History: From GPT-1 to GPT-4 | 9

GPT-2 is publicly available and can be downloaded from Hugging
Face or GitHub.
GPT-2 showed that training a larger language model on a larger dataset improves
the ability of a language model to process tasks and outperforms the state of the art
on many jobs. It also showed that even larger language models can process natural
language better.
GPT-3
OpenAI released version 3 of GPT in June 2020. The main differences between
GPT-2 and GPT-3 are the size of the model and the quantity of data used for the
training. GPT-3 is a much larger model than GPT-2, with 175 billion parameters,
allowing it to capture more complex patterns. In addition, GPT-3 was trained on a
more extensive dataset. This includes Common Crawl, a large web archive containing
text from billions of web pages and other sources, such as Wikipedia. This training
dataset, which includes content from websites, books, and articles, allows GPT-3 to
develop a deeper understanding of the language and context. As a result, GPT-3
demonstrates improved performance on a variety of linguistics tasks. It also demon‐
strates superior coherence and creativity in its generated texts. It is even capable of
writing code snippets, such as SQL queries, and performing other intelligent tasks.
Furthermore, GPT-3 eliminates the need for a fine-tuning step, which was mandatory
for its predecessors.
However, with GPT-3 there is a problem of misalignment between the tasks given
by end users and what the model has seen during its training. As we have seen,
language models are trained to predict the next token based on the input context.
This training process is not necessarily directly aligned with the tasks end users
want the model to perform. In addition, increasing the size of language models does
not inherently make them better at following user intent or instructions. Moreover,
models like GPT-3 were trained on data from different sources on the internet.
Although a cleanup has been made in the selection of sources, the learning data may
contain false or problematic text, including racist text, sexist text, misinformation,
or disinformation. As a result, the model may sometimes say the wrong things or
even toxic things. In 2021 a new release of the GPT-3 model was published, called
the Instruct series. Unlike the original GPT-3 base model, the instruct models are
optimized by reinforcement learning with human feedback, meaning that they use the
feedback to learn and to improve over time. This allows the models to learn from
human instructions while making them more truthful and less toxic.

To illustrate the difference, we input the prompt “Explain what is meant by time
complexity”, and we receive the following:
• With the standard GPT-3, we obtain the output: “Explain what is meant by space
•
complexity. Explain what is meant by the big-O notation.”
• With the instructGPT-3 model, we obtain: “Time complexity is a way of measur‐
•
ing the amount of time it takes for an algorithm to run and complete its task. It
is usually expressed using Big O notation, which measures the complexity of an
algorithm in terms of the number of operations it performs. The time complexity of
an algorithm is important because it determines how efficient the algorithm is and
how well it scales with larger inputs.”
We can see that for the same input, the first model cannot answer the question (the
answer is even weird), whereas the second model does answer the question. It is,
of course, possible to obtain the desired response with a standard GPT-3 model.
However, contrary to instruction models, it is necessary to apply specific prompt
design and optimization techniques to obtain the desired output from the GPT-3
model. This technique is called prompt engineering and will be detailed in the coming
chapters.
From GPT-3 to InstructGPT
OpenAI explains how the Instruct series was constructed in the scientific paper
“Training Language Models to Follow Instructions with Human Feedback” by
Ouyang et al.
The training recipe has two main stages to go from a GPT-3 model to an instructed
GPT-3 model: supervised fine-tuning (SFT) and reinforcement learning from human
feedback (RLHF). In each stage, the results of the prior stage are fine-tuned. That is,
the SFT stage receives the GPT-3 model and returns a new model, which is sent to the
RLHF stage to obtain the instructed version.
Figure 1-6, from the scientific paper from OpenAI, details the entire process.
WATCH THIS FREE VIDEO TO
programming geek to build mobile apps & any software

Figure 1-6. The steps to obtain the instructed models (redrawn from an image by
Ouyang et al.)

We will step through these stages one by one.
In the SFT stage, the original GPT-3 model is fine-tuned with straightforward super‐
vised learning (step 1 in Figure 1-6). OpenAI has a collection of prompts made by
end users. The process starts with the random selection of a prompt from the set of
available prompts. A human (called a labeler) is then asked to write an example of an
ideal answer to this prompt. This process is repeated thousands of times to obtain a
supervised training set composed of prompts and the corresponding ideal responses.
This dataset is then used to fine-tune the GPT-3 model to give more consistent
answers to user requests. The resulting model is called the SFT model.
The RLHF stage is divided into two substeps. First a reward model (RM) is built
(step 2 in Figure 1-6), and then the RM is used for reinforcement learning (step 3 in
Figure 1-6).
The goal of the RM is to automatically give a score to a response to a prompt. When
the response matches what is indicated in the prompt, the RM score should be high;
when it doesn’t match, it should be low. To construct the RM, OpenAI begins by
randomly selecting a question and using the SFT model to produce several possible
answers. As we will see later, it is possible to produce many responses with the same
input prompt via a parameter called temperature. A human labeler is then asked to
rank the responses based on criteria such as fit with the prompt and toxicity of the
response. After running this procedure many times, a dataset is used to fine-tune the
SFT model for scoring. This RM will be used to build the final InstructGPT model.
The final step in training InstructGPT models involves reinforcement learning, which
is an iterative process. It starts with an initial generative model, such as the SFT
model. Then a random prompt is selected, and the model predicts an output, which
the RM evaluates. Based on the reward received, the generative model is updated
accordingly. This process can be repeated countless times without human interven‐
tion, providing a more efficient and automated approach to adapting the model for
better performance.
InstructGPT models are better at producing accurate completions for what people
give as input in the prompt. OpenAI recommends using the InstructGPT series
rather than the original series.
GPT-3.5, Codex, and ChatGPT
In March 2022, OpenAI made available new versions of GPT-3. These new models
can edit text or insert content into text. They have been trained on data through
June 2021 and are described as more powerful than previous versions. At the end
of November 2022, OpenAI began referring to these models as belonging to the
GPT-3.5 series.

Premium Content Worth $99 Locked
To unlock th Ebook And Download It For FREE
CLICK HERE

Developing Apps with GPT-4 and ChatGPT_ Build Intelligent Chatbots, Content Generators

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Developing Apps with GPT-4 and ChatGPT_ Build Intelligent Chatbots, Content Generators

Similar to Developing Apps with GPT-4 and ChatGPT_ Build Intelligent Chatbots, Content Generators (20)

Recently uploaded

Recently uploaded (20)

Developing Apps with GPT-4 and ChatGPT_ Build Intelligent Chatbots, Content Generators