SlideShare a Scribd company logo
1 of 24
Download to read offline
Olivier Caelen &
Marie-Alice Blete
Developing Apps
with GPT-4 and
ChatGPT
Build Intelligent Chatbots, Content Generators,
and More
DATA
“Through practical
examples and step-
by-step instructions,
the authors set
forth the path to
cutting-edge app
development.”
—Tom Taulli
Author of Generative AI (Apress)
“Blends theory and
hands-on practice
seamlessly, making
the intricacies of
GPT-4 and ChatGPT
accessible.”
—Lucas Soares
ML engineer, Biometrid
Developing Apps with GPT-4 & ChatGPT
Twitter: @oreillymedia
linkedin.com/company/oreilly-media
youtube.com/oreillymedia
This minibook is a comprehensive guide for Python developers
who want to learn how to build applications with large language
models. Authors Olivier Caelen and Marie-Alice Blete cover
the main features and benefits of GPT-4 and ChatGPT and
explain how they work. You’ll also get a step-by-step guide
for developing applications using the GPT-4 and ChatGPT
Python library, including text generation, Q&A, and content
summarization tools.
Written in clear and concise language, Developing Apps
with GPT-4 and ChatGPT includes easy-to-follow examples
to help you understand and apply the concepts to your
projects. Python code examples are available in a GitHub
repository, and the book includes a glossary of key terms.
Ready to harness the power of large language models in your
applications? This book is a must.
You’ll learn:
• The fundamentals and benefits of ChatGPT and GPT-4
and how they work
• How to integrate these models into Python-based
applications for NLP tasks
• How to develop applications using GPT-4 or ChatGPT
APIs in Python for text generation, question answering,
and content summarization, among other tasks
• Advanced GPT topics including prompt engineering,
fine-tuning models for specific tasks, plug-ins, LangChain,
and more
Olivier Caelen is an ML researcher
at paytech pioneer Worldline. He has
a PhD in machine learning and also
teaches ML courses at the University
of Brussels.
Marie-Alice Blete is a software
architect and data engineer in
Worldline’s R&D department. She
is also a developer advocate and
technology speaker.
US $59.99 CAN $74.99
ISBN: 978-1-098-15248-2
Olivier Caelen and Marie-Alice Blete
Developing Apps with
GPT-4 and ChatGPT
Build Intelligent Chatbots,
Content Generators, and More
Boston Farnham Sebastopol Tokyo
Beijing Boston Farnham Sebastopol Tokyo
Beijing
Design eye-catching, gorgeous websites in seconds… and
charge $1,000s
978-1-098-15248-2
[LSI]
Developing Apps with GPT-4 and ChatGPT
by Olivier Caelen and Marie-Alice Blete
Copyright © 2023 Olivier Caelen and Marie-Alice Blete. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional
sales department: 800-998-9938 or corporate@oreilly.com.
Acquisitions Editor: Nicole Butterfield
Development Editor: Corbin Collins
Production Editor: Clare Laylock
Copyeditor: Audrey Doyle
Proofreader: Heather Walley
Indexer: Sue Klefstad
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Kate Dullea
September 2023: First Edition
Revision History for the First Edition
2023-08-29: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098152482 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Developing Apps with GPT-4 and
ChatGPT, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors and do not represent the publisher’s views.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use
of or reliance on this work. Use of the information and instructions contained in this work is at your
own risk. If any code samples or other technology this work contains or describes is subject to open
source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. GPT-4 and ChatGPT Essentials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introducing Large Language Models 1
Exploring the Foundations of Language Models and NLP 2
Understanding the Transformer Architecture and Its Role in LLMs 4
Demystifying the Tokenization and Prediction Steps in GPT Models 7
A Brief History: From GPT-1 to GPT-4 8
GPT-1 8
GPT-2 9
GPT-3 10
From GPT-3 to InstructGPT 11
GPT-3.5, Codex, and ChatGPT 13
LLM Use Cases and Example Products 15
Be My Eyes 15
Morgan Stanley 16
Khan Academy 16
Duolingo 17
Yabble 17
Waymark 18
Inworld AI 18
Beware of AI Hallucinations: Limitations and Considerations 19
Optimizing GPT Models with Plug-ins and Fine-Tuning 22
Summary 23
2. A Deep Dive into the GPT-4 and ChatGPT APIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Essential Concepts 26
Models Available in the OpenAI API 26
iii
Trying GPT Models with the OpenAI Playground 28
Getting Started: The OpenAI Python Library 33
OpenAI Access and API Key 34
“Hello World” Example 35
Using ChatGPT and GPT-4 37
Input Options for the Chat Completion Endpoint 38
Output Result Format for the Chat Completion Endpoint 40
From Text Completions to Functions 41
Using Other Text Completion Models 43
Input Options for the Text Completion Endpoint 44
Output Result Format for the Text Completion Endpoint 45
Considerations 46
Pricing and Token Limitations 46
Security and Privacy: Caution! 47
Other OpenAI APIs and Functionalities 47
Embeddings 48
Moderation Model 50
Whisper and DALL-E 53
Summary (and Cheat Sheet) 53
3. Building Apps with GPT-4 and ChatGPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
App Development Overview 55
API Key Management 55
Security and Data Privacy 57
Software Architecture Design Principles 58
LLM-Powered App Vulnerabilities 58
Analyzing Inputs and Outputs 59
The Inevitability of Prompt Injection 60
Example Projects 61
Project 1: Building a News Generator Solution 61
Project 2: Summarizing YouTube Videos 63
Project 3: Creating an Expert for Zelda BOTW 66
Project 4: Voice Control 72
Summary 78
4. Advanced GPT-4 and ChatGPT Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Prompt Engineering 79
Designing Effective Prompts 80
Thinking Step by Step 86
Implementing Few-Shot Learning 88
Improving Prompt Effectiveness 89
Fine-Tuning 91
iv | Table of Contents
Getting Started 91
Fine-Tuning with the OpenAI API 94
Fine-Tuning Applications 97
Generating and Fine-Tuning Synthetic Data for an Email
Marketing Campaign 99
Cost of Fine-Tuning 105
Summary 105
5. Advancing LLM Capabilities with the LangChain Framework and Plug-ins. . . . . . . . . 109
The LangChain Framework 109
Dynamic Prompts 111
Agents and Tools 112
Memory 115
Embeddings 116
GPT-4 Plug-ins 120
Overview 122
The API 122
The Plug-in Manifest 123
The OpenAPI Specification 124
Descriptions 126
Summary 127
Conclusion 127
Glossary of Key Terms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Table of Contents | v
CLICK HERE AND
Ask Anything, And Get Stunning, Real-Time Answers In 5-10 Seconds!
Preface
Within a mere five days of its release, ChatGPT reached an impressive one million
users, sending shock waves throughout the tech industry and beyond. As a side
effect, the OpenAI API for AI-powered text generation was suddenly brought to light,
despite having been available for three years. The ChatGPT interface showcased the
potential of such language models, and suddenly developers and inventors began to
realize the incredible possibilities available at their fingertips.
The field of natural language processing has made incredible technical progress over
the years, but until recently, use of the technology was limited to an elite few. The
OpenAI API and its accompanying libraries provide a ready-to-use solution for
anyone seeking to build AI-powered applications. There is no need to have powerful
hardware or deep knowledge of artificial intelligence; with just a few lines of code,
developers can integrate incredible features into their projects at a reasonable cost.
We combine our knowledge and experience, Olivier as a data scientist and Marie-
Alice as a software engineer, to give you a broad understanding of how to develop
applications with GPT-4 and ChatGPT. In these pages, you will find clear and
detailed explanations of AI concepts, as well as user-friendly guidelines on how to
integrate the OpenAI services effectively, securely, and cost-consciously.
This book is designed to be accessible to all, but some basic Python knowledge is pre‐
ferred. Through clear explanations, example projects, and step-by-step instructions,
we invite you to discover with us how GPT-4 and ChatGPT can transform the way we
interact with machines.
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
vii
Constant width
Used for program listings, as well as within paragraphs to refer to program
elements such as variable or function names, databases, data types, environment
variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.
This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
Using Code Examples
Supplemental material (code examples, exercises, etc.) is available for download at
https://oreil.ly/DevAppsGPT_GitHub.
If you have a technical question or a problem using the code examples, please send
email to support@oreilly.com.
This book is here to help you get your job done. In general, if example code is offered
with this book, you may use it in your programs and documentation. You do not
need to contact us for permission unless you’re reproducing a significant portion of the
code. For example, writing a program that uses several chunks of code from this book
does not require permission. Selling or distributing examples from O’Reilly books does
require permission. Answering a question by citing this book and quoting example
code does not require permission. Incorporating a significant amount of example code
from this book into your product’s documentation does require permission.
viii | Preface
Acknowledgments
Writing a book on one of the fastest-moving AI topics would not have been possible
without the help of many people. We would like to thank the incredible O’Reilly team
for their support, advice, and on-point comments; especially Corbin Collins, Nicole
Butterfield, Clare Laylock, Suzanne Huston, and Audrey Doyle.
The book also benefited from the help of exceptional reviewers who took a lot of time
to provide invaluable feedback. Many thanks to Tom Taulli, Lucas Soares, and Leonie
Monigatti.
Many thanks to our Worldline Labs colleagues for their insights and never-ending
discussions on ChatGPT and the OpenAI services; especially Liyun He Guelton,
Guillaume Coter, Luxin Zhang, and Patrik De Boe. A huge thank you as well to
Worldline’s team of developer advocates who provided support and encouragement
from the start; especially Jean-Francois James and Fanilo Andrianasolo.
And finally, we thank our friends and family for bearing with us during our ChatGPT
craze, allowing us to release this book in such a short time.
x | Preface
If you thought ChatGPT was smart…
​Then RoboChat will make you say “holy smokes!”
​Yes, that’s right, it’s 16X smarter than ChatGPT…
That means, you’re getting:
16X better articles…
16X better websites…
16X better landing pages…
16X better SEO-friendly blog posts
16X better emails…
CLICK HERE TO TRY NOW
CHAPTER 1
GPT-4 and ChatGPT Essentials
Imagine a world where you can communicate with computers as quickly as you can
with your friends. What would that look like? What applications could you create?
This is the world that OpenAI is helping to build with its GPT models, bringing
human-like conversational capabilities to our devices. As the latest advancements
in AI, GPT-4 and other GPT models are large language models (LLMs) trained on
massive amounts of data, enabling them to recognize and generate human-like text
with very high accuracy.
The implications of these AI models go far beyond simple voice assistants. Thanks to
OpenAI’s models, developers can now exploit the power of natural language process‐
ing (NLP) to create applications that understand our needs in ways that were once
science fiction. From innovative customer support systems that learn and adapt to
personalized educational tools that understand each student’s unique learning style,
GPT-4 and ChatGPT open up a whole new world of possibilities.
But what are GPT-4 and ChatGPT? The goal of this chapter is to take a deep dive into
the foundations, origins, and key features of these AI models. By understanding the
basics of these models, you will be well on your way to building the next generation of
LLM-powered applications.
Introducing Large Language Models
This section lays down the fundamental building blocks that have shaped the devel‐
opment of GPT-4 and ChatGPT. We aim to provide a comprehensive understanding
of language models and NLP, the role of transformer architectures, and the tokeniza‐
tion and prediction processes within GPT models.
1
hought ChatGPT Was Smart? Then This Will Blow Your
Socks Off...
CLICK HERE TO TRY NOW
Exploring the Foundations of Language Models and NLP
As LLMs, GPT-4 and ChatGPT are the latest type of model obtained in the field of
NLP, which is itself a subfield of machine learning (ML) and AI. Before delving into
GPT-4 and ChatGPT, it is essential to take a look at NLP and its related fields.
There are different definitions of AI, but one of them, more or less the consensus,
says that AI is the development of computer systems that can perform tasks that
typically require human intelligence. With this definition, many algorithms fall under
the AI umbrella. Consider, for example, the traffic prediction task in GPS applications
or the rule-based systems used in strategic video games. In these examples, seen from
the outside, the machine seems to require intelligence to accomplish these tasks.
ML is a subset of AI. In ML, we do not try to directly implement the decision rules
used by the AI system. Instead, we try to develop algorithms that allow the system to
learn by itself from examples. Since the 1950s, when ML research began, many ML
algorithms have been proposed in the scientific literature.
Among them, deep learning algorithms have come to the fore. Deep learning is a
branch of ML that focuses on algorithms inspired by the structure of the brain. These
algorithms are called artificial neural networks. They can handle very large amounts of
data and perform very well on tasks such as image and speech recognition and NLP.
GPT-4 and ChatGPT are based on a particular type of deep learning algorithm called
transformers. Transformers are like reading machines. They pay attention to different
parts of a sentence or block of text to understand its context and produce a coherent
response. They can also understand the order of words in a sentence and their con‐
text. This makes them highly effective at tasks such as language translation, question
answering, and text generation. Figure 1-1 illustrates the relationships among these
terms.
Figure 1-1. A nested set of technologies from AI to transformers
2 | Chapter 1: GPT-4 and ChatGPT Essentials
NLP is a subfield of AI focused on enabling computers to process, interpret, and gen‐
erate natural human language. Modern NLP solutions are based on ML algorithms.
The goal of NLP is to allow computers to process natural language text. This goal
covers a wide range of tasks:
Text classification
Categorizing input text into predefined groups. This includes, for example, senti‐
ment analysis and topic categorization. Companies can use sentiment analysis
to understand customers’ opinions about their services. Email filtering is an
example of topic categorization in which email can be put into categories such as
“Personal,” “Social,” “Promotions,” and “Spam.”
Automatic translation
Automatic translation of text from one language to another. Note that this can
include areas like translating code from one programming language to another,
such as from Python to C++.
Question answering
Answering questions based on a given text. For example, an online customer
service portal could use an NLP model to answer FAQs about a product, or
educational software could use NLP to provide answers to students’ questions
about the topic being studied.
Text generation
Generating a coherent and relevant output text based on a given input text, called
a prompt.
As mentioned earlier, LLMs are ML models trying to solve text generation tasks,
among others. LLMs enable computers to process, interpret, and generate human
language, allowing for more effective human–machine communication. To be able to
do this, LLMs analyze or train on vast amounts of text data and thereby learn patterns
and relationships between words in sentences. A variety of data sources can be used
to perform this learning process. This data can include text from Wikipedia, Reddit,
the archive of thousands of books, or even the archive of the internet itself. Given
an input text, this learning process allows the LLMs to make predictions about the
likeliest following words and, in this way, can generate meaningful responses to the
input text. The modern language models, published in the past few months, are so
large and have been trained on so many texts that they can now directly perform
most NLP tasks, such as text classification, machine translation, question answering,
and many others. The GPT-4 and ChatGPT models are modern LLMs that excel at
text generation tasks.
The development of LLMs goes back several years. It started with simple language
models such as n-grams, which tried to predict the next word in a sentence based
on the previous words. N-gram models use frequency to do this. The predicted
Introducing Large Language Models | 3
next word is the most frequent word that follows the previous words in the text
the n-gram model was trained on. While this approach was a good start, n-gram
models’ need for improvement in understanding context and grammar resulted in
inconsistent text generation.
To improve the performance of n-gram models, more advanced learning algorithms
were introduced, including recurrent neural networks (RNNs) and long short-term
memory (LSTM) networks. These models could learn longer sequences and analyze
the context better than n-grams, but they still needed help processing large amounts
of data efficiently. These types of recurrent models were the most efficient ones for
a long time and therefore were the most used in tools such as automatic machine
translation.
Understanding the Transformer Architecture and Its Role in LLMs
The Transformer architecture revolutionized NLP, primarily because transformers
effectively address one of the critical limitations of previous NLP models such as
RNNs: their struggle with long text sequences and maintaining context over these
lengths. In other words, while RNNs tended to forget the context in longer sequences
(the infamous “catastrophic forgetting”), transformers came with the ability to handle
and encode this context effectively.
The central pillar of this revolution is the attention mechanism, a simple yet powerful
idea. Instead of treating all words in a text sequence as equally important, the model
“pays attention” to the most relevant terms for each step of its task. Cross-attention
and self-attention are two architectural blocks based on this attention mechanism,
and they are often found in LLMs. The Transformer architecture makes extensive use
of these cross-attention and self-attention blocks.
Cross-attention helps the model determine the relevance of the different parts of
the input text for accurately predicting the next word in the output text. It’s like a
spotlight that shines on words or phrases in the input text, highlighting the relevant
information needed to make the next word prediction while ignoring less important
details.
To illustrate this, let’s take an example of a simple sentence translation task. Imagine
we have an input English sentence, “Alice enjoyed the sunny weather in Brussels,”
which should be translated into French as “Alice a profité du temps ensoleillé à
Bruxelles.” In this example, let us focus on generating the French word ensoleillé,
which means sunny. For this prediction, cross-attention would give more weight to
the English words sunny and weather since they are both relevant to the meaning
of ensoleillé. By focusing on these two words, cross-attention helps the model gener‐
ate an accurate translation for this part of the sentence. Figure 1-2 illustrates this
example.
4 | Chapter 1: GPT-4 and ChatGPT Essentials
Figure 1-2. Cross-attention uses the attention mechanism to focus on essential parts of
the input text (English sentence) to predict the next word in the output text (French
sentence)
Self-attention refers to the ability of a model to focus on different parts of its input
text. In the context of NLP, the model can evaluate the importance of each word in
a sentence with the other words. This allows it to better understand the relationships
between the words and helps the model build new concepts from multiple words in
the input text.
As a more specific example, consider the following: “Alice received praise from her
colleagues.” Assume that the model is trying to understand the meaning of the word
her in the sentence. The self-attention mechanism assigns different weights to the words
in the sentence, highlighting the words relevant to her in this context. In this example,
self-attention would place more weight on the words Alice and colleagues. Self-attention
helps the model build new concepts from these words. In this example, one of the
concepts that could emerge would be “Alice’s colleagues,” as shown in Figure 1-3.
Figure 1-3. Self-attention allows the emergence of the “Alice’s colleagues” concept
Introducing Large Language Models | 5
Unlike the recurrent architecture, transformers also have the advantage of being
easily parallelized. This means the Transformer architecture can process multiple
parts of the input text simultaneously rather than sequentially. This allows faster
computation and training because different parts of the model can work in parallel
without waiting for previous steps to complete, unlike recurrent architectures, which
require sequential processing. The parallel processing capability of transformer mod‐
els fits perfectly with the architecture of graphics processing units (GPUs), which are
designed to handle multiple computations simultaneously. Therefore, GPUs are ideal
for training and running these transformer models because of their high parallelism
and computational power. This advance allowed data scientists to train models on
much larger datasets, paving the way for developing LLMs.
The Transformer architecture, introduced in 2017 by Vaswani et al. from Google
in the paper “Attention Is All You Need”, was originally developed for sequence-to-
sequence tasks such as machine translation. A standard transformer consists of two
primary components: an encoder and a decoder, both of which rely heavily on
attention mechanisms. The task of the encoder is to process the input text, identify
valuable features, and generate a meaningful representation of that text, known as
embedding. The decoder then uses this embedding to produce an output, such as a
translation or summary. This output effectively interprets the encoded information.
Generative pre-trained transformers, commonly known as GPT, are a family of mod‐
els that are based on the Transformer architecture and that specifically utilize the
decoder part of the original architecture. In GPT, the encoder is not present, so there
is no need for cross-attention to integrate the embeddings produced by an encoder.
As a result, GPT relies solely on the self-attention mechanism within the decoder to
generate context-aware representations and predictions. Note that other well-known
models, such as BERT (Bidirectional Encoder Representations from Transformers),
are based on the encoder part. We don’t cover this type of model in this book.
Figure 1-4 illustrates the evolution of these different models.
Figure 1-4. The evolution of NLP techniques from n-grams to the emergence of LLMs
6 | Chapter 1: GPT-4 and ChatGPT Essentials
Demystifying the Tokenization and Prediction Steps in GPT Models
LLMs in the GPT family receive a prompt as input, and in response they generate a
text. This process is known as text completion. For example, the prompt could be “The
weather is nice today, so I decided to” and the model output might be “go for a walk”.
You may be wondering how the LLM model builds this output text from the input
prompt. As you will see, it’s mostly just a question of probabilities.
When a prompt is sent to an LLM, it first breaks the input into smaller pieces
called tokens. These tokens represent single words, parts of words, or spaces and
punctuation. For example, the preceding prompt could be broken like this: [“The”,
“wea”, “ther”, “is”, “nice”, “today”, “,”, “so”, “I”, “de”, “ci”, “ded”, “to”]. Each language
model comes with its own tokenizer. The GPT-4 tokenizer is not available at the time
of this writing, but you can test the GPT-3 tokenizer.
A rule of thumb for understanding tokens in terms of word length
is that 100 tokens equal approximately 75 words for an English
text.
Thanks to the attention principle and the Transformer architecture introduced ear‐
lier, the LLM processes these tokens and can interpret the relationships between them
and the overall meaning of the prompt. The Transformer architecture allows a model
to efficiently identify the critical information and the context within the text.
To create a new sentence, the LLM predicts the tokens most likely to follow, based
on the context of the prompt. OpenAI produced two versions of GPT-4, with context
windows of 8,192 tokens and 32,768 tokens. Unlike the previous recurrent models,
which had difficulty handling long input sequences, the Transformer architecture
with the attention mechanism allows the modern LLM to consider the context as a
whole. Based on this context, the model assigns a probability score for each potential
subsequent token. The token with the highest probability is then selected as the next
token in the sequence. In our example, after “The weather is nice today, so I decided
to”, the next best token could be “go”.
This process is then repeated, but now the context becomes “The weather is nice
today, so I decided to go”, where the previously predicted token “go” is added to the
original prompt. The second token that the model might predict could be “for”. This
process is repeated until a complete sentence is formed: “go for a walk”. This process
relies on the LLM’s ability to learn the next most probable word from massive text
data. Figure 1-5 illustrates this process.
Introducing Large Language Models | 7
Figure 1-5. The completion process is iterative, token by token
A Brief History: From GPT-1 to GPT-4
In this section, we will review the evolution of the OpenAI GPT models from GPT-1
to GPT-4.
GPT-1
In mid-2018, just one year after the invention of the Transformer architecture,
OpenAI published a paper titled “Improving Language Understanding by Generative
Pre-Training”, by Radford et al., in which the company introduced the Generative
Pre-trained Transformer, also known as GPT-1.
Before GPT-1, the common approach to building high-performance NLP neural
models relied on supervised learning. These learning techniques use large amounts of
manually labeled data. For example, in a sentiment analysis task where the goal is to
classify whether a given text has positive or negative sentiment, a common strategy
would require collecting thousands of manually labeled text examples to build an
effective classification model. However, the need for large amounts of well-annotated,
supervised data has limited the performance of these techniques because such data‐
sets are both difficult and expensive to generate.
8 | Chapter 1: GPT-4 and ChatGPT Essentials
In their paper, the authors of GPT-1 proposed a new learning process in which an
unsupervised pre-training step is introduced. In this pre-training step, no labeled
data is needed. Instead, the model is trained to predict what the next token is.
Thanks to the use of the Transformer architecture, which allows parallelization, this
pre-training was performed on a large amount of data. For the pre-training, the
GPT-1 model used the BookCorpus dataset, which contains the text of approximately
11,000 unpublished books. This dataset was initially presented in 2015 in the scien‐
tific paper “Aligning Books and Movies: Towards Story-Like Visual Explanations by
Watching Movies and Reading Books” by Zhu et al., and was initially made available
on a University of Toronto web page. However, today the official version of the
original dataset is no longer publicly accessible.
The GPT-1 model was found to be effective in a variety of basic completion tasks.
In the unsupervised learning phase, the model learned to predict the next item in
the texts of the BookCorpus dataset. However, since GPT-1 is a small model, it was
unable to perform complex tasks without fine-tuning. Therefore, fine-tuning was
performed as a second supervised learning step on a small set of manually labeled
data to adapt the model to a specific target task. For example, in a classification task
such as sentiment analysis, it may be necessary to retrain the model on a small set of
manually labeled text examples to achieve reasonable accuracy. This process allowed
the parameters learned in the initial pre-training phase to be modified to better fit the
task at hand.
Despite its relatively small size, GPT-1 showed remarkable performance on several
NLP tasks using only a small amount of manually labeled data for fine-tuning. The
GPT-1 architecture consisted of a decoder similar to the original transformer, which
was introduced in 2017 and had 117 million parameters. This first GPT model paved
the way for more powerful models with larger datasets and more parameters to take
better advantage of the potential of the Transformer architecture.
GPT-2
In early 2019, OpenAI proposed GPT-2, a scaled-up version of the GPT-1 model that
increased the number of parameters and the size of the training dataset tenfold. The
number of parameters of this new version was 1.5 billion, trained on 40 GB of text. In
November 2019, OpenAI released the full version of the GPT-2 language model.
A Brief History: From GPT-1 to GPT-4 | 9
GPT-2 is publicly available and can be downloaded from Hugging
Face or GitHub.
GPT-2 showed that training a larger language model on a larger dataset improves
the ability of a language model to process tasks and outperforms the state of the art
on many jobs. It also showed that even larger language models can process natural
language better.
GPT-3
OpenAI released version 3 of GPT in June 2020. The main differences between
GPT-2 and GPT-3 are the size of the model and the quantity of data used for the
training. GPT-3 is a much larger model than GPT-2, with 175 billion parameters,
allowing it to capture more complex patterns. In addition, GPT-3 was trained on a
more extensive dataset. This includes Common Crawl, a large web archive containing
text from billions of web pages and other sources, such as Wikipedia. This training
dataset, which includes content from websites, books, and articles, allows GPT-3 to
develop a deeper understanding of the language and context. As a result, GPT-3
demonstrates improved performance on a variety of linguistics tasks. It also demon‐
strates superior coherence and creativity in its generated texts. It is even capable of
writing code snippets, such as SQL queries, and performing other intelligent tasks.
Furthermore, GPT-3 eliminates the need for a fine-tuning step, which was mandatory
for its predecessors.
However, with GPT-3 there is a problem of misalignment between the tasks given
by end users and what the model has seen during its training. As we have seen,
language models are trained to predict the next token based on the input context.
This training process is not necessarily directly aligned with the tasks end users
want the model to perform. In addition, increasing the size of language models does
not inherently make them better at following user intent or instructions. Moreover,
models like GPT-3 were trained on data from different sources on the internet.
Although a cleanup has been made in the selection of sources, the learning data may
contain false or problematic text, including racist text, sexist text, misinformation,
or disinformation. As a result, the model may sometimes say the wrong things or
even toxic things. In 2021 a new release of the GPT-3 model was published, called
the Instruct series. Unlike the original GPT-3 base model, the instruct models are
optimized by reinforcement learning with human feedback, meaning that they use the
feedback to learn and to improve over time. This allows the models to learn from
human instructions while making them more truthful and less toxic.
10 | Chapter 1: GPT-4 and ChatGPT Essentials
To illustrate the difference, we input the prompt “Explain what is meant by time
complexity”, and we receive the following:
• With the standard GPT-3, we obtain the output: “Explain what is meant by space
•
complexity. Explain what is meant by the big-O notation.”
• With the instructGPT-3 model, we obtain: “Time complexity is a way of measur‐
•
ing the amount of time it takes for an algorithm to run and complete its task. It
is usually expressed using Big O notation, which measures the complexity of an
algorithm in terms of the number of operations it performs. The time complexity of
an algorithm is important because it determines how efficient the algorithm is and
how well it scales with larger inputs.”
We can see that for the same input, the first model cannot answer the question (the
answer is even weird), whereas the second model does answer the question. It is,
of course, possible to obtain the desired response with a standard GPT-3 model.
However, contrary to instruction models, it is necessary to apply specific prompt
design and optimization techniques to obtain the desired output from the GPT-3
model. This technique is called prompt engineering and will be detailed in the coming
chapters.
From GPT-3 to InstructGPT
OpenAI explains how the Instruct series was constructed in the scientific paper
“Training Language Models to Follow Instructions with Human Feedback” by
Ouyang et al.
The training recipe has two main stages to go from a GPT-3 model to an instructed
GPT-3 model: supervised fine-tuning (SFT) and reinforcement learning from human
feedback (RLHF). In each stage, the results of the prior stage are fine-tuned. That is,
the SFT stage receives the GPT-3 model and returns a new model, which is sent to the
RLHF stage to obtain the instructed version.
Figure 1-6, from the scientific paper from OpenAI, details the entire process.
A Brief History: From GPT-1 to GPT-4 | 11
WATCH THIS FREE VIDEO TO
programming geek to build mobile apps & any software
Figure 1-6. The steps to obtain the instructed models (redrawn from an image by
Ouyang et al.)
12 | Chapter 1: GPT-4 and ChatGPT Essentials
We will step through these stages one by one.
In the SFT stage, the original GPT-3 model is fine-tuned with straightforward super‐
vised learning (step 1 in Figure 1-6). OpenAI has a collection of prompts made by
end users. The process starts with the random selection of a prompt from the set of
available prompts. A human (called a labeler) is then asked to write an example of an
ideal answer to this prompt. This process is repeated thousands of times to obtain a
supervised training set composed of prompts and the corresponding ideal responses.
This dataset is then used to fine-tune the GPT-3 model to give more consistent
answers to user requests. The resulting model is called the SFT model.
The RLHF stage is divided into two substeps. First a reward model (RM) is built
(step 2 in Figure 1-6), and then the RM is used for reinforcement learning (step 3 in
Figure 1-6).
The goal of the RM is to automatically give a score to a response to a prompt. When
the response matches what is indicated in the prompt, the RM score should be high;
when it doesn’t match, it should be low. To construct the RM, OpenAI begins by
randomly selecting a question and using the SFT model to produce several possible
answers. As we will see later, it is possible to produce many responses with the same
input prompt via a parameter called temperature. A human labeler is then asked to
rank the responses based on criteria such as fit with the prompt and toxicity of the
response. After running this procedure many times, a dataset is used to fine-tune the
SFT model for scoring. This RM will be used to build the final InstructGPT model.
The final step in training InstructGPT models involves reinforcement learning, which
is an iterative process. It starts with an initial generative model, such as the SFT
model. Then a random prompt is selected, and the model predicts an output, which
the RM evaluates. Based on the reward received, the generative model is updated
accordingly. This process can be repeated countless times without human interven‐
tion, providing a more efficient and automated approach to adapting the model for
better performance.
InstructGPT models are better at producing accurate completions for what people
give as input in the prompt. OpenAI recommends using the InstructGPT series
rather than the original series.
GPT-3.5, Codex, and ChatGPT
In March 2022, OpenAI made available new versions of GPT-3. These new models
can edit text or insert content into text. They have been trained on data through
June 2021 and are described as more powerful than previous versions. At the end
of November 2022, OpenAI began referring to these models as belonging to the
GPT-3.5 series.
A Brief History: From GPT-1 to GPT-4 | 13
Premium Content Worth $99 Locked
To unlock th Ebook And Download It For FREE
CLICK HERE

More Related Content

What's hot

Gitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operationsGitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operationsMariano Cunietti
 
How to Execute a Successful API Strategy
How to Execute a Successful API StrategyHow to Execute a Successful API Strategy
How to Execute a Successful API StrategyMatt McLarty
 
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...HostedbyConfluent
 
REX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stackREX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stackMathieu Herbert
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowFernando Ortega Gallego
 
Cloud Native Architectures for Devops
Cloud Native Architectures for DevopsCloud Native Architectures for Devops
Cloud Native Architectures for Devopscornelia davis
 
Best Practice on using Azure OpenAI Service
Best Practice on using Azure OpenAI ServiceBest Practice on using Azure OpenAI Service
Best Practice on using Azure OpenAI ServiceKumton Suttiraksiri
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Kaxil Naik
 
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20Phil Wilkins
 
The Power of GitOps with Flux & GitOps Toolkit
The Power of GitOps with Flux & GitOps ToolkitThe Power of GitOps with Flux & GitOps Toolkit
The Power of GitOps with Flux & GitOps ToolkitWeaveworks
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101Sander Knape
 
API Management - Why it matters!
API Management - Why it matters!API Management - Why it matters!
API Management - Why it matters!Sven Bernhardt
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SREAraf Karsh Hamid
 
[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI Service[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI ServiceNaoki (Neo) SATO
 
apidays Australia 2023 - API Strategy In The Era Of Generative AI,Shreshta Sh...
apidays Australia 2023 - API Strategy In The Era Of Generative AI,Shreshta Sh...apidays Australia 2023 - API Strategy In The Era Of Generative AI,Shreshta Sh...
apidays Australia 2023 - API Strategy In The Era Of Generative AI,Shreshta Sh...apidays
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 

What's hot (20)

Gitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operationsGitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operations
 
Anthos
AnthosAnthos
Anthos
 
How to Execute a Successful API Strategy
How to Execute a Successful API StrategyHow to Execute a Successful API Strategy
How to Execute a Successful API Strategy
 
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
 
REX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stackREX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stack
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
 
Cloud Native Architectures for Devops
Cloud Native Architectures for DevopsCloud Native Architectures for Devops
Cloud Native Architectures for Devops
 
Best Practice on using Azure OpenAI Service
Best Practice on using Azure OpenAI ServiceBest Practice on using Azure OpenAI Service
Best Practice on using Azure OpenAI Service
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
 
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
 
The Power of GitOps with Flux & GitOps Toolkit
The Power of GitOps with Flux & GitOps ToolkitThe Power of GitOps with Flux & GitOps Toolkit
The Power of GitOps with Flux & GitOps Toolkit
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101
 
API Management - Why it matters!
API Management - Why it matters!API Management - Why it matters!
API Management - Why it matters!
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SRE
 
[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI Service[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI Service
 
apidays Australia 2023 - API Strategy In The Era Of Generative AI,Shreshta Sh...
apidays Australia 2023 - API Strategy In The Era Of Generative AI,Shreshta Sh...apidays Australia 2023 - API Strategy In The Era Of Generative AI,Shreshta Sh...
apidays Australia 2023 - API Strategy In The Era Of Generative AI,Shreshta Sh...
 
GCP-pde.pdf
GCP-pde.pdfGCP-pde.pdf
GCP-pde.pdf
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 

Similar to Developing Apps with GPT-4 and ChatGPT_ Build Intelligent Chatbots, Content Generators

sahil mooc ppt (1).ppt
sahil mooc ppt (1).pptsahil mooc ppt (1).ppt
sahil mooc ppt (1).pptSahil564199
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationSafe Software
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Mirco Hering
 
Unleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfeoinhalpin99
 
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...Safe Software
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Top 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App DevelopmentTop 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App DevelopmentInexture Solutions
 
Future of Python Certified Professionals in Data Science and Artificial Intel...
Future of Python Certified Professionals in Data Science and Artificial Intel...Future of Python Certified Professionals in Data Science and Artificial Intel...
Future of Python Certified Professionals in Data Science and Artificial Intel...M M Nair
 
The Architecture Of Software Defined Radios Essay
The Architecture Of Software Defined Radios EssayThe Architecture Of Software Defined Radios Essay
The Architecture Of Software Defined Radios EssayDivya Watson
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesKnoldus Inc.
 
Open Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfOpen Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfJavier Perez
 
Unleashing the power of AI in UiPath Studio with UiPath Autopilot.
Unleashing the power of AI in UiPath Studio with UiPath Autopilot.Unleashing the power of AI in UiPath Studio with UiPath Autopilot.
Unleashing the power of AI in UiPath Studio with UiPath Autopilot.DianaGray10
 
Unleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfTomHalpin9
 
Python Full Stack Training in Noida.pptx
Python Full Stack Training in  Noida.pptxPython Full Stack Training in  Noida.pptx
Python Full Stack Training in Noida.pptxashishthakur730937
 
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...BrianFraser29
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
 
Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at TwitterChris Aniszczyk
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft Meetups
 

Similar to Developing Apps with GPT-4 and ChatGPT_ Build Intelligent Chatbots, Content Generators (20)

sahil mooc ppt (1).ppt
sahil mooc ppt (1).pptsahil mooc ppt (1).ppt
sahil mooc ppt (1).ppt
 
Leveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data IntegrationLeveraging Generative AI: Exploring New Technology for Data Integration
Leveraging Generative AI: Exploring New Technology for Data Integration
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015
 
Unleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdf
 
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Top 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App DevelopmentTop 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App Development
 
Future of Python Certified Professionals in Data Science and Artificial Intel...
Future of Python Certified Professionals in Data Science and Artificial Intel...Future of Python Certified Professionals in Data Science and Artificial Intel...
Future of Python Certified Professionals in Data Science and Artificial Intel...
 
The Architecture Of Software Defined Radios Essay
The Architecture Of Software Defined Radios EssayThe Architecture Of Software Defined Radios Essay
The Architecture Of Software Defined Radios Essay
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Open Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfOpen Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdf
 
Unleashing the power of AI in UiPath Studio with UiPath Autopilot.
Unleashing the power of AI in UiPath Studio with UiPath Autopilot.Unleashing the power of AI in UiPath Studio with UiPath Autopilot.
Unleashing the power of AI in UiPath Studio with UiPath Autopilot.
 
What makes python 3.11 special
What makes python 3.11 special What makes python 3.11 special
What makes python 3.11 special
 
Unleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdf
 
Python
PythonPython
Python
 
Python Full Stack Training in Noida.pptx
Python Full Stack Training in  Noida.pptxPython Full Stack Training in  Noida.pptx
Python Full Stack Training in Noida.pptx
 
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical
 
Open Source Compliance at Twitter
Open Source Compliance at TwitterOpen Source Compliance at Twitter
Open Source Compliance at Twitter
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPT
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Developing Apps with GPT-4 and ChatGPT_ Build Intelligent Chatbots, Content Generators

  • 1. Olivier Caelen & Marie-Alice Blete Developing Apps with GPT-4 and ChatGPT Build Intelligent Chatbots, Content Generators, and More
  • 2. DATA “Through practical examples and step- by-step instructions, the authors set forth the path to cutting-edge app development.” —Tom Taulli Author of Generative AI (Apress) “Blends theory and hands-on practice seamlessly, making the intricacies of GPT-4 and ChatGPT accessible.” —Lucas Soares ML engineer, Biometrid Developing Apps with GPT-4 & ChatGPT Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia This minibook is a comprehensive guide for Python developers who want to learn how to build applications with large language models. Authors Olivier Caelen and Marie-Alice Blete cover the main features and benefits of GPT-4 and ChatGPT and explain how they work. You’ll also get a step-by-step guide for developing applications using the GPT-4 and ChatGPT Python library, including text generation, Q&A, and content summarization tools. Written in clear and concise language, Developing Apps with GPT-4 and ChatGPT includes easy-to-follow examples to help you understand and apply the concepts to your projects. Python code examples are available in a GitHub repository, and the book includes a glossary of key terms. Ready to harness the power of large language models in your applications? This book is a must. You’ll learn: • The fundamentals and benefits of ChatGPT and GPT-4 and how they work • How to integrate these models into Python-based applications for NLP tasks • How to develop applications using GPT-4 or ChatGPT APIs in Python for text generation, question answering, and content summarization, among other tasks • Advanced GPT topics including prompt engineering, fine-tuning models for specific tasks, plug-ins, LangChain, and more Olivier Caelen is an ML researcher at paytech pioneer Worldline. He has a PhD in machine learning and also teaches ML courses at the University of Brussels. Marie-Alice Blete is a software architect and data engineer in Worldline’s R&D department. She is also a developer advocate and technology speaker. US $59.99 CAN $74.99 ISBN: 978-1-098-15248-2
  • 3. Olivier Caelen and Marie-Alice Blete Developing Apps with GPT-4 and ChatGPT Build Intelligent Chatbots, Content Generators, and More Boston Farnham Sebastopol Tokyo Beijing Boston Farnham Sebastopol Tokyo Beijing Design eye-catching, gorgeous websites in seconds… and charge $1,000s
  • 4. 978-1-098-15248-2 [LSI] Developing Apps with GPT-4 and ChatGPT by Olivier Caelen and Marie-Alice Blete Copyright © 2023 Olivier Caelen and Marie-Alice Blete. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Corbin Collins Production Editor: Clare Laylock Copyeditor: Audrey Doyle Proofreader: Heather Walley Indexer: Sue Klefstad Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea September 2023: First Edition Revision History for the First Edition 2023-08-29: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098152482 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Developing Apps with GPT-4 and ChatGPT, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
  • 5. Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1. GPT-4 and ChatGPT Essentials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introducing Large Language Models 1 Exploring the Foundations of Language Models and NLP 2 Understanding the Transformer Architecture and Its Role in LLMs 4 Demystifying the Tokenization and Prediction Steps in GPT Models 7 A Brief History: From GPT-1 to GPT-4 8 GPT-1 8 GPT-2 9 GPT-3 10 From GPT-3 to InstructGPT 11 GPT-3.5, Codex, and ChatGPT 13 LLM Use Cases and Example Products 15 Be My Eyes 15 Morgan Stanley 16 Khan Academy 16 Duolingo 17 Yabble 17 Waymark 18 Inworld AI 18 Beware of AI Hallucinations: Limitations and Considerations 19 Optimizing GPT Models with Plug-ins and Fine-Tuning 22 Summary 23 2. A Deep Dive into the GPT-4 and ChatGPT APIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Essential Concepts 26 Models Available in the OpenAI API 26 iii
  • 6. Trying GPT Models with the OpenAI Playground 28 Getting Started: The OpenAI Python Library 33 OpenAI Access and API Key 34 “Hello World” Example 35 Using ChatGPT and GPT-4 37 Input Options for the Chat Completion Endpoint 38 Output Result Format for the Chat Completion Endpoint 40 From Text Completions to Functions 41 Using Other Text Completion Models 43 Input Options for the Text Completion Endpoint 44 Output Result Format for the Text Completion Endpoint 45 Considerations 46 Pricing and Token Limitations 46 Security and Privacy: Caution! 47 Other OpenAI APIs and Functionalities 47 Embeddings 48 Moderation Model 50 Whisper and DALL-E 53 Summary (and Cheat Sheet) 53 3. Building Apps with GPT-4 and ChatGPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 App Development Overview 55 API Key Management 55 Security and Data Privacy 57 Software Architecture Design Principles 58 LLM-Powered App Vulnerabilities 58 Analyzing Inputs and Outputs 59 The Inevitability of Prompt Injection 60 Example Projects 61 Project 1: Building a News Generator Solution 61 Project 2: Summarizing YouTube Videos 63 Project 3: Creating an Expert for Zelda BOTW 66 Project 4: Voice Control 72 Summary 78 4. Advanced GPT-4 and ChatGPT Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Prompt Engineering 79 Designing Effective Prompts 80 Thinking Step by Step 86 Implementing Few-Shot Learning 88 Improving Prompt Effectiveness 89 Fine-Tuning 91 iv | Table of Contents
  • 7. Getting Started 91 Fine-Tuning with the OpenAI API 94 Fine-Tuning Applications 97 Generating and Fine-Tuning Synthetic Data for an Email Marketing Campaign 99 Cost of Fine-Tuning 105 Summary 105 5. Advancing LLM Capabilities with the LangChain Framework and Plug-ins. . . . . . . . . 109 The LangChain Framework 109 Dynamic Prompts 111 Agents and Tools 112 Memory 115 Embeddings 116 GPT-4 Plug-ins 120 Overview 122 The API 122 The Plug-in Manifest 123 The OpenAPI Specification 124 Descriptions 126 Summary 127 Conclusion 127 Glossary of Key Terms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Table of Contents | v CLICK HERE AND Ask Anything, And Get Stunning, Real-Time Answers In 5-10 Seconds!
  • 8. Preface Within a mere five days of its release, ChatGPT reached an impressive one million users, sending shock waves throughout the tech industry and beyond. As a side effect, the OpenAI API for AI-powered text generation was suddenly brought to light, despite having been available for three years. The ChatGPT interface showcased the potential of such language models, and suddenly developers and inventors began to realize the incredible possibilities available at their fingertips. The field of natural language processing has made incredible technical progress over the years, but until recently, use of the technology was limited to an elite few. The OpenAI API and its accompanying libraries provide a ready-to-use solution for anyone seeking to build AI-powered applications. There is no need to have powerful hardware or deep knowledge of artificial intelligence; with just a few lines of code, developers can integrate incredible features into their projects at a reasonable cost. We combine our knowledge and experience, Olivier as a data scientist and Marie- Alice as a software engineer, to give you a broad understanding of how to develop applications with GPT-4 and ChatGPT. In these pages, you will find clear and detailed explanations of AI concepts, as well as user-friendly guidelines on how to integrate the OpenAI services effectively, securely, and cost-consciously. This book is designed to be accessible to all, but some basic Python knowledge is pre‐ ferred. Through clear explanations, example projects, and step-by-step instructions, we invite you to discover with us how GPT-4 and ChatGPT can transform the way we interact with machines. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. vii
  • 9. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/DevAppsGPT_GitHub. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. viii | Preface
  • 10. Acknowledgments Writing a book on one of the fastest-moving AI topics would not have been possible without the help of many people. We would like to thank the incredible O’Reilly team for their support, advice, and on-point comments; especially Corbin Collins, Nicole Butterfield, Clare Laylock, Suzanne Huston, and Audrey Doyle. The book also benefited from the help of exceptional reviewers who took a lot of time to provide invaluable feedback. Many thanks to Tom Taulli, Lucas Soares, and Leonie Monigatti. Many thanks to our Worldline Labs colleagues for their insights and never-ending discussions on ChatGPT and the OpenAI services; especially Liyun He Guelton, Guillaume Coter, Luxin Zhang, and Patrik De Boe. A huge thank you as well to Worldline’s team of developer advocates who provided support and encouragement from the start; especially Jean-Francois James and Fanilo Andrianasolo. And finally, we thank our friends and family for bearing with us during our ChatGPT craze, allowing us to release this book in such a short time. x | Preface If you thought ChatGPT was smart… ​Then RoboChat will make you say “holy smokes!” ​Yes, that’s right, it’s 16X smarter than ChatGPT… That means, you’re getting: 16X better articles… 16X better websites… 16X better landing pages… 16X better SEO-friendly blog posts 16X better emails… CLICK HERE TO TRY NOW
  • 11. CHAPTER 1 GPT-4 and ChatGPT Essentials Imagine a world where you can communicate with computers as quickly as you can with your friends. What would that look like? What applications could you create? This is the world that OpenAI is helping to build with its GPT models, bringing human-like conversational capabilities to our devices. As the latest advancements in AI, GPT-4 and other GPT models are large language models (LLMs) trained on massive amounts of data, enabling them to recognize and generate human-like text with very high accuracy. The implications of these AI models go far beyond simple voice assistants. Thanks to OpenAI’s models, developers can now exploit the power of natural language process‐ ing (NLP) to create applications that understand our needs in ways that were once science fiction. From innovative customer support systems that learn and adapt to personalized educational tools that understand each student’s unique learning style, GPT-4 and ChatGPT open up a whole new world of possibilities. But what are GPT-4 and ChatGPT? The goal of this chapter is to take a deep dive into the foundations, origins, and key features of these AI models. By understanding the basics of these models, you will be well on your way to building the next generation of LLM-powered applications. Introducing Large Language Models This section lays down the fundamental building blocks that have shaped the devel‐ opment of GPT-4 and ChatGPT. We aim to provide a comprehensive understanding of language models and NLP, the role of transformer architectures, and the tokeniza‐ tion and prediction processes within GPT models. 1 hought ChatGPT Was Smart? Then This Will Blow Your Socks Off... CLICK HERE TO TRY NOW
  • 12. Exploring the Foundations of Language Models and NLP As LLMs, GPT-4 and ChatGPT are the latest type of model obtained in the field of NLP, which is itself a subfield of machine learning (ML) and AI. Before delving into GPT-4 and ChatGPT, it is essential to take a look at NLP and its related fields. There are different definitions of AI, but one of them, more or less the consensus, says that AI is the development of computer systems that can perform tasks that typically require human intelligence. With this definition, many algorithms fall under the AI umbrella. Consider, for example, the traffic prediction task in GPS applications or the rule-based systems used in strategic video games. In these examples, seen from the outside, the machine seems to require intelligence to accomplish these tasks. ML is a subset of AI. In ML, we do not try to directly implement the decision rules used by the AI system. Instead, we try to develop algorithms that allow the system to learn by itself from examples. Since the 1950s, when ML research began, many ML algorithms have been proposed in the scientific literature. Among them, deep learning algorithms have come to the fore. Deep learning is a branch of ML that focuses on algorithms inspired by the structure of the brain. These algorithms are called artificial neural networks. They can handle very large amounts of data and perform very well on tasks such as image and speech recognition and NLP. GPT-4 and ChatGPT are based on a particular type of deep learning algorithm called transformers. Transformers are like reading machines. They pay attention to different parts of a sentence or block of text to understand its context and produce a coherent response. They can also understand the order of words in a sentence and their con‐ text. This makes them highly effective at tasks such as language translation, question answering, and text generation. Figure 1-1 illustrates the relationships among these terms. Figure 1-1. A nested set of technologies from AI to transformers 2 | Chapter 1: GPT-4 and ChatGPT Essentials
  • 13. NLP is a subfield of AI focused on enabling computers to process, interpret, and gen‐ erate natural human language. Modern NLP solutions are based on ML algorithms. The goal of NLP is to allow computers to process natural language text. This goal covers a wide range of tasks: Text classification Categorizing input text into predefined groups. This includes, for example, senti‐ ment analysis and topic categorization. Companies can use sentiment analysis to understand customers’ opinions about their services. Email filtering is an example of topic categorization in which email can be put into categories such as “Personal,” “Social,” “Promotions,” and “Spam.” Automatic translation Automatic translation of text from one language to another. Note that this can include areas like translating code from one programming language to another, such as from Python to C++. Question answering Answering questions based on a given text. For example, an online customer service portal could use an NLP model to answer FAQs about a product, or educational software could use NLP to provide answers to students’ questions about the topic being studied. Text generation Generating a coherent and relevant output text based on a given input text, called a prompt. As mentioned earlier, LLMs are ML models trying to solve text generation tasks, among others. LLMs enable computers to process, interpret, and generate human language, allowing for more effective human–machine communication. To be able to do this, LLMs analyze or train on vast amounts of text data and thereby learn patterns and relationships between words in sentences. A variety of data sources can be used to perform this learning process. This data can include text from Wikipedia, Reddit, the archive of thousands of books, or even the archive of the internet itself. Given an input text, this learning process allows the LLMs to make predictions about the likeliest following words and, in this way, can generate meaningful responses to the input text. The modern language models, published in the past few months, are so large and have been trained on so many texts that they can now directly perform most NLP tasks, such as text classification, machine translation, question answering, and many others. The GPT-4 and ChatGPT models are modern LLMs that excel at text generation tasks. The development of LLMs goes back several years. It started with simple language models such as n-grams, which tried to predict the next word in a sentence based on the previous words. N-gram models use frequency to do this. The predicted Introducing Large Language Models | 3
  • 14. next word is the most frequent word that follows the previous words in the text the n-gram model was trained on. While this approach was a good start, n-gram models’ need for improvement in understanding context and grammar resulted in inconsistent text generation. To improve the performance of n-gram models, more advanced learning algorithms were introduced, including recurrent neural networks (RNNs) and long short-term memory (LSTM) networks. These models could learn longer sequences and analyze the context better than n-grams, but they still needed help processing large amounts of data efficiently. These types of recurrent models were the most efficient ones for a long time and therefore were the most used in tools such as automatic machine translation. Understanding the Transformer Architecture and Its Role in LLMs The Transformer architecture revolutionized NLP, primarily because transformers effectively address one of the critical limitations of previous NLP models such as RNNs: their struggle with long text sequences and maintaining context over these lengths. In other words, while RNNs tended to forget the context in longer sequences (the infamous “catastrophic forgetting”), transformers came with the ability to handle and encode this context effectively. The central pillar of this revolution is the attention mechanism, a simple yet powerful idea. Instead of treating all words in a text sequence as equally important, the model “pays attention” to the most relevant terms for each step of its task. Cross-attention and self-attention are two architectural blocks based on this attention mechanism, and they are often found in LLMs. The Transformer architecture makes extensive use of these cross-attention and self-attention blocks. Cross-attention helps the model determine the relevance of the different parts of the input text for accurately predicting the next word in the output text. It’s like a spotlight that shines on words or phrases in the input text, highlighting the relevant information needed to make the next word prediction while ignoring less important details. To illustrate this, let’s take an example of a simple sentence translation task. Imagine we have an input English sentence, “Alice enjoyed the sunny weather in Brussels,” which should be translated into French as “Alice a profité du temps ensoleillé à Bruxelles.” In this example, let us focus on generating the French word ensoleillé, which means sunny. For this prediction, cross-attention would give more weight to the English words sunny and weather since they are both relevant to the meaning of ensoleillé. By focusing on these two words, cross-attention helps the model gener‐ ate an accurate translation for this part of the sentence. Figure 1-2 illustrates this example. 4 | Chapter 1: GPT-4 and ChatGPT Essentials
  • 15. Figure 1-2. Cross-attention uses the attention mechanism to focus on essential parts of the input text (English sentence) to predict the next word in the output text (French sentence) Self-attention refers to the ability of a model to focus on different parts of its input text. In the context of NLP, the model can evaluate the importance of each word in a sentence with the other words. This allows it to better understand the relationships between the words and helps the model build new concepts from multiple words in the input text. As a more specific example, consider the following: “Alice received praise from her colleagues.” Assume that the model is trying to understand the meaning of the word her in the sentence. The self-attention mechanism assigns different weights to the words in the sentence, highlighting the words relevant to her in this context. In this example, self-attention would place more weight on the words Alice and colleagues. Self-attention helps the model build new concepts from these words. In this example, one of the concepts that could emerge would be “Alice’s colleagues,” as shown in Figure 1-3. Figure 1-3. Self-attention allows the emergence of the “Alice’s colleagues” concept Introducing Large Language Models | 5
  • 16. Unlike the recurrent architecture, transformers also have the advantage of being easily parallelized. This means the Transformer architecture can process multiple parts of the input text simultaneously rather than sequentially. This allows faster computation and training because different parts of the model can work in parallel without waiting for previous steps to complete, unlike recurrent architectures, which require sequential processing. The parallel processing capability of transformer mod‐ els fits perfectly with the architecture of graphics processing units (GPUs), which are designed to handle multiple computations simultaneously. Therefore, GPUs are ideal for training and running these transformer models because of their high parallelism and computational power. This advance allowed data scientists to train models on much larger datasets, paving the way for developing LLMs. The Transformer architecture, introduced in 2017 by Vaswani et al. from Google in the paper “Attention Is All You Need”, was originally developed for sequence-to- sequence tasks such as machine translation. A standard transformer consists of two primary components: an encoder and a decoder, both of which rely heavily on attention mechanisms. The task of the encoder is to process the input text, identify valuable features, and generate a meaningful representation of that text, known as embedding. The decoder then uses this embedding to produce an output, such as a translation or summary. This output effectively interprets the encoded information. Generative pre-trained transformers, commonly known as GPT, are a family of mod‐ els that are based on the Transformer architecture and that specifically utilize the decoder part of the original architecture. In GPT, the encoder is not present, so there is no need for cross-attention to integrate the embeddings produced by an encoder. As a result, GPT relies solely on the self-attention mechanism within the decoder to generate context-aware representations and predictions. Note that other well-known models, such as BERT (Bidirectional Encoder Representations from Transformers), are based on the encoder part. We don’t cover this type of model in this book. Figure 1-4 illustrates the evolution of these different models. Figure 1-4. The evolution of NLP techniques from n-grams to the emergence of LLMs 6 | Chapter 1: GPT-4 and ChatGPT Essentials
  • 17. Demystifying the Tokenization and Prediction Steps in GPT Models LLMs in the GPT family receive a prompt as input, and in response they generate a text. This process is known as text completion. For example, the prompt could be “The weather is nice today, so I decided to” and the model output might be “go for a walk”. You may be wondering how the LLM model builds this output text from the input prompt. As you will see, it’s mostly just a question of probabilities. When a prompt is sent to an LLM, it first breaks the input into smaller pieces called tokens. These tokens represent single words, parts of words, or spaces and punctuation. For example, the preceding prompt could be broken like this: [“The”, “wea”, “ther”, “is”, “nice”, “today”, “,”, “so”, “I”, “de”, “ci”, “ded”, “to”]. Each language model comes with its own tokenizer. The GPT-4 tokenizer is not available at the time of this writing, but you can test the GPT-3 tokenizer. A rule of thumb for understanding tokens in terms of word length is that 100 tokens equal approximately 75 words for an English text. Thanks to the attention principle and the Transformer architecture introduced ear‐ lier, the LLM processes these tokens and can interpret the relationships between them and the overall meaning of the prompt. The Transformer architecture allows a model to efficiently identify the critical information and the context within the text. To create a new sentence, the LLM predicts the tokens most likely to follow, based on the context of the prompt. OpenAI produced two versions of GPT-4, with context windows of 8,192 tokens and 32,768 tokens. Unlike the previous recurrent models, which had difficulty handling long input sequences, the Transformer architecture with the attention mechanism allows the modern LLM to consider the context as a whole. Based on this context, the model assigns a probability score for each potential subsequent token. The token with the highest probability is then selected as the next token in the sequence. In our example, after “The weather is nice today, so I decided to”, the next best token could be “go”. This process is then repeated, but now the context becomes “The weather is nice today, so I decided to go”, where the previously predicted token “go” is added to the original prompt. The second token that the model might predict could be “for”. This process is repeated until a complete sentence is formed: “go for a walk”. This process relies on the LLM’s ability to learn the next most probable word from massive text data. Figure 1-5 illustrates this process. Introducing Large Language Models | 7
  • 18. Figure 1-5. The completion process is iterative, token by token A Brief History: From GPT-1 to GPT-4 In this section, we will review the evolution of the OpenAI GPT models from GPT-1 to GPT-4. GPT-1 In mid-2018, just one year after the invention of the Transformer architecture, OpenAI published a paper titled “Improving Language Understanding by Generative Pre-Training”, by Radford et al., in which the company introduced the Generative Pre-trained Transformer, also known as GPT-1. Before GPT-1, the common approach to building high-performance NLP neural models relied on supervised learning. These learning techniques use large amounts of manually labeled data. For example, in a sentiment analysis task where the goal is to classify whether a given text has positive or negative sentiment, a common strategy would require collecting thousands of manually labeled text examples to build an effective classification model. However, the need for large amounts of well-annotated, supervised data has limited the performance of these techniques because such data‐ sets are both difficult and expensive to generate. 8 | Chapter 1: GPT-4 and ChatGPT Essentials
  • 19. In their paper, the authors of GPT-1 proposed a new learning process in which an unsupervised pre-training step is introduced. In this pre-training step, no labeled data is needed. Instead, the model is trained to predict what the next token is. Thanks to the use of the Transformer architecture, which allows parallelization, this pre-training was performed on a large amount of data. For the pre-training, the GPT-1 model used the BookCorpus dataset, which contains the text of approximately 11,000 unpublished books. This dataset was initially presented in 2015 in the scien‐ tific paper “Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books” by Zhu et al., and was initially made available on a University of Toronto web page. However, today the official version of the original dataset is no longer publicly accessible. The GPT-1 model was found to be effective in a variety of basic completion tasks. In the unsupervised learning phase, the model learned to predict the next item in the texts of the BookCorpus dataset. However, since GPT-1 is a small model, it was unable to perform complex tasks without fine-tuning. Therefore, fine-tuning was performed as a second supervised learning step on a small set of manually labeled data to adapt the model to a specific target task. For example, in a classification task such as sentiment analysis, it may be necessary to retrain the model on a small set of manually labeled text examples to achieve reasonable accuracy. This process allowed the parameters learned in the initial pre-training phase to be modified to better fit the task at hand. Despite its relatively small size, GPT-1 showed remarkable performance on several NLP tasks using only a small amount of manually labeled data for fine-tuning. The GPT-1 architecture consisted of a decoder similar to the original transformer, which was introduced in 2017 and had 117 million parameters. This first GPT model paved the way for more powerful models with larger datasets and more parameters to take better advantage of the potential of the Transformer architecture. GPT-2 In early 2019, OpenAI proposed GPT-2, a scaled-up version of the GPT-1 model that increased the number of parameters and the size of the training dataset tenfold. The number of parameters of this new version was 1.5 billion, trained on 40 GB of text. In November 2019, OpenAI released the full version of the GPT-2 language model. A Brief History: From GPT-1 to GPT-4 | 9
  • 20. GPT-2 is publicly available and can be downloaded from Hugging Face or GitHub. GPT-2 showed that training a larger language model on a larger dataset improves the ability of a language model to process tasks and outperforms the state of the art on many jobs. It also showed that even larger language models can process natural language better. GPT-3 OpenAI released version 3 of GPT in June 2020. The main differences between GPT-2 and GPT-3 are the size of the model and the quantity of data used for the training. GPT-3 is a much larger model than GPT-2, with 175 billion parameters, allowing it to capture more complex patterns. In addition, GPT-3 was trained on a more extensive dataset. This includes Common Crawl, a large web archive containing text from billions of web pages and other sources, such as Wikipedia. This training dataset, which includes content from websites, books, and articles, allows GPT-3 to develop a deeper understanding of the language and context. As a result, GPT-3 demonstrates improved performance on a variety of linguistics tasks. It also demon‐ strates superior coherence and creativity in its generated texts. It is even capable of writing code snippets, such as SQL queries, and performing other intelligent tasks. Furthermore, GPT-3 eliminates the need for a fine-tuning step, which was mandatory for its predecessors. However, with GPT-3 there is a problem of misalignment between the tasks given by end users and what the model has seen during its training. As we have seen, language models are trained to predict the next token based on the input context. This training process is not necessarily directly aligned with the tasks end users want the model to perform. In addition, increasing the size of language models does not inherently make them better at following user intent or instructions. Moreover, models like GPT-3 were trained on data from different sources on the internet. Although a cleanup has been made in the selection of sources, the learning data may contain false or problematic text, including racist text, sexist text, misinformation, or disinformation. As a result, the model may sometimes say the wrong things or even toxic things. In 2021 a new release of the GPT-3 model was published, called the Instruct series. Unlike the original GPT-3 base model, the instruct models are optimized by reinforcement learning with human feedback, meaning that they use the feedback to learn and to improve over time. This allows the models to learn from human instructions while making them more truthful and less toxic. 10 | Chapter 1: GPT-4 and ChatGPT Essentials
  • 21. To illustrate the difference, we input the prompt “Explain what is meant by time complexity”, and we receive the following: • With the standard GPT-3, we obtain the output: “Explain what is meant by space • complexity. Explain what is meant by the big-O notation.” • With the instructGPT-3 model, we obtain: “Time complexity is a way of measur‐ • ing the amount of time it takes for an algorithm to run and complete its task. It is usually expressed using Big O notation, which measures the complexity of an algorithm in terms of the number of operations it performs. The time complexity of an algorithm is important because it determines how efficient the algorithm is and how well it scales with larger inputs.” We can see that for the same input, the first model cannot answer the question (the answer is even weird), whereas the second model does answer the question. It is, of course, possible to obtain the desired response with a standard GPT-3 model. However, contrary to instruction models, it is necessary to apply specific prompt design and optimization techniques to obtain the desired output from the GPT-3 model. This technique is called prompt engineering and will be detailed in the coming chapters. From GPT-3 to InstructGPT OpenAI explains how the Instruct series was constructed in the scientific paper “Training Language Models to Follow Instructions with Human Feedback” by Ouyang et al. The training recipe has two main stages to go from a GPT-3 model to an instructed GPT-3 model: supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). In each stage, the results of the prior stage are fine-tuned. That is, the SFT stage receives the GPT-3 model and returns a new model, which is sent to the RLHF stage to obtain the instructed version. Figure 1-6, from the scientific paper from OpenAI, details the entire process. A Brief History: From GPT-1 to GPT-4 | 11 WATCH THIS FREE VIDEO TO programming geek to build mobile apps & any software
  • 22. Figure 1-6. The steps to obtain the instructed models (redrawn from an image by Ouyang et al.) 12 | Chapter 1: GPT-4 and ChatGPT Essentials
  • 23. We will step through these stages one by one. In the SFT stage, the original GPT-3 model is fine-tuned with straightforward super‐ vised learning (step 1 in Figure 1-6). OpenAI has a collection of prompts made by end users. The process starts with the random selection of a prompt from the set of available prompts. A human (called a labeler) is then asked to write an example of an ideal answer to this prompt. This process is repeated thousands of times to obtain a supervised training set composed of prompts and the corresponding ideal responses. This dataset is then used to fine-tune the GPT-3 model to give more consistent answers to user requests. The resulting model is called the SFT model. The RLHF stage is divided into two substeps. First a reward model (RM) is built (step 2 in Figure 1-6), and then the RM is used for reinforcement learning (step 3 in Figure 1-6). The goal of the RM is to automatically give a score to a response to a prompt. When the response matches what is indicated in the prompt, the RM score should be high; when it doesn’t match, it should be low. To construct the RM, OpenAI begins by randomly selecting a question and using the SFT model to produce several possible answers. As we will see later, it is possible to produce many responses with the same input prompt via a parameter called temperature. A human labeler is then asked to rank the responses based on criteria such as fit with the prompt and toxicity of the response. After running this procedure many times, a dataset is used to fine-tune the SFT model for scoring. This RM will be used to build the final InstructGPT model. The final step in training InstructGPT models involves reinforcement learning, which is an iterative process. It starts with an initial generative model, such as the SFT model. Then a random prompt is selected, and the model predicts an output, which the RM evaluates. Based on the reward received, the generative model is updated accordingly. This process can be repeated countless times without human interven‐ tion, providing a more efficient and automated approach to adapting the model for better performance. InstructGPT models are better at producing accurate completions for what people give as input in the prompt. OpenAI recommends using the InstructGPT series rather than the original series. GPT-3.5, Codex, and ChatGPT In March 2022, OpenAI made available new versions of GPT-3. These new models can edit text or insert content into text. They have been trained on data through June 2021 and are described as more powerful than previous versions. At the end of November 2022, OpenAI began referring to these models as belonging to the GPT-3.5 series. A Brief History: From GPT-1 to GPT-4 | 13
  • 24. Premium Content Worth $99 Locked To unlock th Ebook And Download It For FREE CLICK HERE