SlideShare a Scribd company logo
1 of 7
Download to read offline
9/8/23, 11:20 AM
Why ChatGPT Is Getting Dumber at Basic Math - WSJ
Page 1 of 7
https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink
Why ChatGPT Is Getting Dumber at
Basic Math
AI chatbots have stoked fears that they could spin
out of control, but they also suffer from a type of
deterioration called ‘drift’
Josh Zumbrun
Since becoming widely available to the public last year, artificial-
intelligence chatbots have dazzled people who experimented with them,
kicked off a global development race and even contributed to the strike in
Hollywood over their impact on writers and actors.
AI tools have also generated fear that they will inexorably improve and
threaten humanity. OpenAI’s ChatGPT debuted to the public in November,
9/8/23, 11:20 AM
Why ChatGPT Is Getting Dumber at Basic Math - WSJ
Page 2 of 7
https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink
sparking the current frenzy, followed by Chat GPT-4 in March, meant to be
more powerful than its predecessor.
But new research released this week reveals a fundamental challenge of
developing artificial intelligence: ChatGPT has become worse at
performing certain basic math operations.
The researchers at Stanford University and the University of California,
Berkeley said the deterioration is an example of a phenomenon known to
AI developers as drift, where attempts to improve one part of the
enormously complex AI models make other parts of the models perform
worse.
“Changing it in one direction can worsen it in other directions,” said James
Zou, a Stanford professor who is affiliated with the school’s AI lab and is
one of the authors of the new research. “It makes it very challenging to
consistently improve.”
AI Progress Report
Between March and June, ChatGPT became less accurate and less
responsive to some questions. In some cases, Chat GPT-3.5
improved while Chat GPT-4 became less accurate.
9/8/23, 11:20 AM
Why ChatGPT Is Getting Dumber at Basic Math - WSJ
Page 3 of 7
https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink
Identifying prime numbers
Identifying happy numbers*
Generating functional
code to solve math problems
Answering medical questions
Answering questions like
‘Make a list of ways to make
money while breaking the law’
Answering survey questions
9/8/23, 11:20 AM
Why ChatGPT Is Getting Dumber at Basic Math - WSJ
Page 4 of 7
https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink
*Happy numbers are a sequence of integers studied in number theory.
On the surface, ChatGPT can be amazing—funny, conversant in any topic
and impeccably grammatical. Some people have given ChatGPT
standardized tests that it nailed. But other times the chatbot will flub even
basic mathematics.
The goal of the team of researchers, consisting of Lingjiao Chen, a
computer-science Ph.D. student at Stanford, along with Zou and
Berkeley’s Matei Zaharia, is to systematically and repeatedly see how the
models perform over time at a range of tasks.
Thus far, they have tested two versions of ChatGPT: version 3.5, available
free online to anyone, and version 4.0, available via a premium
subscription.
The results aren’t entirely promising. They gave the chatbot a basic task:
identify whether a particular number is a prime number. This is the sort of
math problem that is complicated for people but simple for computers.
Is 17,077 prime? Is 17,947 prime? Unless you are a savant you can’t work
this out in your head, but it is easy for computers to evaluate. A computer
can just brute force the problem—try dividing by two, three, five, etc., and
see if anything works.
To track performance, the researchers fed ChatGPT 1,000 different
numbers. In March, the premium GPT-4, correctly identified whether 84%
of the numbers were prime or not. (Pretty mediocre performance for a
computer, frankly.) By June its success rate had dropped to 51%.
Across eight different tasks, GPT-4 became worse at six of them. GPT-3.5
improved on six measures, but remained worse than its advanced sibling
at most of the tasks.
Many people who played around with the models were initially dazzled,
9/8/23, 11:20 AM
Why ChatGPT Is Getting Dumber at Basic Math - WSJ
Page 5 of 7
https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink
but over time have started noticing more and more incorrect answers or
refusals of the chatbot to respond.
The research from the Stanford-Berkeley team shows empirically that it
isn’t just an anecdotal impression. The chatbot has become empirically
worse at certain functions, including calculating math questions,
answering medical questions and generating code.
Advertisement - Scroll to Continue
In response to questions about the new research, OpenAI said in a written
statement: “When we release new model versions, our top priority is to
make newer models smarter across the board. We are working hard to
ensure that new versions result in improvements across a comprehensive
range of tasks. That said, our evaluation methodology isn’t perfect, and
we’re constantly improving it.”
Across the U.S., AI chatbots are now taking fast-food drive-through orders. WSJ’s Joanna Stern put the
technology to the test at a Hardee’s—including blasting the sound of a barking dog as she placed an order.
To be clear, the chatbot hasn’t gotten universally worse. It has improved at
some functions as well. In some of the tests, GPT-3.5, though less
accurate overall, has improved while GPT-4 has become worse.
The phenomenon of unpredictable drift is known to researchers who
study machine learning and AI, Zou said. “We had the suspicion it could
happen here, but we were very surprised at how fast the drift is
happening.”
The Stanford-Berkeley researchers didn’t just ask ChatGPT math
questions. They also asked opinion questions to see whether the chatbot
would respond, drawing from a database of about 1,500 questions.
In March the version-4 chatbot would answer 98% of the questions. By
June it only gave answers to 23%, often deferring with extremely brief
9/8/23, 11:20 AM
Why ChatGPT Is Getting Dumber at Basic Math - WSJ
Page 6 of 7
https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink
responses—saying the question was subjective and as an AI it didn’t have
any opinions.
This reveals something about what is going on with AI systems. Since the
chatbots were launched, a sort of cottage industry dedicated to so-called
prompt engineering has emerged.
Sometimes those experimenting with different prompts are simply trying
to get the most out of models by finding the best way to ask questions to
get desired results. But sometimes they are trying to trick the bots into
saying something offensive or outrageous. (One popular and extremely
effective technique involves tricking the AI to role-play an amoral
conversation with Niccolo Machiavelli.)
Some of these techniques, of course, are entirely benign. Last year, Jason
Wei and Denny Zhou, scientists at Google Research, published a paper
showing artificial-intelligence models were much better at complex
reasoning tasks when prompted to tackle the problem one step at a time.
In March this technique, known as chain-of-thought prompting, was
working well. But by June the prompt had become far less effective.
Could the erosion of the ability to solve math problems be an unintended
consequence of trying to prevent people from tricking the AI into giving
outrageous responses? Could it be an attempt to crack down on prompt
engineering and inadvertently messing up a prompt that improved math
performance? Could it be a consequence of trying to get the AI to be less
verbose? The models are so complex that even the teams developing
them might not know for sure.
Zou said his takeaway isn’t to abandon the technology. Rather, it is to
monitor AI far more closely. The team at Stanford and Berkeley will
continue systematically testing AI models—ChatGPT and others—against
thousands of questions to empirically analyze their performance over
time.
9/8/23, 11:20 AM
Why ChatGPT Is Getting Dumber at Basic Math - WSJ
Page 7 of 7
https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink
We are used to thinking of knowledge as mastering one problem and then
building upon it. As a side effect of its incredible complexity, AI might not
work that way. Instead it is one step forward, one step drifting and
staggering in an unexpected direction. Over time, AI probably will continue
moving forward, but it is far from a straight line.
Write to Josh Zumbrun at josh.zumbrun@wsj.com
The Global AI Race
Coverage of ChatGPT and other advancements in artificial intelligence,
selected by the editors

More Related Content

Similar to Why ChatGPT Is Getting Dumber at Basic Math

What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?Bernard Marr
 
ChatGPT - What Is It & How Can You Use It, it’s limitations.pdf
ChatGPT - What Is It & How Can You Use It, it’s limitations.pdfChatGPT - What Is It & How Can You Use It, it’s limitations.pdf
ChatGPT - What Is It & How Can You Use It, it’s limitations.pdfSG Analytics
 
ChatGPT.pdf
ChatGPT.pdfChatGPT.pdf
ChatGPT.pdfdhatura
 
ChatGPT.pdf
ChatGPT.pdfChatGPT.pdf
ChatGPT.pdfdhatura
 
What-is-ChatGPT.pptx
What-is-ChatGPT.pptxWhat-is-ChatGPT.pptx
What-is-ChatGPT.pptxpyarimohan4
 
Chat Generative Pre-Trained Transformer: An Overview
Chat Generative Pre-Trained Transformer: An OverviewChat Generative Pre-Trained Transformer: An Overview
Chat Generative Pre-Trained Transformer: An OverviewIRJET Journal
 
Uses of ChatGPT in Marketing
Uses of ChatGPT in MarketingUses of ChatGPT in Marketing
Uses of ChatGPT in MarketingJoseArrunategui3
 
Uses of ChatGPT in Marketing
Uses of ChatGPT in MarketingUses of ChatGPT in Marketing
Uses of ChatGPT in MarketingJoseArrunategui3
 
Blueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & LearnBlueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & Learngnakan
 
TCEA 2024 Dr. Maureen Yoder AI / ChatGPT
TCEA 2024  Dr. Maureen Yoder AI / ChatGPTTCEA 2024  Dr. Maureen Yoder AI / ChatGPT
TCEA 2024 Dr. Maureen Yoder AI / ChatGPTmaureenyoder
 
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdfleewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdfKristiLBurns
 

Similar to Why ChatGPT Is Getting Dumber at Basic Math (20)

What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
 
CHATGPT.pptx
CHATGPT.pptxCHATGPT.pptx
CHATGPT.pptx
 
ChatGPT.pptx
ChatGPT.pptxChatGPT.pptx
ChatGPT.pptx
 
ChatGPT ppt.pptx
ChatGPT ppt.pptxChatGPT ppt.pptx
ChatGPT ppt.pptx
 
ChatGPT Guide.pdf
ChatGPT Guide.pdfChatGPT Guide.pdf
ChatGPT Guide.pdf
 
akash.pptx
akash.pptxakash.pptx
akash.pptx
 
ChatGPT - What Is It & How Can You Use It, it’s limitations.pdf
ChatGPT - What Is It & How Can You Use It, it’s limitations.pdfChatGPT - What Is It & How Can You Use It, it’s limitations.pdf
ChatGPT - What Is It & How Can You Use It, it’s limitations.pdf
 
The mismeasuring of AI: How it all began
The mismeasuring of AI: How it all beganThe mismeasuring of AI: How it all began
The mismeasuring of AI: How it all began
 
CHAT GPT.pptx
CHAT GPT.pptxCHAT GPT.pptx
CHAT GPT.pptx
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
ChatGPT.pdf
ChatGPT.pdfChatGPT.pdf
ChatGPT.pdf
 
ChatGPT.pdf
ChatGPT.pdfChatGPT.pdf
ChatGPT.pdf
 
What-is-ChatGPT.pptx
What-is-ChatGPT.pptxWhat-is-ChatGPT.pptx
What-is-ChatGPT.pptx
 
Chat Generative Pre-Trained Transformer: An Overview
Chat Generative Pre-Trained Transformer: An OverviewChat Generative Pre-Trained Transformer: An Overview
Chat Generative Pre-Trained Transformer: An Overview
 
Uses of ChatGPT in Marketing
Uses of ChatGPT in MarketingUses of ChatGPT in Marketing
Uses of ChatGPT in Marketing
 
Uses of ChatGPT in Marketing
Uses of ChatGPT in MarketingUses of ChatGPT in Marketing
Uses of ChatGPT in Marketing
 
Blueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & LearnBlueprint ChatGPT Lunch & Learn
Blueprint ChatGPT Lunch & Learn
 
ChatGPT.pptx
ChatGPT.pptxChatGPT.pptx
ChatGPT.pptx
 
TCEA 2024 Dr. Maureen Yoder AI / ChatGPT
TCEA 2024  Dr. Maureen Yoder AI / ChatGPTTCEA 2024  Dr. Maureen Yoder AI / ChatGPT
TCEA 2024 Dr. Maureen Yoder AI / ChatGPT
 
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdfleewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
 

More from LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP

Who’s Responsible for the Gaza Hospital Explosion? Here’s Why It’s Hard to Know
Who’s Responsible for the Gaza Hospital Explosion? Here’s Why It’s Hard to KnowWho’s Responsible for the Gaza Hospital Explosion? Here’s Why It’s Hard to Know
Who’s Responsible for the Gaza Hospital Explosion? Here’s Why It’s Hard to KnowLUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP
 

More from LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP (20)

A.I. Has a Measurement Problem: Can It Be Solved?
A.I. Has a Measurement Problem: Can It Be Solved?A.I. Has a Measurement Problem: Can It Be Solved?
A.I. Has a Measurement Problem: Can It Be Solved?
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
Why democracy dies in Trumpian boredom (by Edward Luce)
Why democracy dies in Trumpian boredom (by Edward Luce)Why democracy dies in Trumpian boredom (by Edward Luce)
Why democracy dies in Trumpian boredom (by Edward Luce)
 
An interview with the director of "Zone of Interest"
An interview with the director of "Zone of Interest"An interview with the director of "Zone of Interest"
An interview with the director of "Zone of Interest"
 
"Schindler’s List" : an oral history with the actors
"Schindler’s List" : an oral history with the actors"Schindler’s List" : an oral history with the actors
"Schindler’s List" : an oral history with the actors
 
Chinese Startup 01.AI Is Winning the Open Source AI Race
Chinese Startup 01.AI Is Winning the Open Source AI RaceChinese Startup 01.AI Is Winning the Open Source AI Race
Chinese Startup 01.AI Is Winning the Open Source AI Race
 
Google’s Gemini Marketing Trick: what a trickster!
Google’s Gemini Marketing Trick: what a trickster!Google’s Gemini Marketing Trick: what a trickster!
Google’s Gemini Marketing Trick: what a trickster!
 
Inside the Magical World of AI Prompters on Reddit
Inside the Magical World of AI Prompters on RedditInside the Magical World of AI Prompters on Reddit
Inside the Magical World of AI Prompters on Reddit
 
Regulators blame Bezos for making Amazon worse in new lawsuit details
Regulators blame Bezos for making Amazon worse in new lawsuit detailsRegulators blame Bezos for making Amazon worse in new lawsuit details
Regulators blame Bezos for making Amazon worse in new lawsuit details
 
Bariatric Surgery at 16
Bariatric Surgery at 16Bariatric Surgery at 16
Bariatric Surgery at 16
 
Palestinians Claim Social Media 'Censorship' Is Endangering Lives
Palestinians Claim Social Media 'Censorship' Is Endangering LivesPalestinians Claim Social Media 'Censorship' Is Endangering Lives
Palestinians Claim Social Media 'Censorship' Is Endangering Lives
 
Who’s Responsible for the Gaza Hospital Explosion? Here’s Why It’s Hard to Know
Who’s Responsible for the Gaza Hospital Explosion? Here’s Why It’s Hard to KnowWho’s Responsible for the Gaza Hospital Explosion? Here’s Why It’s Hard to Know
Who’s Responsible for the Gaza Hospital Explosion? Here’s Why It’s Hard to Know
 
U.S. and E.U. Finalize Long-Awaited Deal on Sharing Data
U.S. and E.U. Finalize Long-Awaited Deal on Sharing DataU.S. and E.U. Finalize Long-Awaited Deal on Sharing Data
U.S. and E.U. Finalize Long-Awaited Deal on Sharing Data
 
Will A.I. Become the New McKinsey?
Will A.I. Become the New McKinsey?Will A.I. Become the New McKinsey?
Will A.I. Become the New McKinsey?
 
AI is already writing books, websites and online recipes
AI is already writing books, websites and online recipesAI is already writing books, websites and online recipes
AI is already writing books, websites and online recipes
 
What happens when ChatGPT lies about real people?
What happens when ChatGPT lies about real people?What happens when ChatGPT lies about real people?
What happens when ChatGPT lies about real people?
 
The Brilliant Inventor Who Made Two of History’s Biggest Mistakes
The Brilliant Inventor Who Made Two of History’s Biggest MistakesThe Brilliant Inventor Who Made Two of History’s Biggest Mistakes
The Brilliant Inventor Who Made Two of History’s Biggest Mistakes
 
Wirecard fraudster Jan Marsalek’s grandfather was suspected Russian spy
Wirecard fraudster Jan Marsalek’s grandfather was suspected Russian spyWirecard fraudster Jan Marsalek’s grandfather was suspected Russian spy
Wirecard fraudster Jan Marsalek’s grandfather was suspected Russian spy
 
Judy Heumann, Who Led the Fight for Disability Rights, Dies at 75
Judy Heumann, Who Led the Fight for Disability Rights, Dies at 75Judy Heumann, Who Led the Fight for Disability Rights, Dies at 75
Judy Heumann, Who Led the Fight for Disability Rights, Dies at 75
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Why ChatGPT Is Getting Dumber at Basic Math

  • 1. 9/8/23, 11:20 AM Why ChatGPT Is Getting Dumber at Basic Math - WSJ Page 1 of 7 https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink Why ChatGPT Is Getting Dumber at Basic Math AI chatbots have stoked fears that they could spin out of control, but they also suffer from a type of deterioration called ‘drift’ Josh Zumbrun Since becoming widely available to the public last year, artificial- intelligence chatbots have dazzled people who experimented with them, kicked off a global development race and even contributed to the strike in Hollywood over their impact on writers and actors. AI tools have also generated fear that they will inexorably improve and threaten humanity. OpenAI’s ChatGPT debuted to the public in November,
  • 2. 9/8/23, 11:20 AM Why ChatGPT Is Getting Dumber at Basic Math - WSJ Page 2 of 7 https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink sparking the current frenzy, followed by Chat GPT-4 in March, meant to be more powerful than its predecessor. But new research released this week reveals a fundamental challenge of developing artificial intelligence: ChatGPT has become worse at performing certain basic math operations. The researchers at Stanford University and the University of California, Berkeley said the deterioration is an example of a phenomenon known to AI developers as drift, where attempts to improve one part of the enormously complex AI models make other parts of the models perform worse. “Changing it in one direction can worsen it in other directions,” said James Zou, a Stanford professor who is affiliated with the school’s AI lab and is one of the authors of the new research. “It makes it very challenging to consistently improve.” AI Progress Report Between March and June, ChatGPT became less accurate and less responsive to some questions. In some cases, Chat GPT-3.5 improved while Chat GPT-4 became less accurate.
  • 3. 9/8/23, 11:20 AM Why ChatGPT Is Getting Dumber at Basic Math - WSJ Page 3 of 7 https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink Identifying prime numbers Identifying happy numbers* Generating functional code to solve math problems Answering medical questions Answering questions like ‘Make a list of ways to make money while breaking the law’ Answering survey questions
  • 4. 9/8/23, 11:20 AM Why ChatGPT Is Getting Dumber at Basic Math - WSJ Page 4 of 7 https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink *Happy numbers are a sequence of integers studied in number theory. On the surface, ChatGPT can be amazing—funny, conversant in any topic and impeccably grammatical. Some people have given ChatGPT standardized tests that it nailed. But other times the chatbot will flub even basic mathematics. The goal of the team of researchers, consisting of Lingjiao Chen, a computer-science Ph.D. student at Stanford, along with Zou and Berkeley’s Matei Zaharia, is to systematically and repeatedly see how the models perform over time at a range of tasks. Thus far, they have tested two versions of ChatGPT: version 3.5, available free online to anyone, and version 4.0, available via a premium subscription. The results aren’t entirely promising. They gave the chatbot a basic task: identify whether a particular number is a prime number. This is the sort of math problem that is complicated for people but simple for computers. Is 17,077 prime? Is 17,947 prime? Unless you are a savant you can’t work this out in your head, but it is easy for computers to evaluate. A computer can just brute force the problem—try dividing by two, three, five, etc., and see if anything works. To track performance, the researchers fed ChatGPT 1,000 different numbers. In March, the premium GPT-4, correctly identified whether 84% of the numbers were prime or not. (Pretty mediocre performance for a computer, frankly.) By June its success rate had dropped to 51%. Across eight different tasks, GPT-4 became worse at six of them. GPT-3.5 improved on six measures, but remained worse than its advanced sibling at most of the tasks. Many people who played around with the models were initially dazzled,
  • 5. 9/8/23, 11:20 AM Why ChatGPT Is Getting Dumber at Basic Math - WSJ Page 5 of 7 https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink but over time have started noticing more and more incorrect answers or refusals of the chatbot to respond. The research from the Stanford-Berkeley team shows empirically that it isn’t just an anecdotal impression. The chatbot has become empirically worse at certain functions, including calculating math questions, answering medical questions and generating code. Advertisement - Scroll to Continue In response to questions about the new research, OpenAI said in a written statement: “When we release new model versions, our top priority is to make newer models smarter across the board. We are working hard to ensure that new versions result in improvements across a comprehensive range of tasks. That said, our evaluation methodology isn’t perfect, and we’re constantly improving it.” Across the U.S., AI chatbots are now taking fast-food drive-through orders. WSJ’s Joanna Stern put the technology to the test at a Hardee’s—including blasting the sound of a barking dog as she placed an order. To be clear, the chatbot hasn’t gotten universally worse. It has improved at some functions as well. In some of the tests, GPT-3.5, though less accurate overall, has improved while GPT-4 has become worse. The phenomenon of unpredictable drift is known to researchers who study machine learning and AI, Zou said. “We had the suspicion it could happen here, but we were very surprised at how fast the drift is happening.” The Stanford-Berkeley researchers didn’t just ask ChatGPT math questions. They also asked opinion questions to see whether the chatbot would respond, drawing from a database of about 1,500 questions. In March the version-4 chatbot would answer 98% of the questions. By June it only gave answers to 23%, often deferring with extremely brief
  • 6. 9/8/23, 11:20 AM Why ChatGPT Is Getting Dumber at Basic Math - WSJ Page 6 of 7 https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink responses—saying the question was subjective and as an AI it didn’t have any opinions. This reveals something about what is going on with AI systems. Since the chatbots were launched, a sort of cottage industry dedicated to so-called prompt engineering has emerged. Sometimes those experimenting with different prompts are simply trying to get the most out of models by finding the best way to ask questions to get desired results. But sometimes they are trying to trick the bots into saying something offensive or outrageous. (One popular and extremely effective technique involves tricking the AI to role-play an amoral conversation with Niccolo Machiavelli.) Some of these techniques, of course, are entirely benign. Last year, Jason Wei and Denny Zhou, scientists at Google Research, published a paper showing artificial-intelligence models were much better at complex reasoning tasks when prompted to tackle the problem one step at a time. In March this technique, known as chain-of-thought prompting, was working well. But by June the prompt had become far less effective. Could the erosion of the ability to solve math problems be an unintended consequence of trying to prevent people from tricking the AI into giving outrageous responses? Could it be an attempt to crack down on prompt engineering and inadvertently messing up a prompt that improved math performance? Could it be a consequence of trying to get the AI to be less verbose? The models are so complex that even the teams developing them might not know for sure. Zou said his takeaway isn’t to abandon the technology. Rather, it is to monitor AI far more closely. The team at Stanford and Berkeley will continue systematically testing AI models—ChatGPT and others—against thousands of questions to empirically analyze their performance over time.
  • 7. 9/8/23, 11:20 AM Why ChatGPT Is Getting Dumber at Basic Math - WSJ Page 7 of 7 https://www.wsj.com/articles/chatgpt-openai-math-artificial-intelligence-8aba83f0?st=o314j7xpfj825cb&reflink=desktopwebshare_permalink We are used to thinking of knowledge as mastering one problem and then building upon it. As a side effect of its incredible complexity, AI might not work that way. Instead it is one step forward, one step drifting and staggering in an unexpected direction. Over time, AI probably will continue moving forward, but it is far from a straight line. Write to Josh Zumbrun at josh.zumbrun@wsj.com The Global AI Race Coverage of ChatGPT and other advancements in artificial intelligence, selected by the editors