SlideShare a Scribd company logo
A Neural Network Solves, Explains, and
Generates University Math Problems by Program
Synthesis and Few-Shot Learning at Human Level
Iddo Drori1,a,b
, Sarah Zhanga
, Reece Shuttlewortha
, Leonard Tangc
, Albert Lua
, Elizabeth Kea
, Kevin Liua
, Linda Chena
,
Sunny Trana
, Newman Chengb
, Roman Wangb
, Nikhil Singha
, Taylor L. Pattic
, Jayson Lynchd
, Avi Shporera
, Nakul Vermab
,
Eugene Wub
, and Gilbert Stranga
a
Massachusetts Institute of Technology; b
Columbia University; c
Harvard University; d
University of Waterloo
This manuscript was compiled on April 7, 2022
We demonstrate that a neural network pre-trained on text and fine-
tuned on code solves mathematics course problems, explains solu-
tions, and generates new questions at human level. We automatically
synthesize programs using few-shot learning and OpenAI’s Codex
transformer and execute them to solve course problems at 80% auto-
matic accuracy. We curate a new dataset of questions from MIT’s
large mathematics courses (Single Variable and Multivariable Cal-
culus, Differential Equations, Introduction to Probability and Statis-
tics, Linear Algebra, and Mathematics for Computer Science) and
Columbia University’s Computational Linear Algebra course. We
solve questions from a MATH dataset (on Prealgebra, Algebra, Count-
ing and Probability, Intermediate Algebra, Number Theory, and Pre-
calculus), the latest benchmark of advanced mathematics problems
designed to assess mathematical reasoning. We randomly sample
questions from each course and benchmark topic and generate so-
lutions with multiple modalities, including numbers, equations, and
plots. The latest GPT-3 language model pre-trained on text auto-
matically solves only 18% of these university questions. In con-
trast, using program synthesis and few-shot learning with Codex that
is fine tuned on code generates programs that automatically solve
80% of these questions. Our approach improves upon the previous
state-of-the-art automatic solution accuracy on the benchmark top-
ics from 8.8% to 81.1%. We perform a survey to evaluate the quality
and difficulty of generated questions. This work is the first to au-
tomatically solve university-level mathematics course questions at
a human level. This is also the first work to explain, and generate
university-level mathematics course questions at scale, a milestone
for higher education.
Neural networks | Mathematics courses | Answering, explaining, and
generating questions |
Introduction
Until this work, it was widely believed that neural net-
works could not solve advanced mathematics problems
(1). However, the previous unsuccessful studies used only
text-based pre-training. We now demonstrate that a neural
network, OpenAI Codex, that is pre-trained on text and fine-
tuned on code automatically answers 80% of university-level
mathematics problems by program synthesis using few-shot
learning.
Figure 1 illustrates several example problems: computing
the volume generated by rotating the graph of a single variable
function around an axis, computing the Lorenz attractor and
its projection, and computing and demonstrating the geometry
of a singular value decomposition (SVD). For the first time, we
demonstrate that not only can a single machine learning model
solve these example problems, it can solve a wide variety of
mathematics courses at scale.
Related Work. Transformers are deep learning architectures
based only on attention mechanisms (2) that do not use re-
current neural networks or convolutional neural networks.
Transformer-based language models have enjoyed tremendous
success across a variety of natural language processing (NLP)
tasks, including zero-shot and few-shot language tasks (3).
However, these models have largely failed to solve math prob-
lems (4–6). In particular, previous work using transformers,
such as GPT-3 (3), have failed to solve mathematics courses
problems because the transformers were pre-trained on text
alone.
Pre-training a transformer is computationally expensive
and most often involves vast amounts of unlabeled data. The
most common optimization objectives for pre-training language
models are (i) masked word prediction: predicting a random
deleted word in a sentence or predicting the next word, or
(ii) classifying whether two sentences follow each other or not.
This computationally expensive step is usually done once and
followed by a relatively fast fine-tuning step. In fine-tuning,
Significance Statement
We provide the first demonstration that a neural network auto-
matically solves, explains, and generates university-level prob-
lems from the largest MIT mathematics courses at human level.
Our methods combine three innovations: (i) using recent neural
networks pre-trained on text and fine-tuned on code, rather
than pre-trained on text, (ii) few shot-learning to automatically
synthesize programs that correctly solve course problems, and
(iii) a pipeline to solve questions, explain solutions, and gener-
ate new questions indistinguishable by students from course
questions. Our work improves upon state-of-the-art increas-
ing automatic accuracy on randomly sampled questions on a
benchmark by an order of magnitude. Implications for higher
education include new roles of AI in automated course evalua-
tion and content generation.
1. Wrote the paper. 2. Created figures. 3. Provided ideas. 4. Curated datasets. 5. Gen-
erated questions. 6. Generated solutions. 7. Run GPT-3 and Codex. 8. Executed programs.
9. Evaluated solutions. 10. Analyzed data. 11. Wrote code. 12. Designed survey. I.D.
1,2,3,4,5,6,7,8,9,10,11,12; S.T. 1,4,5,6,7,8,9; R.W. 4,5,6,7,8,9; K.L. 1,4,5,6,8,9; N.C. 4,5,6,7,8,9;
L.T. 1,4,5,6,7,8,9; E.K. 4,7,8,9; N.S. 1,2,3,10,11,12; T.L.P. 1,2,3,7,8,9; J.L. 1,3,4,5,6,7,8,9; A.S.
1,10; N.V. 1,3,10,12; E.W. 1,2,3,12; G.S. 1,3
There are no competing interests.
1
Iddo Drori. E-mail: idrori@mit.edu
www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX PNAS | April 7, 2022 | vol. XXX | no. XX | 1–180
arXiv:2112.15594v3
[cs.LG]
6
Apr
2022
Input (Problems) Output (Answers, Visualizations, Explanations)
MIT Courses
Columbia U Course
Zero-Shot Learning
70% automatic solve rate
Calculus
Differential Eq.
Intro to Prob.
Linear Algebra
Math for CS
Comp. Lin. Alg.
Prealgebra
Algebra
Number Theory
Counting and Prob.
Inter. Algebra
Precalculus
MATH Topics
Few-Shot Learning
80% automatic solve rate
Program Synthesis
As-is / write a program / use sympy / use simulations
Codex
Calculus
Differential Equations Linear Algebra
Introduction to Probability and Statistics
1. Generate a random number x from a uniform distribution on the interval [0, θ]
2. Test the hypothesis that θ = 2 by rejecting H0 if x ≤ 0.1 or x ≥ 1.9
3. Simulate the probability of a type I error Explanation
Fig. 1. We apply a neural network, OpenAI Codex, to solve, explain, and generate mathematics problems. Our input math problems are randomly sampled from MIT and
Columbia University courses and from the MATH dataset (left). We use zero-shot and few-shot learning to automatically generate programs, solving 80% of the questions. We
then use Codex to explain the synthesized programs. The modality of the program outputs take diverse forms depending on the course problems, such as numerical responses
or plots generated from text by program synthesis (right). For example, in Calculus 18.01-2: the volume generated by rotating the finite 2-dimensional region bounded by two
2-dimensional graphs about the plotted axis (top right); in Differential Equations 18.03: the Lorenz strange attractor (bottom left); In Linear Algebra 18.06: the geometry of the
singular value decomposition (SVD) (bottom right).
the pre-trained model is tuned using a specific dataset or task.
This work demonstrates that OpenAI’s Codex (7), a trans-
former that has been pre-trained on text and then fine-tuned
on code, generates programs (i.e., conducts program synthesis)
that solve math course problems at scale, and that few-shot
learning automatically solves 80% of the math course problems.
Previous work has seen modest success on simpler or spe-
cialized mathematics problem benchmarks. Techniques based
on co-training output to verify (8, 9) or predict expression trees
(10–15) are able to solve elementary school-level math prob-
lems, such as MAWPS and Math23k, with over 80% accuracy.
However, these approaches do not extend to high-school, math
Olympiad, nor university-level courses. Co-training paired
with graph neural networks (GNNs) to predict arithmetic
expression trees is able to solve university-level problems in
Machine Learning (16) with up to 95% accuracy. However,
that work is limited to numeric answers, and is moreover over-
fit to a specific course, such that it does not generalize to other
courses.
Major Contributions. Our main contribution, as shown in Fig-
ure 2, is demonstrating that a single neural network model,
OpenAI Codex, automatically solves 80% of randomly selected
university-level mathematics problems (six MIT mathematics
courses and one Columbia University course) by program syn-
thesis and few-shot learning. We also automatically explain
the solutions and generate new questions, a process requiring
only seconds per problem. The courses are listed in Table 1.
We randomly sample 25 questions per course and the prob-
lems are solved as-is or with minor contextual information
that is automatically applied. If applicable, the neural net-
work outputs an executable program that answers the problem
and visualizes the result. Furthermore, our method explains
the solutions and generates new problems that are nearly
indistinguishable from human-written problems.
This methodology increases the solution accuracy on the
MATH benchmark (5) from 8.8% accuracy using previously
state-of-the-art methods to 81.1% accuracy using automatic
few-shot learning. The MATH benchmark measures the math-
ematical problem-solving ability of neural network models
with challenging problems sourced from high school math
competitions, such as the AMC 10∗
, AMC 12, and AIME†
.
The methods we propose are simple and broadly applica-
ble. The first is using a transformer model pre-trained on
text and fine-tuned on code, so that it is adept at synthesiz-
ing programmatic solutions. The second is to use zero-shot
learning of the questions as-is or with automatically added
contextual information about the problem or program. The
third is to embed the questions and use few-shot learning
based on question-code pairs of nearest embedded zero-shot
questions. In our work, we use the latest version of Open AI
Codex (7), namely code-davinci-002.
Methods
Dataset. We randomly sample 25 questions from each of seven
courses: MIT’s 18.01 Single Variable Calculus, 18.02 Multi-
variable Calculus, 18.03 Differential Equations, 18.05 Intro-
duction to Probability and Statistics, 18.06 Linear Algebra,
6.042 Mathematics for Computer Science, and Columbia Uni-
versity’s COMS3251 Computational Linear Algebra. For the
MATH dataset, we randomly sample 15 questions from each of
six topics in the dataset. We validate that our results are not
merely overfitting training data by solving a new Computa-
tional Linear Algebra course COMS3251 which is not available
∗
American Mathematics Competitions
†
American Invitational Mathematics Examination
2 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Randomly Sampled Questions
(100% Solved)
Zero-Shot Successful
(70% Automatic Solve Rate)
Few-Shot Successful
(+10% Automatic Solve Rate)
Manual Modification (20%)
Few-Shot Unsuccessful
Zero-Shot
Unsuccessful
Zero-Shot Learning:
Acceptable input is:
• no change to original question
• add “write a program”
• add “using sympy”
• add “using simulations”
Few-Shot Learning:
If zero-shot does not work, per-
form few-shot learning using
(question, code) pairs:
1. Embed all questions
2. Compute pairwise similarity
between embeddings
3. Rank the questions that were
zero-shot to select examples for
few-shot
Fig. 2. We select a random sample of questions from each course or topic that do not contain input images or require proofs. A language model pre-trained on text (GPT-3
text-davinci-002) automatically solves only 18% (for courses) and 25.5% (for the MATH benchmark topics) of these questions. In contrast, using zero-shot learning with
a network pre-trained on text and fine tuned on code (OpenAI Codex code-davinci-002) we synthesize programs that automatically solve 70% (for courses) and 72.2%
(for the MATH benchmark topics) of the questions, and using few-shot learning automatically solve 80% (for courses) and 81.1% (for the MATH benchmark topics) of the
questions. We use the nearest embedded zero-shot questions and their synthesized code for few-shot learning. The remaining 20% of the course questions and 18.9% of
MATH benchmark topic questions are manually prompted to solve the question.
online, and is therefore unseen during Codex training. We
automatically obtain correct answers for 80% of the randomly
sampled university math course questions and 81.1% of the
randomly sampled MATH benchmark questions. The previous
state-of-the-art on this benchmark before this work was 8.8%
(4).
Workflow. Our method takes a course problem as input and
synthesizes a program that, when run, outputs the solution.
Figure 4 compares the percent of automatically solved ques-
tions for each course using our zero-shot learning and few-shot
learning approaches with the latest GPT-3 (text-davinci-002)
and Codex (code-davinci-002) versions. The error bars on the
totals are standard errors.
Figures 3 shows examples of automatic workflows for solv-
ing course questions and generating explanations using Codex.
The panels show the original question, the automatic aug-
mentation with context, the resulting synthesized program,
the executed output answer that is the solution, and the ex-
planation of the solution program. Questions are given to
Codex either as-is, or by automatically adding minor context,
as described below.
Automatic Context.
Programming Language Context. Best results are obtained
when the Codex prompt specifies that a program should be
written and specifies which programming language should be
used. We add the text “write a program” before the question
and focus on the Python programming language by placing
the text within Pythonic triple quotes.
Library Context. Likewise, best results are obtained when the
Codex prompt specifies which programming package should
be used. For instance, we may add the Python library SymPy
as context (see Figure 3 top panel 18.01), specifying that
the program synthesized to solve the problem should use this
package.
Figure 5 shows the Python programming packages used by
each course. Each colored stacked bar represents the number
of questions in the course using that package. NumPy and
Sympy are used by all courses. Matplotlib is used by courses
with questions that require plotting. Math, Random, and
SciPy are used by around half of the courses. The usage
patterns of these courses is incorporated automatically in our
approach; as we only specify SymPy or plot related imports,
these other package imports are automatically synthesized.
Multi-modal Output. The output answer may be of numerous
modalities. In the examples featured in Figure 3, the output
answers are: an equation (18.01), a Boolean value (18.02), a
plot (18.03), a numerical value (18.05), and a vector (18.03
and 18.06).
Simulation. Figure 3 (18.05) shows an example from Prob-
ability and Statistics where the question is turned into a
probabilistic programming task that generates simulations in
order to compute an empirical statistic.
Automatic Zero-Shot and Few-Shot Learning. Zero-shot learn-
ing synthesizes a program from the original question or from
the automatically augmented question without using any exam-
ples. This method automatically solves 70% of the questions.
Few-shot learning begins by embedding all the questions
that are solved by zero-shot learning. Then, separately for
each course, we select the nearest zero-shot questions based
on the cosine similarity with the target question. We use the
nearest zero-shot (question, code) pairs as examples for few-
shot learning. This increases the number of questions that are
automatically solved from 70% by zero-shot learning to 80%
by few-shot learning. Figure 3 (18.02) demonstrates few-shot
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 3
ID Course Question Solution
1 18.01
Single Variable Calculus
A bacteria population is 4000 at time t = 0 and its rate of growth is 1000 ∗ 2t
bacteria per hour after t hours. What is the population after one hour?
4000 + 1000
log(2)
2 18.02
Multi-variable Calculus
Describe the graph of the function f:
f(x, y) = 10 −
p
x2 + y2
3 18.03
Differential Equations
Find general solutions of the differential equations. If an initial condition is given,
find the corresponding particular solution. Throughout, primes denote derivatives
with respect to x. y′
+ y = 2, y(0) = 0
y(x) = 2(1 − e−x
)
4 18.05
Introduction to Probability
and Statistics
Calculate the probability of getting a three-of-a-kind poker hand. 0.021128
5 18.06
Linear Algebra
Find a combination x1w1 + x2w2 + x3w3 that gives the zero vector with x1 = 1.
w1 is the vector (1;2;3). w2 is the vector (4; 5; 6). w3 is the vector (7; 8; 9).
x1 = 1, x2 = −2, x3 = 1
6 6.042
Mathematics for
Computer Science
Find a number x ∈ {0, 1, ..., 112} such that 11x ≡ 1 (mod 113). 72
7 COMS3251
Computational
Linear Algebra
Given a d-dimensional non-zero vector v, compute the rank of the matrix vv′
1
8 MATH
Prealgebra
What is the greatest common factor of 84, 112 and 210? 14
9 MATH
Algebra
Let N, O be functions such that N(x) = 2
√
x, and O(x) = x2
. What is
N(O(N(O(N(O(3))))))?
24
10 MATH
Number Theory
How many four-digit numbers whose digits add up to 9 are divisible by 11? 0
11 MATH
Counting and Probability
A standard six-sided fair die is rolled four times. The probability that the product
of all four numbers rolled is a perfect square is m
n
, where m and n are relatively
prime positive integers. Find m + n.
187
12 MATH
Intermediate Algebra
Given that x2
+ y2
= 14x + 6y + 6, find the largest possible value of 3x + 4y. 73
13 MATH
Precalculus
If the six solutions of x6
= −64 are written in the form a + bi, where a and b are
real, find the product of those solutions with a > 0.
4
Table 1. Example questions and solutions from six MIT courses (18.01, 18.02, 18.03, 18.05, 18.06, 6.042), one Columbia University course
(COMS3251), and six topics of the MATH dataset (Pre-Algebra, Algebra, Intermediate Algebra, Counting and Probability, Precalculus, Number
Theory). The solutions can contain numerical answers, equations, and plots, among other modalities.
learning. Given the original question, the nearest embedded
question is found among the question that have been zero-shot.
This (question, code) pair along with the automatic input are
given to Codex which results in generated code that correctly
solves the problem.
Manual Prompts.
Question Tidying. While 80% of the question are automati-
cally solved by zero-shot and few-shot learning; 20% of the
questions may require manual prompts. These questions may
be vague or contain superfluous information (e.g., reference
movie characters or current events), and require tidying to
extract the essence of the question. Question tidying primarily
involves removing superfluous information, breaking down long
sentence structures into smaller components, and converting
prompts into a programming format.
Interaction for Visualization. Another form of manual prompt-
ing occurs when an answer plot requires multiple steps to
generate a visually pleasing and clear plot. These special
cases, which are among the remaining 20% of the questions re-
quire interactively prompting Codex until reaching the desired
visualizations.
Automatic Explanation. Explanations are generated automati-
cally using a prompt consisting of three quotes followed by the
text “Here’s what the above code is doing: 1.”. This prompt is
given after the question and the generated code, and results
in a step-by-step explanation of the solution code.
Automatic Questions Generation and their Human Evaluation.
We use Codex to generate new questions for each course. This
is done by creating a numbered list of questions from the
dataset. This list is cut off after a random number of questions,
and the result is used to prompt Codex to generate the next
question. This process is repeated to generate many new
questions for each course. Questions may be generated based
on course topic and given student performance may also be
generated based on level of difficulty.
To evaluate the generated questions, we survey MIT stu-
dents who have taken these courses or their equivalents to
compare the quality and difficulty of machine-generated ques-
4 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Ans
2*x/(2*x-3) -
2*(x**2-1)/(2*x-3)**2
Find the derivative
of the function
using the definition
of a derivative.
f(x)= (x**2-1)/(2*x-3)
Question
18.01 Single Variable Calculus Program
Using sympy find the
derivative of the func-
tion using the definition
of the derivative.
f(x)= (x**2-1)/(2*x-3)
Input
Codex Codex
Explanation
1. We import sympy as sp
2. We create a symbol x
3. We create a function f
4. We print the derivative
of f with respect to x
Ans
R =1/2[ 2,−√2;√2,√2]
What 2 by 2 matrix R rotates
every vector through 45 degrees?
Example: the vector [1,0] goes to
[sqrt(2)/2, sqrt(2)/2].
Question
18.06 Linear Algebra Program
Codex Codex
Explanation
1. We define a function
called rotation_matrix
which takes in a parame-
ter theta.
2. We return a 2D numpy
array which is the rota-
tion matrix.
3. We call the function
with theta = pi/4.
4. We print the result.
Ans
0.1
One generates a number x
from a uniform distribution on
the interval [0, θ]. One de-
cides to test H0 : θ = 2
against HA : θ = 2 by reject-
ing H0 if x ≤ 0.1 or x ≥ 1.9.
Compute the probability of a
type I error.
Question
18.05 Introduction to Probability and Statistics Program
One generates a
number x from a uni-
form distribution on the
interval [0, θ]. One de-
cides to test H0 : θ = 2
against HA : θ = 2 by
rejecting H0 if x ≤ 0.1 or
x ≥ 1.9. Using simula-
tions, compute the
probability of a type I
error.
Input
Codex Codex
Explanation
1. Generate a random
number x from a uniform
distribution on the in-
terval [0, θ]
2. Test the hypothesis
that θ = 2 by rejecting
H0 if x ≤ 0.1 or x ≥ 1.9
3. Simulate the probabil-
ity of a type I error
Lorem ipsum
Ans
(0,0)
Find the critical
point or points of
the given autono-
mous system.
dx/dt=2*x-y,
dy/dt=x-3y
Question
18.03 Differential Equations Program
Use Sympy to find the criti-
cal point or points of the
given autonomous
system. dx/dt=2*x-y,
dy/dt=x-3y
Plot using streamplot
Input
Codex Codex
Explanation
1. We create a grid of x
and y values.
2. We create a function
for dx/dt and dy/dt.
3. We plot dx/dt and
dy/dt.
4. We find the critical
points.
5. We plot the critical
points.
Determine
whether the alter-
nating series con-
verge or diverge.
Question
18.02 Multivariate Calculus Program
Use comparison tests to determine
whether the infinite series converge
or diverge.
from sympy import Sum,
Symbol, oo, limit,
init_printing
init_printing()
n = Symbol('n')
s = Sum(1/(n**2 + n + 1),
(n, 1, oo))
limit(s, n, oo)
"""
The series converges.
"""
Few Shot (Question, Code)
Codex
Explanation
Codex
1. The first line imports the Sum
function from the sympy module.
2. The second line imports the
Symbol function from the sympy
module.
3. The third line imports the oo
function from the sympy module.
4. The fourth line imports the
limit function from the sympy
module.
5. The fifth line imports the
init_printing function from the
sympy module.
6. The sixth line creates a vari-
able called n and assigns it the
value of the Symbol function.
7. The seventh line creates a vari-
able called s and assigns it the
value of the Sum function.
8. The eighth line prints the limit
of the variable s as n approaches
infinity.
Ans
Converges
Write a program using
sympy to determine
whether the alternat-
ing series
converges or diverges.
Input
Fig. 3. Example pipelines automatically solving questions from MIT mathematics courses and explaining the solutions. 18.01 Single Variable Calculus Zero-Shot example:
Given a question and the automatically generated prefix "using SymPy", Codex is prompted and outputs a program. Running the program results in equations which are the
correct answer. The program is then fed to Codex again with an automatic prompt which results in a generated explanation of the code. 18.02 Multi-variable Calculus Few-Shot
example: Given a question, the prefix "write a program using SymPy" is automatically generated. The question is embedded along with the other questions in the course that
are zero-shot. The nearest zero-shot question and its corresponding code are used as a few-shot example. Together the few-shot example pair and the input question are fed
into Codex, which generates a program that solves the question. The question, program, and a prompt for explanation are fed into Codex to generate the explanation. 18.03
Differential Equations Zero-Shot example: In this example the answer is both a vector and a plot. 18.05 Introduction to Probability and Statistics Zero-Shot example: Given the
question, a probabilistic program is generated by using simulations. 18.06 Linear Algebra Zero-Shot example: The output answer is the correct vector.
tions with human-written questions for each of the courses ‡
‡
The IRB that approved the survey is MIT IRB Exempt Id E-3792. The survey was optional, included
informed consent, with the following description: “We are conducting a survey to assess the quality
and difficulty of automatically generated questions for STEM courses. You will be presented with
a series of blocks consisting of questions, either human-written (taken from an actual course) or
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 5
Codex Few−Shot Codex Zero−Shot GPT−3
0.00
0.25
0.50
0.75
1.00
1
8
.
0
1
1
8
.
0
2
1
8
.
0
3
1
8
.
0
5
1
8
.
0
6
6
.
0
4
2
C
O
M
S
3
2
5
1
T
o
t
a
l
MIT Mathematics Courses and a New Columbia Course
Automatic
Solve−Rate
A
A
l
g
e
b
r
a
C
o
u
n
t
i
n
g
&
P
r
o
b
a
b
i
l
i
t
y
I
n
t
e
r
m
e
d
i
a
t
e
A
l
g
e
b
r
a
N
u
m
b
e
r
T
h
e
o
r
y
P
r
e
a
l
g
e
b
r
a
P
r
e
c
a
l
c
u
l
u
s
T
o
t
a
l
MATH Benchmark
B
Fig. 4. Comparison between automatic solve rates on MIT mathematics courses and a new Columbia University course (A), and a MATH benchmark dataset (B). The latest
OpenAI GPT-3 (text-davinci-002), a transformer pre-trained on text, achieves 18% (A) and 25.5% (B) automatic solve rates. In contrast, program synthesis with zero-shot and
few-shot learning using the latest OpenAI Codex (code-davinci-002), a transformer pre-trained on text and fine tuned on code, achieves an automatic solve rate of 80% (A) and
81.1% (B).
We randomly sample five original questions and five generated
questions for each MIT course. In the survey, students are
asked to read ten questions from each course, which are a
mixture of randomly ordered human-written and machine-
generated questions.
For each of the 60 course questions, the students are asked
three survey questions: (i) is the question human-written or
machine-generated? (ii) is the question appropriate or not
appropriate for the specific course? and (iii) what is the
question’s difficulty level on a scale between 1 (easiest) and 5
(hardest)? An example of this survey format is given in Figure
6. The students are asked to simply provide their ratings and
not to solve the questions. The survey is conducted online
and anonymously.
generated with machine learning, but you will not be told the source of a given question. For
each question, you will be asked (a) whether you think the question is human-written or machine-
generated, (b) whether the question is appropriate for the given course, and finally (c) how you
would rate the difficulty of the question. Please carefully read each question and answer to the
best of your ability”.
0
10
20
30
18.03 18.01 18.02 COMS3251 18.05 18.06 6.042
Course
Count
Library
math
matplotlib
mpl_toolkits
networkx
numpy
random
scipy
sympy
Fig. 5. Imported Python programming libraries by course: NumPy is used by nearly
all courses. Matplotlib is used by courses with questions that involve plotting. Sympy
is used by most of the courses, and SciPy by half of the courses. Introduction to
Probability and Statistics (18.05) uses the statistics package.
Results
Questions Solved. We solve a total of 265 questions, 213
of them automatically, as described in the Supplemen-
tary Information. These 265 questions include 25 ran-
domly sampled questions from each of the seven courses
(18.01/18.02/18.03/18.05/18.06/6.042/COMS3251), and 15
randomly sampled questions for each of the six topics in the
MATH dataset (Prealgebra/Algebra/Number Theory/Count-
ing and Probability/Intermediate Algebra/Precalculus). The
breakdown of the questions solved automatically by zero-shot
and few-shot learning as compared with GPT-3 is shown in
Figure 4.
Visualization of Embedded Questions. We embed the 175
mathematics course questions onto a 2,048 dimensional
space using OpenAI’s (text-similarity-babbage-001) embed-
ding which captures semantic similarity between texts. We
then use uniform manifold approximation and projection
(UMAP) (17) reducing dimensionality to two dimensions. Fig-
ure 8 shows that the embedded questions are clustered by
course topics. On the top right we see clusters of questions
from MIT’s 18.06 Linear Algebra (black plus) and Columbia’s
COMS3251 Computational Linear Algebra (yellow plus). On
the left side we see a cluster of the questions from MIT’s 18.01
(red circles), 18.02 (green circles), and 18.03 (blue circles). On
the bottom right we see a cluster of the questions from MIT’s
18.05 Introduction to Probability and Statistics (magenta x)
and 6.042 Mathematics for Computer Science (cyan x), both
courses cover probability and statistics.
Automatically Generating New Questions. We use the curated
questions to generate new questions for each of the courses
and topics. This is done by prompting Codex with numbered
human-written questions and automatically generating the
next question. We generate 130 new questions as shown in the
Supplementary Information. These include ten new questions
for each of the seven courses and each of the six MATH topics.
6 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Fig. 6. Student survey question: For each of 60 questions, students are asked if (i) the question is human-written or machine-generated, (ii) the question is appropriate or
inappropriate for the course, and (iii) to rate the difficulty level of each question on a scale between 1 (easiest) and 5 (hardest).
Fig. 7. Student survey results: Panel A compares between the level of difficulty of human-written questions and questions generated by our approach for each course based on
the student ratings. The plot shows the means of the difficulty ratings between 1 (easiest) and 5 (hardest), and their 95% confidence intervals. Panel B shows the percentage of
human-written and machine-generated questions rated as appropriate and not appropriate for the course. Panel C shows the percentage of human-written questions rated as
human-written or machine-generated (left) and the percentage of machine-generated questions rated as human-written or machine-generated (right).
Table 2 shows one generated question for each course and
MATH topic. Generating a question takes less than a second
and we can generate an arbitrarily large number of questions.
We create prompts of 25 randomly selected problems for which
Codex generates correct answers, remove the questions after a
randomly chosen question in the list, and have Codex complete
the next new question.
Student Survey Results. Fifteen participants completed our
survey answering questions about all 60 questions, taking a
median of 40 minutes. Figure 7 summarizes the results of
the student survey comparing human-written and machine-
generated questions. Panel A compares between the level of
difficulty of human-written questions and questions generated
by our approach for each course based on the student ratings.
The plot shows the means of the difficulty ratings between 1
(easiest) and 5 (hardest), and their 95% confidence intervals.
Panel B shows the percentage of human-written and machine-
generated questions rated by students as appropriate or not
appropriate for the courses. Panel C shows the percentage of
human-written questions rated as human-written or machine-
generated (left) and the percentage of machine-generated ques-
tions rated as human-written or machine-generated (right).
Summarizing the student survey results:
• Survey participants rated our machine-generated ques-
tions and human-written questions to be at a similar level
of difficulty within confidence intervals.
• Survey participants rated human-written questions
slightly more appropriate for the courses than machine-
generated ones.
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 7
Fig. 8. Visualization of embeddings of course questions: We embed the course questions into a 2,048 dimensional space using OpenAI’s (text-similarity-babbage-001)
embedding which captures semantic similarity between texts. We then use uniform manifold approximation and projection to reduce dimensionality to two dimensions. This
shows distinctive clusters based on topics. On the top right we see clusters of questions from MIT’s 18.06 Linear Algebra and Columbia’s COMS3251 Computational Linear
Algebra. On the left side we see a cluster of the questions from MIT’s 18.01, 18.02, and 18.03. On the bottom right we see a cluster of the questions from MIT’s 18.05
Introduction to Probability and Statistics and 6.042 Mathematics for Computer Science, both course cover probability and statistics.
• Survey participants rated human-written questions
slightly more likely to be human-written as shown in the
left side of Panel C. Survey participants rated machine-
generated questions equally likely to be machine-generated
and human-written as shown in the right side of Panel C.
Human Level. We achieve 80% automatic accuracy on solving
mathematics course problems at MIT and Columbia. This is
comparable to typical student performance on these problem
sets in our courses at MIT and Columbia University. We au-
tomatically generate new questions that are indistinguishable
by students from human course questions.
Implementation Details. We use the latest version of OpenAI’s
GPT-3 text-davinci-002 and Codex codex-davinci-002 engine
for all of our experiments. We fix all of Codex’s hyperparame-
ters to be the same for all solution and explanation experiments
to yield deterministic and reproducible results. Specifically,
top P which controls diversity is set to 0 and sampling tem-
perature, which controls randomness, is also set to 0. Both
frequency and presence penalty are set to 0, and we do not
halt on any stop sequences. For all new question generation
experiments we allow diversity and randomness by setting top
P and temperature to 0.1. Each prompt is structured as a
Python documentation comment surrounded by triple quota-
tions and line breaks. We evaluate the solution by running
the output using a Python interpreter. Answers are correct
if a printed output is the correct solution, or if the program
function returns the correct solution, and the explanation of
the solution is correct.
Types of Problems the Model Cannot Solve. There are a few
different types of problems the model is incapable of solving: (i)
any problem for which images or other non-text modalities of
input are critical to the question, (ii) questions with solutions
that require proofs, and (iii) problems that are computationally
intractable, such as factoring very large primes. This last
category is not expected in any math course assignment, as
students themselves would also be unable to answer them.
That being said, many questions that students can answer
have generalizations that are computationally intractable.
Conclusion
We demonstrate that program synthesis and few-shot learning
using OpenAI Codex is able to solve, explain, and generate
university-level mathematics problems at human level. In con-
trast, previous methods use transformers pre-trained only on
text, such as GPT-3, and fail at these tasks. We verify that our
strong results are not overfitting the training data by solving a
new course that is not available online. We also generate and
analyze new problem sets. The success of this work confirms
that programs serve as a good representation and computation
environment for solving math problems. Since our approach
requires no additional training, it has been scaled-up to over
thirty STEM courses. This work addresses major pedagogical
challenges, bringing substantial benefits to higher education
such as tools for curriculum design and analysis, and automatic
content generation.
We show that neural network synthesis with modern pro-
gramming languages is more dynamic and widely applicable
than expression trees, and is therefore likely to solve a wider
range of problems. Although any finite computation could
8 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
ID Course Machine-generated question Closest question in the dataset Similarity
1 18.01
Single-Variable Calculus
Find the area of the region bounded by the curve and
the x-axis. y = x2
sin(x), 0 ≤ x ≤ π
Find the area of the region under the given curve from
1 to 2. y = (x2
+ 1)/(3x − x2
)
0.61
2 18.02
Multi-Variable Calculus
Find a × b. a = h9, −2, 1i, b = h−2, 1, 1i Find a × b. a = h5, −1, −2i, b = h−3, 2, 4i 0.87
3 18.03
Differential Equations
Use the method of separable variables to solve the
initial-value problem dy
dx
= 5ex
, y(2) = 12 when x =
2
Separate variables and use partial fractions to solve
the initial value problems. Use either the exact so-
lution or a computer-generated slope field to sketch
the graphs of several solutions of the given differen-
tial equation, and highlight the indicated particular so-
lution.
f′
(x) = 3f(x)(5 − f(x)), f(0) = 8
0.21
4 18.05
Introduction to Probability
and Statistics
Let X be a uniformly distributed random variable over
the interval [0, 1). Find E[X2
]
Let X be the result of rolling a fair 4-sided die. Let Y
be the result of rolling a fair 6-sided die. You win 2X
dollars if X > Y and lose 1 dollar otherwise. After
playing this game 60 times, what is your expected total
gain?
0.29
5 18.06
Linear Algebra
Write a Matlab code to determine if the given matrix
A = [1, 1; 4, 4] is positive semidefinite and if it is neg-
ative semidefinite.
Find A′
A if the columns of A are unit vectors, all mu-
tually perpendicular.
0.21
6 6.042
Mathematics for
Computer Science
A student is taking a test consisting of n multiple-
choice questions. Each question has five possible an-
swers, and only one of them is correct. The student
knows that the probability that any particular question
is answered correctly is 1
5
. Let X be the number of
questions answered correctly by the student. What is
E(X)?
MIT students sometimes delay laundry for a few days.
Assume all random values described below are mu-
tually independent. A busy student must complete 3
problem sets before doing laundry. Each problem set
requires 1 day with probability 2
3
and 2 days with prob-
ability 1
3
. Let B be the number of days a busy student
delays laundry. What is E(B)?
0.47
7 COMS3251
Computational
Find a combination of the vectors
"
1 2 3
4 5 6
7 8 9
#
that
gives the vector

1 2 3

.
Find a combination of the vectors

1 2 3
4 5 6
7 8 9
#
that
give the zero vector.
0.90
8 MATH
Pre-Algebra
How many four-digit positive integers are there with
hundreds digit 2?
How many four-digit positive integers are there with
thousands digit 2?
0.90
9 MATH
Algebra
Find the distance between the points (0, 0) and (3, 4). Find the distance between the points (0, 4) and (3, 0). 0.99
10 MATH
Number Theory
Find the smallest positive integer n such that n2
is di-
visible by 210
and n3
is divisible by 310
.
How many four-digit numbers whose digits add up to 9
are divisible by 11?
0.25
11 MATH
Counting and Probability
How many ways are there to divide a set of 10 objects
into two sets of equal size?
Compute
8
4

. 0.12
12 MATH
Intermediate Algebra
Let x and y be positive real numbers such that x2
+
y2
= 1. Find the maximum value of xy.
Given that x2
+ y2
= 14x + 6y + 6, find the largest
possible value of 3x + 4y.
0.59
13 MATH
Precalculus
Let A be the matrix

1 2 3
4 5 6
7 8 9
#
Find the determinant
of A2
+ A3
.
If det(A) = 2 and det(B) = 12, then find det(AB). 0.41
Table 2. Examples of new questions automatically generated by Codex for each course and the closest question from the course.
be expressed as a sufficiently large expression tree, one may
see an arbitrarily large expansion in the size of expression
tree needed, as opposed to a Turing-complete language. This
flexibility is bolstered by the massive corpus of programs that
already exist, which eclipses the amount of labeled expression
trees available. Program outputs are also inherently more
human-readable, as the ability to use abstraction, modularity,
and high-level logic lead to clearer illustrations of the path to
solution. Furthermore, program synthesis can convey logical
deductions directly through explanatory comments, as well
as function and variable names. In particular, we see such
explanatory text and derivations in a number of the Codex
outputs. The unification of such formal and informal language
is an inherent advantage of our methodology. We emphasize
that the outputs may be complex and multi-modal. For ex-
ample, using Python packages such as Matplotlib to produce
graphs of equations. This advanced and unique ability is
time-consuming for humans and offers a major pedagogical
benefit.
In summary, we automatically solve, explain, and gener-
ate university-level mathematics course questions in real-time
at human level. Students rated machine-generated questions
as equally likely to be human-written as machine-generated,
although rated human-written questions more likely to be
human-written. Students rated machine-generated questions
similarly difficult to human-written questions, as well as mostly
appropriate for their respective courses. Finally, we have suc-
ceeded in scaling-up this work to over thirty STEM courses
across 13 departments in the schools of science and engineer-
ing at MIT, Harvard, and the Ivy League universities, with
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 9
excellent results.
1. CQ Choi, 7 revealing ways AIs fail: Neural networks can be disastrously brittle, forgetful, and
surprisingly bad at math. IEEE Spectr. 58, 42–47 (2021).
2. A Vaswani, et al., Attention is all you need in Proceedings of Advances in neural information
processing systems. Vol. 30, (2017).
3. TB Brown, et al., Language models are few-shot learners in Proceedings of Advances in
Neural Information Processing Systems. (2020).
4. D Hendrycks, et al., Measuring massive multitask language understanding. arXiv preprint
arXiv:2009.03300 (2020).
5. D Hendrycks, et al., Measuring mathematical problem solving with the math dataset in Pro-
ceedings of Advances in Neural Information Processing Systems: Datasets and Benchmarks.
(2021).
6. JW Rae, et al., Scaling language models: Methods, analysis  insights from training gopher.
arXiv preprint arXiv:2112.11446 (2021).
7. M Chen, , et al., Evaluating large language models trained on code. arXiv preprint
arXiv:2107.03374 (2021).
8. J Shen, et al., Generate  rank: A multi-task framework for math word problems. arXiv
preprint arXiv:2109.03034 (2021).
9. K Cobbe, et al., Training verifiers to solve math word problems. arXiv preprint
arXiv:2110.14168 (2021).
10. Z Xie, S Sun, A goal-driven tree-structured neural model for math word problems in Proceed-
ings of International Joint Conference on Artificial Intelligence. pp. 5299–5305 (2019).
11. Q Wu, Q Zhang, J Fu, XJ Huang, A knowledge-aware sequence-to-tree network for math
word problem solving in Proceedings of Conference on Empirical Methods in Natural Lan-
guage Processing. pp. 7137–7146 (2020).
12. J Qin, L Lin, X Liang, R Zhang, L Lin, Semantically-aligned universal tree-structured solver
for math word problems in Proceedings of Conference on Empirical Methods in Natural Lan-
guage Processing. (2020).
13. J Zhang, et al., Graph-to-tree learning for solving math word problems in Proceedings of the
Annual Meeting of the Association for Computational Linguistics. pp. 3928–3937 (2020).
14. S Li, et al., Graph-to-tree neural networks for learning structured input-output translation with
applications to semantic parsing and math word problem in Proceedings of Conference on
Empirical Methods in Natural Language Processing. pp. 2841–2852 (2020).
15. Z Liang, J Zhang, J Shao, X Zhang, MWP-BERT: A strong baseline for math word problems.
arXiv preprint arXiv:2107.13435 (2021).
16. S Tran, et al., Solving machine learning problems in Proceedings of Asian Conference on
Machine Learning. (2021).
17. L McInnes, J Healy, J Melville, Umap: Uniform manifold approximation and projection for
dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
10 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
A. Solution of MIT 18.01: Single Variable Calculus
Prereq None
Units 5-0-7
Syllabus Differentiation and integration of functions of one variable, with applications. Informal treatment of limits and
continuity. Differentiation: definition, rules, application to graphing, rates, approximations, and extremum problems. Indefinite
integration; separable first-order differential equations. Definite integral; fundamental theorem of calculus. Applications of
integration to geometry and science. Elementary functions. Techniques of integration. Polar coordinates. L’Hopital’s rule.
Improper integrals. Infinite series: geometric, p-harmonic, simple comparison tests, power series for some elementary functions.
Table 3. MIT 18.01, Question 1: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Functions and Models
Original question Sketch the graph of the function. f(x) = x + |x|
Codex input Same as original question
Codex code
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-10, 10, 100)
y = x + np.abs(x)
plt.plot(x, y)
plt.xlabel(’x’)
plt.ylabel(’y’)
plt.grid()
plt.show()
Program solution
Solution type Automatic Zero-Shot
Codex code explanation 1. We import the numpy and matplotlib.pyplot libraries.
2. We create an array of 100 evenly spaced numbers from -10 to 10.
3. We create an array of the same size as x, where each element is the absolute value of the correspond-
ing element in x.
4. We plot the graph of y against x.
5. We label the x-axis and y-axis.
6. We display the grid.
7. We display the graph.
GPT-3 response The graph of the function is a line with a slope of 1 and a y-intercept of 0.
GPT-3 evaluation Incorrect
Table 4. MIT 18.01, Question 2: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Functions and Models
Original question Find f ◦ g ◦ h. f(x) = 3x − 2, g(x) = sin(x), h(x) = x2
Codex input Same as original question
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 11
Codex code
import math
def f(x):
return 3*x-2
def g(x):
return math.sin(x)
def h(x):
return x**2
def fgh(x):
return f(g(h(x)))
print(fgh(2))
Program solution 3 sin(x2
) − 2
Solution type Automatic Zero-Shot
Codex code explanation 1. fgh(2)
2. h(2)
3. g(4)
4. f(sin(4))
5. 3 sin 4 − 2
GPT-3 response f◦g◦h = f(g(h(x))) = f(g(x**2)) = f(sin(x**2)) = 3*sin(x**2)-2
GPT-3 evaluation Correct
Table 5. MIT 18.01, Question 3: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Functions and Models
Original question Solve each equation for x. ln(x2
− 1) = 3
Codex input Same as original question
Codex code
import sympy
x = sympy.Symbol(’x’)
eq = sympy.log(x**2-1) - 3
print(sympy.solve(eq, x))
Program solution −
√
1 + e3,
√
1 + e3
Solution type Automatic Zero-Shot
Codex code explanation 1. We import the sympy module.
2. We create a symbol x.
3. We create an equation.
4. We solve the equation for x.
GPT-3 response x**2-1=e**3
x**2=e**3+1
x=sqrt(e**3+1)
GPT-3 evaluation Correct
Table 6. MIT 18.01, Question 4: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Limits and Derivatives
Original question Determine the infinite limit. limx→5+ (x + 1)/(x − 5)
Codex input Using sympy determine the infinite limit. lim_x_5+ (x+1)/(x-5)
12 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Codex code
from sympy import limit, Symbol, S
x = Symbol(’x’)
limit((x+1)/(x-5), x, 5)
Program solution ∞
Solution type Automatic Zero-Shot
Codex code explanation 1. We import the limit function from the sympy module.
2. We create a Symbol object called x.
3. We call the limit function and pass in the expression (x+1)/(x-5) and the variable x.
4. We pass in the point 5 as the third argument.
GPT-3 response The limit does not exist because the function is not defined when x = 5.
GPT-3 evaluation Incorrect
Table 7. MIT 18.01, Question 5: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Limits and Derivatives
Original question Find the limits as x → ∞ and as x → −∞. Use this information, together with intercepts, to give a
rough sketch of the graph as in Example 12. y = x2
(x2
− 1)2
(x + 2)
Codex input Same as original question
Codex code
import sympy as sym
from sympy import Symbol
from sympy import pprint
sigma = Symbol(’sigma’)
x = Symbol(’x’)
mu = Symbol(’mu’)
pprint(2*sym.pi*sigma)
pprint(sym.sqrt(2*sym.pi*sigma))
part_1 = 1/(sym.sqrt(2*sym.pi*sigma))
part_2 = sym.exp(-1*((x-mu)**2)/(2*sigma))
my_gauss_function = part_1*part_2
pprint(my_gauss_function)
sym.plot(my_gauss_function.subs({mu:10,sigma:30}),(x,0,100),title=’gauss
distribution’)
x_values = []
y_values = []
for value in range(-5,5):
y = my_gauss_function.subs({mu:10,sigma:30,x:value}).evalf()
y_values.append(y)
x_values.append(value)
print(value,y)
import matplotlib.pyplot as plt
plt.plot(x_values,y_values)
plt.show()
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 13
Program solution ∞; -∞;
Solution type Automatic Zero-Shot
Codex code explanation 1. We create a list of x values from -5 to 5.
2. We create an empty list to store the y values.
3. We loop through the x values.
4. For each x value, we calculate the y value.
5. We append the y value to the list of y values.
6. We plot the x values against the y values.
GPT-3 response As x - infinity, y - infinity. As x - - infinity, y - infinity.
The graph will look like a parabola that opens up.
GPT-3 evaluation Incorrect
Table 8. MIT 18.01, Question 6: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Limits and Derivatives
Original question Find the derivative of the function using the definition of a derivative. f(x) = (x2
− 1)/(2x − 3)
Codex input Using Sympy find the derivative of the function using the definition of a derivative. f(x)=(x**2-1)/(2*x-3)
Codex code
import sympy as sp
x = sp.Symbol(’x’)
f = (x**2-1)/(2*x-3)
print(sp.diff(f,x))
Program solution 2x
2x−3
−
2(x2
−1)
2x−3
Solution type Automatic Zero-Shot
Codex code explanation 1. We import sympy as sp
2. We create a symbol x
3. We create a function f
4. We print the derivative of f with respect to x
GPT-3 response (2*x**2-1)/(2*x-3)**2
GPT-3 evaluation Correct
Table 9. MIT 18.01, Question 7: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Differentiation Rules
Original question Find the limit. limx→0 csc(x) sin(sin(x))
Codex input Same as original question
Codex code
from sympy import limit, csc, sin, Symbol
x = Symbol(’x’)
print(limit(csc(x)*sin(sin(x)), x, 0))
14 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Program solution 1
Solution type Automatic Zero-Shot
Codex code explanation 1. We are finding the limit of csc(x)*sin(sin(x)) as x approaches 0.
2. We are using the limit function from the sympy library.
3. We are passing in the function csc(x)*sin(sin(x)) as the first argument.
4. We are passing in the variable x as the second argument.
5. We are passing in the number 0 as the third argument.
GPT-3 response The limit does not exist.
GPT-3 evaluation Incorrect
Table 10. MIT 18.01, Question 8: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Differentiation Rules
Original question The half-life of cesium-137 is 30 years. Suppose we have a 100-mg sample. How much of the sample
remains after 100 years?
Codex input Write a program. The half-life of cesium-137 is 30 years. Suppose we have a 100-mg sample. How much
of the sample remains after 100 years?
Codex code
import math
def half_life(t, h):
return 100 * math.pow(0.5, t/h)
print(half_life(100, 30))
Program solution 9.92 mg
Solution type Automatic Zero-Shot
Codex code explanation 1. We import the math module.
2. We define a function called half_life that takes two arguments: t and h.
3. We return the value of 100 * 0.5ˆ(t/h).
4. We print the result of half_life(100, 30).
GPT-3 response After 100 years, 50 mg of the sample would remain.
GPT-3 evaluation Incorrect
Table 11. MIT 18.01, Question 9: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Differentiation Rules
Original question The radius of a sphere is increasing at a rate of 4 mm/s. How fast is the volume increasing when the
diameter is 80 mm?
Codex input Same as original question
Codex code
import sympy as sp
r = sp.symbols(’r’)
d = 4*r
v = 4/3*sp.pi*r**3
dv_dr = sp.diff(v,r)
print(dv_dr.subs(r,80))
Program solution 25600π
Solution type Manual
Codex code explanation 1. We define the radius as a symbol.
2. We define the diameter as a function of the radius.
3. We define the volume as a function of the radius.
4. We take the derivative of the volume with respect to the radius.
5. We substitute the radius with 80 and print the result.
GPT-3 response The volume of a sphere is increasing at a rate of approximately 201.06 mm3/s.
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 15
GPT-3 evaluation Incorrect
Table 12. MIT 18.01, Question 10: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Applications of Differentiation
Original question The water level, measured in feet above mean sea level, of Lake Lanier in Georgia, USA, during 2012
can be modeled by the function L(t) = 0.01441t3
− 0.4177t2
+ 2.703t + 1060.1 where t is measured
in months since January 1, 2012. Estimate when the water level was highest during 2012.
Codex input Same as original question
Codex code
import math
def L(t):
return 0.01441*t**3-0.4177*t**2+2.703*t+1060.1
def dL(t):
return 0.04323*t**2-0.8354*t+2.703
def ddL(t):
return 0.08646*t-0.8354
def newton(f, df, x0, epsilon):
x = x0
while True:
x1 = x - f(x)/df(x)
if abs(x1 - x)  epsilon:
return x1
x = x1
print(newton(dL, ddL, 0, 0.00001))
Program solution
Solution type Automatic Zero-Shot
Codex code explanation 1. We define the function L(t)
2. We define the derivative of L(t)
3. We define the second derivative of L(t)
4. We define a function that uses Newton’s method to find the root of a function f(x)
5. We print the result of using Newton’s method to find the root of the derivative of L(t)
GPT-3 response The water level was maximum at t=2.
GPT-3 evaluation Incorrect
Table 13. MIT 18.01, Question 11: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Applications of Differentiation
Original question If a resistor of R ohms is connected across a battery of E volts with internal resistance r ohms, then the
power (in watts) in the external resistor is P = E2
R/(R + r)2
. If E and r are fixed but R varies, what is
the maximum value of the power?
Codex input Same as original question
16 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Codex code
import sympy as sym
from sympy import Symbol
from sympy import pprint
R = Symbol(’R’)
E = Symbol(’E’)
r = Symbol(’r’)
P = E**2*R/(R+r)**2
print(P)
dP = P.diff(R)
print(dP)
R_sol = sym.solve(dP,R)
print(R_sol)
R_max = R_sol[0].evalf()
print(R_max)
P_max = P.subs({R:R_max})
print(P_max)
Program solution E2
4r
Solution type Automatic Few-Shot (5)
Codex code explanation 1. Define the variables
2. Define the function
3. Take the derivative of the function
4. Solve the derivative for R
5. Evaluate the solution
6. Substitute the solution into the original function
GPT-3 response The maximum value of the power is P = Eˆ2R/(R+r)ˆ2 when R = r.
GPT-3 evaluation Incorrect
Table 14. MIT 18.01, Question 12: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Applications of Differentiation
Original question Use Newton’s method to find all the solutions of the equation correct to eight decimal places. Start by
drawing a graph to find initial approximations. x5
− 3x4
+ x3
− x2
− x + 6 = 0
Codex input Use Newton’s method to find all the solutions between -100 and 100 with integer initial values. x**5-
3*x**4+x**3-x**2-x+6=0. Use initial points between -100 and 100. Round to 8 decimal places.
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 17
Codex code
import numpy as np
import matplotlib.pyplot as plt
def f(x):
return x**5-3*x**4+x**3-x**2-x+6
def df(x):
return 5*x**4-12*x**3+3*x**2-2*x-1
def newton(x, f, df, epsilon=1e-8, max_iter=1000):
for i in range(max_iter):
x_new = x - f(x)/df(x)
if abs(x_new - x)  epsilon:
return x_new
x = x_new
return x_new
def main():
x = np.linspace(-100, 100, 1000)
y = f(x)
plt.plot(x, y)
plt.grid()
plt.show()
x_sols = []
for i in range(-100, 100):
x_sols.append(newton(i, f, df))
print(x_sols)
print(np.round(x_sols, 8))
if __name__ == ’__main__’:
main()
Program solution −1.044503071109637, 2.7055120932482875
Solution type Manual
Codex code explanation 1. Plotting the function f(x) = x**5-3*x**4+x**3-x**2-x+6
2. Finding the derivative of f(x)
3. Using Newton’s method to find the roots of f(x)
4. Plotting the roots of f(x)
GPT-3 response There are five solutions.
-1.46557123
-0.53728871
0.36082474
1.12403607
2.09132432
GPT-3 evaluation Incorrect
Table 15. MIT 18.01, Question 13: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Integrals
Original question Evaluate
R 1
1
√
1 + x4 dx.
Codex input Write a program to evaluate the integral from 1 to 1 of sqrt(1 + xˆ4) dx.
18 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Codex code
from sympy import Symbol, sqrt, Integral, pprint
x = Symbol(’x’)
k = Symbol(’k’)
p = sqrt(1 + x**4)
pprint(Integral(p, (x, 1, 1)).doit())
Program solution 0.0
Solution type Automatic Zero-Shot
Codex code explanation 1. We create a Symbol object called x.
2. We create a Symbol object called k.
3. We create a symbolic expression called p.
4. We create an Integral object called p_integral.
5. We call the doit() method on the Integral object.
GPT-3 response The integral from 1 to 1 of sqrt(1 + xˆ4) dx is equal to 2.
GPT-3 evaluation Incorrect
Table 16. MIT 18.01, Question 14: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Integrals
Original question Sketch the region enclosed by the given curves and calculate its area. y = 4 − x2
, y = 0
Codex input Using numpy, graph the lines y=0 and y=4-x**2. Use fill_between().
Then using Sympy, find the points of intersection between and y=0 and y=4-x**2.
Integrate between lines y=4-x**2 and y=0, over the points of intersection to calculate the area of the
region. x should go from -5 to 5.
Codex code
import numpy as np
import matplotlib.pyplot as plt
import sympy as sp
x = np.linspace(-5, 5, 100)
y1 = 0*x
y2 = 4-x**2
plt.plot(x, y1, ’-b’, label=’y=0’)
plt.plot(x, y2, ’-r’, label=’y=4-x**2’)
plt.fill_between(x, y1, y2, color=’grey’, alpha=0.5)
plt.legend(loc=’best’)
plt.show()
x, y = sp.symbols(’x y’)
f = 4-x**2
g = 0*x
sols = sp.solve([f, g], [x, y])
x1 = sols[0][0]
y1 = sols[0][1]
x2 = sols[1][0]
y2 = sols[1][1]
area = sp.integrate(f, (x, x1, x2)) + sp.integrate(g, (x, x1, x2))
print(area)
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 19
Program solution 32
3
Solution type Manual
Codex code explanation 1. We are using numpy to graph the lines y=0 and y=4-x**2. We are using fill_between() to fill in the area
between the two lines.
2. We are using Sympy to find the points of intersection between the two lines.
3. We are using Sympy to integrate between the two lines, over the points of
intersection, to calculate the area of the region.
GPT-3 Response The region is enclosed by the curves y=4-xˆ2, y=0, and x=2. Its area is 16.
GPT-3 evaluation Incorrect
Table 17. MIT 18.01, Question 15: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Integrals
Original question A bacteria population is 4000 at time t = 0 and its rate of growth is 1000 · 2t
bacteria per hour after t
hours. What is the population after one hour?
Codex input A bacteria population is 4000 at time t = 0 and its rate of growth is y = 1000 * 2ˆx bacteria per hour after
t hours. Write a function to calculate the population after one hour. Use sympy to integrate the symbol x
from 0 to 1. Then add the initial population to the result.
Codex code
import sympy as sp
x = sp.Symbol(’x’)
def bacteria_population(x):
return 1000 * 2**x
print(sp.integrate(bacteria_population(x), (x, 0, 1)) + 4000)
Program solution 1000
log 2
+ 4000
Solution type Manual
Codex code explanation 1. We import sympy as sp.
2. We define x as a symbol.
3. We define a function bacteria_population(x) that returns the population of bacteria after x hours.
4. We print the result of integrating the function bacteria_population(x) from 0 to 1 and adding the initial
population of 4000.
GPT-3 response 5000
GPT-3 evaluation Incorrect
Table 18. MIT 18.01, Question 16: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Applications of Integration
Original question Graph the region between the curves and use your calculator to compute the area correct to five decimal
places. y = tan2
(x), y =
√
x
Codex input Graph the region between the curves y = tanˆ2(x) and y = sqrt(x) to find the two points where they
intersect. Then, write a program to compute the area between the curves where they cross. Use sympy.
20 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Codex code
import sympy
import math
x = sympy.symbols(’x’)
y1 = sympy.tan(x)**2
y2 = sympy.sqrt(x)
# find the intersection points
intersection_points = sympy.solve(y1 - y2)
# find the area between the two curves
area = sympy.integrate(y1 - y2, (x, intersection_points[0],
intersection_points[1]))
print(area)
Program solution 0.25142
Solution type Manual
Codex code explanation 1. We import sympy and math.
2. We create a variable x and set it equal to sympy.symbols(’x’). This is how we tell sympy that we want
to use x as a variable.
3. We create two variables, y1 and y2, and set them equal to sympy.tan(x)**2 and sympy.sqrt(x), respec-
tively.
4. We create a variable called intersection_points and set it equal to sympy.solve(y1 - y2). This is how
we tell sympy to find the intersection points of the two curves.
5. We create a variable called area and set it equal to sympy.integrate(y1 - y2, (x, intersection_points[0],
intersection_points[1])). This is how we tell sympy to find the area between the two curves.
6. We print the area.
GPT-3 response The region between the curves is shaded in the graph.
The area is 1.04719.
GPT-3 evaluation Incorrect
Table 19. MIT 18.01, Question 17: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Applications of Integration
Original question A force of 10 lb is required to hold a spring stretched 4 in. beyond its natural length. How much work is
done in stretching it from its natural length to 6 in. beyond its natural length?
Codex input A force of 10 lb is required to hold a spring stretched delta_x = 1/3 foot. Write a function to calculate
the work done in stretching it to delta_x = 1/2 foot. Use Hooke’s Law F = k*delta_x where delta_x is a
sympy.symbols to find the spring constant k = 10/(1/3) and integrate F with respect to delta_x from 0 to
the stretched ratio 1/2. Use sympy for the integral. Then plug in for delta_x = 1/2.
Codex code
import sympy
def work_done(delta_x):
k = 10/(1/3)
F = k*delta_x
return sympy.integrate(F, (delta_x, 0, 1/2))
print(work_done(sympy.symbols(’delta_x’)))
Program solution 3.750
Solution type Manual
Codex code explanation 1. We define a function called work_done that takes a variable delta_x as an argument.
2. We define the spring constant k = 10/(1/3)
3. We define the force F = k*delta_x
4. We return the integral of F with respect to delta_x from 0 to 1/2.
5. We print the result of calling work_done with sympy.symbols(’delta_x’) as the argument.
GPT-3 Response The work done in stretching the spring from its natural length to 6 in. beyond its natural length is 60 lb-in.
GPT-3 evaluation Incorrect
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 21
Table 20. MIT 18.01, Question 18: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Techniques of Integration
Original question Evaluate the integral.
R 0.5
0
x
√
1 − 4x2 dx
Codex input Using sympy evaluate the integral. 0 to 0.5 of x*sqrt(1-4xˆ2) dx.
Codex code
from sympy import *
x = Symbol(’x’)
integral = integrate(x*sqrt(1-4*x**2), (x, 0, 0.5))
print(integral)
Program solution 1
12
Solution type Automatic Zero-Shot
Codex code explanation 1. We import the sympy module.
2. We create a symbol x.
3. We create an integral object.
4. We print the integral object.
GPT-3 response The integral is 0.
GPT-3 evaluation Incorrect
Table 21. MIT 18.01, Question 19: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Techniques of Integration
Original question Find the area of the region under the given curve from 1 to 2. y = (x2
+ 1)/(3x − x2
)
Codex input Find the area of the region under the given curve from 1 to 2. y = (xˆ2+1)/(3x-xˆ2)
22 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Codex code
import numpy as np
import matplotlib.pyplot as plt
def f(x):
return (x**2+1)/(3*x-x**2)
def trapezoidal(f, a, b, n):
h = (b-a)/n
s = 0.5*(f(a) + f(b))
for i in range(1, n):
s = s + f(a + i*h)
return h*s
def simpson(f, a, b, n):
h = (b-a)/n
s = f(a) + f(b)
for i in range(1, n, 2):
s = s + 4*f(a + i*h)
for i in range(2, n-1, 2):
s = s + 2*f(a + i*h)
return h*s/3
def gauss(f, a, b, n):
x, A = np.polynomial.legendre.leggauss(n)
return 0.5*(b-a)*np.dot(A, f(0.5*(b-a)*x + 0.5*(b+a)))
a = 1
b = 2
n = 10
print(Trapezoidal:, trapezoidal(f, a, b, n))
print(Simpson:, simpson(f, a, b, n))
print(Gauss:, gauss(f, a, b, n))
x = np.linspace(a, b, 100)
y = f(x)
plt.plot(x, y)
plt.show()
Program solution 1.5415321620760492
Solution type Automatic Zero-Shot
Codex code explanation 1. Define the function f(x) = (xˆ2+1)/(3x-xˆ2)
2. Define the trapezoidal rule for numerical integration
3. Define the Simpson’s rule for numerical integration
4. Define the Gauss-Legendre quadrature rule for numerical integration
5. Define the limits of integration
6. Define the number of intervals
7. Print the results of the three methods
8. Plot the function
GPT-3 response The area of the region under the given curve from 1 to 2 is approximately 1.4.
GPT-3 evaluation Incorrect
Table 22. MIT 18.01, Question 20: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Techniques of Integration
Original question Evaluate the integral. ln(1 + x2
)
Codex input Evaluate the integral of ln(1+x**2). Cast to a string and add  + C.
Codex code
from sympy import *
x = Symbol(’x’)
print(str(integrate(ln(1+x**2), x)) +  + C)
Program solution x log(x2
+ 1) − 2x + 2 arctan x + C
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 23
Solution type Automatic Zero-Shot
Codex code explanation 1. We import the sympy module.
2. We create a symbol x.
3. We create a variable integral and assign it the value of the integral of ln(1+x**2) with respect to x.
4. We print the value of the integral.
GPT-3 response The integral of ln(1+x**2) is x*ln(1+x**2)-x
GPT-3 evaluation Incorrect
Table 23. MIT 18.01, Question 21: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Parametric Equations and Polar Coordinates
Original question Graph the curve x = y − 2 sin(πy).
Codex input Graph the curve x = y - 2sin(pi*y) by sampling points and swapping the (x, y) pair locations.
Codex code
import numpy as np
import matplotlib.pyplot as plt
def f(y):
return y - 2*np.sin(np.pi*y)
y = np.linspace(-2, 2, 100)
x = f(y)
plt.plot(x, y)
plt.xlabel(’x’)
plt.ylabel(’y’)
plt.title(’x = y - 2sin(pi*y)’)
plt.show()
Program solution
Solution type Automatic Zero-Shot
Codex code explanation 1. Define the function f(y) = y - 2sin(pi*y).
2. Create an array of 100 evenly spaced numbers between -2 and 2.
3. Create an array of the values of f(y) for each y in the array.
4. Plot the curve x = f(y).
24 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
GPT-3 response A curve is defined by a set of parametric equations if the coordinates of each point on the curve are given
by specific functions of a variable. The variable is called a parameter.
The parametric equations for a curve are usually written as x = f(t) and y = g(t). The parameter t is usually
restricted to a certain interval, such as 0 ≤ t ≤ 1.
The graph of the parametric equations x = f(t) and y = g(t) is the set of points (f(t), g(t)) for all values of t
in the interval.
Example 1: Find the graph of the parametric equations x = t2 and y = t3.
Solution: The graph of the parametric equations x = t2 and y = t3 is the set of points (t2, t3) for all values
of t.
GPT-3 evaluation Incorrect
Table 24. MIT 18.01, Question 22: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Parametric Equations and Polar Coordinates
Original question Find the distance between the points with polar coordinates (4, 4π/3) and (6, 5π/3).
Codex input Using sympy find the distance between the points with polar coordinates (4, 4pi/3) and (6, 5pi/3).
Codex code
from sympy import *
r1, r2, theta1, theta2 = symbols(’r1 r2 theta1 theta2’)
x1 = r1 * cos(theta1)
y1 = r1 * sin(theta1)
x2 = r2 * cos(theta2)
y2 = r2 * sin(theta2)
d = sqrt((x2 - x1)**2 + (y2 - y1)**2)
print(d.subs({r1: 4, theta1: 4*pi/3, r2: 6, theta2: 5*pi/3}))
Program solution 2 ∗ sqrt(7)
Solution type Automatic Zero-Shot
Codex code explanation 1. We define the variables r1, r2, theta1, theta2 as symbols.
2. We define x1, y1, x2, y2 as the cartesian coordinates of the points.
3. We define d as the distance between the points.
4. We substitute the values of r1, theta1, r2, theta2 and print the result.
GPT-3 response The distance between the points is 2.
GPT-3 evaluation Incorrect
Table 25. MIT 18.01, Question 23: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Infinite Sequences and Series
Original question Determine whether the series is convergent or divergent.
P∞
n=1
1/n
√
2
Codex input Using Sympy, calculate the value sum_n=1ˆinf 1/nˆsqrt(2) if it converges. Check if it converges.
Codex code
from sympy import Sum, Symbol, oo, pprint, init_printing
def test_convergence(n):
x = Symbol(’x’)
s = Sum(1/x**(2**0.5), (x, 1, n))
pprint(s)
pprint(s.doit())
if __name__ == ’__main__’:
init_printing(order=’rev-lex’)
test_convergence(10)
test_convergence(oo)
NOTE: ∞ in SymPy is oo. This is because oo looks like ∞.
Program solution
P∞
n=1
n−
√
2
; True
Solution type Automatic Zero-Shot
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 25
Codex code explanation 1. We import the Sum function from sympy.
2. We create a function that takes in a number n and creates a Sum object.
3. We print the Sum object and the result of the summation.
4. We call the function with n = 10 and n = oo.
GPT-3 response The series is divergent.
GPT-3 evaluation Incorrect
Table 26. MIT 18.01, Question 24: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Infinite Sequences and Series
Original question Find the sum of the series.
P∞
n=1
(−1)n−1
∗ (3n
/(n · 5n
))
Codex input Find the sum of the series by running a large number of iterations. sum_{n=1}ˆ{inf} (-1)ˆ{n-1}*(3ˆn/(n*5ˆn)).
Return 0 if n  1.
Codex code
def sum_series(n):

Find the sum of the series. sum n to infinity (-1)^{n-1}*(3^n/(n*5^n))

sum = 0
for i in range(1, n+1):
sum += ((-1)**(i-1))*(3**i)/(i*(5**i))
return sum
def main():

Find the sum of the series. sum n to infinity (-1)^{n-1}*(3^n/(n*5^n))

n = int(input())
print(sum_series(n))
if __name__ == ’__main__’:
main()
Program solution 0.47000362924573563
Solution type Automatic Zero-Shot
Codex code explanation 1. We are taking input from the user.
2. We are calling the function sum_series() with the input as the argument.
3. The function sum_series() is returning the sum of the series.
4. We are printing the sum of the series.
GPT-3 response The sum of the series is -1/4.
GPT-3 evaluation Incorrect
Table 27. MIT 18.01, Question 25: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Further Applications of Integration
Original question If revenue flows into a company at a rate of f(t) = 9000*sqrt(1+2t), where t is measured in years and f(t)
is measured in dollars per year, find the total revenue obtained in the first four years.
Codex input Using sympy if revenue flows into a company at a rate of f(t) = 9000*sqrt(1+2t), where t is measured in
years and f(t) is measured in dollars per year, find the total revenue obtained in the first four years.
Codex code
from sympy import *
t = symbols(’t’)
f = 9000*sqrt(1+2*t)
print(integrate(f, (t, 0, 4)))
Program solution 78000
Solution type Automatic Zero-Shot
26 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Codex code explanation 1. We import the sympy library.
2. We define the variable t as a symbol.
3. We define the function f(t) = 9000*sqrt(1+2t).
4. We integrate f(t) from t=0 to t=4.
5. We print the result.
GPT-3 response The total revenue obtained in the first four years is f(4) = 9000*sqrt(1+2*4) = 9000*sqrt(9) = 9000*3 =
27000.
GPT-3 evaluation Incorrect
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 27
B. Solution of MIT 18.02: Multivariable Calculus
Prereq Calculus I (GIR)
Units 5-0-7
Syllabus Calculus of several variables. Vector algebra in 3-space, determinants, matrices. Vector-valued functions of one
variable, space motion. Scalar functions of several variables: partial differentiation, gradient, optimization techniques. Double
integrals and line integrals in the plane; exact differentials and conservative fields; Green’s theorem and applications, triple
integrals, line and surface integrals in space, Divergence theorem, Stokes’ theorem; applications.
Textbook Edwards, C.; Penney, David Multivariable Calculus(6th ed.) Pearson Education, ISBN 9780130339676.
Table 28. MIT 18.02, Question 1: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Polar Coordinates
Original question Sketch the graph of the polar equation. Indicate any symmetries around either coordinate axis or the
origin. r = (2 + 7 sin(θ))(cos(5θ))
Codex input Write a function using matplotlib to plot the graph of the polar equation r = (2 + 7*sin(theta))*(cos(5*theta))
Codex code
import numpy as np
import matplotlib.pyplot as plt
def polar_eq(theta):
return (2 + 7*np.sin(theta))*(np.cos(5*theta))
theta = np.linspace(0, 2*np.pi, 100)
plt.polar(theta, polar_eq(theta))
plt.show()
Program solution
Solution type Automatic Zero-Shot
Codex code explanation 1. Importing numpy and matplotlib.pyplot
2. Defining a function that takes theta as an argument
3. Creating a list of 100 evenly spaced values between 0 and 2pi
4. Plotting the polar equation
GPT-3 response The graph has symmetry around the origin and the y-axis.
GPT-3 evaluation Incorrect
Table 29. MIT 18.02, Question 2: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Analytical Geometry and Polar Coordinates
Original question Find the center and radius of the circle described in the given equation. x2
+ 2x + y2
= 4
Codex input Find the center and radius of the circle described in the given equation. x2
+ 2x + y2
= 4.
28 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Codex code
import sympy as sp
x, y = sp.symbols(’x y’)
# Equation of the circle
eq = x**2 + 2*x + y**2 - 4
# Solving the equation
sol = sp.solve(eq, y)
# Printing the solution
print(sol)
# Finding the center
center = sp.solve(eq, x)
print(center)
# Finding the radius
radius = sp.sqrt(center[0]**2 + sol[0]**2)
print(radius)
Program solution Center: (-1,0); Radius:
√
5
Solution type Automatic Zero-Shot
Codex code explanation 1. We import the sympy library and the symbols function.
2. We define the equation we want to find the radius and center of.
3. We create a Circle object from the equation.
4. We print the radius and center of the circle.
GPT-3 response The center of the circle is at (1, 0) and the radius is
√
3.
GPT-3 evaluation Incorrect
Table 30. MIT 18.02, Question 3: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Polar Coordinates
Original question Sketch the graph of the polar equation. Indicate any symmetries around either coordinate axis or the
origin. r = 3θ (spiral of Archimedes)
Codex input Write a function using matplotlib to plot the graph of the polar equation r = 3*theta (spiral of Archimedes).
Codex code
import numpy as np
import matplotlib.pyplot as plt
theta = np.arange(0, 2*np.pi, 0.01)
r = 3*theta
plt.polar(theta, r)
plt.show()
Program solution
Solution type Automatic Zero-Shot
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 29
Codex code explanation 1. Importing the numpy and matplotlib.pyplot modules.
2. Creating a range of values for theta from 0 to 2π with an increment of 0.01.
3. Creating a range of values for r from 0 to 30 with an increment of 0.01.
4. Creating a polar plot using the polar() function.
5. Showing the plot using the show() function.
GPT-3 response The graph of the polar equation is a spiral of Archimedes. There is no symmetry around either coordinate
axis or the origin.
GPT-3 evaluation Incorrect
Table 31. MIT 18.02, Question 4: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Area Computations in Polar Coordinates
Original question In Problems 7 through 16. find the area bounded by the given curve. r = 2 ∗ cos(θ)
Codex input f(theta)=2*cos(theta)
Using Sympy, starting from 0, find the first two points that f(theta)=0 and set teh solutions to a and b.
Use the solution to Integrate f(theta)**2/2 from a to b.
Codex code
from sympy import *
theta = Symbol(’theta’)
f = 2*cos(theta)
a = solve(f, theta)
print(a)
b = integrate(f**2/2, (theta, a[0], a[1]))
print(b)
Program solution π
Solution type Manual
Codex code explanation 1. The first line imports the sympy library.
2. The second line defines the function f(theta) as 2*cos(theta).
3. The third line defines the symbol theta.
4. The fourth line uses the solve() function to find the two values of theta that make f(theta)=0.
5. The fifth line uses the integrate() function to find the area under the curve from a to b.
GPT-3 response The area is 8π.
GPT-3 evaluation Incorrect
Table 32. MIT 18.02, Question 5: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Parametric Curves
Original question Eliminate the parameter and then sketch the curve. x = et
, y = 4e2
t
Codex input Plot the parametric curve. x = et
, y = 4e2
t
Codex code
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(-2, 2, 100)
x = np.exp(t)
y = 4 * np.exp(2 * t)
plt.plot(x, y)
plt.show()
30 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Program solution
Solution type Manual
Codex code explanation 1. Importing the numpy and matplotlib.pyplot modules.
2. Defining the t array.
3. Defining the x and y arrays.
4. Plotting the curve.
GPT-3 response y = 4e2
t
GPT-3 evaluation Incorrect
Table 33. MIT 18.02, Question 6: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Integral Computations with Parametric Curves
Original question Find the arc length of the given curve. x = 2t, y = 2
3
∗ t
3
2 , 5 ≤ t ≤ 12
Codex input Use sympy to integrate from 5 to 12 to find the arc length of x = 2t, y = 2/3*t**(3/2)
Codex code
from sympy import *
x = 2*t
y = 2/3*t**(3/2)
dx = diff(x, t)
dy = diff(y, t)
ds = sqrt(dx**2 + dy**2)
print(integrate(ds, (t, 5, 12)))
Program solution 74
3
Solution type Automatic Zero-Shot
Codex code explanation 1. Define x and y as functions of t
2. Define dx and dy as the derivatives of x and y with respect to t
3. Define ds as the magnitude of dx and dy
4. Integrate ds from 5 to 12
GPT-3 response The arc length of the given curve is approximately 33.4.
GPT-3 evaluation Incorrect
Table 34. MIT 18.02, Question 7: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Parabola
Original question Sketch the parabola with the given equation. Show and label its vertex, focus, axis, and directrix. y2
=
12x
Codex input Use least squares to find a second order polynomial passing through points (2. 3), (4, 3), and (6,-5).
Use polynomial matrix.
Then use python to print(f’c[0]+c[1]x+c[2]x**2’)
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 31
Codex code
import numpy as np
x = np.array([2, 4, 6])
y = np.array([3, 3, -5])
A = np.array([[1, x[0], x[0]**2],
[1, x[1], x[1]**2],
[1, x[2], x[2]**2]])
b = np.array([y[0], y[1], y[2]])
c = np.linalg.solve(A, b)
print(f’{c[0]}+{c[1]}x+{c[2]}x**2’)
Program solution −5.0 + 6.0x + −x2
Solution type Manual
Codex code explanation 1. Create a numpy array of x values
2. Create a numpy array of y values
3. Create a numpy array of ones with the same length as x and y
4. Create a numpy array of x values squared
5. Create a numpy array of x values cubed
6. Create a numpy array of the values of the matrix A
7. Create a numpy array of the values of the vector b
8. Solve the system of equations
9. Print the values of the coefficients
GPT-3 response The parabola has a vertex at (0,0), a focus at (0,6), an axis at y=0, and a directrix at y=-6.
GPT-3 evaluation Incorrect
Table 35. MIT 18.02, Question 8: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Ellipse
Original question Find an equation of the ellipse specified. Foci (-2, 2) and (4, 2), eccentricity 1/3
Codex input Write a function using sympy to find the equation of the ellipse with foci at (-2, 2) and (4, 2). The
eccentricity of the ellipse is 1/3.
Codex code
import sympy as sp
x, y = sp.symbols(’x y’)
foci_1 = sp.Point(-2, 2)
foci_2 = sp.Point(4, 2)
eccentricity = sp.Rational(1, 3)
ellipse = sp.Ellipse(sp.Point(0, 0), sp.sqrt(5), sp.sqrt(5 * (1 -
eccentricity ** 2)))
print(ellipse.equation(x, y))
Program solution
(x−1)2
81
+
(y−2)2
72
= 1
Solution type Automatic Zero-Shot
Codex code explanation 1. Find the center of the ellipse
2. Find the distance between the foci
3. Find the length of the major axis
4. Find the length of the minor axis
5. Find the equation of the ellipse
GPT-3 response (x − 2)2
/9 + (y − 2)2
/4 = 1
GPT-3 evaluation Incorrect
32 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Table 36. MIT 18.02, Question 9: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Functions of Several Variables
Original question Describe the graph of the function f.f(x, y) = 10 − 2
p
x2 + y2
Codex input Plot the 3-D graph of the function f(x, y) = 10 − 2
p
x2 + y2
Codex code
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection=’3d’)
x = np.arange(-10, 10, 0.1)
y = np.arange(-10, 10, 0.1)
X, Y = np.meshgrid(x, y)
Z = 10 - np.sqrt(X**2 + Y**2)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=’rainbow’)
plt.show()
Program solution
Solution type Automatic Zero-Shot
Codex code explanation 1. Create a figure object
2. Add a subplot to the figure object
3. Create a meshgrid of x and y values
4. Calculate the z values for the meshgrid
5. Plot the graph
GPT-3 response The graph of the function f is a parabola.
GPT-3 evaluation Incorrect
Table 37. MIT 18.02, Question 10: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Infinite Series and Convergence
Original question Determine whether the given infinite series converges or diverges. If it converges, find its sum. 1 + 3 + 5
+ 7 + ... + (2n-1) + ....
Codex input Using sympy, determine if the sequence sn = 2n − 1 diverges by showing the value of the limit as n
goes to infinity. If the limit is infinity, then print diverges. Otherwise print converges.
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 33
Codex code
from sympy import limit, oo
def sequence_diverges(sequence):

Determine if the sequence s_n = 2n-1 diverges by showing the value of
the limit as n goes to infinity.
If the limit is infinity, then print diverges. Otherwise print converges.

limit_value = limit(sequence, n, oo)
if limit_value == oo:
print(diverges)
else:
print(converges)
sequence_diverges(2*n-1)
Program solution Diverges
Solution type Manual
Codex code explanation 1. The limit function takes a sequence and a value of n and returns the value of the limit as n goes to
infinity.
2. The limit function returns oo if the sequence diverges.
3. If the limit is oo, then the sequence diverges.
4. If the limit is not oo, then the sequence converges.
GPT-3 response The series diverges.
GPT-3 evaluation Incorrect
Table 38. MIT 18.02, Question 11: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Taylor Series and Taylor Polynomials
Original question Find the Taylor polynomial with remainder by using the given values of a and n. f(x) = ex
; a = 1, n = 4
Codex input Using Sympy, find the taylor series of f(x) = ex
where a = 1 and n = 4
Codex code
import sympy as sp
x = sp.Symbol(’x’)
f = sp.exp(x)
print(sp.series(f, x, x0=1, n=4))
Program solution E + E(x − 1) +
E(x−1)2
2
+
E(x−1)3
6
+ O((x − 1)4
, (x, 1))
Solution type Automatic Zero Shot
Codex code explanation 1. We define the function f(x) = ex
2. We define the function taylor(function, a, n) which takes in a function, a value a, and a value n.
3. We define a variable i and set it to 0.
4. We define a variable p and set it to 0.
5. We create a while loop that runs while i is less than or equal to n.
6. We set p equal to p plus the derivative of the function with respect to x, evaluated at x=a, divided by
the factorial of i, times (x − a)i
.
7. We increment i by 1.
8. We return p.
9. We print the value of taylor(f(x), 1, 4).
GPT-3 response T4(x) = 1 + x + x2
/2! + x3
/3! + x4
/4! + R4(x)
GPT-3 evaluation Incorrect
Table 39. MIT 18.02, Question 12: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Taylor Series and Taylor Polynomials
Original question Find the Maclaurin series of the given function f by substitution. f(x) = sin(2x)
Codex input Use Sympy to find the maclaurin series of f(x) = sin(2x)
34 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Codex code
from sympy import *
x = symbols(’x’)
f = sin(2*x)
print(f.series(x, 0, 10))
Program solution 2x − 4x3
3
+ 4x5
15
+ 8x7
315
+ · · ·
Solution type Automatic Zero-Shot
Codex code explanation 1. Define the variable x as a symbol
2. Define the function f(x) = sin(2x)
3. Print the expression f(x)
4. Print the series expansion of f(x) about x=0 up to 10 terms
GPT-3 response sin(2x) = 2 sin(x) cos(x)
sin(2x) = 2(x − x3
/3! + x5
/5! − ...)(cos(x) + x2
/2! − x4
/4! + ...)
sin(2x) = 2x cos(x) − 2x3
/3! cos(x) + 2x5
/5! cos(x) − ... + 2x3
/3! − 2x5
/5! + ...
sin(2x=2x cos(x) − 2x3
/3!(cos(x) + x2
/2!) + 2x5
/5!(cos(x) + x2
/2! − x4
/4!) − ...
GPT-3 evaluation Incorrect
Table 40. MIT 18.02, Question 13: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Integral Test
Original question Use the integral test to test the given series for convergence.
P∞
1
1/(2n
).
Codex input Using Sympy, integrate
P∞
1
1/(2n
) from 0 to infinity. if it is finite print(’converges’) and if it is infinite
print(’diverges’)
Codex code
from sympy import *
x = Symbol(’x’)
def f(x):
return 1/(2**x)
def integrate_f(x):
return integrate(f(x), (x, 0, oo))
print(integrate_f(x))
if integrate_f(x).is_infinite:
print(’diverges’)
else:
print(’converges’)
Program solution Converges
Solution type Automatic Zero-Shot
Codex code explanation 1. We import sympy as sp.
2. We create a symbol n.
3. We integrate 1/(2**n) from n=1 to infinity.
GPT-3 response
P∞
1
1/(2n
) converges because
R ∞
1
1/(2x
)dx converges.
GPT-3 evaluation Correct
Table 41. MIT 18.02, Question 14: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Comparison Tests for Positive-Term Series
Original question Use comparison tests to determine whether the infinite series converge or diverge.
P∞
1
1/(n2
+n+1).
Codex input Write a function using sympy to determine if the infinite series
P∞
1
1/(n2
+ n + 1) converges.
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 35
Codex code
from sympy import Sum, Symbol, oo, limit, init_printing
init_printing()
n = Symbol(’n’)
s = Sum(1/(n**2 + n + 1), (n, 1, oo))
limit(s, n, oo)

The series converges.

Program solution Converges.
Solution type Automatic Zero-Shot
Codex code explanation 1. We create a symbol n to represent the variable in the summation.
2. We create a summation of 1/(nˆ2 + n + 1) from n = 1 to infinity.
3. We use the limit function to determine if the summation converges.
GPT-3 response
P∞
1
1/(n2
+ n + 1) diverges.
GPT-3 evaluation Incorrect
Table 42. MIT 18.02, Question 15: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Alternating Series and Absolute Convergence
Original question Determine whether the alternating series converge or diverge.
P∞
1
((−1)(
n + 1))/n2
Codex input Write a function using sympy to determine if the alternating series
P∞
1
((−1)(
n + 1))/n2
converges or
diverges. 
Codex code
from sympy import Sum, Symbol, oo, sympify, simplify, limit, init_printing
def alternating_series(n):

This function takes a number n and determines if the alternating series
converges or diverges.

# Initialize sympy printing
init_printing()
# Create a symbol for the summation
n = Symbol(’n’)
# Create the summation
s = Sum((-1)**(n+1)/n**2, (n, 1, oo))
# Simplify the summation
s = simplify(s)
# Determine if the series converges or diverges
if s.is_infinite:
print(The series diverges.)
else:
print(The series converges.)
# Determine the limit of the series
lim = limit(s, n, oo)
# Print the limit
print(The series converges to:  + str(lim))
if __name__ == __main__:
alternating_series(10)
Program solution Converges
Solution type Automatic Few-Shot (1)
36 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
Codex code explanation 1. The first line imports the Sum function from the sympy module.
2. The second line imports the Symbol function from the sympy module.
3. The third line imports the oo function from the sympy module.
4. The fourth line imports the limit function from the sympy module.
5. The fifth line imports the init_printing function from the sympy module.
6. The sixth line creates a variable called n and assigns it the value of the Symbol function.
7. The seventh line creates a variable called s and assigns it the value of the Sum function.
8. The eighth line prints the limit of the variable s as n approaches infinity.
GPT-3 response The series converges.
GPT-3 evaluation Incorrect
Table 43. MIT 18.02, Question 16: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Power Series Computations
Original question Use power series to approximate the values of the given integrals accurate to four decimal places.
R 1
0
sin(x)/xdx.
Codex input Write a function using scipy to compute the values of the given integral
R 1
0
sin(x)/xdx.
Codex code
from scipy.integrate import quad
def f(x):
return np.sin(x)/x
I = quad(f, 0, 1)
print(I)
Program solution 0.9461
Solution type Automatic Zero-Shot
Codex code explanation 1. Importing the integrate module from scipy
2. Defining a function f(x)
3. Calling the integrate.quad function, which performs the integral
4. Printing the result
GPT-3 response 
R 1
0
sin(x)/xdx ≈
P∞
n=0
(−1)n x2n+1
(2n+1)!
R 1
0
sin(x)/xdx ≈
P3
n=0
(−1)n x2n+1
(2n+1)!
R 1
0
sin(x)/xdx ≈ x
1!
− x3
3!
+ x5
5!
− x7
7!
R 1
0
sin(x)/xdx ≈ 1
1
− 1
6
+ 1
120
− 1
5040
R 1
0
sin(x)/xdx ≈ 0.8415
GPT-3 evaluation Incorrect
Table 44. MIT 18.02, Question 17: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Cross Product of Two Vectors
Original question Find a x b. a = 5,-1,-2, b = -3,2,4
Codex input Find the vector product of a x b. a = 5,-1,-2, b = -3,2,4
Codex code
a = [5,-1,-2]
b = [-3,2,4]
def vector_product(a,b):
return [a[1]*b[2]-a[2]*b[1], a[2]*b[0]-a[0]*b[2], a[0]*b[1]-a[1]*b[0]]
print(vector_product(a,b))
Program solution h0, −14, 7i
Solution type Automatic Zero-Shot
Codex code explanation 1. The first line creates a list called a.
2. The second line creates a list called b.
3. The third line creates a list called c and stores the result of the vector product of a and b in c.
4. The fourth line prints the result of the vector product of a and b.
GPT-3 response 5,-1,-2 * -3,2,4 = -14
Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 37
GPT-3 evaluation Incorrect
Table 45. MIT 18.02, Question 18: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Curves and Motion in Space
Original question Show that the graph of the curve with parametric equations x = t ∗ (sin(6t)), y = t ∗ (cos(6t)), z = t
lies on the cone z = 2
p
x2 + y2 with its vertex at the origin and opening upward.
Codex input Plot the graph of the curve with parametric equations: x = t*(sin6t). y = t*(cos6t), z = t.
Then plot the cone z =
p
x2 + y2 with its vertex at the origin and opening upward.
Codex code
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
t = np.linspace(-10, 10, 100)
x = t*(np.sin(6*t))
y = t*(np.cos(6*t))
z = t
fig = plt.figure()
ax = fig.add_subplot(111, projection=’3d’)
ax.plot(x, y, z)
r = np.sqrt(x**2 + y**2)
ax.plot(x, y, r, ’r’)
plt.show()
Program solution
Solution type Automatic Few-Shot (5)
Codex code explanation 1. Generate 100 points between -10 and 10 using the linspace function from numpy.
2. Calculate x, y, and z using the parametric equations.
3. Plot the graph of the curve with parametric equations.
4. Plot the cone z = 2
p
x2 + y2 with its vertex at the origin and opening upward.
GPT-3 response The graph of the curve with parametric equations x = t*(sin6t). y = t*(cos6t), z = t lies on the cone z =
2
p
x2 + y2 with its vertex at the origin and opening upward.
GPT-3 evaluation Incorrect
Table 46. MIT 18.02, Question 19: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3
response and evaluation
Topic Double Integrals
Original question Evaluate the iterated integral.
R 2
0
R 4
0
(3x + 4y)dxdy
Codex input Write a function using scipy.integrate.dblquad to compute the value of the double integral:
R 2
0
R 4
0
(3x +
4y)dxdy.
38 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf

More Related Content

Similar to A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf

Teaching Mathematics Concepts via Computer Algebra Systems
Teaching Mathematics Concepts via Computer Algebra SystemsTeaching Mathematics Concepts via Computer Algebra Systems
Teaching Mathematics Concepts via Computer Algebra Systems
inventionjournals
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learningbutest
 
A Proposition on Memes and Meta-Memes in Computing for Higher ...
A Proposition on Memes and Meta-Memes in Computing for Higher ...A Proposition on Memes and Meta-Memes in Computing for Higher ...
A Proposition on Memes and Meta-Memes in Computing for Higher ...butest
 
A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...
TELKOMNIKA JOURNAL
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
mathsjournal
 
chapter 1
chapter 1chapter 1
chapter 1
yatheesha
 
Calculation of the Minimum Computational Complexity Based on Information Entropy
Calculation of the Minimum Computational Complexity Based on Information EntropyCalculation of the Minimum Computational Complexity Based on Information Entropy
Calculation of the Minimum Computational Complexity Based on Information Entropy
ijcsa
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
nep_test_account
 
LSTM Model for Semantic Clustering of User-Generated Content Using AI Geared ...
LSTM Model for Semantic Clustering of User-Generated Content Using AI Geared ...LSTM Model for Semantic Clustering of User-Generated Content Using AI Geared ...
LSTM Model for Semantic Clustering of User-Generated Content Using AI Geared ...
IRJET Journal
 
COMBINED CHARGES DIFFICULTY EVALUATION AND TESTEES LEVEL FOR INTELLECT APPRA...
COMBINED CHARGES DIFFICULTY EVALUATION AND  TESTEES LEVEL FOR INTELLECT APPRA...COMBINED CHARGES DIFFICULTY EVALUATION AND  TESTEES LEVEL FOR INTELLECT APPRA...
COMBINED CHARGES DIFFICULTY EVALUATION AND TESTEES LEVEL FOR INTELLECT APPRA...
IJTRET-International Journal of Trendy Research in Engineering and Technology
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
kavinilavuG
 
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Gina Rizzo
 
00 - 30 Dec - Introduction
00 - 30 Dec - Introduction00 - 30 Dec - Introduction
00 - 30 Dec - Introduction
Neeldhara Misra
 
Role of Mathematics in Computer Science.pptx
Role of Mathematics in Computer Science.pptxRole of Mathematics in Computer Science.pptx
Role of Mathematics in Computer Science.pptx
AshishPandey502
 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...butest
 
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on HmmEquirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
International Journal of Engineering Inventions www.ijeijournal.com
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Computational Aptitude of Handheld Calculator
Computational Aptitude of Handheld CalculatorComputational Aptitude of Handheld Calculator
Computational Aptitude of Handheld Calculator
mathsmasters
 

Similar to A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf (20)

Teaching Mathematics Concepts via Computer Algebra Systems
Teaching Mathematics Concepts via Computer Algebra SystemsTeaching Mathematics Concepts via Computer Algebra Systems
Teaching Mathematics Concepts via Computer Algebra Systems
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
A Proposition on Memes and Meta-Memes in Computing for Higher ...
A Proposition on Memes and Meta-Memes in Computing for Higher ...A Proposition on Memes and Meta-Memes in Computing for Higher ...
A Proposition on Memes and Meta-Memes in Computing for Higher ...
 
A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
An Application of Assignment Problem in Laptop Selection Problem Using MATLAB
 
chapter 1
chapter 1chapter 1
chapter 1
 
Calculation of the Minimum Computational Complexity Based on Information Entropy
Calculation of the Minimum Computational Complexity Based on Information EntropyCalculation of the Minimum Computational Complexity Based on Information Entropy
Calculation of the Minimum Computational Complexity Based on Information Entropy
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
 
LSTM Model for Semantic Clustering of User-Generated Content Using AI Geared ...
LSTM Model for Semantic Clustering of User-Generated Content Using AI Geared ...LSTM Model for Semantic Clustering of User-Generated Content Using AI Geared ...
LSTM Model for Semantic Clustering of User-Generated Content Using AI Geared ...
 
COMBINED CHARGES DIFFICULTY EVALUATION AND TESTEES LEVEL FOR INTELLECT APPRA...
COMBINED CHARGES DIFFICULTY EVALUATION AND  TESTEES LEVEL FOR INTELLECT APPRA...COMBINED CHARGES DIFFICULTY EVALUATION AND  TESTEES LEVEL FOR INTELLECT APPRA...
COMBINED CHARGES DIFFICULTY EVALUATION AND TESTEES LEVEL FOR INTELLECT APPRA...
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic Analysis
 
00 - 30 Dec - Introduction
00 - 30 Dec - Introduction00 - 30 Dec - Introduction
00 - 30 Dec - Introduction
 
Role of Mathematics in Computer Science.pptx
Role of Mathematics in Computer Science.pptxRole of Mathematics in Computer Science.pptx
Role of Mathematics in Computer Science.pptx
 
Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...Incorporating Prior Domain Knowledge Into Inductive Machine ...
Incorporating Prior Domain Knowledge Into Inductive Machine ...
 
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on HmmEquirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Computational Aptitude of Handheld Calculator
Computational Aptitude of Handheld CalculatorComputational Aptitude of Handheld Calculator
Computational Aptitude of Handheld Calculator
 
Academic paper - Final
Academic paper - FinalAcademic paper - Final
Academic paper - Final
 

More from Sarah Pollard

Three Parts Of Essay - Clip Art Library. Online assignment writing service.
Three Parts Of Essay - Clip Art Library. Online assignment writing service.Three Parts Of Essay - Clip Art Library. Online assignment writing service.
Three Parts Of Essay - Clip Art Library. Online assignment writing service.
Sarah Pollard
 
Essay On My First Day At College In English With Quotations - YouTube
Essay On My First Day At College In English With Quotations - YouTubeEssay On My First Day At College In English With Quotations - YouTube
Essay On My First Day At College In English With Quotations - YouTube
Sarah Pollard
 
Psy 104 Week 5 Assignment Developmental Psychology
Psy 104 Week 5 Assignment Developmental PsychologyPsy 104 Week 5 Assignment Developmental Psychology
Psy 104 Week 5 Assignment Developmental Psychology
Sarah Pollard
 
10 Lines On How We Can Help Poor People In English
10 Lines On How We Can Help Poor People In English10 Lines On How We Can Help Poor People In English
10 Lines On How We Can Help Poor People In English
Sarah Pollard
 
Sample Essay For Leadership Program The. Online assignment writing service.
Sample Essay For Leadership Program The. Online assignment writing service.Sample Essay For Leadership Program The. Online assignment writing service.
Sample Essay For Leadership Program The. Online assignment writing service.
Sarah Pollard
 
Ap Language Synthesis Essay.. Online assignment writing service.
Ap Language Synthesis Essay.. Online assignment writing service.Ap Language Synthesis Essay.. Online assignment writing service.
Ap Language Synthesis Essay.. Online assignment writing service.
Sarah Pollard
 
How To Descriptive Essay. How To Write A Descriptive
How To Descriptive Essay. How To Write A DescriptiveHow To Descriptive Essay. How To Write A Descriptive
How To Descriptive Essay. How To Write A Descriptive
Sarah Pollard
 
A For And Against Essay Essay W. Online assignment writing service.
A For And Against Essay Essay W. Online assignment writing service.A For And Against Essay Essay W. Online assignment writing service.
A For And Against Essay Essay W. Online assignment writing service.
Sarah Pollard
 
Educational Argumentative Topics. The. Online assignment writing service.
Educational Argumentative Topics. The. Online assignment writing service.Educational Argumentative Topics. The. Online assignment writing service.
Educational Argumentative Topics. The. Online assignment writing service.
Sarah Pollard
 
Custom Term Paper Writing Service - YouTube
Custom Term Paper Writing Service - YouTubeCustom Term Paper Writing Service - YouTube
Custom Term Paper Writing Service - YouTube
Sarah Pollard
 
How To Write A Good Biology Research Paper Artofit
How To Write A Good Biology Research Paper ArtofitHow To Write A Good Biology Research Paper Artofit
How To Write A Good Biology Research Paper Artofit
Sarah Pollard
 
HOW TO WRITE AN ABSTRACT. Online assignment writing service.
HOW TO WRITE AN ABSTRACT. Online assignment writing service.HOW TO WRITE AN ABSTRACT. Online assignment writing service.
HOW TO WRITE AN ABSTRACT. Online assignment writing service.
Sarah Pollard
 
Writing A Self Reflective Essay How To Write A Re
Writing A Self Reflective Essay How To Write A ReWriting A Self Reflective Essay How To Write A Re
Writing A Self Reflective Essay How To Write A Re
Sarah Pollard
 
Chinese Essay Writing Service. Online assignment writing service.
Chinese Essay Writing Service. Online assignment writing service.Chinese Essay Writing Service. Online assignment writing service.
Chinese Essay Writing Service. Online assignment writing service.
Sarah Pollard
 
Outer Space Astronaut Writing Paper - Lined And
Outer Space Astronaut Writing Paper - Lined AndOuter Space Astronaut Writing Paper - Lined And
Outer Space Astronaut Writing Paper - Lined And
Sarah Pollard
 
Top Funny Essay Titles By Menskool - Issuu
Top Funny Essay Titles By Menskool - IssuuTop Funny Essay Titles By Menskool - Issuu
Top Funny Essay Titles By Menskool - Issuu
Sarah Pollard
 
Essay Examples Life Experience Essay. Online assignment writing service.
Essay Examples Life Experience Essay. Online assignment writing service.Essay Examples Life Experience Essay. Online assignment writing service.
Essay Examples Life Experience Essay. Online assignment writing service.
Sarah Pollard
 
Essay About Can Money Buy Ha. Online assignment writing service.
Essay About Can Money Buy Ha. Online assignment writing service.Essay About Can Money Buy Ha. Online assignment writing service.
Essay About Can Money Buy Ha. Online assignment writing service.
Sarah Pollard
 
Definitive Argument Topics. Definition Essay Topics T
Definitive Argument Topics. Definition Essay Topics TDefinitive Argument Topics. Definition Essay Topics T
Definitive Argument Topics. Definition Essay Topics T
Sarah Pollard
 
My College Essay Paragraph On My College Why I Love My College
My College Essay Paragraph On My College Why I Love My CollegeMy College Essay Paragraph On My College Why I Love My College
My College Essay Paragraph On My College Why I Love My College
Sarah Pollard
 

More from Sarah Pollard (20)

Three Parts Of Essay - Clip Art Library. Online assignment writing service.
Three Parts Of Essay - Clip Art Library. Online assignment writing service.Three Parts Of Essay - Clip Art Library. Online assignment writing service.
Three Parts Of Essay - Clip Art Library. Online assignment writing service.
 
Essay On My First Day At College In English With Quotations - YouTube
Essay On My First Day At College In English With Quotations - YouTubeEssay On My First Day At College In English With Quotations - YouTube
Essay On My First Day At College In English With Quotations - YouTube
 
Psy 104 Week 5 Assignment Developmental Psychology
Psy 104 Week 5 Assignment Developmental PsychologyPsy 104 Week 5 Assignment Developmental Psychology
Psy 104 Week 5 Assignment Developmental Psychology
 
10 Lines On How We Can Help Poor People In English
10 Lines On How We Can Help Poor People In English10 Lines On How We Can Help Poor People In English
10 Lines On How We Can Help Poor People In English
 
Sample Essay For Leadership Program The. Online assignment writing service.
Sample Essay For Leadership Program The. Online assignment writing service.Sample Essay For Leadership Program The. Online assignment writing service.
Sample Essay For Leadership Program The. Online assignment writing service.
 
Ap Language Synthesis Essay.. Online assignment writing service.
Ap Language Synthesis Essay.. Online assignment writing service.Ap Language Synthesis Essay.. Online assignment writing service.
Ap Language Synthesis Essay.. Online assignment writing service.
 
How To Descriptive Essay. How To Write A Descriptive
How To Descriptive Essay. How To Write A DescriptiveHow To Descriptive Essay. How To Write A Descriptive
How To Descriptive Essay. How To Write A Descriptive
 
A For And Against Essay Essay W. Online assignment writing service.
A For And Against Essay Essay W. Online assignment writing service.A For And Against Essay Essay W. Online assignment writing service.
A For And Against Essay Essay W. Online assignment writing service.
 
Educational Argumentative Topics. The. Online assignment writing service.
Educational Argumentative Topics. The. Online assignment writing service.Educational Argumentative Topics. The. Online assignment writing service.
Educational Argumentative Topics. The. Online assignment writing service.
 
Custom Term Paper Writing Service - YouTube
Custom Term Paper Writing Service - YouTubeCustom Term Paper Writing Service - YouTube
Custom Term Paper Writing Service - YouTube
 
How To Write A Good Biology Research Paper Artofit
How To Write A Good Biology Research Paper ArtofitHow To Write A Good Biology Research Paper Artofit
How To Write A Good Biology Research Paper Artofit
 
HOW TO WRITE AN ABSTRACT. Online assignment writing service.
HOW TO WRITE AN ABSTRACT. Online assignment writing service.HOW TO WRITE AN ABSTRACT. Online assignment writing service.
HOW TO WRITE AN ABSTRACT. Online assignment writing service.
 
Writing A Self Reflective Essay How To Write A Re
Writing A Self Reflective Essay How To Write A ReWriting A Self Reflective Essay How To Write A Re
Writing A Self Reflective Essay How To Write A Re
 
Chinese Essay Writing Service. Online assignment writing service.
Chinese Essay Writing Service. Online assignment writing service.Chinese Essay Writing Service. Online assignment writing service.
Chinese Essay Writing Service. Online assignment writing service.
 
Outer Space Astronaut Writing Paper - Lined And
Outer Space Astronaut Writing Paper - Lined AndOuter Space Astronaut Writing Paper - Lined And
Outer Space Astronaut Writing Paper - Lined And
 
Top Funny Essay Titles By Menskool - Issuu
Top Funny Essay Titles By Menskool - IssuuTop Funny Essay Titles By Menskool - Issuu
Top Funny Essay Titles By Menskool - Issuu
 
Essay Examples Life Experience Essay. Online assignment writing service.
Essay Examples Life Experience Essay. Online assignment writing service.Essay Examples Life Experience Essay. Online assignment writing service.
Essay Examples Life Experience Essay. Online assignment writing service.
 
Essay About Can Money Buy Ha. Online assignment writing service.
Essay About Can Money Buy Ha. Online assignment writing service.Essay About Can Money Buy Ha. Online assignment writing service.
Essay About Can Money Buy Ha. Online assignment writing service.
 
Definitive Argument Topics. Definition Essay Topics T
Definitive Argument Topics. Definition Essay Topics TDefinitive Argument Topics. Definition Essay Topics T
Definitive Argument Topics. Definition Essay Topics T
 
My College Essay Paragraph On My College Why I Love My College
My College Essay Paragraph On My College Why I Love My CollegeMy College Essay Paragraph On My College Why I Love My College
My College Essay Paragraph On My College Why I Love My College
 

Recently uploaded

S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
NelTorrente
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
MERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDFMERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDF
scholarhattraining
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Ashish Kohli
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 

Recently uploaded (20)

S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
MERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDFMERN Stack Developer Roadmap By ScholarHat PDF
MERN Stack Developer Roadmap By ScholarHat PDF
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 

A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.pdf

  • 1. A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level Iddo Drori1,a,b , Sarah Zhanga , Reece Shuttlewortha , Leonard Tangc , Albert Lua , Elizabeth Kea , Kevin Liua , Linda Chena , Sunny Trana , Newman Chengb , Roman Wangb , Nikhil Singha , Taylor L. Pattic , Jayson Lynchd , Avi Shporera , Nakul Vermab , Eugene Wub , and Gilbert Stranga a Massachusetts Institute of Technology; b Columbia University; c Harvard University; d University of Waterloo This manuscript was compiled on April 7, 2022 We demonstrate that a neural network pre-trained on text and fine- tuned on code solves mathematics course problems, explains solu- tions, and generates new questions at human level. We automatically synthesize programs using few-shot learning and OpenAI’s Codex transformer and execute them to solve course problems at 80% auto- matic accuracy. We curate a new dataset of questions from MIT’s large mathematics courses (Single Variable and Multivariable Cal- culus, Differential Equations, Introduction to Probability and Statis- tics, Linear Algebra, and Mathematics for Computer Science) and Columbia University’s Computational Linear Algebra course. We solve questions from a MATH dataset (on Prealgebra, Algebra, Count- ing and Probability, Intermediate Algebra, Number Theory, and Pre- calculus), the latest benchmark of advanced mathematics problems designed to assess mathematical reasoning. We randomly sample questions from each course and benchmark topic and generate so- lutions with multiple modalities, including numbers, equations, and plots. The latest GPT-3 language model pre-trained on text auto- matically solves only 18% of these university questions. In con- trast, using program synthesis and few-shot learning with Codex that is fine tuned on code generates programs that automatically solve 80% of these questions. Our approach improves upon the previous state-of-the-art automatic solution accuracy on the benchmark top- ics from 8.8% to 81.1%. We perform a survey to evaluate the quality and difficulty of generated questions. This work is the first to au- tomatically solve university-level mathematics course questions at a human level. This is also the first work to explain, and generate university-level mathematics course questions at scale, a milestone for higher education. Neural networks | Mathematics courses | Answering, explaining, and generating questions | Introduction Until this work, it was widely believed that neural net- works could not solve advanced mathematics problems (1). However, the previous unsuccessful studies used only text-based pre-training. We now demonstrate that a neural network, OpenAI Codex, that is pre-trained on text and fine- tuned on code automatically answers 80% of university-level mathematics problems by program synthesis using few-shot learning. Figure 1 illustrates several example problems: computing the volume generated by rotating the graph of a single variable function around an axis, computing the Lorenz attractor and its projection, and computing and demonstrating the geometry of a singular value decomposition (SVD). For the first time, we demonstrate that not only can a single machine learning model solve these example problems, it can solve a wide variety of mathematics courses at scale. Related Work. Transformers are deep learning architectures based only on attention mechanisms (2) that do not use re- current neural networks or convolutional neural networks. Transformer-based language models have enjoyed tremendous success across a variety of natural language processing (NLP) tasks, including zero-shot and few-shot language tasks (3). However, these models have largely failed to solve math prob- lems (4–6). In particular, previous work using transformers, such as GPT-3 (3), have failed to solve mathematics courses problems because the transformers were pre-trained on text alone. Pre-training a transformer is computationally expensive and most often involves vast amounts of unlabeled data. The most common optimization objectives for pre-training language models are (i) masked word prediction: predicting a random deleted word in a sentence or predicting the next word, or (ii) classifying whether two sentences follow each other or not. This computationally expensive step is usually done once and followed by a relatively fast fine-tuning step. In fine-tuning, Significance Statement We provide the first demonstration that a neural network auto- matically solves, explains, and generates university-level prob- lems from the largest MIT mathematics courses at human level. Our methods combine three innovations: (i) using recent neural networks pre-trained on text and fine-tuned on code, rather than pre-trained on text, (ii) few shot-learning to automatically synthesize programs that correctly solve course problems, and (iii) a pipeline to solve questions, explain solutions, and gener- ate new questions indistinguishable by students from course questions. Our work improves upon state-of-the-art increas- ing automatic accuracy on randomly sampled questions on a benchmark by an order of magnitude. Implications for higher education include new roles of AI in automated course evalua- tion and content generation. 1. Wrote the paper. 2. Created figures. 3. Provided ideas. 4. Curated datasets. 5. Gen- erated questions. 6. Generated solutions. 7. Run GPT-3 and Codex. 8. Executed programs. 9. Evaluated solutions. 10. Analyzed data. 11. Wrote code. 12. Designed survey. I.D. 1,2,3,4,5,6,7,8,9,10,11,12; S.T. 1,4,5,6,7,8,9; R.W. 4,5,6,7,8,9; K.L. 1,4,5,6,8,9; N.C. 4,5,6,7,8,9; L.T. 1,4,5,6,7,8,9; E.K. 4,7,8,9; N.S. 1,2,3,10,11,12; T.L.P. 1,2,3,7,8,9; J.L. 1,3,4,5,6,7,8,9; A.S. 1,10; N.V. 1,3,10,12; E.W. 1,2,3,12; G.S. 1,3 There are no competing interests. 1 Iddo Drori. E-mail: idrori@mit.edu www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX PNAS | April 7, 2022 | vol. XXX | no. XX | 1–180 arXiv:2112.15594v3 [cs.LG] 6 Apr 2022
  • 2. Input (Problems) Output (Answers, Visualizations, Explanations) MIT Courses Columbia U Course Zero-Shot Learning 70% automatic solve rate Calculus Differential Eq. Intro to Prob. Linear Algebra Math for CS Comp. Lin. Alg. Prealgebra Algebra Number Theory Counting and Prob. Inter. Algebra Precalculus MATH Topics Few-Shot Learning 80% automatic solve rate Program Synthesis As-is / write a program / use sympy / use simulations Codex Calculus Differential Equations Linear Algebra Introduction to Probability and Statistics 1. Generate a random number x from a uniform distribution on the interval [0, θ] 2. Test the hypothesis that θ = 2 by rejecting H0 if x ≤ 0.1 or x ≥ 1.9 3. Simulate the probability of a type I error Explanation Fig. 1. We apply a neural network, OpenAI Codex, to solve, explain, and generate mathematics problems. Our input math problems are randomly sampled from MIT and Columbia University courses and from the MATH dataset (left). We use zero-shot and few-shot learning to automatically generate programs, solving 80% of the questions. We then use Codex to explain the synthesized programs. The modality of the program outputs take diverse forms depending on the course problems, such as numerical responses or plots generated from text by program synthesis (right). For example, in Calculus 18.01-2: the volume generated by rotating the finite 2-dimensional region bounded by two 2-dimensional graphs about the plotted axis (top right); in Differential Equations 18.03: the Lorenz strange attractor (bottom left); In Linear Algebra 18.06: the geometry of the singular value decomposition (SVD) (bottom right). the pre-trained model is tuned using a specific dataset or task. This work demonstrates that OpenAI’s Codex (7), a trans- former that has been pre-trained on text and then fine-tuned on code, generates programs (i.e., conducts program synthesis) that solve math course problems at scale, and that few-shot learning automatically solves 80% of the math course problems. Previous work has seen modest success on simpler or spe- cialized mathematics problem benchmarks. Techniques based on co-training output to verify (8, 9) or predict expression trees (10–15) are able to solve elementary school-level math prob- lems, such as MAWPS and Math23k, with over 80% accuracy. However, these approaches do not extend to high-school, math Olympiad, nor university-level courses. Co-training paired with graph neural networks (GNNs) to predict arithmetic expression trees is able to solve university-level problems in Machine Learning (16) with up to 95% accuracy. However, that work is limited to numeric answers, and is moreover over- fit to a specific course, such that it does not generalize to other courses. Major Contributions. Our main contribution, as shown in Fig- ure 2, is demonstrating that a single neural network model, OpenAI Codex, automatically solves 80% of randomly selected university-level mathematics problems (six MIT mathematics courses and one Columbia University course) by program syn- thesis and few-shot learning. We also automatically explain the solutions and generate new questions, a process requiring only seconds per problem. The courses are listed in Table 1. We randomly sample 25 questions per course and the prob- lems are solved as-is or with minor contextual information that is automatically applied. If applicable, the neural net- work outputs an executable program that answers the problem and visualizes the result. Furthermore, our method explains the solutions and generates new problems that are nearly indistinguishable from human-written problems. This methodology increases the solution accuracy on the MATH benchmark (5) from 8.8% accuracy using previously state-of-the-art methods to 81.1% accuracy using automatic few-shot learning. The MATH benchmark measures the math- ematical problem-solving ability of neural network models with challenging problems sourced from high school math competitions, such as the AMC 10∗ , AMC 12, and AIME† . The methods we propose are simple and broadly applica- ble. The first is using a transformer model pre-trained on text and fine-tuned on code, so that it is adept at synthesiz- ing programmatic solutions. The second is to use zero-shot learning of the questions as-is or with automatically added contextual information about the problem or program. The third is to embed the questions and use few-shot learning based on question-code pairs of nearest embedded zero-shot questions. In our work, we use the latest version of Open AI Codex (7), namely code-davinci-002. Methods Dataset. We randomly sample 25 questions from each of seven courses: MIT’s 18.01 Single Variable Calculus, 18.02 Multi- variable Calculus, 18.03 Differential Equations, 18.05 Intro- duction to Probability and Statistics, 18.06 Linear Algebra, 6.042 Mathematics for Computer Science, and Columbia Uni- versity’s COMS3251 Computational Linear Algebra. For the MATH dataset, we randomly sample 15 questions from each of six topics in the dataset. We validate that our results are not merely overfitting training data by solving a new Computa- tional Linear Algebra course COMS3251 which is not available ∗ American Mathematics Competitions † American Invitational Mathematics Examination 2 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 3. Randomly Sampled Questions (100% Solved) Zero-Shot Successful (70% Automatic Solve Rate) Few-Shot Successful (+10% Automatic Solve Rate) Manual Modification (20%) Few-Shot Unsuccessful Zero-Shot Unsuccessful Zero-Shot Learning: Acceptable input is: • no change to original question • add “write a program” • add “using sympy” • add “using simulations” Few-Shot Learning: If zero-shot does not work, per- form few-shot learning using (question, code) pairs: 1. Embed all questions 2. Compute pairwise similarity between embeddings 3. Rank the questions that were zero-shot to select examples for few-shot Fig. 2. We select a random sample of questions from each course or topic that do not contain input images or require proofs. A language model pre-trained on text (GPT-3 text-davinci-002) automatically solves only 18% (for courses) and 25.5% (for the MATH benchmark topics) of these questions. In contrast, using zero-shot learning with a network pre-trained on text and fine tuned on code (OpenAI Codex code-davinci-002) we synthesize programs that automatically solve 70% (for courses) and 72.2% (for the MATH benchmark topics) of the questions, and using few-shot learning automatically solve 80% (for courses) and 81.1% (for the MATH benchmark topics) of the questions. We use the nearest embedded zero-shot questions and their synthesized code for few-shot learning. The remaining 20% of the course questions and 18.9% of MATH benchmark topic questions are manually prompted to solve the question. online, and is therefore unseen during Codex training. We automatically obtain correct answers for 80% of the randomly sampled university math course questions and 81.1% of the randomly sampled MATH benchmark questions. The previous state-of-the-art on this benchmark before this work was 8.8% (4). Workflow. Our method takes a course problem as input and synthesizes a program that, when run, outputs the solution. Figure 4 compares the percent of automatically solved ques- tions for each course using our zero-shot learning and few-shot learning approaches with the latest GPT-3 (text-davinci-002) and Codex (code-davinci-002) versions. The error bars on the totals are standard errors. Figures 3 shows examples of automatic workflows for solv- ing course questions and generating explanations using Codex. The panels show the original question, the automatic aug- mentation with context, the resulting synthesized program, the executed output answer that is the solution, and the ex- planation of the solution program. Questions are given to Codex either as-is, or by automatically adding minor context, as described below. Automatic Context. Programming Language Context. Best results are obtained when the Codex prompt specifies that a program should be written and specifies which programming language should be used. We add the text “write a program” before the question and focus on the Python programming language by placing the text within Pythonic triple quotes. Library Context. Likewise, best results are obtained when the Codex prompt specifies which programming package should be used. For instance, we may add the Python library SymPy as context (see Figure 3 top panel 18.01), specifying that the program synthesized to solve the problem should use this package. Figure 5 shows the Python programming packages used by each course. Each colored stacked bar represents the number of questions in the course using that package. NumPy and Sympy are used by all courses. Matplotlib is used by courses with questions that require plotting. Math, Random, and SciPy are used by around half of the courses. The usage patterns of these courses is incorporated automatically in our approach; as we only specify SymPy or plot related imports, these other package imports are automatically synthesized. Multi-modal Output. The output answer may be of numerous modalities. In the examples featured in Figure 3, the output answers are: an equation (18.01), a Boolean value (18.02), a plot (18.03), a numerical value (18.05), and a vector (18.03 and 18.06). Simulation. Figure 3 (18.05) shows an example from Prob- ability and Statistics where the question is turned into a probabilistic programming task that generates simulations in order to compute an empirical statistic. Automatic Zero-Shot and Few-Shot Learning. Zero-shot learn- ing synthesizes a program from the original question or from the automatically augmented question without using any exam- ples. This method automatically solves 70% of the questions. Few-shot learning begins by embedding all the questions that are solved by zero-shot learning. Then, separately for each course, we select the nearest zero-shot questions based on the cosine similarity with the target question. We use the nearest zero-shot (question, code) pairs as examples for few- shot learning. This increases the number of questions that are automatically solved from 70% by zero-shot learning to 80% by few-shot learning. Figure 3 (18.02) demonstrates few-shot Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 3
  • 4. ID Course Question Solution 1 18.01 Single Variable Calculus A bacteria population is 4000 at time t = 0 and its rate of growth is 1000 ∗ 2t bacteria per hour after t hours. What is the population after one hour? 4000 + 1000 log(2) 2 18.02 Multi-variable Calculus Describe the graph of the function f: f(x, y) = 10 − p x2 + y2 3 18.03 Differential Equations Find general solutions of the differential equations. If an initial condition is given, find the corresponding particular solution. Throughout, primes denote derivatives with respect to x. y′ + y = 2, y(0) = 0 y(x) = 2(1 − e−x ) 4 18.05 Introduction to Probability and Statistics Calculate the probability of getting a three-of-a-kind poker hand. 0.021128 5 18.06 Linear Algebra Find a combination x1w1 + x2w2 + x3w3 that gives the zero vector with x1 = 1. w1 is the vector (1;2;3). w2 is the vector (4; 5; 6). w3 is the vector (7; 8; 9). x1 = 1, x2 = −2, x3 = 1 6 6.042 Mathematics for Computer Science Find a number x ∈ {0, 1, ..., 112} such that 11x ≡ 1 (mod 113). 72 7 COMS3251 Computational Linear Algebra Given a d-dimensional non-zero vector v, compute the rank of the matrix vv′ 1 8 MATH Prealgebra What is the greatest common factor of 84, 112 and 210? 14 9 MATH Algebra Let N, O be functions such that N(x) = 2 √ x, and O(x) = x2 . What is N(O(N(O(N(O(3))))))? 24 10 MATH Number Theory How many four-digit numbers whose digits add up to 9 are divisible by 11? 0 11 MATH Counting and Probability A standard six-sided fair die is rolled four times. The probability that the product of all four numbers rolled is a perfect square is m n , where m and n are relatively prime positive integers. Find m + n. 187 12 MATH Intermediate Algebra Given that x2 + y2 = 14x + 6y + 6, find the largest possible value of 3x + 4y. 73 13 MATH Precalculus If the six solutions of x6 = −64 are written in the form a + bi, where a and b are real, find the product of those solutions with a > 0. 4 Table 1. Example questions and solutions from six MIT courses (18.01, 18.02, 18.03, 18.05, 18.06, 6.042), one Columbia University course (COMS3251), and six topics of the MATH dataset (Pre-Algebra, Algebra, Intermediate Algebra, Counting and Probability, Precalculus, Number Theory). The solutions can contain numerical answers, equations, and plots, among other modalities. learning. Given the original question, the nearest embedded question is found among the question that have been zero-shot. This (question, code) pair along with the automatic input are given to Codex which results in generated code that correctly solves the problem. Manual Prompts. Question Tidying. While 80% of the question are automati- cally solved by zero-shot and few-shot learning; 20% of the questions may require manual prompts. These questions may be vague or contain superfluous information (e.g., reference movie characters or current events), and require tidying to extract the essence of the question. Question tidying primarily involves removing superfluous information, breaking down long sentence structures into smaller components, and converting prompts into a programming format. Interaction for Visualization. Another form of manual prompt- ing occurs when an answer plot requires multiple steps to generate a visually pleasing and clear plot. These special cases, which are among the remaining 20% of the questions re- quire interactively prompting Codex until reaching the desired visualizations. Automatic Explanation. Explanations are generated automati- cally using a prompt consisting of three quotes followed by the text “Here’s what the above code is doing: 1.”. This prompt is given after the question and the generated code, and results in a step-by-step explanation of the solution code. Automatic Questions Generation and their Human Evaluation. We use Codex to generate new questions for each course. This is done by creating a numbered list of questions from the dataset. This list is cut off after a random number of questions, and the result is used to prompt Codex to generate the next question. This process is repeated to generate many new questions for each course. Questions may be generated based on course topic and given student performance may also be generated based on level of difficulty. To evaluate the generated questions, we survey MIT stu- dents who have taken these courses or their equivalents to compare the quality and difficulty of machine-generated ques- 4 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 5. Ans 2*x/(2*x-3) - 2*(x**2-1)/(2*x-3)**2 Find the derivative of the function using the definition of a derivative. f(x)= (x**2-1)/(2*x-3) Question 18.01 Single Variable Calculus Program Using sympy find the derivative of the func- tion using the definition of the derivative. f(x)= (x**2-1)/(2*x-3) Input Codex Codex Explanation 1. We import sympy as sp 2. We create a symbol x 3. We create a function f 4. We print the derivative of f with respect to x Ans R =1/2[ 2,−√2;√2,√2] What 2 by 2 matrix R rotates every vector through 45 degrees? Example: the vector [1,0] goes to [sqrt(2)/2, sqrt(2)/2]. Question 18.06 Linear Algebra Program Codex Codex Explanation 1. We define a function called rotation_matrix which takes in a parame- ter theta. 2. We return a 2D numpy array which is the rota- tion matrix. 3. We call the function with theta = pi/4. 4. We print the result. Ans 0.1 One generates a number x from a uniform distribution on the interval [0, θ]. One de- cides to test H0 : θ = 2 against HA : θ = 2 by reject- ing H0 if x ≤ 0.1 or x ≥ 1.9. Compute the probability of a type I error. Question 18.05 Introduction to Probability and Statistics Program One generates a number x from a uni- form distribution on the interval [0, θ]. One de- cides to test H0 : θ = 2 against HA : θ = 2 by rejecting H0 if x ≤ 0.1 or x ≥ 1.9. Using simula- tions, compute the probability of a type I error. Input Codex Codex Explanation 1. Generate a random number x from a uniform distribution on the in- terval [0, θ] 2. Test the hypothesis that θ = 2 by rejecting H0 if x ≤ 0.1 or x ≥ 1.9 3. Simulate the probabil- ity of a type I error Lorem ipsum Ans (0,0) Find the critical point or points of the given autono- mous system. dx/dt=2*x-y, dy/dt=x-3y Question 18.03 Differential Equations Program Use Sympy to find the criti- cal point or points of the given autonomous system. dx/dt=2*x-y, dy/dt=x-3y Plot using streamplot Input Codex Codex Explanation 1. We create a grid of x and y values. 2. We create a function for dx/dt and dy/dt. 3. We plot dx/dt and dy/dt. 4. We find the critical points. 5. We plot the critical points. Determine whether the alter- nating series con- verge or diverge. Question 18.02 Multivariate Calculus Program Use comparison tests to determine whether the infinite series converge or diverge. from sympy import Sum, Symbol, oo, limit, init_printing init_printing() n = Symbol('n') s = Sum(1/(n**2 + n + 1), (n, 1, oo)) limit(s, n, oo) """ The series converges. """ Few Shot (Question, Code) Codex Explanation Codex 1. The first line imports the Sum function from the sympy module. 2. The second line imports the Symbol function from the sympy module. 3. The third line imports the oo function from the sympy module. 4. The fourth line imports the limit function from the sympy module. 5. The fifth line imports the init_printing function from the sympy module. 6. The sixth line creates a vari- able called n and assigns it the value of the Symbol function. 7. The seventh line creates a vari- able called s and assigns it the value of the Sum function. 8. The eighth line prints the limit of the variable s as n approaches infinity. Ans Converges Write a program using sympy to determine whether the alternat- ing series converges or diverges. Input Fig. 3. Example pipelines automatically solving questions from MIT mathematics courses and explaining the solutions. 18.01 Single Variable Calculus Zero-Shot example: Given a question and the automatically generated prefix "using SymPy", Codex is prompted and outputs a program. Running the program results in equations which are the correct answer. The program is then fed to Codex again with an automatic prompt which results in a generated explanation of the code. 18.02 Multi-variable Calculus Few-Shot example: Given a question, the prefix "write a program using SymPy" is automatically generated. The question is embedded along with the other questions in the course that are zero-shot. The nearest zero-shot question and its corresponding code are used as a few-shot example. Together the few-shot example pair and the input question are fed into Codex, which generates a program that solves the question. The question, program, and a prompt for explanation are fed into Codex to generate the explanation. 18.03 Differential Equations Zero-Shot example: In this example the answer is both a vector and a plot. 18.05 Introduction to Probability and Statistics Zero-Shot example: Given the question, a probabilistic program is generated by using simulations. 18.06 Linear Algebra Zero-Shot example: The output answer is the correct vector. tions with human-written questions for each of the courses ‡ ‡ The IRB that approved the survey is MIT IRB Exempt Id E-3792. The survey was optional, included informed consent, with the following description: “We are conducting a survey to assess the quality and difficulty of automatically generated questions for STEM courses. You will be presented with a series of blocks consisting of questions, either human-written (taken from an actual course) or Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 5
  • 6. Codex Few−Shot Codex Zero−Shot GPT−3 0.00 0.25 0.50 0.75 1.00 1 8 . 0 1 1 8 . 0 2 1 8 . 0 3 1 8 . 0 5 1 8 . 0 6 6 . 0 4 2 C O M S 3 2 5 1 T o t a l MIT Mathematics Courses and a New Columbia Course Automatic Solve−Rate A A l g e b r a C o u n t i n g & P r o b a b i l i t y I n t e r m e d i a t e A l g e b r a N u m b e r T h e o r y P r e a l g e b r a P r e c a l c u l u s T o t a l MATH Benchmark B Fig. 4. Comparison between automatic solve rates on MIT mathematics courses and a new Columbia University course (A), and a MATH benchmark dataset (B). The latest OpenAI GPT-3 (text-davinci-002), a transformer pre-trained on text, achieves 18% (A) and 25.5% (B) automatic solve rates. In contrast, program synthesis with zero-shot and few-shot learning using the latest OpenAI Codex (code-davinci-002), a transformer pre-trained on text and fine tuned on code, achieves an automatic solve rate of 80% (A) and 81.1% (B). We randomly sample five original questions and five generated questions for each MIT course. In the survey, students are asked to read ten questions from each course, which are a mixture of randomly ordered human-written and machine- generated questions. For each of the 60 course questions, the students are asked three survey questions: (i) is the question human-written or machine-generated? (ii) is the question appropriate or not appropriate for the specific course? and (iii) what is the question’s difficulty level on a scale between 1 (easiest) and 5 (hardest)? An example of this survey format is given in Figure 6. The students are asked to simply provide their ratings and not to solve the questions. The survey is conducted online and anonymously. generated with machine learning, but you will not be told the source of a given question. For each question, you will be asked (a) whether you think the question is human-written or machine- generated, (b) whether the question is appropriate for the given course, and finally (c) how you would rate the difficulty of the question. Please carefully read each question and answer to the best of your ability”. 0 10 20 30 18.03 18.01 18.02 COMS3251 18.05 18.06 6.042 Course Count Library math matplotlib mpl_toolkits networkx numpy random scipy sympy Fig. 5. Imported Python programming libraries by course: NumPy is used by nearly all courses. Matplotlib is used by courses with questions that involve plotting. Sympy is used by most of the courses, and SciPy by half of the courses. Introduction to Probability and Statistics (18.05) uses the statistics package. Results Questions Solved. We solve a total of 265 questions, 213 of them automatically, as described in the Supplemen- tary Information. These 265 questions include 25 ran- domly sampled questions from each of the seven courses (18.01/18.02/18.03/18.05/18.06/6.042/COMS3251), and 15 randomly sampled questions for each of the six topics in the MATH dataset (Prealgebra/Algebra/Number Theory/Count- ing and Probability/Intermediate Algebra/Precalculus). The breakdown of the questions solved automatically by zero-shot and few-shot learning as compared with GPT-3 is shown in Figure 4. Visualization of Embedded Questions. We embed the 175 mathematics course questions onto a 2,048 dimensional space using OpenAI’s (text-similarity-babbage-001) embed- ding which captures semantic similarity between texts. We then use uniform manifold approximation and projection (UMAP) (17) reducing dimensionality to two dimensions. Fig- ure 8 shows that the embedded questions are clustered by course topics. On the top right we see clusters of questions from MIT’s 18.06 Linear Algebra (black plus) and Columbia’s COMS3251 Computational Linear Algebra (yellow plus). On the left side we see a cluster of the questions from MIT’s 18.01 (red circles), 18.02 (green circles), and 18.03 (blue circles). On the bottom right we see a cluster of the questions from MIT’s 18.05 Introduction to Probability and Statistics (magenta x) and 6.042 Mathematics for Computer Science (cyan x), both courses cover probability and statistics. Automatically Generating New Questions. We use the curated questions to generate new questions for each of the courses and topics. This is done by prompting Codex with numbered human-written questions and automatically generating the next question. We generate 130 new questions as shown in the Supplementary Information. These include ten new questions for each of the seven courses and each of the six MATH topics. 6 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 7. Fig. 6. Student survey question: For each of 60 questions, students are asked if (i) the question is human-written or machine-generated, (ii) the question is appropriate or inappropriate for the course, and (iii) to rate the difficulty level of each question on a scale between 1 (easiest) and 5 (hardest). Fig. 7. Student survey results: Panel A compares between the level of difficulty of human-written questions and questions generated by our approach for each course based on the student ratings. The plot shows the means of the difficulty ratings between 1 (easiest) and 5 (hardest), and their 95% confidence intervals. Panel B shows the percentage of human-written and machine-generated questions rated as appropriate and not appropriate for the course. Panel C shows the percentage of human-written questions rated as human-written or machine-generated (left) and the percentage of machine-generated questions rated as human-written or machine-generated (right). Table 2 shows one generated question for each course and MATH topic. Generating a question takes less than a second and we can generate an arbitrarily large number of questions. We create prompts of 25 randomly selected problems for which Codex generates correct answers, remove the questions after a randomly chosen question in the list, and have Codex complete the next new question. Student Survey Results. Fifteen participants completed our survey answering questions about all 60 questions, taking a median of 40 minutes. Figure 7 summarizes the results of the student survey comparing human-written and machine- generated questions. Panel A compares between the level of difficulty of human-written questions and questions generated by our approach for each course based on the student ratings. The plot shows the means of the difficulty ratings between 1 (easiest) and 5 (hardest), and their 95% confidence intervals. Panel B shows the percentage of human-written and machine- generated questions rated by students as appropriate or not appropriate for the courses. Panel C shows the percentage of human-written questions rated as human-written or machine- generated (left) and the percentage of machine-generated ques- tions rated as human-written or machine-generated (right). Summarizing the student survey results: • Survey participants rated our machine-generated ques- tions and human-written questions to be at a similar level of difficulty within confidence intervals. • Survey participants rated human-written questions slightly more appropriate for the courses than machine- generated ones. Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 7
  • 8. Fig. 8. Visualization of embeddings of course questions: We embed the course questions into a 2,048 dimensional space using OpenAI’s (text-similarity-babbage-001) embedding which captures semantic similarity between texts. We then use uniform manifold approximation and projection to reduce dimensionality to two dimensions. This shows distinctive clusters based on topics. On the top right we see clusters of questions from MIT’s 18.06 Linear Algebra and Columbia’s COMS3251 Computational Linear Algebra. On the left side we see a cluster of the questions from MIT’s 18.01, 18.02, and 18.03. On the bottom right we see a cluster of the questions from MIT’s 18.05 Introduction to Probability and Statistics and 6.042 Mathematics for Computer Science, both course cover probability and statistics. • Survey participants rated human-written questions slightly more likely to be human-written as shown in the left side of Panel C. Survey participants rated machine- generated questions equally likely to be machine-generated and human-written as shown in the right side of Panel C. Human Level. We achieve 80% automatic accuracy on solving mathematics course problems at MIT and Columbia. This is comparable to typical student performance on these problem sets in our courses at MIT and Columbia University. We au- tomatically generate new questions that are indistinguishable by students from human course questions. Implementation Details. We use the latest version of OpenAI’s GPT-3 text-davinci-002 and Codex codex-davinci-002 engine for all of our experiments. We fix all of Codex’s hyperparame- ters to be the same for all solution and explanation experiments to yield deterministic and reproducible results. Specifically, top P which controls diversity is set to 0 and sampling tem- perature, which controls randomness, is also set to 0. Both frequency and presence penalty are set to 0, and we do not halt on any stop sequences. For all new question generation experiments we allow diversity and randomness by setting top P and temperature to 0.1. Each prompt is structured as a Python documentation comment surrounded by triple quota- tions and line breaks. We evaluate the solution by running the output using a Python interpreter. Answers are correct if a printed output is the correct solution, or if the program function returns the correct solution, and the explanation of the solution is correct. Types of Problems the Model Cannot Solve. There are a few different types of problems the model is incapable of solving: (i) any problem for which images or other non-text modalities of input are critical to the question, (ii) questions with solutions that require proofs, and (iii) problems that are computationally intractable, such as factoring very large primes. This last category is not expected in any math course assignment, as students themselves would also be unable to answer them. That being said, many questions that students can answer have generalizations that are computationally intractable. Conclusion We demonstrate that program synthesis and few-shot learning using OpenAI Codex is able to solve, explain, and generate university-level mathematics problems at human level. In con- trast, previous methods use transformers pre-trained only on text, such as GPT-3, and fail at these tasks. We verify that our strong results are not overfitting the training data by solving a new course that is not available online. We also generate and analyze new problem sets. The success of this work confirms that programs serve as a good representation and computation environment for solving math problems. Since our approach requires no additional training, it has been scaled-up to over thirty STEM courses. This work addresses major pedagogical challenges, bringing substantial benefits to higher education such as tools for curriculum design and analysis, and automatic content generation. We show that neural network synthesis with modern pro- gramming languages is more dynamic and widely applicable than expression trees, and is therefore likely to solve a wider range of problems. Although any finite computation could 8 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 9. ID Course Machine-generated question Closest question in the dataset Similarity 1 18.01 Single-Variable Calculus Find the area of the region bounded by the curve and the x-axis. y = x2 sin(x), 0 ≤ x ≤ π Find the area of the region under the given curve from 1 to 2. y = (x2 + 1)/(3x − x2 ) 0.61 2 18.02 Multi-Variable Calculus Find a × b. a = h9, −2, 1i, b = h−2, 1, 1i Find a × b. a = h5, −1, −2i, b = h−3, 2, 4i 0.87 3 18.03 Differential Equations Use the method of separable variables to solve the initial-value problem dy dx = 5ex , y(2) = 12 when x = 2 Separate variables and use partial fractions to solve the initial value problems. Use either the exact so- lution or a computer-generated slope field to sketch the graphs of several solutions of the given differen- tial equation, and highlight the indicated particular so- lution. f′ (x) = 3f(x)(5 − f(x)), f(0) = 8 0.21 4 18.05 Introduction to Probability and Statistics Let X be a uniformly distributed random variable over the interval [0, 1). Find E[X2 ] Let X be the result of rolling a fair 4-sided die. Let Y be the result of rolling a fair 6-sided die. You win 2X dollars if X > Y and lose 1 dollar otherwise. After playing this game 60 times, what is your expected total gain? 0.29 5 18.06 Linear Algebra Write a Matlab code to determine if the given matrix A = [1, 1; 4, 4] is positive semidefinite and if it is neg- ative semidefinite. Find A′ A if the columns of A are unit vectors, all mu- tually perpendicular. 0.21 6 6.042 Mathematics for Computer Science A student is taking a test consisting of n multiple- choice questions. Each question has five possible an- swers, and only one of them is correct. The student knows that the probability that any particular question is answered correctly is 1 5 . Let X be the number of questions answered correctly by the student. What is E(X)? MIT students sometimes delay laundry for a few days. Assume all random values described below are mu- tually independent. A busy student must complete 3 problem sets before doing laundry. Each problem set requires 1 day with probability 2 3 and 2 days with prob- ability 1 3 . Let B be the number of days a busy student delays laundry. What is E(B)? 0.47 7 COMS3251 Computational Find a combination of the vectors " 1 2 3 4 5 6 7 8 9 # that gives the vector 1 2 3 . Find a combination of the vectors 1 2 3 4 5 6 7 8 9 # that give the zero vector. 0.90 8 MATH Pre-Algebra How many four-digit positive integers are there with hundreds digit 2? How many four-digit positive integers are there with thousands digit 2? 0.90 9 MATH Algebra Find the distance between the points (0, 0) and (3, 4). Find the distance between the points (0, 4) and (3, 0). 0.99 10 MATH Number Theory Find the smallest positive integer n such that n2 is di- visible by 210 and n3 is divisible by 310 . How many four-digit numbers whose digits add up to 9 are divisible by 11? 0.25 11 MATH Counting and Probability How many ways are there to divide a set of 10 objects into two sets of equal size? Compute 8 4 . 0.12 12 MATH Intermediate Algebra Let x and y be positive real numbers such that x2 + y2 = 1. Find the maximum value of xy. Given that x2 + y2 = 14x + 6y + 6, find the largest possible value of 3x + 4y. 0.59 13 MATH Precalculus Let A be the matrix 1 2 3 4 5 6 7 8 9 # Find the determinant of A2 + A3 . If det(A) = 2 and det(B) = 12, then find det(AB). 0.41 Table 2. Examples of new questions automatically generated by Codex for each course and the closest question from the course. be expressed as a sufficiently large expression tree, one may see an arbitrarily large expansion in the size of expression tree needed, as opposed to a Turing-complete language. This flexibility is bolstered by the massive corpus of programs that already exist, which eclipses the amount of labeled expression trees available. Program outputs are also inherently more human-readable, as the ability to use abstraction, modularity, and high-level logic lead to clearer illustrations of the path to solution. Furthermore, program synthesis can convey logical deductions directly through explanatory comments, as well as function and variable names. In particular, we see such explanatory text and derivations in a number of the Codex outputs. The unification of such formal and informal language is an inherent advantage of our methodology. We emphasize that the outputs may be complex and multi-modal. For ex- ample, using Python packages such as Matplotlib to produce graphs of equations. This advanced and unique ability is time-consuming for humans and offers a major pedagogical benefit. In summary, we automatically solve, explain, and gener- ate university-level mathematics course questions in real-time at human level. Students rated machine-generated questions as equally likely to be human-written as machine-generated, although rated human-written questions more likely to be human-written. Students rated machine-generated questions similarly difficult to human-written questions, as well as mostly appropriate for their respective courses. Finally, we have suc- ceeded in scaling-up this work to over thirty STEM courses across 13 departments in the schools of science and engineer- ing at MIT, Harvard, and the Ivy League universities, with Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 9
  • 10. excellent results. 1. CQ Choi, 7 revealing ways AIs fail: Neural networks can be disastrously brittle, forgetful, and surprisingly bad at math. IEEE Spectr. 58, 42–47 (2021). 2. A Vaswani, et al., Attention is all you need in Proceedings of Advances in neural information processing systems. Vol. 30, (2017). 3. TB Brown, et al., Language models are few-shot learners in Proceedings of Advances in Neural Information Processing Systems. (2020). 4. D Hendrycks, et al., Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020). 5. D Hendrycks, et al., Measuring mathematical problem solving with the math dataset in Pro- ceedings of Advances in Neural Information Processing Systems: Datasets and Benchmarks. (2021). 6. JW Rae, et al., Scaling language models: Methods, analysis insights from training gopher. arXiv preprint arXiv:2112.11446 (2021). 7. M Chen, , et al., Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021). 8. J Shen, et al., Generate rank: A multi-task framework for math word problems. arXiv preprint arXiv:2109.03034 (2021). 9. K Cobbe, et al., Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021). 10. Z Xie, S Sun, A goal-driven tree-structured neural model for math word problems in Proceed- ings of International Joint Conference on Artificial Intelligence. pp. 5299–5305 (2019). 11. Q Wu, Q Zhang, J Fu, XJ Huang, A knowledge-aware sequence-to-tree network for math word problem solving in Proceedings of Conference on Empirical Methods in Natural Lan- guage Processing. pp. 7137–7146 (2020). 12. J Qin, L Lin, X Liang, R Zhang, L Lin, Semantically-aligned universal tree-structured solver for math word problems in Proceedings of Conference on Empirical Methods in Natural Lan- guage Processing. (2020). 13. J Zhang, et al., Graph-to-tree learning for solving math word problems in Proceedings of the Annual Meeting of the Association for Computational Linguistics. pp. 3928–3937 (2020). 14. S Li, et al., Graph-to-tree neural networks for learning structured input-output translation with applications to semantic parsing and math word problem in Proceedings of Conference on Empirical Methods in Natural Language Processing. pp. 2841–2852 (2020). 15. Z Liang, J Zhang, J Shao, X Zhang, MWP-BERT: A strong baseline for math word problems. arXiv preprint arXiv:2107.13435 (2021). 16. S Tran, et al., Solving machine learning problems in Proceedings of Asian Conference on Machine Learning. (2021). 17. L McInnes, J Healy, J Melville, Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018). 10 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 11. A. Solution of MIT 18.01: Single Variable Calculus Prereq None Units 5-0-7 Syllabus Differentiation and integration of functions of one variable, with applications. Informal treatment of limits and continuity. Differentiation: definition, rules, application to graphing, rates, approximations, and extremum problems. Indefinite integration; separable first-order differential equations. Definite integral; fundamental theorem of calculus. Applications of integration to geometry and science. Elementary functions. Techniques of integration. Polar coordinates. L’Hopital’s rule. Improper integrals. Infinite series: geometric, p-harmonic, simple comparison tests, power series for some elementary functions. Table 3. MIT 18.01, Question 1: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Functions and Models Original question Sketch the graph of the function. f(x) = x + |x| Codex input Same as original question Codex code import numpy as np import matplotlib.pyplot as plt x = np.linspace(-10, 10, 100) y = x + np.abs(x) plt.plot(x, y) plt.xlabel(’x’) plt.ylabel(’y’) plt.grid() plt.show() Program solution Solution type Automatic Zero-Shot Codex code explanation 1. We import the numpy and matplotlib.pyplot libraries. 2. We create an array of 100 evenly spaced numbers from -10 to 10. 3. We create an array of the same size as x, where each element is the absolute value of the correspond- ing element in x. 4. We plot the graph of y against x. 5. We label the x-axis and y-axis. 6. We display the grid. 7. We display the graph. GPT-3 response The graph of the function is a line with a slope of 1 and a y-intercept of 0. GPT-3 evaluation Incorrect Table 4. MIT 18.01, Question 2: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Functions and Models Original question Find f ◦ g ◦ h. f(x) = 3x − 2, g(x) = sin(x), h(x) = x2 Codex input Same as original question Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 11
  • 12. Codex code import math def f(x): return 3*x-2 def g(x): return math.sin(x) def h(x): return x**2 def fgh(x): return f(g(h(x))) print(fgh(2)) Program solution 3 sin(x2 ) − 2 Solution type Automatic Zero-Shot Codex code explanation 1. fgh(2) 2. h(2) 3. g(4) 4. f(sin(4)) 5. 3 sin 4 − 2 GPT-3 response f◦g◦h = f(g(h(x))) = f(g(x**2)) = f(sin(x**2)) = 3*sin(x**2)-2 GPT-3 evaluation Correct Table 5. MIT 18.01, Question 3: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Functions and Models Original question Solve each equation for x. ln(x2 − 1) = 3 Codex input Same as original question Codex code import sympy x = sympy.Symbol(’x’) eq = sympy.log(x**2-1) - 3 print(sympy.solve(eq, x)) Program solution − √ 1 + e3, √ 1 + e3 Solution type Automatic Zero-Shot Codex code explanation 1. We import the sympy module. 2. We create a symbol x. 3. We create an equation. 4. We solve the equation for x. GPT-3 response x**2-1=e**3 x**2=e**3+1 x=sqrt(e**3+1) GPT-3 evaluation Correct Table 6. MIT 18.01, Question 4: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Limits and Derivatives Original question Determine the infinite limit. limx→5+ (x + 1)/(x − 5) Codex input Using sympy determine the infinite limit. lim_x_5+ (x+1)/(x-5) 12 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 13. Codex code from sympy import limit, Symbol, S x = Symbol(’x’) limit((x+1)/(x-5), x, 5) Program solution ∞ Solution type Automatic Zero-Shot Codex code explanation 1. We import the limit function from the sympy module. 2. We create a Symbol object called x. 3. We call the limit function and pass in the expression (x+1)/(x-5) and the variable x. 4. We pass in the point 5 as the third argument. GPT-3 response The limit does not exist because the function is not defined when x = 5. GPT-3 evaluation Incorrect Table 7. MIT 18.01, Question 5: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Limits and Derivatives Original question Find the limits as x → ∞ and as x → −∞. Use this information, together with intercepts, to give a rough sketch of the graph as in Example 12. y = x2 (x2 − 1)2 (x + 2) Codex input Same as original question Codex code import sympy as sym from sympy import Symbol from sympy import pprint sigma = Symbol(’sigma’) x = Symbol(’x’) mu = Symbol(’mu’) pprint(2*sym.pi*sigma) pprint(sym.sqrt(2*sym.pi*sigma)) part_1 = 1/(sym.sqrt(2*sym.pi*sigma)) part_2 = sym.exp(-1*((x-mu)**2)/(2*sigma)) my_gauss_function = part_1*part_2 pprint(my_gauss_function) sym.plot(my_gauss_function.subs({mu:10,sigma:30}),(x,0,100),title=’gauss distribution’) x_values = [] y_values = [] for value in range(-5,5): y = my_gauss_function.subs({mu:10,sigma:30,x:value}).evalf() y_values.append(y) x_values.append(value) print(value,y) import matplotlib.pyplot as plt plt.plot(x_values,y_values) plt.show() Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 13
  • 14. Program solution ∞; -∞; Solution type Automatic Zero-Shot Codex code explanation 1. We create a list of x values from -5 to 5. 2. We create an empty list to store the y values. 3. We loop through the x values. 4. For each x value, we calculate the y value. 5. We append the y value to the list of y values. 6. We plot the x values against the y values. GPT-3 response As x - infinity, y - infinity. As x - - infinity, y - infinity. The graph will look like a parabola that opens up. GPT-3 evaluation Incorrect Table 8. MIT 18.01, Question 6: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Limits and Derivatives Original question Find the derivative of the function using the definition of a derivative. f(x) = (x2 − 1)/(2x − 3) Codex input Using Sympy find the derivative of the function using the definition of a derivative. f(x)=(x**2-1)/(2*x-3) Codex code import sympy as sp x = sp.Symbol(’x’) f = (x**2-1)/(2*x-3) print(sp.diff(f,x)) Program solution 2x 2x−3 − 2(x2 −1) 2x−3 Solution type Automatic Zero-Shot Codex code explanation 1. We import sympy as sp 2. We create a symbol x 3. We create a function f 4. We print the derivative of f with respect to x GPT-3 response (2*x**2-1)/(2*x-3)**2 GPT-3 evaluation Correct Table 9. MIT 18.01, Question 7: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Differentiation Rules Original question Find the limit. limx→0 csc(x) sin(sin(x)) Codex input Same as original question Codex code from sympy import limit, csc, sin, Symbol x = Symbol(’x’) print(limit(csc(x)*sin(sin(x)), x, 0)) 14 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 15. Program solution 1 Solution type Automatic Zero-Shot Codex code explanation 1. We are finding the limit of csc(x)*sin(sin(x)) as x approaches 0. 2. We are using the limit function from the sympy library. 3. We are passing in the function csc(x)*sin(sin(x)) as the first argument. 4. We are passing in the variable x as the second argument. 5. We are passing in the number 0 as the third argument. GPT-3 response The limit does not exist. GPT-3 evaluation Incorrect Table 10. MIT 18.01, Question 8: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Differentiation Rules Original question The half-life of cesium-137 is 30 years. Suppose we have a 100-mg sample. How much of the sample remains after 100 years? Codex input Write a program. The half-life of cesium-137 is 30 years. Suppose we have a 100-mg sample. How much of the sample remains after 100 years? Codex code import math def half_life(t, h): return 100 * math.pow(0.5, t/h) print(half_life(100, 30)) Program solution 9.92 mg Solution type Automatic Zero-Shot Codex code explanation 1. We import the math module. 2. We define a function called half_life that takes two arguments: t and h. 3. We return the value of 100 * 0.5ˆ(t/h). 4. We print the result of half_life(100, 30). GPT-3 response After 100 years, 50 mg of the sample would remain. GPT-3 evaluation Incorrect Table 11. MIT 18.01, Question 9: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Differentiation Rules Original question The radius of a sphere is increasing at a rate of 4 mm/s. How fast is the volume increasing when the diameter is 80 mm? Codex input Same as original question Codex code import sympy as sp r = sp.symbols(’r’) d = 4*r v = 4/3*sp.pi*r**3 dv_dr = sp.diff(v,r) print(dv_dr.subs(r,80)) Program solution 25600π Solution type Manual Codex code explanation 1. We define the radius as a symbol. 2. We define the diameter as a function of the radius. 3. We define the volume as a function of the radius. 4. We take the derivative of the volume with respect to the radius. 5. We substitute the radius with 80 and print the result. GPT-3 response The volume of a sphere is increasing at a rate of approximately 201.06 mm3/s. Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 15
  • 16. GPT-3 evaluation Incorrect Table 12. MIT 18.01, Question 10: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Applications of Differentiation Original question The water level, measured in feet above mean sea level, of Lake Lanier in Georgia, USA, during 2012 can be modeled by the function L(t) = 0.01441t3 − 0.4177t2 + 2.703t + 1060.1 where t is measured in months since January 1, 2012. Estimate when the water level was highest during 2012. Codex input Same as original question Codex code import math def L(t): return 0.01441*t**3-0.4177*t**2+2.703*t+1060.1 def dL(t): return 0.04323*t**2-0.8354*t+2.703 def ddL(t): return 0.08646*t-0.8354 def newton(f, df, x0, epsilon): x = x0 while True: x1 = x - f(x)/df(x) if abs(x1 - x) epsilon: return x1 x = x1 print(newton(dL, ddL, 0, 0.00001)) Program solution Solution type Automatic Zero-Shot Codex code explanation 1. We define the function L(t) 2. We define the derivative of L(t) 3. We define the second derivative of L(t) 4. We define a function that uses Newton’s method to find the root of a function f(x) 5. We print the result of using Newton’s method to find the root of the derivative of L(t) GPT-3 response The water level was maximum at t=2. GPT-3 evaluation Incorrect Table 13. MIT 18.01, Question 11: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Applications of Differentiation Original question If a resistor of R ohms is connected across a battery of E volts with internal resistance r ohms, then the power (in watts) in the external resistor is P = E2 R/(R + r)2 . If E and r are fixed but R varies, what is the maximum value of the power? Codex input Same as original question 16 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 17. Codex code import sympy as sym from sympy import Symbol from sympy import pprint R = Symbol(’R’) E = Symbol(’E’) r = Symbol(’r’) P = E**2*R/(R+r)**2 print(P) dP = P.diff(R) print(dP) R_sol = sym.solve(dP,R) print(R_sol) R_max = R_sol[0].evalf() print(R_max) P_max = P.subs({R:R_max}) print(P_max) Program solution E2 4r Solution type Automatic Few-Shot (5) Codex code explanation 1. Define the variables 2. Define the function 3. Take the derivative of the function 4. Solve the derivative for R 5. Evaluate the solution 6. Substitute the solution into the original function GPT-3 response The maximum value of the power is P = Eˆ2R/(R+r)ˆ2 when R = r. GPT-3 evaluation Incorrect Table 14. MIT 18.01, Question 12: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Applications of Differentiation Original question Use Newton’s method to find all the solutions of the equation correct to eight decimal places. Start by drawing a graph to find initial approximations. x5 − 3x4 + x3 − x2 − x + 6 = 0 Codex input Use Newton’s method to find all the solutions between -100 and 100 with integer initial values. x**5- 3*x**4+x**3-x**2-x+6=0. Use initial points between -100 and 100. Round to 8 decimal places. Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 17
  • 18. Codex code import numpy as np import matplotlib.pyplot as plt def f(x): return x**5-3*x**4+x**3-x**2-x+6 def df(x): return 5*x**4-12*x**3+3*x**2-2*x-1 def newton(x, f, df, epsilon=1e-8, max_iter=1000): for i in range(max_iter): x_new = x - f(x)/df(x) if abs(x_new - x) epsilon: return x_new x = x_new return x_new def main(): x = np.linspace(-100, 100, 1000) y = f(x) plt.plot(x, y) plt.grid() plt.show() x_sols = [] for i in range(-100, 100): x_sols.append(newton(i, f, df)) print(x_sols) print(np.round(x_sols, 8)) if __name__ == ’__main__’: main() Program solution −1.044503071109637, 2.7055120932482875 Solution type Manual Codex code explanation 1. Plotting the function f(x) = x**5-3*x**4+x**3-x**2-x+6 2. Finding the derivative of f(x) 3. Using Newton’s method to find the roots of f(x) 4. Plotting the roots of f(x) GPT-3 response There are five solutions. -1.46557123 -0.53728871 0.36082474 1.12403607 2.09132432 GPT-3 evaluation Incorrect Table 15. MIT 18.01, Question 13: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Integrals Original question Evaluate R 1 1 √ 1 + x4 dx. Codex input Write a program to evaluate the integral from 1 to 1 of sqrt(1 + xˆ4) dx. 18 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 19. Codex code from sympy import Symbol, sqrt, Integral, pprint x = Symbol(’x’) k = Symbol(’k’) p = sqrt(1 + x**4) pprint(Integral(p, (x, 1, 1)).doit()) Program solution 0.0 Solution type Automatic Zero-Shot Codex code explanation 1. We create a Symbol object called x. 2. We create a Symbol object called k. 3. We create a symbolic expression called p. 4. We create an Integral object called p_integral. 5. We call the doit() method on the Integral object. GPT-3 response The integral from 1 to 1 of sqrt(1 + xˆ4) dx is equal to 2. GPT-3 evaluation Incorrect Table 16. MIT 18.01, Question 14: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Integrals Original question Sketch the region enclosed by the given curves and calculate its area. y = 4 − x2 , y = 0 Codex input Using numpy, graph the lines y=0 and y=4-x**2. Use fill_between(). Then using Sympy, find the points of intersection between and y=0 and y=4-x**2. Integrate between lines y=4-x**2 and y=0, over the points of intersection to calculate the area of the region. x should go from -5 to 5. Codex code import numpy as np import matplotlib.pyplot as plt import sympy as sp x = np.linspace(-5, 5, 100) y1 = 0*x y2 = 4-x**2 plt.plot(x, y1, ’-b’, label=’y=0’) plt.plot(x, y2, ’-r’, label=’y=4-x**2’) plt.fill_between(x, y1, y2, color=’grey’, alpha=0.5) plt.legend(loc=’best’) plt.show() x, y = sp.symbols(’x y’) f = 4-x**2 g = 0*x sols = sp.solve([f, g], [x, y]) x1 = sols[0][0] y1 = sols[0][1] x2 = sols[1][0] y2 = sols[1][1] area = sp.integrate(f, (x, x1, x2)) + sp.integrate(g, (x, x1, x2)) print(area) Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 19
  • 20. Program solution 32 3 Solution type Manual Codex code explanation 1. We are using numpy to graph the lines y=0 and y=4-x**2. We are using fill_between() to fill in the area between the two lines. 2. We are using Sympy to find the points of intersection between the two lines. 3. We are using Sympy to integrate between the two lines, over the points of intersection, to calculate the area of the region. GPT-3 Response The region is enclosed by the curves y=4-xˆ2, y=0, and x=2. Its area is 16. GPT-3 evaluation Incorrect Table 17. MIT 18.01, Question 15: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Integrals Original question A bacteria population is 4000 at time t = 0 and its rate of growth is 1000 · 2t bacteria per hour after t hours. What is the population after one hour? Codex input A bacteria population is 4000 at time t = 0 and its rate of growth is y = 1000 * 2ˆx bacteria per hour after t hours. Write a function to calculate the population after one hour. Use sympy to integrate the symbol x from 0 to 1. Then add the initial population to the result. Codex code import sympy as sp x = sp.Symbol(’x’) def bacteria_population(x): return 1000 * 2**x print(sp.integrate(bacteria_population(x), (x, 0, 1)) + 4000) Program solution 1000 log 2 + 4000 Solution type Manual Codex code explanation 1. We import sympy as sp. 2. We define x as a symbol. 3. We define a function bacteria_population(x) that returns the population of bacteria after x hours. 4. We print the result of integrating the function bacteria_population(x) from 0 to 1 and adding the initial population of 4000. GPT-3 response 5000 GPT-3 evaluation Incorrect Table 18. MIT 18.01, Question 16: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Applications of Integration Original question Graph the region between the curves and use your calculator to compute the area correct to five decimal places. y = tan2 (x), y = √ x Codex input Graph the region between the curves y = tanˆ2(x) and y = sqrt(x) to find the two points where they intersect. Then, write a program to compute the area between the curves where they cross. Use sympy. 20 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 21. Codex code import sympy import math x = sympy.symbols(’x’) y1 = sympy.tan(x)**2 y2 = sympy.sqrt(x) # find the intersection points intersection_points = sympy.solve(y1 - y2) # find the area between the two curves area = sympy.integrate(y1 - y2, (x, intersection_points[0], intersection_points[1])) print(area) Program solution 0.25142 Solution type Manual Codex code explanation 1. We import sympy and math. 2. We create a variable x and set it equal to sympy.symbols(’x’). This is how we tell sympy that we want to use x as a variable. 3. We create two variables, y1 and y2, and set them equal to sympy.tan(x)**2 and sympy.sqrt(x), respec- tively. 4. We create a variable called intersection_points and set it equal to sympy.solve(y1 - y2). This is how we tell sympy to find the intersection points of the two curves. 5. We create a variable called area and set it equal to sympy.integrate(y1 - y2, (x, intersection_points[0], intersection_points[1])). This is how we tell sympy to find the area between the two curves. 6. We print the area. GPT-3 response The region between the curves is shaded in the graph. The area is 1.04719. GPT-3 evaluation Incorrect Table 19. MIT 18.01, Question 17: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Applications of Integration Original question A force of 10 lb is required to hold a spring stretched 4 in. beyond its natural length. How much work is done in stretching it from its natural length to 6 in. beyond its natural length? Codex input A force of 10 lb is required to hold a spring stretched delta_x = 1/3 foot. Write a function to calculate the work done in stretching it to delta_x = 1/2 foot. Use Hooke’s Law F = k*delta_x where delta_x is a sympy.symbols to find the spring constant k = 10/(1/3) and integrate F with respect to delta_x from 0 to the stretched ratio 1/2. Use sympy for the integral. Then plug in for delta_x = 1/2. Codex code import sympy def work_done(delta_x): k = 10/(1/3) F = k*delta_x return sympy.integrate(F, (delta_x, 0, 1/2)) print(work_done(sympy.symbols(’delta_x’))) Program solution 3.750 Solution type Manual Codex code explanation 1. We define a function called work_done that takes a variable delta_x as an argument. 2. We define the spring constant k = 10/(1/3) 3. We define the force F = k*delta_x 4. We return the integral of F with respect to delta_x from 0 to 1/2. 5. We print the result of calling work_done with sympy.symbols(’delta_x’) as the argument. GPT-3 Response The work done in stretching the spring from its natural length to 6 in. beyond its natural length is 60 lb-in. GPT-3 evaluation Incorrect Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 21
  • 22. Table 20. MIT 18.01, Question 18: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Techniques of Integration Original question Evaluate the integral. R 0.5 0 x √ 1 − 4x2 dx Codex input Using sympy evaluate the integral. 0 to 0.5 of x*sqrt(1-4xˆ2) dx. Codex code from sympy import * x = Symbol(’x’) integral = integrate(x*sqrt(1-4*x**2), (x, 0, 0.5)) print(integral) Program solution 1 12 Solution type Automatic Zero-Shot Codex code explanation 1. We import the sympy module. 2. We create a symbol x. 3. We create an integral object. 4. We print the integral object. GPT-3 response The integral is 0. GPT-3 evaluation Incorrect Table 21. MIT 18.01, Question 19: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Techniques of Integration Original question Find the area of the region under the given curve from 1 to 2. y = (x2 + 1)/(3x − x2 ) Codex input Find the area of the region under the given curve from 1 to 2. y = (xˆ2+1)/(3x-xˆ2) 22 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 23. Codex code import numpy as np import matplotlib.pyplot as plt def f(x): return (x**2+1)/(3*x-x**2) def trapezoidal(f, a, b, n): h = (b-a)/n s = 0.5*(f(a) + f(b)) for i in range(1, n): s = s + f(a + i*h) return h*s def simpson(f, a, b, n): h = (b-a)/n s = f(a) + f(b) for i in range(1, n, 2): s = s + 4*f(a + i*h) for i in range(2, n-1, 2): s = s + 2*f(a + i*h) return h*s/3 def gauss(f, a, b, n): x, A = np.polynomial.legendre.leggauss(n) return 0.5*(b-a)*np.dot(A, f(0.5*(b-a)*x + 0.5*(b+a))) a = 1 b = 2 n = 10 print(Trapezoidal:, trapezoidal(f, a, b, n)) print(Simpson:, simpson(f, a, b, n)) print(Gauss:, gauss(f, a, b, n)) x = np.linspace(a, b, 100) y = f(x) plt.plot(x, y) plt.show() Program solution 1.5415321620760492 Solution type Automatic Zero-Shot Codex code explanation 1. Define the function f(x) = (xˆ2+1)/(3x-xˆ2) 2. Define the trapezoidal rule for numerical integration 3. Define the Simpson’s rule for numerical integration 4. Define the Gauss-Legendre quadrature rule for numerical integration 5. Define the limits of integration 6. Define the number of intervals 7. Print the results of the three methods 8. Plot the function GPT-3 response The area of the region under the given curve from 1 to 2 is approximately 1.4. GPT-3 evaluation Incorrect Table 22. MIT 18.01, Question 20: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Techniques of Integration Original question Evaluate the integral. ln(1 + x2 ) Codex input Evaluate the integral of ln(1+x**2). Cast to a string and add + C. Codex code from sympy import * x = Symbol(’x’) print(str(integrate(ln(1+x**2), x)) + + C) Program solution x log(x2 + 1) − 2x + 2 arctan x + C Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 23
  • 24. Solution type Automatic Zero-Shot Codex code explanation 1. We import the sympy module. 2. We create a symbol x. 3. We create a variable integral and assign it the value of the integral of ln(1+x**2) with respect to x. 4. We print the value of the integral. GPT-3 response The integral of ln(1+x**2) is x*ln(1+x**2)-x GPT-3 evaluation Incorrect Table 23. MIT 18.01, Question 21: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Parametric Equations and Polar Coordinates Original question Graph the curve x = y − 2 sin(πy). Codex input Graph the curve x = y - 2sin(pi*y) by sampling points and swapping the (x, y) pair locations. Codex code import numpy as np import matplotlib.pyplot as plt def f(y): return y - 2*np.sin(np.pi*y) y = np.linspace(-2, 2, 100) x = f(y) plt.plot(x, y) plt.xlabel(’x’) plt.ylabel(’y’) plt.title(’x = y - 2sin(pi*y)’) plt.show() Program solution Solution type Automatic Zero-Shot Codex code explanation 1. Define the function f(y) = y - 2sin(pi*y). 2. Create an array of 100 evenly spaced numbers between -2 and 2. 3. Create an array of the values of f(y) for each y in the array. 4. Plot the curve x = f(y). 24 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 25. GPT-3 response A curve is defined by a set of parametric equations if the coordinates of each point on the curve are given by specific functions of a variable. The variable is called a parameter. The parametric equations for a curve are usually written as x = f(t) and y = g(t). The parameter t is usually restricted to a certain interval, such as 0 ≤ t ≤ 1. The graph of the parametric equations x = f(t) and y = g(t) is the set of points (f(t), g(t)) for all values of t in the interval. Example 1: Find the graph of the parametric equations x = t2 and y = t3. Solution: The graph of the parametric equations x = t2 and y = t3 is the set of points (t2, t3) for all values of t. GPT-3 evaluation Incorrect Table 24. MIT 18.01, Question 22: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Parametric Equations and Polar Coordinates Original question Find the distance between the points with polar coordinates (4, 4π/3) and (6, 5π/3). Codex input Using sympy find the distance between the points with polar coordinates (4, 4pi/3) and (6, 5pi/3). Codex code from sympy import * r1, r2, theta1, theta2 = symbols(’r1 r2 theta1 theta2’) x1 = r1 * cos(theta1) y1 = r1 * sin(theta1) x2 = r2 * cos(theta2) y2 = r2 * sin(theta2) d = sqrt((x2 - x1)**2 + (y2 - y1)**2) print(d.subs({r1: 4, theta1: 4*pi/3, r2: 6, theta2: 5*pi/3})) Program solution 2 ∗ sqrt(7) Solution type Automatic Zero-Shot Codex code explanation 1. We define the variables r1, r2, theta1, theta2 as symbols. 2. We define x1, y1, x2, y2 as the cartesian coordinates of the points. 3. We define d as the distance between the points. 4. We substitute the values of r1, theta1, r2, theta2 and print the result. GPT-3 response The distance between the points is 2. GPT-3 evaluation Incorrect Table 25. MIT 18.01, Question 23: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Infinite Sequences and Series Original question Determine whether the series is convergent or divergent. P∞ n=1 1/n √ 2 Codex input Using Sympy, calculate the value sum_n=1ˆinf 1/nˆsqrt(2) if it converges. Check if it converges. Codex code from sympy import Sum, Symbol, oo, pprint, init_printing def test_convergence(n): x = Symbol(’x’) s = Sum(1/x**(2**0.5), (x, 1, n)) pprint(s) pprint(s.doit()) if __name__ == ’__main__’: init_printing(order=’rev-lex’) test_convergence(10) test_convergence(oo) NOTE: ∞ in SymPy is oo. This is because oo looks like ∞. Program solution P∞ n=1 n− √ 2 ; True Solution type Automatic Zero-Shot Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 25
  • 26. Codex code explanation 1. We import the Sum function from sympy. 2. We create a function that takes in a number n and creates a Sum object. 3. We print the Sum object and the result of the summation. 4. We call the function with n = 10 and n = oo. GPT-3 response The series is divergent. GPT-3 evaluation Incorrect Table 26. MIT 18.01, Question 24: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Infinite Sequences and Series Original question Find the sum of the series. P∞ n=1 (−1)n−1 ∗ (3n /(n · 5n )) Codex input Find the sum of the series by running a large number of iterations. sum_{n=1}ˆ{inf} (-1)ˆ{n-1}*(3ˆn/(n*5ˆn)). Return 0 if n 1. Codex code def sum_series(n): Find the sum of the series. sum n to infinity (-1)^{n-1}*(3^n/(n*5^n)) sum = 0 for i in range(1, n+1): sum += ((-1)**(i-1))*(3**i)/(i*(5**i)) return sum def main(): Find the sum of the series. sum n to infinity (-1)^{n-1}*(3^n/(n*5^n)) n = int(input()) print(sum_series(n)) if __name__ == ’__main__’: main() Program solution 0.47000362924573563 Solution type Automatic Zero-Shot Codex code explanation 1. We are taking input from the user. 2. We are calling the function sum_series() with the input as the argument. 3. The function sum_series() is returning the sum of the series. 4. We are printing the sum of the series. GPT-3 response The sum of the series is -1/4. GPT-3 evaluation Incorrect Table 27. MIT 18.01, Question 25: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Further Applications of Integration Original question If revenue flows into a company at a rate of f(t) = 9000*sqrt(1+2t), where t is measured in years and f(t) is measured in dollars per year, find the total revenue obtained in the first four years. Codex input Using sympy if revenue flows into a company at a rate of f(t) = 9000*sqrt(1+2t), where t is measured in years and f(t) is measured in dollars per year, find the total revenue obtained in the first four years. Codex code from sympy import * t = symbols(’t’) f = 9000*sqrt(1+2*t) print(integrate(f, (t, 0, 4))) Program solution 78000 Solution type Automatic Zero-Shot 26 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 27. Codex code explanation 1. We import the sympy library. 2. We define the variable t as a symbol. 3. We define the function f(t) = 9000*sqrt(1+2t). 4. We integrate f(t) from t=0 to t=4. 5. We print the result. GPT-3 response The total revenue obtained in the first four years is f(4) = 9000*sqrt(1+2*4) = 9000*sqrt(9) = 9000*3 = 27000. GPT-3 evaluation Incorrect Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 27
  • 28. B. Solution of MIT 18.02: Multivariable Calculus Prereq Calculus I (GIR) Units 5-0-7 Syllabus Calculus of several variables. Vector algebra in 3-space, determinants, matrices. Vector-valued functions of one variable, space motion. Scalar functions of several variables: partial differentiation, gradient, optimization techniques. Double integrals and line integrals in the plane; exact differentials and conservative fields; Green’s theorem and applications, triple integrals, line and surface integrals in space, Divergence theorem, Stokes’ theorem; applications. Textbook Edwards, C.; Penney, David Multivariable Calculus(6th ed.) Pearson Education, ISBN 9780130339676. Table 28. MIT 18.02, Question 1: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Polar Coordinates Original question Sketch the graph of the polar equation. Indicate any symmetries around either coordinate axis or the origin. r = (2 + 7 sin(θ))(cos(5θ)) Codex input Write a function using matplotlib to plot the graph of the polar equation r = (2 + 7*sin(theta))*(cos(5*theta)) Codex code import numpy as np import matplotlib.pyplot as plt def polar_eq(theta): return (2 + 7*np.sin(theta))*(np.cos(5*theta)) theta = np.linspace(0, 2*np.pi, 100) plt.polar(theta, polar_eq(theta)) plt.show() Program solution Solution type Automatic Zero-Shot Codex code explanation 1. Importing numpy and matplotlib.pyplot 2. Defining a function that takes theta as an argument 3. Creating a list of 100 evenly spaced values between 0 and 2pi 4. Plotting the polar equation GPT-3 response The graph has symmetry around the origin and the y-axis. GPT-3 evaluation Incorrect Table 29. MIT 18.02, Question 2: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Analytical Geometry and Polar Coordinates Original question Find the center and radius of the circle described in the given equation. x2 + 2x + y2 = 4 Codex input Find the center and radius of the circle described in the given equation. x2 + 2x + y2 = 4. 28 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 29. Codex code import sympy as sp x, y = sp.symbols(’x y’) # Equation of the circle eq = x**2 + 2*x + y**2 - 4 # Solving the equation sol = sp.solve(eq, y) # Printing the solution print(sol) # Finding the center center = sp.solve(eq, x) print(center) # Finding the radius radius = sp.sqrt(center[0]**2 + sol[0]**2) print(radius) Program solution Center: (-1,0); Radius: √ 5 Solution type Automatic Zero-Shot Codex code explanation 1. We import the sympy library and the symbols function. 2. We define the equation we want to find the radius and center of. 3. We create a Circle object from the equation. 4. We print the radius and center of the circle. GPT-3 response The center of the circle is at (1, 0) and the radius is √ 3. GPT-3 evaluation Incorrect Table 30. MIT 18.02, Question 3: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Polar Coordinates Original question Sketch the graph of the polar equation. Indicate any symmetries around either coordinate axis or the origin. r = 3θ (spiral of Archimedes) Codex input Write a function using matplotlib to plot the graph of the polar equation r = 3*theta (spiral of Archimedes). Codex code import numpy as np import matplotlib.pyplot as plt theta = np.arange(0, 2*np.pi, 0.01) r = 3*theta plt.polar(theta, r) plt.show() Program solution Solution type Automatic Zero-Shot Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 29
  • 30. Codex code explanation 1. Importing the numpy and matplotlib.pyplot modules. 2. Creating a range of values for theta from 0 to 2π with an increment of 0.01. 3. Creating a range of values for r from 0 to 30 with an increment of 0.01. 4. Creating a polar plot using the polar() function. 5. Showing the plot using the show() function. GPT-3 response The graph of the polar equation is a spiral of Archimedes. There is no symmetry around either coordinate axis or the origin. GPT-3 evaluation Incorrect Table 31. MIT 18.02, Question 4: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Area Computations in Polar Coordinates Original question In Problems 7 through 16. find the area bounded by the given curve. r = 2 ∗ cos(θ) Codex input f(theta)=2*cos(theta) Using Sympy, starting from 0, find the first two points that f(theta)=0 and set teh solutions to a and b. Use the solution to Integrate f(theta)**2/2 from a to b. Codex code from sympy import * theta = Symbol(’theta’) f = 2*cos(theta) a = solve(f, theta) print(a) b = integrate(f**2/2, (theta, a[0], a[1])) print(b) Program solution π Solution type Manual Codex code explanation 1. The first line imports the sympy library. 2. The second line defines the function f(theta) as 2*cos(theta). 3. The third line defines the symbol theta. 4. The fourth line uses the solve() function to find the two values of theta that make f(theta)=0. 5. The fifth line uses the integrate() function to find the area under the curve from a to b. GPT-3 response The area is 8π. GPT-3 evaluation Incorrect Table 32. MIT 18.02, Question 5: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Parametric Curves Original question Eliminate the parameter and then sketch the curve. x = et , y = 4e2 t Codex input Plot the parametric curve. x = et , y = 4e2 t Codex code import numpy as np import matplotlib.pyplot as plt t = np.linspace(-2, 2, 100) x = np.exp(t) y = 4 * np.exp(2 * t) plt.plot(x, y) plt.show() 30 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 31. Program solution Solution type Manual Codex code explanation 1. Importing the numpy and matplotlib.pyplot modules. 2. Defining the t array. 3. Defining the x and y arrays. 4. Plotting the curve. GPT-3 response y = 4e2 t GPT-3 evaluation Incorrect Table 33. MIT 18.02, Question 6: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Integral Computations with Parametric Curves Original question Find the arc length of the given curve. x = 2t, y = 2 3 ∗ t 3 2 , 5 ≤ t ≤ 12 Codex input Use sympy to integrate from 5 to 12 to find the arc length of x = 2t, y = 2/3*t**(3/2) Codex code from sympy import * x = 2*t y = 2/3*t**(3/2) dx = diff(x, t) dy = diff(y, t) ds = sqrt(dx**2 + dy**2) print(integrate(ds, (t, 5, 12))) Program solution 74 3 Solution type Automatic Zero-Shot Codex code explanation 1. Define x and y as functions of t 2. Define dx and dy as the derivatives of x and y with respect to t 3. Define ds as the magnitude of dx and dy 4. Integrate ds from 5 to 12 GPT-3 response The arc length of the given curve is approximately 33.4. GPT-3 evaluation Incorrect Table 34. MIT 18.02, Question 7: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Parabola Original question Sketch the parabola with the given equation. Show and label its vertex, focus, axis, and directrix. y2 = 12x Codex input Use least squares to find a second order polynomial passing through points (2. 3), (4, 3), and (6,-5). Use polynomial matrix. Then use python to print(f’c[0]+c[1]x+c[2]x**2’) Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 31
  • 32. Codex code import numpy as np x = np.array([2, 4, 6]) y = np.array([3, 3, -5]) A = np.array([[1, x[0], x[0]**2], [1, x[1], x[1]**2], [1, x[2], x[2]**2]]) b = np.array([y[0], y[1], y[2]]) c = np.linalg.solve(A, b) print(f’{c[0]}+{c[1]}x+{c[2]}x**2’) Program solution −5.0 + 6.0x + −x2 Solution type Manual Codex code explanation 1. Create a numpy array of x values 2. Create a numpy array of y values 3. Create a numpy array of ones with the same length as x and y 4. Create a numpy array of x values squared 5. Create a numpy array of x values cubed 6. Create a numpy array of the values of the matrix A 7. Create a numpy array of the values of the vector b 8. Solve the system of equations 9. Print the values of the coefficients GPT-3 response The parabola has a vertex at (0,0), a focus at (0,6), an axis at y=0, and a directrix at y=-6. GPT-3 evaluation Incorrect Table 35. MIT 18.02, Question 8: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Ellipse Original question Find an equation of the ellipse specified. Foci (-2, 2) and (4, 2), eccentricity 1/3 Codex input Write a function using sympy to find the equation of the ellipse with foci at (-2, 2) and (4, 2). The eccentricity of the ellipse is 1/3. Codex code import sympy as sp x, y = sp.symbols(’x y’) foci_1 = sp.Point(-2, 2) foci_2 = sp.Point(4, 2) eccentricity = sp.Rational(1, 3) ellipse = sp.Ellipse(sp.Point(0, 0), sp.sqrt(5), sp.sqrt(5 * (1 - eccentricity ** 2))) print(ellipse.equation(x, y)) Program solution (x−1)2 81 + (y−2)2 72 = 1 Solution type Automatic Zero-Shot Codex code explanation 1. Find the center of the ellipse 2. Find the distance between the foci 3. Find the length of the major axis 4. Find the length of the minor axis 5. Find the equation of the ellipse GPT-3 response (x − 2)2 /9 + (y − 2)2 /4 = 1 GPT-3 evaluation Incorrect 32 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 33. Table 36. MIT 18.02, Question 9: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Functions of Several Variables Original question Describe the graph of the function f.f(x, y) = 10 − 2 p x2 + y2 Codex input Plot the 3-D graph of the function f(x, y) = 10 − 2 p x2 + y2 Codex code import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D fig = plt.figure() ax = fig.add_subplot(111, projection=’3d’) x = np.arange(-10, 10, 0.1) y = np.arange(-10, 10, 0.1) X, Y = np.meshgrid(x, y) Z = 10 - np.sqrt(X**2 + Y**2) ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=’rainbow’) plt.show() Program solution Solution type Automatic Zero-Shot Codex code explanation 1. Create a figure object 2. Add a subplot to the figure object 3. Create a meshgrid of x and y values 4. Calculate the z values for the meshgrid 5. Plot the graph GPT-3 response The graph of the function f is a parabola. GPT-3 evaluation Incorrect Table 37. MIT 18.02, Question 10: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Infinite Series and Convergence Original question Determine whether the given infinite series converges or diverges. If it converges, find its sum. 1 + 3 + 5 + 7 + ... + (2n-1) + .... Codex input Using sympy, determine if the sequence sn = 2n − 1 diverges by showing the value of the limit as n goes to infinity. If the limit is infinity, then print diverges. Otherwise print converges. Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 33
  • 34. Codex code from sympy import limit, oo def sequence_diverges(sequence): Determine if the sequence s_n = 2n-1 diverges by showing the value of the limit as n goes to infinity. If the limit is infinity, then print diverges. Otherwise print converges. limit_value = limit(sequence, n, oo) if limit_value == oo: print(diverges) else: print(converges) sequence_diverges(2*n-1) Program solution Diverges Solution type Manual Codex code explanation 1. The limit function takes a sequence and a value of n and returns the value of the limit as n goes to infinity. 2. The limit function returns oo if the sequence diverges. 3. If the limit is oo, then the sequence diverges. 4. If the limit is not oo, then the sequence converges. GPT-3 response The series diverges. GPT-3 evaluation Incorrect Table 38. MIT 18.02, Question 11: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Taylor Series and Taylor Polynomials Original question Find the Taylor polynomial with remainder by using the given values of a and n. f(x) = ex ; a = 1, n = 4 Codex input Using Sympy, find the taylor series of f(x) = ex where a = 1 and n = 4 Codex code import sympy as sp x = sp.Symbol(’x’) f = sp.exp(x) print(sp.series(f, x, x0=1, n=4)) Program solution E + E(x − 1) + E(x−1)2 2 + E(x−1)3 6 + O((x − 1)4 , (x, 1)) Solution type Automatic Zero Shot Codex code explanation 1. We define the function f(x) = ex 2. We define the function taylor(function, a, n) which takes in a function, a value a, and a value n. 3. We define a variable i and set it to 0. 4. We define a variable p and set it to 0. 5. We create a while loop that runs while i is less than or equal to n. 6. We set p equal to p plus the derivative of the function with respect to x, evaluated at x=a, divided by the factorial of i, times (x − a)i . 7. We increment i by 1. 8. We return p. 9. We print the value of taylor(f(x), 1, 4). GPT-3 response T4(x) = 1 + x + x2 /2! + x3 /3! + x4 /4! + R4(x) GPT-3 evaluation Incorrect Table 39. MIT 18.02, Question 12: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Taylor Series and Taylor Polynomials Original question Find the Maclaurin series of the given function f by substitution. f(x) = sin(2x) Codex input Use Sympy to find the maclaurin series of f(x) = sin(2x) 34 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 35. Codex code from sympy import * x = symbols(’x’) f = sin(2*x) print(f.series(x, 0, 10)) Program solution 2x − 4x3 3 + 4x5 15 + 8x7 315 + · · · Solution type Automatic Zero-Shot Codex code explanation 1. Define the variable x as a symbol 2. Define the function f(x) = sin(2x) 3. Print the expression f(x) 4. Print the series expansion of f(x) about x=0 up to 10 terms GPT-3 response sin(2x) = 2 sin(x) cos(x) sin(2x) = 2(x − x3 /3! + x5 /5! − ...)(cos(x) + x2 /2! − x4 /4! + ...) sin(2x) = 2x cos(x) − 2x3 /3! cos(x) + 2x5 /5! cos(x) − ... + 2x3 /3! − 2x5 /5! + ... sin(2x=2x cos(x) − 2x3 /3!(cos(x) + x2 /2!) + 2x5 /5!(cos(x) + x2 /2! − x4 /4!) − ... GPT-3 evaluation Incorrect Table 40. MIT 18.02, Question 13: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Integral Test Original question Use the integral test to test the given series for convergence. P∞ 1 1/(2n ). Codex input Using Sympy, integrate P∞ 1 1/(2n ) from 0 to infinity. if it is finite print(’converges’) and if it is infinite print(’diverges’) Codex code from sympy import * x = Symbol(’x’) def f(x): return 1/(2**x) def integrate_f(x): return integrate(f(x), (x, 0, oo)) print(integrate_f(x)) if integrate_f(x).is_infinite: print(’diverges’) else: print(’converges’) Program solution Converges Solution type Automatic Zero-Shot Codex code explanation 1. We import sympy as sp. 2. We create a symbol n. 3. We integrate 1/(2**n) from n=1 to infinity. GPT-3 response P∞ 1 1/(2n ) converges because R ∞ 1 1/(2x )dx converges. GPT-3 evaluation Correct Table 41. MIT 18.02, Question 14: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Comparison Tests for Positive-Term Series Original question Use comparison tests to determine whether the infinite series converge or diverge. P∞ 1 1/(n2 +n+1). Codex input Write a function using sympy to determine if the infinite series P∞ 1 1/(n2 + n + 1) converges. Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 35
  • 36. Codex code from sympy import Sum, Symbol, oo, limit, init_printing init_printing() n = Symbol(’n’) s = Sum(1/(n**2 + n + 1), (n, 1, oo)) limit(s, n, oo) The series converges. Program solution Converges. Solution type Automatic Zero-Shot Codex code explanation 1. We create a symbol n to represent the variable in the summation. 2. We create a summation of 1/(nˆ2 + n + 1) from n = 1 to infinity. 3. We use the limit function to determine if the summation converges. GPT-3 response P∞ 1 1/(n2 + n + 1) diverges. GPT-3 evaluation Incorrect Table 42. MIT 18.02, Question 15: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Alternating Series and Absolute Convergence Original question Determine whether the alternating series converge or diverge. P∞ 1 ((−1)( n + 1))/n2 Codex input Write a function using sympy to determine if the alternating series P∞ 1 ((−1)( n + 1))/n2 converges or diverges. Codex code from sympy import Sum, Symbol, oo, sympify, simplify, limit, init_printing def alternating_series(n): This function takes a number n and determines if the alternating series converges or diverges. # Initialize sympy printing init_printing() # Create a symbol for the summation n = Symbol(’n’) # Create the summation s = Sum((-1)**(n+1)/n**2, (n, 1, oo)) # Simplify the summation s = simplify(s) # Determine if the series converges or diverges if s.is_infinite: print(The series diverges.) else: print(The series converges.) # Determine the limit of the series lim = limit(s, n, oo) # Print the limit print(The series converges to: + str(lim)) if __name__ == __main__: alternating_series(10) Program solution Converges Solution type Automatic Few-Shot (1) 36 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.
  • 37. Codex code explanation 1. The first line imports the Sum function from the sympy module. 2. The second line imports the Symbol function from the sympy module. 3. The third line imports the oo function from the sympy module. 4. The fourth line imports the limit function from the sympy module. 5. The fifth line imports the init_printing function from the sympy module. 6. The sixth line creates a variable called n and assigns it the value of the Symbol function. 7. The seventh line creates a variable called s and assigns it the value of the Sum function. 8. The eighth line prints the limit of the variable s as n approaches infinity. GPT-3 response The series converges. GPT-3 evaluation Incorrect Table 43. MIT 18.02, Question 16: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Power Series Computations Original question Use power series to approximate the values of the given integrals accurate to four decimal places. R 1 0 sin(x)/xdx. Codex input Write a function using scipy to compute the values of the given integral R 1 0 sin(x)/xdx. Codex code from scipy.integrate import quad def f(x): return np.sin(x)/x I = quad(f, 0, 1) print(I) Program solution 0.9461 Solution type Automatic Zero-Shot Codex code explanation 1. Importing the integrate module from scipy 2. Defining a function f(x) 3. Calling the integrate.quad function, which performs the integral 4. Printing the result GPT-3 response R 1 0 sin(x)/xdx ≈ P∞ n=0 (−1)n x2n+1 (2n+1)! R 1 0 sin(x)/xdx ≈ P3 n=0 (−1)n x2n+1 (2n+1)! R 1 0 sin(x)/xdx ≈ x 1! − x3 3! + x5 5! − x7 7! R 1 0 sin(x)/xdx ≈ 1 1 − 1 6 + 1 120 − 1 5040 R 1 0 sin(x)/xdx ≈ 0.8415 GPT-3 evaluation Incorrect Table 44. MIT 18.02, Question 17: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Cross Product of Two Vectors Original question Find a x b. a = 5,-1,-2, b = -3,2,4 Codex input Find the vector product of a x b. a = 5,-1,-2, b = -3,2,4 Codex code a = [5,-1,-2] b = [-3,2,4] def vector_product(a,b): return [a[1]*b[2]-a[2]*b[1], a[2]*b[0]-a[0]*b[2], a[0]*b[1]-a[1]*b[0]] print(vector_product(a,b)) Program solution h0, −14, 7i Solution type Automatic Zero-Shot Codex code explanation 1. The first line creates a list called a. 2. The second line creates a list called b. 3. The third line creates a list called c and stores the result of the vector product of a and b in c. 4. The fourth line prints the result of the vector product of a and b. GPT-3 response 5,-1,-2 * -3,2,4 = -14 Drori et al. PNAS | April 7, 2022 | vol. XXX | no. XX | 37
  • 38. GPT-3 evaluation Incorrect Table 45. MIT 18.02, Question 18: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Curves and Motion in Space Original question Show that the graph of the curve with parametric equations x = t ∗ (sin(6t)), y = t ∗ (cos(6t)), z = t lies on the cone z = 2 p x2 + y2 with its vertex at the origin and opening upward. Codex input Plot the graph of the curve with parametric equations: x = t*(sin6t). y = t*(cos6t), z = t. Then plot the cone z = p x2 + y2 with its vertex at the origin and opening upward. Codex code import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D t = np.linspace(-10, 10, 100) x = t*(np.sin(6*t)) y = t*(np.cos(6*t)) z = t fig = plt.figure() ax = fig.add_subplot(111, projection=’3d’) ax.plot(x, y, z) r = np.sqrt(x**2 + y**2) ax.plot(x, y, r, ’r’) plt.show() Program solution Solution type Automatic Few-Shot (5) Codex code explanation 1. Generate 100 points between -10 and 10 using the linspace function from numpy. 2. Calculate x, y, and z using the parametric equations. 3. Plot the graph of the curve with parametric equations. 4. Plot the cone z = 2 p x2 + y2 with its vertex at the origin and opening upward. GPT-3 response The graph of the curve with parametric equations x = t*(sin6t). y = t*(cos6t), z = t lies on the cone z = 2 p x2 + y2 with its vertex at the origin and opening upward. GPT-3 evaluation Incorrect Table 46. MIT 18.02, Question 19: Original question, Codex input and code and code explanation, Program solution, Solution type, GPT-3 response and evaluation Topic Double Integrals Original question Evaluate the iterated integral. R 2 0 R 4 0 (3x + 4y)dxdy Codex input Write a function using scipy.integrate.dblquad to compute the value of the double integral: R 2 0 R 4 0 (3x + 4y)dxdy. 38 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Drori et al.