6 Multiple-Choice Questions In Programming Courses Can We Use Them And Are Students Motivated By Them

6
Multiple-Choice Questions in Programming Courses: Can
We Use Them and Are Students Motivated by Them?
PEDRO HENRIQUES ABREU, Faculty of Sciences and Technology of the University of Coimbra
DANIEL CASTRO SILVA, Faculty of Engineering of the University of Porto
ANABELA GOMES, Coimbra Institute of Engineering
Low performance of nontechnical engineering students in programming courses is a problem that remains
unsolved. Over the years, many authors have tried to identify the multiple causes for that failure, but there
is unanimity on the fact that motivation is a key factor for the acquisition of knowledge by students.
To better understand motivation, a new evaluation strategy has been adopted in a second programming
course of a nontechnical degree, consisting of 91 students. The goals of the study were to identify if those
students felt more motivated to answer multiple-choice questions in comparison to development questions,
and what type of question better allows for testing student knowledge acquisition.
Possibilities around the motivational qualities of multiple-choice questions in programming courses will
be discussed in light of the results.
In conclusion, it seems clear that student performance varies according to the type of question. Our study
points out that multiple-choice questions can be seen as a motivational factor for engineering students and
it might also be a good way to test acquired programming concepts. Therefore, this type of question could
be further explored in the evaluation points.
CCS Concepts: • Social and professional topics → Computer science education; Student assessment;
Additional Key Words and Phrases: Student’s motivation, evaluation methodologies, pedagogical issues, pro-
gramming and programming languages
ACM Reference format:
Pedro Henriques Abreu, Daniel Castro Silva, and Anabela Gomes. 2018. Multiple-Choice Questions in Pro-
gramming Courses: Can We Use Them and Are Students Motivated by Them? ACM Trans. Comput. Educ. 19,
14, Article 6 (November 2018), 16 pages.
https://doi.org/10.1145/3243137
1 INTRODUCTION
The difficulties of the learning process of students in a programming course have been debated
for several years, not only in engineering courses [12] but also in other areas [20]. Nevertheless,
Authors’ addresses: P. H. Abreu, Department of Informatics Engineering, Faculty of Sciences and Technology of the
University of Coimbra/Centre for Informatics and Systems, Pólo II, Pinhal de Marrocos, 3030-290, Coimbra, Portugal;
email: pha@dei.uc.pt; D. C. Silva, Department of Informatics Engineering, Faculty of Engineering of the University of
Porto/Artificial Intelligence and Computer Science Laboratory, Rua Dr. Roberto Frias s/n 4200-465 Porto, Portugal; email:
dcs@fe.up.pt; A. Gomes, Department of Informatics Engineering and Systems, Coimbra Institute of Engineering/Centre
for Informatics and Systems, Rua Pedro Nunes, Quinta da Nora, 3030-199 Coimbra, Portugal; email: anabela@isec.pt.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2018 Association for Computing Machinery.
1946-6226/2018/11-ART6 $15.00
https://doi.org/10.1145/3243137
ACM Transactions on Computing Education, Vol. 19, No. 14, Article 6. Publication date: November 2018.

6:2 P. H. Abreu et al.
the low performance achieved by students still remains an important issue. Multiple factors can
contribute to these results, with the most cited in the literature being the following [8, 12, 30]:
(1) Teaching methods that do not attend to different student profiles and are often so tied
to the programming language that the main goal of helping students solve problems is
sometimes forgotten
(2) Study methods that are not always effective and also the effort made by the students can
be considerably low
(3) The students’ abilities and attitudes that are related to the lack of knowledge, espe-
cially in vital areas such as math and logic
(4) The nature of programming that encompasses the complexity of the syntax, which
leads sometimes to the qualitative effect of scale—the code cannot be tested before the
team, program, or problem size achieves a sufficient size (more critical in object oriented
programming)
(5) Psychological effects especially related to motivation of the students. On this particular
point, Guzdial [13] claims that all the listed problems can be easily solved if the students
recognize value in the course. Students will learn, regardless of the learning curve, when
they recognize value in that subject.
In the higher education context, different levels of programming courses can be divided as fol-
lows: introductory—especially related to students up to the second year and still with a small
number of programming courses; intermediate—for students in the third year; and advanced—for
master’s or PhD students in an engineering course. For each of the levels, many different strate-
gies can be used. For the introductory one (which will be the subject of analysis in this article),
the majority of approaches are based on games with the goal of promoting interaction between
students and consequently their motivation along the programming learning concepts [9, 24]. In
the other levels, new concepts can emerge based on programming paradigms [1, 39] and the influ-
ence of those paradigms on students’ performance [22]. Also, there are other approaches that can
be considered transversal to all these levels. Wang et al. [50] presented an approach that combines
an online programming platform with real-time feedback with a practical project that stimulates
programming knowledge acquisition.
Following the classification used by Merrienboer [48], there are two kinds of strategies—the
generation strategy of programming learning that consists of the creation of a program code from
scratch, and the completion strategy that consists, for instance, of the modification or completion of
a program code. Within the second type, multiple-choice questions (MCQs) have been disregarded
when compared to other kinds of exercises in the engineering context. A multiple-choice question
consists of two parts: a stem and several options or alternatives. The stem usually asks a question
or makes a statement. The student has to identify either the correct or best option to go with the
stem. The incorrect options are called distractors and the correct option is often referred to as
the key. In contrast, a development question (DQ) is a question where the student must produce
some code that solves a specific problem using a programming language. In our context, a possible
hypothesis that can emerge is if multiple-choice questions are suitable for assessing knowledge in
programming courses and motivating for students from a nontechnical degree. Studies found in
the literature (see Section 2) have approached some of these issues in different contexts but focus
on courses within the same area of knowledge as that of the students (for instance, a programming
course in the context of a computer science degree). We consider that the last part of the hypothesis
(motivation related to MCQ assessment) is encompassed in the fifth factor, “psychological effects,”
mentioned above. The hypothesis can then be materialized in two research questions:

Multiple-Choice Questions in Programming Courses 6:3
(1) Are multiple-choice questions capable of accurately assessing students’ acquired program-
ming knowledge?
(2) Do multiple-choice questions motivate programming students more than other question
types?
To answer these questions, and in the context of an Industrial and Management Engineering
bachelor degree, a multiple-choice question strategy encompassed in the evaluation points was
used to try to promote the motivation of the students for the learning subject. In brief, the two
main goals of this research work can be specified as follows:
(1) To evaluate the acquisition of programming concepts by students, comparing the perfor-
mance obtained in each type of question (MCQ and DQ)
(2) To evaluate students’ motivation related to the different types of questions (MCQ and DQ)
To achieve that, a Python programming course in the second semester of the first year of the
bachelor degree in Industrial and Management Engineering was used as a case study, and all the
evaluation points (mini-tests and exams) were formulated according to the recommendations pro-
posed by Simon et al. [44].
Also, to avoid the loss of structured thinking by the students, we have created evaluation points
where 50% of the questions are multiple-choice questions and the other 50% are development ques-
tions, following the recommendations of Rǎduţǎ [38]. A total of 91 students were enrolled in the
study, with five evaluation points being used. Multiple-choice questions proved to be a good strat-
egy to promote student motivation in evaluation points in the programming course; however, the
performance of students seems to be only slightly better than their performance in equivalent
classical development questions.
To the best of our knowledge, this is the first time that this strategy was used to measure the
acquisition of concepts and students’ motivation in a programming course.
The remainder of this article is structured as follows: Section 2 presents a review of related work.
Section 3 presents a brief description of the course used to test these hypotheses. In Section 4, the
results are presented and discussed. Finally, in Section 5, conclusions about the produced work
and some future lines are drawn.
2 LITERATURE REVIEW
There are many reasons instructors like multiple-choice tests. First of all, they are easy to correct
in relatively short periods of time, which is especially important in mass lecture classes. Another
reason is that these tests can also help control cheating through the creation of multiple versions
of the same test with either different questions or different orderings of the same set of questions
[29]. It is also possible to use more questions than in other formats of tests such as essay tests,
enabling one to ask questions that cover a wider range of material [4]. Another pointed advan-
tage is their perceived objectivity [23, 36, 52]. In [26], the authors also point out multiple reasons
to consider using MCQs. According to the literature, most students also prefer MCQs [21, 52]. If
the majority of both students and teachers prefer MC tests to “traditional” test formats, why are
they not widely used? Generally, there are doubts about whether multiple-choice questions exam-
ine student cognitive understanding of class materials. Most objections center on their inability
to measure other higher thinking skills such as problem solving and all the underlying reason-
ing involved (analytical reasoning, quantitative reasoning, analogical reasoning, and combinatory
reasoning skills).
Over the years, many studies have tried to find a connection between multiple-choice ques-
tions and students’ performance. Kuechler and Simkin [21, 42] have tried to detect whether or

not multiple-choice questions can be used to replace constructed questions in capturing students’
knowledge on a subject. For that, Visual Basic was used as the programming language and to cap-
ture and validate concepts related to Form/GUI construction. One hundred and fifty students from
two consecutive years were enrolled in the experiment, and results show a correlation between
the two types of questions (multiple choice and constructed).
In a programming course, Clark [6] developed a study that tried to predict the performance of
students in the final exam using previous tests composed by different kinds of questions following
Bloom’s cognitive levels [3]. Attending to the achieved results, and in spite of the fact that one
of the conclusions is that application questions encompassing multiple choice are good predictors
for the exam, the authors conclude that a single form of assessment is not sufficient to evaluate all
learning outcomes.
Čisar et al. [49] developed a study to understand students’ knowledge acquisition in an object-
oriented programming course. For that, they defined two groups: a control group was evaluated
using traditional tests—paper and pencil; the second one was evaluated using Computer Adaptive
Tests (CATs). In brief, CATs are tests that relate questions with complexity level using a software
platform. For example, if the student performs well in a question with a medium level of complex-
ity, the platform will propose a new question with a higher complexity level. On the other hand, if
the student fails to answer these questions, the platform can maintain the level of complexity for
the new questions or eventually downgrade the level of complexity.
In this study, 199 students were enrolled and the authors found statistical significance in the
obtained results in the two groups. However, such conclusions required a further validation since
the question structure of the tests used in the two groups was not the same and the medium of
testing the students’ knowledge was also not the same, encompassing psychological and physio-
logical factors. For instance, a student answering a paper-and-pencil test will likely show fatigue
and decreasing motivation and will take longer to answer the test. Also, in theoretical terms, the
student will be more motivated in answering a test through a software application instead of in
the traditional way.
Several authors describe the effective use of MCQs in computer science contexts. In [25], the au-
thors describe alternative approaches to assessing first-year programming. They describe a model
based on a well-known taxonomy of learning stages. They consider that assessment by MCQs is an
effective means of assessing the intermediate levels of programming skills, provided that the MCQs
are consciously designed to do so. The authors also exemplify poorly constructed programming
MCQs.
Lister and colleagues [28] describe a suite of MCQs that can be used to evaluate a student’s
ability to read code, an essential part of the ability to program.
In [51], the authors discuss the major criticism of multiple-choice questions around the idea that
they do not test anything more than just straight recall of facts and examine ways of overcoming
this misconception.
[27] presents a multiple-choice question exam used to test students completing their first semes-
ter of programming. The author presents a detailed analysis on how students performed on the
questions. However, the main intent was to begin a community process of identifying the criteria
to define an effective multiple-choice exam for testing novice programmers.
Simon and colleagues [43] have undertaken exhaustive descriptive studies of the types of ques-
tions used in examinations in programming courses. Other studies propose how MCQs can be
validly used in computing assessment [27, 37, 41, 45].
In [6], the authors report a study intending to investigate the relationship between results on
multiple-choice tests and the final exam. This study has provided insights into the types of ques-
tions that have good discriminators for moderate students whose main goal is to pass and for

stronger students who aspire to higher grades. They state that no single form of assessment is suf-
ficient for all learning outcomes, and multiple-choice tests in particular have their weaknesses and
strengths. However, the authors note that tests in that study were useful in predicting students’
performance on the final exam, showing their importance in summative assessment.
Other authors state that MCQs can support the process of formative assessment and feedback
[15], as described by the “seven principles of good feedback practice” [32]. A related study [31],
which focused on effective e-assessment using MCQs, highlighted the importance of not just the
questions themselves, but the context in which they were deployed within modules in order to
promote the development of self-directed learning. In [41], formative and summative assessments
to evaluate programming learning are discussed. The authors verified that novice programming
ability is usually done through an arrangement of formative and summative assessments, with the
latter typically represented by a final examination. That paper presents the instructor’s perspec-
tives of multiple-choice questions in summative assessment for novice programmers. The authors
analyzed the MCQ with four measures: Syntax Knowledge, Semantic Knowledge, Problem Solving
Skill, and the Level of Difficulty of the problem. They identified some gaps between the instructor’s
perspectives of the questions’ difficulty and what is required to be assessed to determine whether
students have achieved the goals of their course.
In [35], the authors stated that in contexts where formative feedback is desired, standard
multiple-choice questions may lead students to a false sense of confidence as a result of their small
solution space and the temptation to guess. Therefore, these authors propose the adoption of an
uncommon form of MCQ, the multiple-answer multiple-choice question (MAMCQ), to provide
formative feedback to students in programming courses.
The importance and timeliness of the use of MCQ tests is shown in more recent studies such as
those used in the project BRACELET, where multiple-choice and free-text code explaining ques-
tions were put in the same exam, and a comparison of the performances of individual students on
the sets of questions was done [45]. The authors found that students perform substantially and sig-
nificantly better on the multiple-choice questions. The authors called attention to the importance
of MCQs in some students’ learning programming phases, launching several questions for reflec-
tion: when students cannot correctly describe the purpose of a small piece of code, is it because they
do not understand the code, because they understand its detail but are unable to abstract that detail
to determine the purpose, or because they understand the purpose but are unable to express it?
Regarding programming teaching and learning, we agree with Parsons et al. [34] when they
suggest that perhaps we are not using correct methods of assessment; possibly we are teaching
successfully but assessing badly.
More recent studies are considering the possibilities of MCQs. In [15], the authors analyze
the relationship between students’ use of PeerWise, an online tool that facilitates peer learning
through student-generated content in the form of MCQs, and achievement, as measured by their
performance in the end-of-module examinations. The study suggests that engaging with the
production and discussion of student-generated content in the form of MCQs can support student
learning in a way that is not critically dependent on the course, institution, instructor, or student.
In [5], the Concept Inventory (CI), a set of multiple-choice questions, is used to reveal students’
misconceptions related to some topic. Each available choice (besides the correct choice) is a
distractor that is carefully developed to address a specific misunderstanding. In computer science
introductory programming courses, the development of CIs is still beginning, with many topics
requiring further study and analysis. The authors discuss the difficulty of assessing introductory
programming misconceptions independent of the syntax of a language and present a detailed dis-
cussion of two pilot CIs related to parameters: an open-ended question (to help identify new mis-
understandings) and a multiple-choice question using some distractors identified by the authors.

Having performed a comparison between the analyzed studies and the goal established for this
article of determining the impact of MCQs in student motivation, it is important to state that none
of these studies have tried to analyze the students’ motivation regarding multiple-choice ques-
tions. Some studies [40] have tried to analyze student motivation in collaborative environments,
but that again diverges from the problem addressed in this article, since the motivation as well
as student performance must be analyzed at an individual dimension in a programming course.
Regarding the goal of comparing the capability of MCQs to accurately assess student knowledge,
although this has been studied by many authors, to the best of our knowledge none of those
studies has focused on the evaluation of knowledge of an area of expertise different from that of
the student. In this article, we try to extend the conclusions from those studies to a context of
nontechnical students in a programming course.
3 METHODOLOGY
The course used for this study appears in the second semester of the first year of the bachelor
degree in Industrial and Management Engineering, and it follows an introductory programming
course in the first semester. The programming language used in both courses is Python, due to its
simplicity in terms of syntax and the easy way of validating a small piece of code.
There are approximately 100 students per year in this course (see Table 1 for more details), with
an average of 57% of students having successfully attended the programming course in the first
semester.
The course is composed solely of theory-practical classes, used both for presenting the concepts
from a more theoretical point of view, allied with examples, and for solving practical exercises,
putting the theory into practice. The students were made aware that there would be MCQs in
the tests, but the course suffered no other changes in recent years. The exercises solved during
classes have also remained the same, so as to introduce only one new variable in the study—the
presence of MCQs in the evaluations. This was a planned decision so as not to introduce a bias
toward multiple-choice questions that could influence the results, in terms of both total and correct
number of questions answered.
3.1 Student Characterization
The characterization of the class for the year under analysis (2015) is presented in Table 1, as
well as for the two previous years. The number of attending students (those who attend classes
at least until the first mini-test) is usually lower than the number of enrolled students, revealing
percentages of “ghost students” (those who enroll in the course but don’t even appear for the first
mini-test) of around 15% to 30%.
The sample was constituted by 55 male students (60%) and 36 female students (40%), with ages
between 18 and 34, with an average age of 20.75 years. To further characterize the sample, the
number of students in each year who had successfully attended the first course on programming
(Success in Previous Course) is also presented (note that, on average, only approximately 57% of
students successfully concluded that first course), as is the number of students who are enrolled in
the course under analysis for the first time (First Enrollment) (it can be easily seen that a high per-
centage of students had previous enrollments in the course, especially in the year under analysis).
The first programming course is a basic introductory course to informatics in general, including
some basic programming using Python. While it does not constitute an enforced pre-requisite for
the course under study, some of the concepts are revisited in this course, so previous knowledge
may prove to be an asset.

Table 1. Student Characterization
2015 2014 2013
Enrolled Students 91 104 95
Attending Students 79 80 64
Success in Previous Course 55 56 54
First Enrollment 48 77 N/A
3.2 Experimental Setup
The evaluation method in this course was composed of three tests that took place during the
semester (Mini-Test 1, Mini-Test 2, and Mini-Test 3) and an exam (Exam 1 and/or Exam 2).
In order to successfully complete this course, students have two choices:
• Three mini-tests plus Exam 1
• Exam 2
Students must have a minimum grade of 35% in each of the mini-tests to be eligible to go to
Exam 1. Each of the mini-tests weights 10% of the final grade and Exam 1 the remaining 70%. If,
for some reason, students desire to improve the classification obtained in the first method, they
can go to Exam 2, which counts as 100% of the final grade. Also, students who fail to obtain a
minimum grade in any of the mini-tests can go to Exam 2 (despite being excluded from Exam 1).
To construct those evaluation points, some guidelines were followed. For MCQs, several guide-
lines were published over the years [14, 16]. In our study, we followed the CAD Guidelines, which
are constituted by 12 points [33]:
• Make sure the wording of your questions is suitable for the reading level of your students.
• Make your questions concise, clear, and grammatically consistent.
• Write stems that clearly state the topic of the question.
• Only one option should be correct or best (according to experts in the field).
• Avoid negative statements if possible. If you must use a negative, write the negative in
capital letters or underline it. Never use double negatives.
• Provide plausible distractors.
• Don’t give clues to the correct choice.
• Don’t use overlapping alternatives.
• Don’t ask students to express an opinion.
• Never use “all of the above.” Be careful when using “none of the above.”
• Validate your questions after each use.
• Always have your questions checked by a colleague.
At the same time, we tried to follow some of the recommendations of how to write an intro-
ductory programming test [44]: keep the questions as simple as possible; simplify its preambles
as much as possible; ensure that students are familiar with the type of questions; avoid variable
names that are easily confused with one another or with other symbols; include questions with
different levels of difficulties; include some multiple-choice questions; include some code-reading
questions; include questions of different forms.
To better illustrate how these concepts were put into practice, one of the mini-tests is presented
in Appendix A.
Concerning the point of evaluation previously illustrated, only students who attained a mini-
mum average classification of 35% on the mini-tests could attend the first exam, while all students
were able to attend the second exam.

Table 2. Evaluation Topics for Each Question Group (QG)
MT 1 MT 2 MT 3 Exam 1 Exam 2
QG 1 CYC, STR CYC, LST CYC, MTX, LST CYC, MTX CYC, LST
QG 2 CYC, STR CYC, MTX CYC, LST, FLS CYC, LST, FLS CYC, LST, MTX
QG 3 N/A N/A N/A CYC, LST CYC, FLS
CYC - Cycles; STR - String; LST - Lists; MTX - Matrices; FLS - Files; N/A - Not Applicable.
In each mini-test and exam, questions were grouped in pairs, each pair consisting of an MCQ
and a DQ, both evaluating the same concepts. Table 2 shows the organization of the topics under
evaluation for all mini-tests and exams, pointing out the subject under evaluation for each question
group. Mini-tests had only two groups, while exams were composed of three groups. In each mini-
test and exam, the first question group is considered to have a low level of complexity, the second
question group a medium level of complexity, and the third group (only present in exams) a higher
level of complexity.
4 RESULTS
The total number of students who attended the mini-tests and exams is shown in Table 3. As
previously stated, not all the enrolled students attended the first mini-test (MT1)—12 of the 91
enrolled students (13%) are “ghost” students who attended none or just a few classes but never
showed up for evaluation. It is important to note that as the semester progresses, a decreasing
number of students attend the evaluation points (MTs). This is explained by the existence of a
minimum grade for the MTs and the fact that some students did not achieve this grade. Also,
despite the fact that all students could attend the second exam, typically students who obtain
approval in the first exam do not show up for the second exam, and the students who fail to obtain
the minimum grade during the semester appear only in small numbers.
Table 3 also shows the number of students who answered each individual question. One can
easily note that MCQs were answered by all students, while some of the students opted not to
answer one or more of the DQs. In general, this seems to indicate that students might be more
motivated to answer these questions, especially multiple-choice questions, and that even though
they might not feel comfortable in developing a program from scratch, they feel confident enough
to answer questions that focus on the same concepts but without requiring them to code. When
referring to motivation, it is not to be perceived as an intrinsic or extrinsic motivation, but rather
as an achievement motivation (as defined by [17]). By using MCQs, students feel more confident
and motivated in getting better results, which translates into more positive attitudes, with less
stress, ultimately even relaxing students and improving their concentration in the tests. Note that
MTs have only two groups of questions, while exams have three groups, hence the N/A cells.
Table 4 presents the results obtained by students in the mini-tests and exams, with the average
result for each question as well as the standard deviation. Several conclusions can be drawn by
analyzing this table:
• While MCQs don’t always present a better result than the pairing DQ (Exam 2 being a no-
table example of that), on average MCQs attained a score 8.5% better than the corresponding
DQ, which may indicate that students feel more motivated to answer these questions, and
that students actually understood the concepts underlying these questions and can’t as eas-
ily create a program that demonstrates such knowledge. Another possible explanation to
this phenomenon is the non-null chance of students guessing the correct answer in MCQs,
but in this specific course, the teachers always give students a minimum grade in DQs, as

Table 3. Number of Students Who Answered Each Question
Total Number of Students 79 66 44 28 33
QG 1
Multiple Choice 79 66 44 28 33
Development 70 59 39 28 32
QG 2
Multiple Choice 79 66 44 28 33
Development 66 60 38 28 29
QG 3
Multiple Choice N/A N/A N/A 28 33
Development N/A N/A N/A 26 29
Number of students with at
least one unanswered question
22 12 10 2 7
Table 4. Average Results for Each Individual Question (Mean/Standard Deviation)
QG 1
MCQ 29% / 46% 52% / 50% 41% / 49% 57% / 50% 48% / 51%
DQ 39% / 36% 26% / 35% 51% / 35% 46% / 25% 49% / 30%
QG 2
MCQ 38% / 49% 45% / 50% 57% / 50% 46% / 50% 39% / 50%
DQ 21% / 30% 35% / 36% 35% / 31% 48% / 29% 53% / 36%
QG 3
MCQ N/A N/A N/A 79% / 41% 28% / 45%
DQ N/A N/A N/A 16% / 17% 38% / 35%
Table 5. Aggregated Results by Question Type and Year
Development Compiler Concept Multiple Choice
2015 38% N/A N/A 47%
2014 28% N/A 42% N/A
2013 34% 52% 47% N/A
long as they try to solve it with minimal reasoning, which somehow decreases the proba-
bility of this being the cause.
• Results are not always proportional to the complexity of the questions. Some questions with
a higher degree of complexity present better results when compared to a question deemed
to be simpler. One such case is the MCQ in Question Group 3, in Exam 1—the results show
a surprisingly high rate of correct answers, when compared to other questions; however,
the corresponding DQ has a very low average score, which may again be an indication that
the students were actually capable of understanding the underlying concepts but are not
able to use them to create a code that demonstrates such knowledge.
• MCQs present a higher standard deviation when compared to that of DQs. This is to be ex-
pected, given that MCQs have a binary classification of 0 or 100%, while the other questions
have a continuous scale in that same interval.
Table 5 presents the aggregated results by year and question type in all evaluation moments. It
is easy to see that DQs have lower performance rates, when compared to other types of questions
that also evaluate programming concepts, but using another method rather than asking students
to write code. Development questions are those where students are asked to write a piece of code.
Compiler questions are those where students should identify errors in a piece of code and correct

Fig. 1. Scatter graph of student results in DQs and MCQs.
them. Concept questions are more theoretical ones, where students should describe or explain a
concept related to programming. It is important to note that each type of question is not present
in the evaluation year, denoting the remaining years with N/A (for instance, compiler questions
were only present in 2013 exams). The results suffer fluctuations over the years, which is to be
expected from different (yet similar) groups of students. These values also seem to point to the
fact that different question types can be used to assess programming knowledge, and that using
only DQs may be prejudicial for students with less tendency for programming. Also, these results
seem to show that MCQs are in line with other types of questions (namely, compiler and concept)
in terms of general results.
As multiple-choice questions are not suitable for correlation analysis due to their score being
binary in nature, they were grouped together and compared to the results of the development
questions.
Figure 1 shows the scatter graph for 2015 student results in both MCQs and DQs. The values for
each student are obtained averaging the scores for all MCQs and for all DQs he or she answered.
A linear regression was also performed, and the results are shown in the same graph. A significant
regression was found (F(1,78) = 22.42, p < 0.001), with an R2 of 0.223.
These results are very similar to those of [21] regarding the slope and offset of the regression line,
which validates these results in a different reality, using a procedural programming language rather
than a GUI-oriented language, and with students from a single course unrelated to programming
(Industrial and Management Engineering).
The interception with the y-axis at a value of 0.24 seems to show that even students with very
low classifications in the development questions are not only responding to the MCQs but are also
answering correctly. This can also be attributed to the chance of guessing the correct answer in an
MCQ, which is not a factor in DQs. However, and as mentioned above, when grading DQs, teachers
in this course also have a minimum grade for those who tried to answer the questions, which can
decrease the influence of the “luck” factor. It is important to note that in order to obtain this min-
imum grade, students must present an initial but logical and structured thinking of the problem.

5 DISCUSSION
In this article, the authors try to assess the adequacy of multiple-choice questions to evaluate
programming knowledge in the context of a nontechnological degree, in this case Industrial and
Management Engineering.
From the results, one can conclude that question types other than development questions can
be used to assess programming knowledge, and specifically multiple-choice questions may lead to
better results, in terms of both motivation (as seen by the fact that all students respond to these
questions, while several students don’t even try to answer development questions) and perfor-
mance (as these questions have better results than development questions).
An empirical inquiry of the students at the end of the semester was performed trying to evaluate
the type of student motivation based on the four types defined in [17], which are extrinsic, intrin-
sic, social, and achievement motivation. Students stated that MCQs enable them to obtain better
results and increase their own satisfaction regarding programming learning. Also, they pointed
out that with this approach, they expect to obtain better results, not because they can guess the
answers, but because they can better reflect on the subject (with all the answers given), and it can
be easier to examine each option and determine whether it is correct than to work out an answer
from scratch. Consequently, through the different answers, students can think deeper about each
involved concept.
These results fall in line with some studies carried out over the years that attempted to link
evaluation format with student motivational levels. Traub and McRury [46], for example, have
reported that students have more positive attitudes toward multiple-choice tests in comparison to
free-response tests because they think that these tests are easier to prepare for, are easier to take,
reduce stress and test anxiety, and thus will bring in relatively higher scores. Another example
is presented by Clarke et al. [7], who started from the research question “Is there a difference
between student preferences and assessment types?” and confirmed it in the sense that there was
a substantial difference in student preferences and that MCQ tests were, on average, the most
preferred choice. Other studies can be found where MCQs and continuous assessment methods
were found to be preferred by students, such as in [47] or [10], so it is hoped that these methods
will encourage engagement and increase motivation and learning by students. Nicol [31] advocates
that the use of MCQs in a continuous learning environment can be motivating and a supporting
factor for learner self-regulation, also increasing the student’s self-confidence when taking exams.
Bandarage et al. [2] also reported MCQs in a continuous assessment context to be a source of
motivation for students. These studies, along with others, have shown that motivation can be
related to assessment type. Most (if not all) of these studies, however, are conducted in courses
tightly related to the students’ area of studies. In this article, we attempt to extend that conclusion
to programming courses in nontechnical degrees. This is an important contextual difference, since
students are usually poorly motivated toward these courses that do not fall in line with their
general area of studies, and a more suited evaluation method and assessment type can contribute to
higher motivational levels. In a previous work of the same authors [11], at the end of the semester,
an inquiry was produced to evaluate the students’ opinions about the type of evaluation format.
All the students noted that they felt more motivated in performing multiple-choice questions or
even code completion questions instead of developing a full program. Of note, the inquiry was
only a small part of the author’s previous work. The main focus of that work was the analysis and
description of different assessment methodologies applied in order to evaluate the most suitable for
assessing programming knowledge. The results showed that it is possible to use different question
types as a pedagogical strategy to assess student difficulty levels and programming skills. Also, it is
shown that different forms of questions are equivalent to assess equal knowledge. Even though that

work had already provided some indicators regarding the importance of the evaluation format for
the learner’s motivational levels, it differs significantly from the actual one. The first gives some
indication of the assessment format that most motivates the students, from their point of view.
This one gives pointers of motivation from the teachers’ analyses by comparison of the students’
performance in two different formats of questions.
One point that was not expected was that results were not proportional to question complexity.
It would be expected that questions with a low complexity level would be correctly answered by
more students than questions with a higher complexity level. However, this did not happen, as can
be seen in Table 4.
5.1 Limitations
This work has a number of limitations, including:
• The study was conducted with a small sample size (only one occurrence of a course); this
limits the generalizability of the achieved results.
• The study involved only one context: a programming course in a management degree in a
Portuguese university; this limits the generalizability of the results, as distinct contexts and
realities (due to cultural differences, dissimilarities in degree and course areas of study and
subjects, or others) can provide different results.
• Motivation was not assessed using a formal instrument.
• The questions (both MCQs and DQs) used in the evaluation points were only validated em-
pirically by the course professors; although we trust their experience was enough to create
the questions as intended (in terms of desired difficulty level, cognitive level to evaluate,
and so on), a more formal validation strategy could have been used, including peer reviews
of the questions.
• Only two types of questions were used in the tests: MCQs and DQs; additional question
types could have been used, so as to allow for a better understanding of how these different
types of questions relate to each other in terms of evaluating student knowledge.
5.2 Future Work
In the future, further experiments need to be conducted in other programming courses, both in
technological and nontechnological degrees, in order to determine if there are differences between
the results in these two realities, and also to verify if these results are representative of nontech-
nological students enrolled in a programming course.
Another future experiment is to use the same MT structure (both MCQs and DQs), but with a
reduced number of evaluation points over the semester, in order to test whether or not a higher
number of tests is a factor for decreasing motivation over the semester.
Finally, we plan to further study motivation using an instrument based in the ARCS (Attention,
Relevance, Concentration, and Satisfaction) motivational model designed by John M. Keller [18,
19]. Even though this is an old theory, it is very well established and the more recent motivational
instruments are based on it. For now, we consider that MCQ tests develop the Confidence cate-
gory as students have positive expectations for achieving success when MCQ tests are used. In
other words, confidence helps students believe/feel they can succeed. A student may be highly
motivated, but if he or she feels a lack of confidence to carry out a particular activity, he or she
may not be involved in it. Consequently, we consider that we can help the students to succeed and
allow them to control their success using the MCQ approach. Satisfaction will also be gained by
reinforcing accomplishment with rewards (internal and/or external ones), helping students have
good feelings about their results and the desire to continue progressing.

APPENDIX
A EXAMPLE MINI-TEST
Multiple-Choice Question 1 (20 Points)
Consider the temperature function below, which receives an integer value and prints the corre-
sponding heat classification.
What does the function do when called with 45 as an argument?
The function prints nothing and returns 0 according to line 12;
The function prints only “very hot” since the temperature is higher than 40 degrees;
The function prints “mild temperature,” “starting to get hot,” “very hot” and returns “hot”;
The function prints “mild temperature,” “starting to get hot” and returns “hot.”
Multiple-Choice Question 2 (20 Points)
To play the famous Euro-Millions lottery, the player must choose a key composed of five different
main numbers from 1 to 50 and two different lucky stars from 1 to 12. We want to create a Python
function able to generate a valid set of the five different main numbers for the lottery.

When the function euro_million_key is called, what will be its returned value?
The function will return nothing and will present an infinite cycle;
The function will return an invalid key since the range in the randint function is not a
valid one;
The function will have a stochastic behavior that can contemplate an invalid key with less
than five numbers;
The function will return an invalid key with repeated numbers.
Development Question 1 (30 Points)
We want to create a function validate(license_plate) that receives a license plate from a car and
returns a Boolean value indicating whether or not that plate is valid. In Portugal, a license plate
of a car can follow one of two templates: if the license is between 1992 and 2005, the template is
(00-00-AA), and if it is more recent than 2005, the template is (00-AA-00). The numbers and the
letters are separated by “-”, the letters must be upper case and cannot contain “Y”, and the numbers
must be in the range of 0 to 99.
Development Question 2 (30 Points)
We want to create a function sudoku() that prints a valid 3x3 board of the famous game Sudoku.
This board must contain, without repetition, all the numbers in the range 1 to 9.
Output Example: sudoku()
3 2 8
6 1 7
4 5 9
REFERENCES
[1] Giora Alexandron, Michal Armoni, Michal Gordon, and David Harel. 2016. Teaching nondeterminism through pro-
gramming. Informatics in Education 15, 1 (2016), 1–23.
[2] Gunadya Bandarage, Marie N. Kumudinee de Zoysa, and Lankani Priyangika Wijesinghe. 2009. Effect of continuous
assessment tests with multiple choice questions on motivation for meaningful learning. In Proceedings of the Annual
Academic Sessions of the Open University of Sri Lanka. 8–10.

[3] Benjamin S. Bloom. 1956. Taxonomy of Educational Objectives: Book 1 Cognitive Domain. Longman, New York.
[4] Brent Bridgeman and Charles Lewis. 1994. The relationship of essay and multiple-choice scores with grades in college
courses. Journal of Educational Measurement 31, 1 (1994), 37–50.
[5] Ricardo Caceffo, Steve Wolfman, Kellogg S. Booth, and Rodolfo Azevedo. 2016. Developing a computer science con-
cept inventory for introductory programming. In 47th ACM Technical Symposium on Computing Science Education
(SIGCSE ’16). 364–369.
[6] David Clark. 2004. Testing programming skills with multiple choice questions. Informatics in Education 3, 2 (2004),
161–178.
[7] Peter David Clarke, Joo-Gim Heaney, and Terry John Gatfield. 2005. Multiple choice testing: A preferred assessment
procedure that is fair to all our business students? In ANZMAC 2005 Conference: Marketing Education. 51–57.
[8] William R. Cook. 2008. High-level problems in teaching undergraduate programming languages. ACM SIGPLAN No-
tices 43, 11 (2008), 55–58.
[9] José M. R. Corral, Antón C. Balcells, Arturo M. Estévez, Gabriel J. Moreno, and Maria J. F. Ramos. 2014. A game-based
approach to the teaching of object-oriented programming languages. Computers Education 73 (2014), 83–92.
[10] Adrian Furnham, Mark Batey, and Neil Martin. 2011. How would you like to be evaluated? The correlates of students’
preferences for assessment methods. Personality and Individual Differences 50, 2 (2011), 259–263.
[11] Anabela Gomes, Fernanda B. Correia, and Pedro H. Abreu. 2016. Types of assessing student-programming knowledge.
In Proceedings of the International Conference on Frontiers in Education. 8 pages.
[12] Anabela Gomes and António J. Mendes. 2007. Learning to program - Difficulties and solutions. In Proceedings of the
International Conference on Engineering Education. 5 pages.
[13] Mark Guzdial. 2014. The difficulty of teaching programming languages, and the benefits of hands-on learning. Com-
munications of the ACM 57, 7 (2014), 10–11.
[14] James D. Hansen and Lee Dexter. 1997. Quality multiple-choice test questions: Item-writing guidelines and an analysis
of auditing testbanks. Journal of Education for Business 73, 2 (1997), 94–97.
[15] Judy Hardy, Simon Bates, Morag Casey, Kyle Galloway, Ross Galloway, Alison Kay, Peter Kirsop, and Heather
McQueen. 2014. Student-generated content: Enhancing learning through sharing multiple-choice questions. Inter-
national Journal of Science Education 36, 13 (2014), 2180–2194.
[16] Geoff Isaacs. 1994. Multiple choice testing, HERDSA Green Guide No 16. Higher Education Research and Development
Society of Australasia.
[17] Tony Jenkins. 2001. The Motivation of Students of Programming. Master’s thesis. University of Kent at Canterbury.
[18] John M. Keller. 1987. Strategies for stimulating the motivation to learn. Performance Instruction 26, 8 (1987), 1–7.
[19] John M. Keller. 1987. The systematic process of motivational design. Performance Instruction 26, 9 (1987), 1–8.
[20] Gabor Kiss. 2013. Teaching programming in the higher education not for engineering students. Procedia - Social and
Behavioral Sciences 103 (2013), 922–927.
[21] William L. Kuechler and Mark G. Simkin. 2003. How well do multiple choice tests evaluate student understanding in
computer programming classes. Journal of Information Systems Education 14, 4 (2003), 389–399.
[22] Wanda M. Kunkle and Robert B. Allen. 2016. The impact of different teaching approaches and languages on student
learning of introductory programming concepts. ACM Transactions on Computing Education 16, 1 (2016), 26 pages.
[23] Mila Kwiatkowska. 2016. Measuring the difficulty of test items in computing science education. In 21st Western Cana-
dian Conference on Computing Education. 5:1–5:6.
[24] Scott Leutenegger and Jeffrey Edgington. 2007. A games first approach to teaching introductory programming. In
Proceedings of the 38th SIGCSE Technical Symposium on Computer Science Education. 115–118.
[25] Raymond Lister. 2000. On blooming first year programming, and its blooming assessment. In Australasian Conference
on Computing Education. 158–162.
[26] Raymond Lister. 2001. Objectives and objective assessment in CS1. ACM SIGCSE Bulletin 33, 1 (2001), 292–296.
[27] Raymond Lister. 2005. One small step toward a culture of peer review and multi-institutional sharing of educational
resources: A multiple choice exam for first semester programming students. In 7th Australasian Computing Education
Conference. 155–164.
[28] Raymond Lister, Elizabeth Adams, Sue Fitzgerald, William Fone, John Hamer, Morten Lindholm, Robert McCartney,
Jan Moström, Kate Sanders, Otto Seppälä, Beth Simon, and Lynda Thomas. 2004. A multi-national study of reading
and tracing skills in novice programmers. ACM SIGCSE Bulletin 36, 4 (2004), 119–150.
[29] D. B. Marx and David E. Longer. 1986. Cheating on multiple choice exams is difficult to assess quantitatively. NACTA
Journal 30, 1 (March 1986), 23–26.
[30] Ioana T. Mow. 2008. Issues and difficulties in teaching novice computer programming. In Innovative Techniques in
Instruction Technology, E-Learning, E-Assessment, and Education, Magued Iskander (Ed.). Springer Netherlands, 199–
204.

[31] David Nicol. 2007. E-assessment by design: Using multiple-choice tests to good effect. Journal of Further and Higher
Education 31, 1 (2007), 53–64.
[32] David Nicol and Debra Macfarlane-Dick. 2006. Formative assessment and self-regulated learning: A model and seven
principles of good feedback practice. Studies in Higher Education 31, 2 (2006), 199–218.
[33] Victoria University of Wellington. 2013. A guide for developing multiple choice and other objective style
questions. CAD Guidelines. Retrieved from http://www.victoria.ac.nz/learning-teaching/support/approach/guides/
developing-questions2/developing-questions.pdf.
[34] Dale Parsons, Krissi Wood, and Patricia Haden. 2015. What are we doing when we assess programming? In 17th
Australasian Computing Education. 119–127.
[35] Andrew Petersen, Michelle Craig, and Paul Denny. 2016. Employing multiple-answer multiple choice questions. In
ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE’16). 252–253.
[36] Kreig Randall and Bulent Uyar. 2001. Student performance in business and economics statistics: Does exam structure
matter? Journal of Economics and Finance 25, 2 (2001), 229–240.
[37] Tim Roberts. 2006. The use of multiple choice tests for formative and summative assessment. In 8th Australasian
Computing Education Conference. 175–180.
[38] Cristian M. Rǎduţǎ. 2013. Consequences the extensive use of multiple-choice questions might have on student’s
reasoning structure. Romain Journal of Phisics 58, 9 (2013), 1363–1380.
[39] Selvakumar Samuel. 2014. Teaching programming subjects with emphasis on programming paradigms. In Proceedings
of the International Conference on Advances in Education Technology. 94–97.
[40] Luis M. Serrano-Cámara, Maximiliano Paredes-Velasco, Carlos-Maria Alcover, and Ángel Velázquez-Iturbide. 2014.
An evaluation of students’ motivation in computer-supported collaborative learning of programming concepts. Com-
puters in Human Behavior 31 (2014), 499–508.
[41] Shuhaida Shuhidan, Margaret Hamilton, and Daryl D’Souza. 2010. Instructor perspectives of multiple-choice ques-
tions in summative assessment for novice programmers. Computer Science Education 20, 3 (2010), 229–259.
[42] Mark G. Simkin and William L. Kuechler. 2005. Multiple-choice tests and student understanding: What is the con-
nection? Decision Sciences Journal of Innovative Education 3, 1 (2005), 73–98.
[43] Simon, Judy Sheard, Angela Carbone, Donald Chinn, Mikko-Jussi Laakso, Tony Clear, Michael de Raadt, Daryl
D’Souza, Raymond Lister, Anne Philpott, James Skene, and Geoff Warburton. 2012. Introductory programming: Ex-
amining the exams. In 14th Australasian Computing Education Conference. 61–70.
[44] Simon, Judy Sheared, Daryl D’Souza, Mike Lopez, Andrew Luxton-Reilly, Iwan H. Putro, Phil Robbins, Donna Teague,
and Jacqueline Whalley. 2015. How (not) to write an introductory programming exam. In Proceedings of the 17th
International Australasian Computing Education Conference. 137–146.
[45] Simon and Susan Snowdon. 2014. Multiple-choice vs free-text code-explaining examination questions. In Koli Calling.
91–97.
[46] Ross E. Traub. 1990. Multiple-Choice Vs. Free-Response in the Testing of Scholastic Achievement. Ontario Institute for
Studies in Education.
[47] Eileen Trotter. 2006. Student perceptions of continuous summative assessment. Assessment Evaluation in Higher
Education 31, 5 (2006), 505–521.
[48] Jeroen Van Merrinboer. 1990. Strategies for programming instruction in high school: Program completion vs. program
generation. Journal of Educational Computing Research 6, 3 (1990), 265–285.
[49] Sanja M. Čisar, Peter Čisar, and Robert Pinter. 2016. Evaluation of knowledge in object oriented programming course
with computer adaptive tests. Computers Education 92–93 (2016), 142–160.
[50] Gui Ping Wang, Shu Yu Chen, Xin Yang, and Rui Feng. 2016. OJPOT: Online judge practice oriented teaching idea
in programming courses. European Journal of Engineering Education 41, 3 (2016), 304–319.
[51] Karyn Woodford and Peter Bancroft. 2005. Multiple choice questions not considered harmful. In 7th Australasian
Computing Education Conference (ACE’05). 109–115.
[52] Moshe Zeidner. 1987. Essay versus multiple-choice type classroom exams: The student’s perspective. Journal of Ed-
ucational Research 80, 6 (1987), 352–358.
Received August 2017; revised June 2018; accepted July 2018
View publication stats
View publication stats

6 Multiple-Choice Questions In Programming Courses Can We Use Them And Are Students Motivated By Them

Recommended

Recommended

More Related Content

Similar to 6 Multiple-Choice Questions In Programming Courses Can We Use Them And Are Students Motivated By Them

Similar to 6 Multiple-Choice Questions In Programming Courses Can We Use Them And Are Students Motivated By Them (20)

More from Andrew Molina

More from Andrew Molina (20)

Recently uploaded

Recently uploaded (20)

6 Multiple-Choice Questions In Programming Courses Can We Use Them And Are Students Motivated By Them