Pros and Cons of Multiple Choice Question in Language Testing

9,347 views

Published on

A literature review of Multiple Choice Question in Language Testing.

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
9,347
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
195
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Pros and Cons of Multiple Choice Question in Language Testing

  1. 1. The Use of Multiple-Choice Questions in Language Testing A paperassignment for Language Testing and Evaluation ENGL 6201 By: Ihsan Ibadurrahman (G1025429)
  2. 2. The use of Multiple Choice Questions in Language TestingI. IntroductionMultiple Choice Question (MCQ), also referred to as an objective test, is one of the test techniques inlanguage testing where candidates are typically asked to respond by giving one correct answer from theavailable options (Hughes, 2003). In designing these MCQs, most test designers utilize Bloom’staxonomy to check whether specific instructional objectives have been met during the course (Samad,2010). Bloom’s taxonomy is a set of six level hierarchically ordered learning objectives that areproposed by Benjamin Bloom in 1956 (Wikipedia, 2012). This paper aims to illustrate how MCQs arecreated using Bloom’s taxonomy. Seven samples of MCQs will be presented based on a reading passageentitled Smart US dog learns more than 1,000, written by Kerry Sheridan (2012). Justification of howthese MCQs are created will also be provided. MCQ is the most commonly used testing techniques in many academic achievement tests.However, it is also the most criticized techniques (Lee, 2011). The second aim of this paper is thus toelaborate the advantages and disadvantages of using MCQ in language testing by reviewing currentavailable literature review in the field. Discussion and conclusion are given at the end of the paper.II. A Sample of Multiple Choice QuestionsBelow is a sample of multiple choice questions using Bloom’s taxonomy, which orders instructionalobjectives in 6 different hierarchical levels: Knowledge, Comprehension, Application, Analysis, Synthesis,and Evaluation. Knowledge refers to the ability to recall information from the text. Comprehensionrefers to the ability to understand the text. Application is the ability to apply what is understood fromthe text to a new situation. Analysis is the ability to compare and contrast different information. 1
  3. 3. Synthesis is the ability to create new information based on the given text. And lastly, evaluation is theability to judge a piece of information. Due to the constraint of the type of the text however, thequestions could only be targeted tothe level five of the learning objective in Bloom’s - synthesis.Following are questions related to the reading passage. For each question, choose the correct answerby marking A, B, C, or D on your answer sheet.1. What does the passage mainly discuss?A. A comparison between two different dogs: Casher and Rico.B. A study of how to increase dogs’ memory capability.C. A report of a herding dog that learns a mass of vocabulary.D. A report of how trainers can train their dogs differently.2. After three years of training, Chase stopped learning new words because ____A. her memory capacity was at its limit.B. her trainers seldom rewardedher with food anymore.C. her trainers felt that it was enough.D. her trainers thought that there was not enough time.3. “By the time the pup was five months old, language training began in earnest.” (par. 9)The underlined word may be replaced with _____A. a careful mannerB. a serious mannerC. a slow mannerD. a quick manner4. Which of the following is NOT Chaser’s ability____A. distinguishing between different toysB. recognizing objects from loads of toysC. separating verbs from nounsD. understanding the word order 2
  4. 4. 5. Rico fails to distinguish the difference between ‘ball’ and ‘get-the-ball’ because he can’t ____A. store any more vocabularyB. recognize noun phrasesC. separate nouns from commandsD. respond to requests6.“Hasan has got a Siberian husky dog called Maxi. She is 5 years old, and has never been language-trained.” Which of the following statements are supported by the passage?A. Learning words is not possible because Maxi is not a border collie.B. Training language should have begun when Maxi was 5 months old.C. Training Maxi is still possible, even without treating her nice food.D. None of the above statements are supported by the passage.7. Study the type of human memory in the text below. “There are many types of memory. In general terms, memory can be divided into two: procedural and declarative. Procedural memory is based on implicit learning, and is primarily employed in learning motor skills. Declarative memory, on the other hand, requires conscious recall, in that some conscious process must call back the information. Another type of memory is called visual memory which is part of memory preserving some characteristics of our senses pertaining to visual experience. One is able to place in memory information that resembles objects, places, animals or people in sort of a mental image. Topographic memory is the ability to orient oneself in space, to recognize and follow an itinerary, or to recognize familiar places.Getting lost when traveling alone is an example of the failure of topographic memory.” (Wikipedia, 2012)From the information above, it can be said that Chase has an excellent ______ memory.A. ProceduralB. DeclarativeC. VisualD. Topographic 3
  5. 5. III. CommentsJudging from the difficulty of the text, it is deemed most suitable for secondary school students.Therefore, the language, the types, and the difficulty of the questions outlined above followedaccordingly. I shall use the word students throughout the rest of this section to refer to any test-takersor candidates taking this particular test. All the above seven multiple choice questions strictly followedguidelines given by Samad (2010) and Woodford& Bancroft (2005) in which all the options/alternativesin each question were written to be balanced in terms of sentence length, and similar in part of speech.As stated in the guidelines, a chronological order was used in sorting out the questions so that studentswould not get cognitively loaded. For each question, elaboration is given as follows. Itemone is a synthesis questionas itattempts to test out students’ ability in synthesizing all theinformation from the passage to decide the best possible answer of all four. Although it may seem a bittoo early to jump into a higher order cognitive skill such as synthesis, this question needs to be placed inthe beginning since it actually discusses the ‘title’, a beginning of a story. Assuming that the question isused in a real test and not for just the sake of this paper, the title of the passage would be removed. It isthus clear that the key in this question is C. A report of a herding dog that learns a mass of vocabularywhich sounds like the title itself – Smart US dog learns more than 1,000 words. Distractors were writtenin a way that they would give students a somewhat vague answer. This is a case of “correct versus bestanswer” as recommended by Tamir (1991, p. 188) where instead of having one correct answer andeverything else are outright wrong, students need to select the best answer that summarizes best whatthe text is about. This clearly sets the first question in a higher cognitivelevel compared to the rest of thequestions. Item two is a type of knowledge question since it requires the students only to recall theinformation given in the text in order to answer the key D. her trainers thought that there was not 4
  6. 6. enough time. However, in order to make the answer less obvious, exact wording from the text wasavoided. Item three clearly falls into a comprehension question as it tests students’ ability to understandthe meaning of the underlined word from the context. Item four is yet another knowledge question inwhich students are asked to recall information from the text to rule out which information is notconsidered as true. Thus, it seems logically necessary to put the key in the last position to avoid thepossibility of eliminating the distractors in alphabetical order. This is to say that if the key was put in theinitial position as option A., students with knowledge would not read the rest of the options. The alternatives in item 5 were written in such a way that it avoids redundancy and unnecessaryconfusion (Samad, 2010; Woodford & Bancroft, 2005). A redundancy could have been caused by placingthe same word he can’t in all of the alternatives, while confusion could be caused by a possible doublenegative in the stem as in “Rico can’t …. Becausehe can’t”. Thus to avoid this confusion, the word failwas used in the stem. For this item, a comprehension question is used since it tests out students’ abilityto retain specific information from the test as well as understanding the meaning of discern (par.17) andlook for its closest meaning in the alternatives. A higher order thinking skill is used for the last 2 questions of the test. Item 6 uses applicationquestion by asking students to study a new situation and apply what they learn from the text in order tosolve it. Another difficulty in this question lies in the use of none of the above as distractor. Woodfordand Bancroft (2005) mention that it is commonly regarded as an affective option compared to itscounterpart – all of the above, since it requires students to eliminate all the other options first before itcan finally be chosen. The last question, item 7, goes up a level further in the taxonomy. A level ofanalysis question is achieved by asking students to break down all the crucial information from the newtext, as well as to compare and contrast between different types of memory in order to come up withthe answer. 5
  7. 7. III. Literature ReviewFollowing is a brief overview of literature on MCQ.This section will be divided into two: arguments thatsupport MCQ, and those that go against it. However, before going further, there are four most relevantjournal articles worth mentioning. The first one is a quantitative study by Cheng (2004) who comparesthe use of three different test techniques in a listening performance: A multiple-choice test, a multiple-choice cloze test, and an open-ended test technique. From one hundred fifty nine Mandarin Chinesetechnical college students who participatein the study, it is revealed that the participants score highestin MCC (mean = 35.33), followed by MC (mean = 33.84), and score lowest in OE (mean = 25.23). Thissuggests that listening performance is enhanced when a type of selected response category such asmultiple-choice is used. Asomewhat similar study is conducted by Ajideh & Rajab (2009) who compare the use of MCQsand cloze-tests in measuring vocabulary proficiency. Using 21 Iranian undergraduate EFL students as itssample, the study reveals that, at a figure of .57, there is no significant correlation between discrete-point item test (MCQ) and integrative cloze test in a vocabulary test. This indicates that those whoperform in cloze test might have a similar result if they take a discrete-point item test. It also suggeststhe possibility of substituting MCQ for the traditional cloze test in vocabulary tests. Although MCQ might have its place in a vocabulary test, its use in a grammar test is still inquestion. Currie and Chiramanee (2010) compare the difference between the responses of MCQ andconstructed-response format in a grammar test. In their study, 152 undergraduates Thai students takethe grammar test first and later take a similar multiple-choice tests using students’ error on the first testas distractors. The study reveals that the score increases in the second test, with only 26% of theresponses are found to besimilar to the first. It is assumed that guessing could be one of the factors thataffect the success rate in the second test. 6
  8. 8. Although not directly related to language testing, McCoubrie (2004) seeks to bring to light theunfairnessthat MCQ has been generally criticized for, particularly in the area of medical assessment. Thisis especially true for medical students since they need to be professionally able to think quickly on theirfeet in solving patients’ problems and MCQ provides these students the chance to make life-and-deathdecisions swiftly in a timed test. In his literature review, he concludes that a fair MCQ exam is one that:(1) is related to the syllabus, (2) integrates practical competence testing in it, (3) can fairly represent allthe important materials in the syllabus, (4) is error-free, (5) uses a criterion-referenced test, (6) exercisescaution when questions are reused, (7) uses Extended-matching question (EMQ) format for a morereliable, and time-efficient testing.a. Arguments supporting multiple choice itemsFrom the available literature, MCQ is generally favored for tests that measure receptive skills. Forinstance, Cheng’s (2004) study highlights the preference of using MCQ over short-answer item in alistening test. In the study, participants’ scores are higher in MCQmainly due to guessing, memoryconstraints, and the ability for test-takers to predict what is coming before listening. Another possiblereason is that MCQ strips away the ambiguity caused in most open-ended questions (Linn & Miller,2005). MCQ appeals for use in testing receptive skills since they do not require the candidate to producethe language (Samad, 2010). Hence, MCQ makes it possible to obtain accurate information oncandidate’s receptive skills and abilities, which would otherwise be difficult. MCQ can be very useful for large-scale high-stake testing where scoring thousands of essayswould seem practically impossible (Hughes, 2003;McCoubrie, 2004; Woodford & Bancroft,2005).Considering this one benefit of MCQ, Ajideh and Rajab (2009) suggest that MCQ could substitutethe traditional cloze-test formats in vocabulary tests when number of test-takers need to be taken intoconsideration. Most large-scale tests rely on MCQ because of its practicality and reliability (Cheng, 2004; 7
  9. 9. Lee, 2011). It is practical because scoring can be done quickly using a computer scanner. Furthermore,with the increasing number of computer-based MCQs nowadays, testing time has been reported to besignificantly shorter (McCoubrie, 2004). MCQ isalso reliable sinceno judgment has to be made by theassessor of the test. High reliability is what sets MCQ apart from other test techniques. In scoringhundreds of essays, for instance, there is a concern that the last few essays are not scored as reliably asthe first few due to the fatigue (Samad, 2010). MCQ also allows those with knowledge but are poor writers to excel since what they are onlyasked is to respond by marking their correct answers on the test (Tamil, 1991). Since it is time-efficient,this means that testers may be able to cover a wide area of topics with more questions includedin thetest, compared to other test techniques (McCoubrie, 2004). By using computer generated software, nowtesters are evenable tocreate a test item bank which can be instantly produced on demand, or evenreused for future testing (Hoshino & Nakagawa, 2005).b. Arguments against multiple choice itemsAs mentioned previously, MCQ might be favorable in measuring listening skills. However, it might fallshort in accurately measuring language competence such as grammar (Currie & Chiramanee, 2010).Candidatesmay be able to pick the right answer in an MCQ test, but to be able to perform it in written orspoken language is an entirely different story. Hughes (2003) also questions the validity of one part inTOEFL paper-based test where it aims to measure candidates’ writing ability, butit merely asks them toidentify grammatically incorrect language items in MCQ format. Most MCQ tests, thus, falls short inachieving validity for the sake of reliability. A scathing criticism that has often been leveled to MCQ is that there is always an element ofguessing that causes noise in language testing (Currie & Chiramanee, 2010: 487).In a four optionmultiple choice option, for example, there is a 25% possibility of getting the correct answer through 8
  10. 10. guessing. In this sort of guessing game, candidates may be able to pick the correct answer withouthaving to read the passage or understanding the meaning in it (Lee, 2011: 31). Similarly, a correctanswer can also be the result of eliminating incorrect answers, without the candidates knowing the rightanswer in the first place (Woodford & Bancroft, 2005). Another most commonly cited disadvantage of MCQ is the difficulty in writing one (Burton et al,1991; Linn & Miller, 2005). Hughes (2003) explains that MCQs are very difficult to write;a lot of carefulsteps need to be accounted for in writing a good MCQ. It needs to be well-written, retested, andstatistically analyzed. Schutz et al. (2008) assert that coming up with good distractors (i.e. those thatcould potentially be chosen by many candidates) contribute to the difficulty of writing MCQ. This meansthat a great deal of effort and time need to go into the construction of MCQ. There are many other disadvantages in using MCQ in a test. Hughes (2003) mentions some ofthese shortcomings: 1. It measures only recognition knowledge: MCQ cannot directly measure the gap between candidate’s productive and receptive skills. For instance, a candidate who gets the correct answer in a grammar test may probably not be able to use the grammar item correctly in speaking or writing. In other words, it only tells half the picture of someone’s ability in language – the knowledge, not the use. 2. It limits what can be tested: MCQ may be limited to test language items that have a range of available distractors. For example, while it is possible to test candidate’s knowledge on English prepositions using many different variants of them as distractors, it would be difficult to find distractors in asking the difference between present perfect and past simple. This may lead to MCQ with poor distractors, which in turn may give hints to the correct answer, without the candidates even having to guess. MCQ also lend itself very easily to 9
  11. 11. apply only Bloom’s taxonomy lower level thinking skills in the items (Paxton, 2000; Vleuten, 1999 as cited in McCoubrie, 2004), which in turn encourages student’s rote learning. (Roediger, H. & Marsh, E., 2005; Tamir, 1999). 3. It may create a harmful backwash effect: high-stakes testing that relies heavily on MCQ may create a negative effect on teaching and learning in class. Teachers would concentrate more on the preparation of the exam such as training students how to answer MCQs with educated guesses rather than actually getting students to improve their language. 4. It may facilitate cheating: It would be very easy for students to pass down the answers to others by using non-verbal language such as body gesture.IV. Discussion and conclusionFrom the reviewed articles, it can be inferred that MCQ has both its strengths and weaknesses. MCQ’sbenefits lie in its high reliability, accuracy for receptive skills, practicality, quick scoring, and the ability toinclude many question items. However, it has its shortcomings which include: harmful backwash,facilitating guesswork and cheating, lack of validity, limitation to test what can be tested, and excludinghigher order thinking skills. Just as there is no universally correct or best ‘methodology’ in languageteaching (Brown, 2001), similarly in language testing we may never have one best ‘test techniques’. Thepros and cons outweighed so far mean that have to choosea different kind of test techniques for everygiven situation. Thus, as pointed by Ory (n.d. as cited in Samad, 2010: 32) MCQ may be mostappropriate when: 1. A large number of students are concerned 2. The test questions are reused 3. The scores needed to be graded quickly 4. A wide of test coverage needs to be included 10
  12. 12. 5. Lower thinking skills are of greater importanceAfter having the experience of writing MCQ as outlined previously in this paper, the writer has to agreeon the last point above that coming up with higher learning objectives such as application, synthesis, isand evaluation are possible but might be difficult. This is probably what leads to a condition wherehigher learning objectives are deemed necessary, but are omitted due to the implausibility of writingone (Paxton, 2000). As mentioned earlier, many MCQ tests are produced poorly because of the difficultyit presents and sadly where very little care and attention is given. This is clearly something to beavoided. MCQ is not a shortcut for an easy and cheap way of testing. This is to say that although MCQstill has its potentials in language testing, all kinds of ‘excessive, indiscriminate, and potentially harmfuluse of technique’ should be avoided (Hughes, 2003: 78). 11
  13. 13. ReferencesAjideh,P. & Esfandiari, R. (2009). ‘A Close Look at the Relationship between Multiple Choice Vocabulary Test and Integrative Cloze Test of Lexical Words in Iranian Context’, English Language Teaching, Vol. 2(3), pp. 163-170.Blooms Taxonomy. (2012, March 9). In Wikipedia, The Free Encyclopedia. Retrieved March 10, 2012, from http://en.wikipedia.org/w/index.php?title=Bloom%27s_Taxonomy&oldid=481064754Brown, H. (2001). Teaching by Principles, 2nd edn., New York: Pearson Education.Burton, S., Sudweeks, R., Merrill, P. & Wood, B. (1991). How to Prepare Better Multiple-Choice Test Items: Guidelines for University Faculty. Retrieved from: http://testing.byu.edu/info/handbooks/betteritems.pdfCheng, H. (2004). ‘A comparison of Multiple-Choice and Open-Ended Response Formats for the Assessment of Listening Performance.’ Foreign Language Annuals, Vol. 37(4), pp. 544-555.Currrie, M.& Chiramanee, T. (2010). ‘The effect of the multiple-choice item format on the measurement of knowledge of language structure’, Language Testing, Vol. 27, pp. 471-491.Hughes, A. (2003). Testing for Language Teachers (2nd Ed), Cambridge: Cambridge University Press.Hoshino. A& Nagakawa, H. (2005). ‘A real-time multiple-choice question generation for language testing’, Proceedings of the 2nd Workshop on Building Educational Applications Using NLP, pp. 17-20.Lee, J. (2011). Second Language Reading Topic Familiarity and Test Score: Test-Taking Strategies for Multiple-Choice Comprehension Questions (PhD. Thesis). Retrieved from ProQuest Dissertations and Theses. (Accession Order No. 3494064).Linn, R. & Miller, M. (2005). Measurement and Assessment in Teaching (9th Ed). New Jersey: Pearson Education.McCoubrie, P. (2004). ‘Improving the fairness of multiple-choice questions: a literature review’, Medical Teacher, Vol. 26(8), pp. 709-712.Memory. (2012, March 8). In Wikipedia, The Free Encyclopedia. Retrieved March 10, 2012, from http://en.wikipedia.org/w/index.php?title=Memory&oldid=480781062Paxton, M. (2000). ‘A linguistic perspective on multiple choice questioning’, Assessment and Evaluation in Higher Education, Vol. 25(2), pp.109-119.Rodrieger, H.& Marsh, E. (2005). ‘The Positive and Negative Consequences of Multiple-Choice Testing’, Journal of Experimental Psychology: Learning, Memory, and Cognition. Vol. 31(5), pp. 1155-1159. 12
  14. 14. Samad, A. (2010). Essentials of Language Testing for Malaysian Teachers. Selangor: Universiti Putra Malaysia PressSchutz, L., Rivers, K., Schutz, J. & Proctor, A.(2008). ‘Preventing Multiple-Choice Tests From impeding Educational Advancement After Acquired Brain Injury’. Language, Speech & Hearing Services in Schools, Vol. 39(1), pp. 104-109.Sheridan, K. (2011, January 7). ‘Smart US dog learns more than 1,000 words’. In MNN, Mother Nature Network. Retrieved March 10, 2012, from http://www.mnn.com/family/pets/stories/smart-dog- learns-more-than-1000-wordsTamir, P. (1991). ‘Multiple Choice Items: How to Gain the Most Out of Them’, Biochemical Education, Vol. 19(4). Pp. 188-192.Woodford, K. & Bancroft, P. (2005). ‘Multiple Choice Questions Not Considered Harmful’. Australian Computing Education Conference 2005 – Research and Practice in Information Technology, Vol. 42. Pp.1-8. 13

×