Computer Based Exams tedeq - question bank and curriculum mapping
EXAMSOFT WHITE PAPERDesign of a Tagged Electronic Database of Exam Questions (TEDEQ) as a Toolfor Assessment Management within an Undergraduate Medical Curriculum. Dr. Dale D. Vandre Department of Physiology and Cell Biology, College of Medicine, The Ohio State University, Columbus, OH Eric Ermie Office of Medical Education, College of Medicine, The Ohio State University, Columbus, OH www.examsoft.com | firstname.lastname@example.org | 866.429.8889
Abstract An aspect of curriculum mapping that is often overlooked is exam blueprinting,which provides a necessary link between instructional content and examination items.Computerized testing not only increases the efficiency and reliability of examinationdelivery, it also provides an effective tool for exam blueprinting, item banking, andmanagement of examination content and quality. We designed a method to categorizethe exam items used in our preclinical medical curriculum using a unique identifying tagto create a tagged electronic database of exam questions (TEDEQ) using the SofTeachmodule of the ExamSoft test management system. Utilizing the TEDEQ output, a detailedreport of exam performance is now provided to students following completion of eachexamination. This enables students to better evaluate their performance in relevant subject areasafter each examination, and follow their cumulative performance in subject areas spreadlongitudinally across the integrated medical curriculum. These same reports can be usedby faculty and staff to aid in academic advising and counseling of students. The TEDEQsystem provides a flexible tool for curricular management and examination blueprintingthat is relatively easy to implement in a medical curriculum. The information retrieved fromthe TEDEQ enables students and faculty to better evaluate course performance.
Introduction The importance of assessment in medical education is well established, not onlyfor guiding learning and ensuring competence, but also as a means to provide feedbackto the student as they negotiate the curriculum. Perhaps the most critical aspect ofassessment is the licensure exam, which has a significant impact on determining careeroptions. Assessment of knowledge and evaluating the acquisition of competencies bystudents during their undergraduate medical education relies heavily upon the utilizationof multiple choice question based examinations. Despite the extensive use of multiplechoice questions in medical curricula comparatively little faculty time is dedicated to theconstruction of the exam in comparison to the time involved in the design, preparation,and delivery of curricular content.1 Instructors often do not devote sufficient effort intothe preparation of questions, and the exam tends to be assembled at the last minute withlittle or no time for adequate review of the questions or evaluation of the overall balanceand quality of the exam as a whole.2 As a result, the quality of in house multiple choicequestions, especially in the pre-clinical curriculum, may suffer from being too reliant onquestions that focus on simple recall and comprehension of knowledge without effectivelytesting higher order thinking skills.3,4 Curriculum mapping programs have been designed to facilitate the managementof integrated medical school curricula in order to keep track of institutional objectives,course objectives, content areas, learning resources, learning events, assessments, andoutcomes. Many of these approaches focus on generating a database that includes ataxonomy of subjects or concepts included in the curriculum with varying degrees ofgranularity. Examples include databases such as KnowledgeMap,5 Topics for IndexingMedical Education (TIME),6,7 and CurrMit.8 Documentation of where specific topics arecovered in the curriculum is an important component of the accreditation process formedical schools, and utilization of curriculum management tools provides an efficientmechanism to aid in addressing accreditation standards. One component of effective curriculum mapping is exam blueprinting, which is oftenoverlooked in medical education.3,9 A test blueprint is used to link the content deliveredduring a period of instruction to the items appearing on the corresponding examination,and is a measurement of the representativeness of the test items to the subject matter.10,11To improve the quality of written examinations it is essential that the examination questionsreflect the content of the curriculum. Therefore, test blueprinting is a critical componentof improving examination quality, but linking curricular topics with those in the examquestions is not sufficient to ensure that the examination has content validity.9,12 In additionto measuring whether the examination adequately represents the learning objectives,content validity ensures that the examination is comprehensive and does not containbiased and/or under sampling of the curriculum. Moreover, test blueprinting ensures that
the questions are balanced with regard to degree of difficulty, that the items are clearlywritten and the format is not flawed, and that the examination measures higher orderthinking skills and not just factual recall.1,3,9,13 Therefore additional information, beyondthat provided by the subject taxonomy, is required of individual test questions in order toeffectively determine the content validity of assessments. The introduction of computerized testing to medical education provides an opportunityto increase the efficiency and reliability of the assessment process. When compared topaper-and-pencil examinations, no difference was found in the performance of medicalstudents on computer-based exams.14 In addition to delivery of examination by computer,programs designed to maintain a database of exam questions, or item bank, from whichexaminations could be assembled were described nearly 30 years ago.15 However,testing software must be extremely flexible to meet the demands of a medical schoolcurriculum. In addition to facilitate item banking, the software must be user friendly, beable to collect item statistics, provide immediate scoring feedback, have the capability ofpresenting items using various multimedia formats, and be able to deliver the examinationin a secure mode. Because of these various demands, suitable commercial softwareproducts were unavailable until recently, and as a result medical schools that were earlyadopters of computerized testing developed in-house solutions such as the ItemBankerprogram developed at the University of Iowa.16 As part of the ItemBanker system, eachexam question is identified by a unique serial number, and the database allows for abreakdown of question topic taxonomy, and provides statistics on performance and itemdifficulty. On-line administration of licensure examinations is becoming more commonplace inprofessional education. The United States Medical Licensing Examination (USMLE) Step1, Step 2, and Step 3 tests have been delivered in a computerized format since 1998, andthe National Board of Medical Educators provide an increasing number of examinations inan on-line format. Similarly, the bar examination is used to determine whether a studentis qualified to practice law in a given jurisdiction. Unlike the USMLE, which is a nationalexamination, bar examinations are administered by each state in the United States, and 38states currently use ExamSoft Worldwide, Inc., as the provider of secure on-line computer-based testing software for the administration of the state bar examination.17 Based uponthis utilization, we evaluated ExamSoft among other commercial testing software productsfor the administration of examinations to pre-clinical medical students, and adoptedExamSoft for the administration of multiple-choice examinations in 2009. The construction of a well-written exam is required to effectively measure studentcompetencies throughout the curriculum. Ultimately, exam performance is used to assessthe success of the educational program in preparing the student for licensure exams andmore advanced training. Therefore, a quality exam must not only measure the student’s
application of knowledge, but it is also essential that the questions adequately reflect thecourse content and objectives. We describe the development of a tool to generate a taggedelectronic database of exam questions (TEDEQ) that can be used for the categorizationof multiple choice examination questions. TEDEQ provides information necessary to linkexisting questions to curricular objectives using a taxonomy of instructional objectives,identifies question characteristics, and helps ascertain the level of knowledge required toaddress the question. The TEDEQ tool is easy to implement and integrate into existingcurricula and can be customized using the SofTeach module of the ExamSoft testmanagement system to derive the maximum amount of information from assessments. Wehave integrated the TEDEQ tool as part of the computerized administration of our examsusing ExamSoft, and the information is being used to help improve examination quality,provide input into curricular management, supplement curricular mapping documentationfor accreditation, and make content area specific performance feedback available to ourstudents.
Methods The preclinical Integrated Pathway (IP) program at Ohio State University College ofMedicine is broken down into organ system blocks, which are subdivided into divisionsranging in length from three to five weeks. At the end of each division, a 100-125 multiplechoice question examination is administered to assess whether the outlined learningobjects have been achieved. We have amassed an item bank of over 3,500 multiplechoice questions distributed across 22+ exams during the Med 1 and Med 2 years of theIP curriculum. Previously, little or no information was gathered linking course learningobjectives with the items included on the examination, rather test items were simplygrouped according to the division test they were part of. Additionally the only performancefeedback provided to the students, administrators, or faculty following an examinationwas the overall test scores and an item analysis of each exam question. We set out to design a system that would provide more thorough feedback to thestudents regarding their performance in content specific areas of a particular exam as wellas across longitudinal topics that span the curriculum. In addition, we wanted to generateinformation that would help faculty guide curricular management and enable improvedexamination quality, and give administrators an additional measure by which to compareinternal course performance against student performance on the USMLE Step 1 exam.To accomplish these goals we developed a simple coding system for each question, theTagged Electronic Database of Exam Questions, which utilizes specific data markers tocategorize, organize, and track use of items. We accomplished this goal using features ofthe question categorization tool developed within ExamSoft, which is the software usedcurrently for secure computerized delivery of examinations within the IP program.While the question categorization system, contained within the ExamSoft software,allowed for an unlimited number of categories to be assigned to each question, one of ourdesign objectives was to limit the number of fields applied to each question as well as thegranularity of the topic categories in order to obtain data that would be most meaningful.For students useful information would include performance feedback in subject areas ofthe curriculum, provide a study guide indicating areas of deficiency for those studentsrequiring remediation, and serve as an aide in planning for their USMLE Step 1 preparation.Further, these limitations allowed us to create a system that was not overly complex ordifficult for faculty to implement. This structure insured greater faculty buy in, withoutcompromising the impact of information collected, which was required to meet our goalswith regard to curricular management and test construction. The method used for item tagging/question categorization consisted of six categoriesas outlined in Table 1. Each of these categories provides distinct information that will havedifferent significance depending on the recipient audience. The content for categories oneand two associate the exam question with the block and division of the IP curriculum, andprovides a link to the faculty member responsible for writing the item. In most cases, the
identified faculty member is also responsible for the design and delivery of the learningmaterials used to meet the learning objective for which the question is designed to assess.Therefore, the tagging system links the question to the appropriate content expert ifquestions or issues arise over the validity or accuracy of the question. The “options” foreach of the first two categories are simply the names of the blocks in the curriculum andthe names of the faculty members.Table 1 – Design of the Tagged Electronic Database of Exam Questions (TEDEQ) CategoriesTable 1 contains a detailed list of all the categorization options within the first five categories of the TEDEQ system. This table isdistributed to faculty for use in categorizing questions.
For category three we defined four component choices that would classify thequestion type with regard to level of cognitive complexity: 1) Recall of Factual Knowledge- memory recall/factoid questions; 2) Interpretation or Analysis of Information - questionsthat required the interpretation of data from a table/graph and use of that informationto answer the question; 3) Application: Basic Science Vignette - questions pertaining tofoundational science that contain a vignette of patient information that must be appliedrequiring multiple steps of knowledge application to deduce the correct answer; and 4)Application: Clinical Science Vignette - questions pertaining to clinical science that containa vignette of patient information that must be applied requiring multiple steps of knowledgeapplication to deduce the correct answer. To code the question, a number was assignedto each category component. For example, a recall question generated by Dr. Smith inthe first division of the Neuroscience block would be coded Neuro1.Smith.1. Subsequentcategories were assigned numeric values and added to the end of the code sequentiallyseparated by periods. Categories four, five and six are used to map the question to content areas. Thecategorization was designed to meet the components of our curriculum, but is also flexibleenough to be applied to other medical curricula as well. To make the tagging system aswidely applicable as possible we focused on the subject categories of the USMLE Step 1exam.18 We first tag the question into broad categories with regard to process and focus.These include normal process, abnormal process, therapeutics, or gender, ethnic, andbehavioral considerations. All medical schools whose students take the USMLE Step 1 exam receive a report fromUSMLE that breaks down the performance of their students into 20 specific categories incomparison to the national average for each of those categories. These categories arethe same as the 20 categories content is broken down in the USMLE Step 1 study guideprovided to medical students for Step 1 exam preparation. Since these major subjectareas are comparable to those used in either the block design of our curriculum or aslongitudinal subject areas that run across the IP curriculum, they were adopted as the20 subject areas for category five. The study guide also contains specific sub-topics foreach of the 20 subject areas. We reviewed the sub-topics from the USMLE Step 1 studyguide and compared them with an internal set of learning objectives and topics that weuse within our curriculum. These two lists were combined and modified as necessary tocreate the sub-categories that comprise category six. A sampling of those sub-categoriesis presented in Table 2. A set of sub-categories was created for and matched to each ofthe 20 subject areas of category five.
Table 2 contains a sampling of the sub-categories used in the TEDEQ system. In total there are 290 sub-categories within the systemthat are associated with one of the 20 USMLE subject areas. Faculty were required to select one sub-category for each subject areathey associated with a question.
Faculty buy in was key to the successful implementation of the system. As such we met with our faculty members who serve as block leaders within the curriculum to explain the categorization system, how it worked and what it could do for them. We provided all block leaders instructions detailing the guidelines for applying the categorization to their exams. Faculty members were instructed that they could only assign one option from each category for categories one through four. A question could be assigned to up to three subject categories (category 5), however for each designation in category five a corresponding sub-category designation in category six is required. The number of sub- categories per subject category ranged from 5 to 25. Rather than attempt to force a curriculum wide application of the TEDEQ system simultaneously, we chose to work the application into the existing framework of the exam process. Therefore, categorization was required for each examination as it was generated during the academic year. As a part of normal preparation and revision, application of the TEDEQ categorization was added to the examination development process. A report was generated using ExamSoft that served as a template for faculty members to review all questions appearing on the exam. The report included the item analysis for each question (if available from previous assessment records), and a section was provided for assignment of the TEDEQ code to each question (Figure 1). The TEDEQ database was generated based upon the codes assigned to each exam item. The database could be used retroactively, since any question used on a previous assessment in ExamSoft would be recognized and assigned TEDEQ categories. This allowed second year medical students to review category performance on previous exams from their first year of medical school.Figure 1 - Sample Question with TEDEQ CategorizationFigure 1 contains an example from the report faculty members use to categorize exam questions. It displays the item analysis of thequestion (from use on the previous year’s exam), the question text, the image (if any) associated with the question and the categoriesassociated with each question.
Results The TEDEQ method has been implemented successfully across the Med 1 and Med 2IP curriculum for the 2010-2011 academic year. It has been well received by both facultyand students, and we have successfully gathered data on all of our division assessmentsusing this system. While data collection is continuing, there have been some immediateapplications of the information provided by TEDEQ. As feature a of the ExamSoft softwarestudents can instantly view a breakdown report of their individual performance in everycategory applied to exam items (Table 3). Table 3 - Sample Exam Performance Breakdown Table 3 is a sample of the TEDEQ report generated following each exam, which breaks down the performance of each student on that exam. This same report is also generated for faculty and staff; however it also contains the breakdown for the overall class performance for comparison purposes.
The students access these reports at anytime post exam by logging into a website,and can view and compare their results from exam to exam throughout the course of theyear. Course leaders receive an identical report, but the results represent an aggregate forthe class as a whole and not an individual student. The individual students performancedata can also be readily plotted in comparison to class average performance for anycoded category, which serves as an additional aid for students to evaluate their academicsuccess in the IP program (Figure 2). For example, a student would be able to evaluatetheir performance on recall questions in comparison to clinical vignettes, or in particularsubject areas such as pharmacology. This information can be provided to the student foreach exam as shown in Figure 2, as well as a cumulative average across exams as thestudent progresses through the curriculum. These individual student performance reports(Table 3 and Figure 2) can be generated for the faculty and staff review as necessary foracademic advising.Figure 2 - Sample Student Performance AnalysisFigure 2 illustrates comparative student performance in four of the six TEDEQ categories to the performance of the whole class. Thisreport is used by students and faculty to distinguish areas of strength and weakness in student performance.
Discusion The TEDEQ method we developed for creating a linked database of information relatingexam items to curricular content was implemented with several specific applications in mind.At the time of development, we did not have the ability to either accurately monitor howwell examination content addressed instructional objectives nor track student performanceon specific longitudinal foundational science subjects that cross blocks of the curriculumand appear on multiple assessments. Therefore, the initial goal was to create a centralizedrepository of information that allows for both improved quality control of assessments andmore detailed tracking of student performance in selected subject areas that span thecurriculum. The preclinical IP program is organized around organ systems, and integratesnormal structure/function with pathophysiology and clinical aspects of disease. The TEDEQ coding system begins by identifying the temporal location and source ofeach question within the curriculum. This is followed by two broad categories that definegeneral properties of each question with relation to cognitive skills and process that theitem is addressing. Each item is then assigned to a subject area corresponding to theclassification of topics used by the USMLE to both guide student preparation for the Step1 licensure examination and breakdown student performance. The final level of granularityin the tagging code indicates the most relevant sub-categories within each subject areabased upon curricular learning objectives. In many, but not all cases, this final level ofclassification is directly related to the categories also defined by the USMLE for Step 1.Utilizing a coding system that closely aligns with the USMLE Step 1 categories not onlyallows for the collection of the necessary information to link internal learning objectiveswith the assessment content, but also provides an opportunity to directly compare andanalyze student performance in discipline or subject specific components of the integratedcurriculum with performance on the Step 1 examination. This provides an opportunityto identify potential areas of curricular content that excel in preparing students for thelicensure examinations, or those areas that may need attention and improvement in orderto better prepare the students. The TEDEQ database provides a critical component necessary to blueprint the medicalcurriculum, namely an exam blueprint,3,9 and the corresponding content validity.10,11Another important aspect of aligning content between the curriculum and the examinationis that it provides greater relevance to the assessment.19 For example, the feedback reportTEDEQ enables us to generate, provides the student with immediate information relevantto their success in the curriculum. In addition, these performance reports also allow thestudent learner to more easily visualize how the knowledge base builds upon itself as theyprogress through the curriculum especially in longitudinal subject areas providing academicrelevance. Since the TEDEQ subject breakdown relates their current performance inthe curriculum to topics they will encounter in future licensure examinations, the reportprovides an additional tool the student can use to gauge readiness and develop their study
plans for USMLE Step 1 preparation. Thus the current curricular examination gains futurerelevance to the student. Together, feedback reports generated by TEDEQ contribute toproviding greater authentic relevance to the assessment process.19 The TEDEQ reports have already been used by the teaching faculty to identify selectedareas of curricular content that students are not performing as well as expected on theassessment. For example, we have identified sub-categories within the gross anatomycontent that indicate students are having difficulty with specific anatomical regions. Having identified these topics, anatomy faculty are designing additional e-learningobjects targeting these topics that will be available for the incoming class of studentsto supplement the material currently presented in the gross anatomy component of thecurriculum. In summary, the TEDEQ database provides a powerful tool for curricular managementthat is easy for faculty to implement. The system was developed within the ExamSoftsoftware, which has been used to deliver computerized medical examinations at ourinstitution for the past two years. Information provided by the TEDEQ database allows forexam blueprinting, which serves as a source of additional information necessary to meetaccreditation guidelines for curricular content management. The exam item breakdownreports provide useful performance feedback to the students as well as faculty instructors.The feedback information can be used to guide student remediation, student study habits,and direct curricular modification. In future, we plan to use the TEDEQ database to guidethe design and assembly of higher quality examinations within the preclinical medicalcurriculum.
References1. Wallach PM, Crespo LM, Holtzman KZ, Galbraith RM, Swanson DB. Use of a committeereview process to improve the quality of course examinations. Adv Health Sci Educ 2006;11:61-8.2. Jozefowicz RF, Koeppen BM, Case S, Galbraith R, Swanson D, Glew RH. The quality of in-house medical school examinations. Acad Med 2002;77:156-61.3. Hamdy H. Blueprinting for the assessment of health care professionals. Clin Teach2006;3:175-9.4. Chandratilake MN, Davis MH, Ponnamperuma G. Evaluating and designing assessments formedical education: the utility formula. Internet J Med Educ 2010;1:1-9.5. Denny JC, Smithers JD, Armstrong B, Spickard III A. “Where do we teach what?” Findingbroad concepts in the medical school curriculum. J Gen Intern Med 2005;20:943-6.6. Willett TG, Marshall KC, Broudo M, Clarke M. TIME as a generic index for outcome-basedmedical education. Med Teach 2007;29:655-9.7. Willett TG, Marshall KC, Broudo M, Clarke M. It’s about TIME: a general-purpose taxonomy ofsubjects in medical education. Med Educ 2008;42:432-8.8. Salas AA, Anderson MB, LaCourse L, Allen R, Candler CS, Cameron T, Lafferty D: CurrMIT. atool for managing medical school curricula. Acad Med 2003;78:275-9.9. Bridge PD, Musial J, Frank R, Roe T, Sawilowsky S. Measurement practices: methods fordeveloping content-valid student examinations. Med Teach 2003;25:414-21.10. McLaughlin K, Coderre S, Woloschuk W, Mandin H. Does blueprint publication affectstudents’ perception of validity of the evaluation process? Adv Health Sci Educ 2005;1015-22.11. Coderre S, Woloschuk W, McLaughlin K. Twelve tips for blueprinting. Med Teach2009;31:322-4.12. Lynn MR. Determination and quantification of content validity. Nurs Res 1986;35: 382-5.13. Yaghmale F. Content validity and its estimation. J Med Educ 2003;3:25-7.14. Kies SM, Williams BD, Freund GC. Gender plays no role in student ability to perform oncomputer-based examinations. BMC Med Educ 2006;6:57.15. Hall JR, Weitz FI. Question database and program for generation of examinations in nationalboard of medical examiner format. Proc Annu Symp Comput Appl Med Care 1983;26:454-6.16. Peterson MW, Gordon J, Elliott S, Kreiter C. Computer-based testing: initial report ofextensive use in a medical school curriculum. Teach Learn Med 2004;16:51-9.17. ExamSoft [http://examsoft.com/main/index.php?option=com_content&view=article&id=33&Itemid=7#NEWS12].18. United States Medical Licensing Examination Website [http://www.usmle.org/ examinations/step1/2011step1.pdf].19. D’Eon M, Crawford R. The elusive content of the medical-school curriculum: a method to themadness. Med Teach 2005;27:699-703.