Competence based exam for prospective driving instructors


Published on

Paper describes a currently designed exam for prospective Dutch driving instructors. What does instructor competence encompass? What type of tests were employed? What level of proficiencey is required?

Published in: Automotive
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Competence based exam for prospective driving instructors

  1. 1. Development and Evaluation of a Competence BasedExam for Prospective Driving Instructors Erik Roelofs1, Maria Bolsinova1, Marieke van Onna1, Jan Vissers2 1 Cito, National Institute for Educational Measurement, The Netherlands 2 Royal Haskoning DHV, Amersfoort, The NetherlandsIntroductionA growing consensus among driver training and road safety researchers is that driver trainingshould place greater emphasis on higher-order, cognitive and motivational functionsunderlying driving behaviour (Mayhew and Simpson, 2002; Hatakka et al., 2002). Thischanged conception of driver training has been laid down in the Goals for Driver Educationmatrix (Hatakka et al., 2002). Recent research seems to support this consensus (Isner, Starkey,& Shepard, 2011; Beanland, Goode, Salmon, & Lenné, 2013). Innovative training initiativesappear to counteract overconfidence and address motivational factors such as driving anger,sensation seeking, and boredom (e.g. Isner et al., 2009).Parallel to the doubts raised about the quality of driver training, the quality of driver instructorpreparation programs is criticized. The MERIT review study (Bartl, Gregeresen, & Sanders,2005) showed that huge variations existed in quality of driver instructor education throughoutEurope. The content did not cover higher order skills. Most programs relied on teacher-focused approaches, which seem to fall short in developing higher order skills.In many European countries the quality of the education of driving instructors is regulated bymeans of the instructor exam. One may view this as a problem, but on the other side, this alsooffers opportunities for improvement. A valid and reliable exam that only allows proficientprospective instructors to enter the profession, may have a positive backwash effect on driverinstructor education programs, as in other fields of education: teachers teach and studentslearn what will be tested (Crooks, 1988; Fredericksen & Collins, 1989; Madaus, 1988).In the Netherlands, the first steps in this direction have been made in the last ten years. Aspart of a new law on driving education in 2003, competence-based outcome standards forprospective driving instructors have been formulated (Nägele, Vissers, & Roelofs, 2006). Themost far-reaching change underlying these standards has been the emphasis on performancein critical job-situations with real learner drivers. In addition, supporting knowledge wasdefined in terms of relevant concepts, principles and decision making skills to be applied inauthentic instructional situations.Based on the standards, a two stage competence-based exam was designed and put into actionin the fall of 2009. Since then, over 4000 prospective driving instructors (PDI’s) have gonethrough one or more tests. The question is whether the assessments have resulted in valid andfair decisions about PDI’s. Regarding the tenability of decisions, this paper focuses on theseparate theoretical assessments, comprising stage 1. In addition, their predictive value forinstructor performance as demonstrated at the final performance assessment lesson (stage 2) isstudied. 1
  2. 2. The fairness question concentrates on the comparability of different versions of assessments.In the exam under study, items banks are used, from which different sets are drawn tocompose exam versions to prevent effects of item exposure and cheating. The question thenarises whether one cut-off score implies the same level of required proficiency for differentversions. To solve this problem, psychometrical equating methods are common to determinehow scores on two different tests can be projected on one (latent) scale (Kolen, & Brennan,1995).In summary, four research questions are addressed: 1. To what extent are the individual parts of the exam psychometrically reliable? 2. To what extent do the different theoretical tests inter correlate? 3. To what extent do results on theoretical tests and performance assessment for instructional ability correlate? 4. Do the used cut-off score across different versions of the theoretical tests reflect equivalent required levels of proficiency?Design Features of the Competence Based ExamThe exam consists of two stages. The first stage comprises the assessment of the theoreticalknowledge base of prospective instructors regarding driving and driving pedagogy. Afterhaving passed the first stage PDI’s receive a provisional instructor license enabling them toenrol in a half year internship at a (certified) professional driving school. In the second stage,after having finished their internships, PDI’s are judged on their professional instructionalabilities, during a masterpiece lesson involving one of their own learner drivers, whom theyhave been teaching as an intern. If they pass, the will get a full license for the next five years.Below the design features are described. A summary of all exam parts is provided in Table 1.The Design ProcessAll individual assessments of the exam were designed by means of the evidence-centreddesign model (ECD, Almond, Steinberg, & Mislevy,2002; Mislevy, & Haertel, 2006). TheECD model identifies five layers in the design process: domain analysis, domain modelling,conceptual assessment framework, assessment implementation, and assessment delivery.These design layers have been gone through successively, whereby a continuous dialoguetook place between the assessment designers and different stake holders: a board of instructoreducators, exam institutes, ICT specialists, psychometricians, educational scholars, academicteacher educators, driving examiners, and driving instructors.The Conceptual Assessment FrameworkIn the exam the layer of the conceptual assessment framework for assessment task design wasof central importance. The conceptual assessment framework helps to sort out therelationships among attributes of a candidate’s competence, observations which showcompetence and situations which elicit relevant driver performance. The central models fortask design are the Competence or Student Model, the Evidence Model, and the Task Model.The driving instructor competence model. The Competence Model encompasses variablesrepresenting the aspects of instructor competence that are the targets of inference in theassessment and their inter-relationships. Starting from a literature search on what comprises 2
  3. 3. good teaching in general and more specifically driving instruction a competence model wasconstructed. This resulted in the formulation of four domains of competence summarized inTable 1: 1) Conscious traffic participation as first and second driver, 2) Lesson preparation, 3)Instruction and coaching, 4) Evaluation, reflection and revision.A model of competent task performance formed the basis for two competence models: drivingcompetence (Roelofs, Van Onna, Vissers, 2010) and instructional competence (Roelofs &Sanders, 2007). A basic tenet in the model (see Figure 1) is that instructor competence isreflected in the consequences of an instructor’s actions. The most important consequences arestudents’ learning activities during delivery of instruction or coaching, and safety, flow andcomfort in traffic during driving.Starting from the consequences, the remaining elements of the model can be mappedbackwards. First, the component ‘actions’ refers to professional activities, e.g. deliveringinstruction or providing coaching to students. Second, any instructor activity takes placewithin a specific context in which many decisions need to be made, on a long-term basis(planning ahead) or immediately during an in-car situation. For instance, instructors will haveto plan their instruction and adapt it depending on differing circumstances (e.g. differentlearning paces, traffic situations). Third, when making decisions and performing activities,teachers will have to draw from a professional knowledge base (e.g. pedagogical principles,psychology of driving, rules and regulations).[Insert Figure 1]Assessment task models and exam assembly. The Task Model describes the kinds ofassessments tasks (items) that embody an assessment (test). They follow directly from thecognitive activity and interactive activity as mentioned competence model. Three types ofassessments tasks were designed.1) Case based items. (Schuwirth et al. 2000; Norman, Swanson, & Case, 1996). These itemsaddress knowledge of concepts and cause-effect rules in cases embedded in a rich drivinginstruction context. These questions were used for theoretical assessment in all four taskdomains. An example of an item is: “An instructor starts a lesson with an explanation on thefirst topic. He does not preview on what the learner driver is going to be able to at the end ofthe lesson. What is the most likely consequence of his approach?” To respond, the PDI wouldchoose out of four options: A. the learner will learn less that desirable, B. the learner will notfully understand your explanation, C. the learner will have less time to practice new drivingtasks, D. the learner cannot direct his attention to the essential parts of the lesson (correct).2) Situational judgment items. (Whetzel & McDaniel, 2009). These items address decisionmaking skills in a rich driving instruction context. An example regarding lesson planning is:“The learner driver is starting to learn manoeuvring in traffic situations with low trafficintensity as illustrated on the picture. During the upcoming lesson you are going to instructhow to park backwards into a parking bay. Which of the parking bays [pictures shown with orwithout other vehicle parked alongside an empty bay] can you choose best for a learner driverin this stage of driver education?” To respond, the PDI would choose one out of four pictures.One option is optimal, one other is suboptimal, and the remaining two are unacceptable.3) Performance assessment assignments. To address the PDI’s own driving competence andhis ability to verbalize mental task processes (perception, anticipation, decision, action 3
  4. 4. execution) the PDI had to complete a 60 minute drive, on which his performance was judgedby a trained assessor using a standardized scoring form. Five performance criteria wereemployed on this form: 1) safety, 2) aiding traffic flow, 3) driving socially considerately, 4)ECO friendly driving, and 5) vehicle control (Roelofs, Van Onna, & Vissers, J, 2010). At twointermediate stops the PDI was asked to verbalize his mental processes that went on whilesolving the previous traffic situations.Finally, professional actions regarding instruction, coaching and evaluation were judged byhaving the PDI carry out a lesson with a real learner driver. The lesson performance wasjudged by trained assessors. To this end a 34 item scoring form was used to judge the qualityof coaching and instruction.Evidence model. The Evidence Model describes how to extract the key items of evidencefrom instructor behaviour, and the relationship of these observable variables to thecompetence model variables. All three theory tests consisted of 60 multiple choice items. TheTheory of Driving Test consisted of one-best-answer type items, scored dichotomously (0,1).The cut-off score for passing the test was 42 items. Both the Theory of Lesson Preparationtest and the Theory of Instruction and Coaching Test included situational judgment items,which were of the partial credit model. The best answer yielded 7 points, a suboptimal answeryielded 3 points, while the distractors yielded no points. The case-based knowledge items hadone best answer; all items were scored 0 and 7, for incorrect and correct answers respectively.The cut-off score for passing the test was 266 points (out of 420 points).The items on the Performance Assessment Lesson were scored on a three point scale,representing ‘counterproductive performance’, ‘beginning productive performance’ and‘optimal performance’. A detailed scoring guide was available for assessors. Initial rateragreement scores using Gower’s similarity index (Gower, 1971) showed acceptable levels ofagreement, .67 for instruction and .75 for coaching. The cut-off score for passing theperformance assessment was 71 points (out of 102 points), under the condition that on nomore than one out of seven categories the results were below the specified cut-off score(Mean=2.0).MethodsSubjects and DataAll data from candidates who enrolled the program between January 1st 1010 and October 1st2012 were selected. Data from 2009 were discarded of, because the computer-basedassessment platform was not completely stable at that time. This resulted in assessment datafor 4741 prospective driving instructors. 79 per cent of them were male and 21 per centfemale. The mean age was 34.9 years (SD=10.9). Of them 3079 (74.4%) were born in theNetherlands. The remaining 25.6% originally came from 79 different countries. The majorityof them were immigrants from Morocco (n=199), Suriname (n=190), Turkey (n=151),Afghanistan (n=112), Iraq (n=89), and Iran (46).A total of 4644 PDI’s completed at least one of the theory tests. Of them 2977 passed all theirtheory exams. Of them 1941 PDI’s took part in de Performance Assessment instruction andcoaching. From the remaining PDI’s about half (n=508) did not participate in the PerformanceAssessment Lesson within more than a year after their last successful theory test. Theremaining part (n=528) did not finish their internship. 368 PDI’s got dispensation to 4
  5. 5. participate in the performance assessment, although they failed one of theory tests. In total,2315 PDI’s participated at least once in the Performance Assessment Lesson.AnalysesIRT-analyses were performed on the data of the three theory tests. Each test had beenadministered in many different versions, based on 60 item representative samples drawn froman item bank. Available software does not allow to perform IRT-analysis on a very largenumber of versions - there were more than 700 versions for each test- of a test, especially ifsome of these versions were taken by only one PDI. Therefore, for each of the theory testsonly versions with a substantive number of participants were selected. In Table 2 the numberof test versions, total number of items and sample size chosen for analysis are shown.[Insert Table 2]For the tests which included items with partial scoring (0, 3, 7), a partial credit model wasfitted first. The model had a very poor fit in both cases. Responses to all items in the testsTheory of Instruction and Coaching and Theory of Lesson Preparation were dichotomized,because this allows to fit more flexible IRT models that can fit the data better. All responsesfor which 7 points were given were recoded as 1 (correct response), and all other as 0(incorrect response). The number of items correct had a very high correlation with the originalnumber of points (.986 for both tests). A new cut-off score for the number of items correctwas chosen in such a way that the number of subjects having less than 266 points but passingthe test with a new cut-off score in terms of items correct and the number of subjects havingmore than 266 points but failing the test according to the new criterion was minimal. For theboth tests, the new cut-off score was 35 items correct, which gave 4.65% of misclassificationsfor the test of Theory of Instruction and Coaching and 3.95% for the test of Theory of LessonPreparation.A one parameter logistic model (OPLM, Verhelst, & Glas, 1995; Verhelst, Glas, &Verstralen, 1995) was fitted to the data of each theory test. In the Theory of Driving Test, oneof the items was excluded from the analysis because it was answered correctly by all PDI’s. Inthe Theory of Instruction and Coaching Test, two items were excluded because they hadstrong negative correlations with the test score. The model had a reasonable fit for all threetests.In the perspective of IRT, the concept of reliability differs from that of the Classical TestTheory. The measurement precision depends on the position on the latent ability scale. Theability of a subject for whom all test items are too difficult will be measured with a largererror than the ability of someone for whom the difficulty of the items is appropriate. In thecase of three theory tests, it is important to measure accurately at the ability levelcorresponding to the cut-off score, representing the pass-fail boundary. A standard error ofestimate of ability corresponding to the required number of items correct (SE) has beenchosen as a measure of reliability. To enable interpretation of this value it has to be comparedwith the standard deviation of the ability in the population (SD). As a measure, we look at theso called proportion of true variance: (SD2 – SE2)/SD2. Additionally, as a measure of globalaccuracy of measurement the MAcc index was used, reported by the OPLM software, whichis the expected reliability coefficient alpha as defined in classical test theory for a givenpopulation of PDI’s. 5
  6. 6. Using an IRT model allows to locate all different versions of the test on the same scale, andtherefore compare different versions by their difficulty and the level of ability needed to passthe test. For each test version a level of ability at the cut-off score was estimated. Unlike theraw test scores, the latent estimates from different test versions are directly comparable. Alllatent abilities were scaled in such a way, that the population distribution had a mean of 100and standard deviation of 15.To analyse relations between the three theory tests, a subsample of n=1980 which have takenall three tests were selected. Correlations between the three latent abilities scores werecomputed.To determine the reliability of the final Performance Assessment Lesson, all available rawdata (n=580) for PDI’s were obtained and processed into data files. The exam institute onlykept the final pass/fail outcome in their fata files, but still had score forms of 580 PDI’s.Principal component analysis was carried out to reduce the data to a limited number ofinterpretable factors, yielding maximum reliable criterion variables. Correlations betweenthree latent abilities and the score on the resulting scales for instruction and coaching werecomputed.ResultsFigure 2 shows a normal distribution of ability scores, (mean=100; SD =15). In this figure thecut-off scores for the 14 most frequently administered versions of the Theory of Driving Testare plotted. The dots represent the cut-off scores for each test version expressed in terms ofability (Theta). The lines represent the standard errors around the cut-off score. Two thingscan be noted. First, the cut-off score for all test versions fall below the average ability in thetotal population. The mean cut-off score (M=85.3, see Table 2) is almost one standarddeviation below the ability mean of the population. This means that relatively low ability(M=86.5) was needed to pass the test. Second, there are small differences between therequired ability levels for different test versions, but the variation of the cut-off levels(SD=3.22) across versions is small compared to the standard error.[Insert Figure 2]Similarly, Figure 3 shows the plotted cut-off ability scores for the 15 most frequentlyadministered versions of the Theory of Lesson Preparation Test. As can be noted from thefigure the differences between the cut-off scores of versions are higher than for the Theory ofDriving Test. Version 1 and 6 differ considerably: version 1 requires an ability level of 78, asversion 6 requires 92. The mean cut-off score (M=85.3) is again below the ability mean of thepopulation.[Insert Figure 3][Insert Figure 4]The results regarding the Theory of Instruction and Coaching Test show a similar pattern (seeFigure 4): the required ability levels are below the mean ability (M=90.2, see Table 3), butless pronounced compared to the other theory tests. The different test versions show differentlevels of required ability, but these are relatively low, compared to the standard error. 6
  7. 7. For the theory of driving test the standard error at the cut-off scores amounts to 10.01,whereas the values for the theory of lesson preparation and theory of instruction and coachingamount to 9.56 and 7.74 respectively (see Table 3). All values fall within the region of goodreliability. The MAcc-coefficients reflect reasonable overall reliability for the whole test,whereas the average proportion of true variance at the cut-off score measures reflectquestionable reliability for the first two tests and reasonable for the Theory of Instruction andCoaching test.[Insert Table 3]The correlation coefficients between the ability scores on the three theory tests show amoderate correlation of .56 between the Theory of Driving Test and the Theory of LessonPreparation Test. Ability scores on the Theory of Driving Test correlate .43 with the abilityscores on the Theory of Lesson Preparation Test. Finally, ability scores on the Theory oflesson preparation test correlate .29 with ability levels on the Theory of Instruction andCoaching Test.Table 4 shows a psychometric report for the Final Performance Assessment Lesson.[Insert Table 4]Principal component analyses resulted in three clearly interpretable factors. Three fairlyreliable scale scores could be composed (see Table 4), representing aspects of coaching,Motivational Support (alpha= .77), Diagnosis and Task Support (alpha= .79) and Instruction(alpha= .83). The Motivational Support Scale correlated .46 (p<.001) with Diagnosis andTask Support and .45 (p<.001) with Instructional skill. Diagnosis and Task Supportcorrelated .66 (p<.001) with Instructional Skill.[Insert Table 5]Table 5 shows the correlations between the ability scores on the theory tests and the scores onthe Performance Assessment Lesson, for the three subscales and the overall assessment score.The correlation coefficients for Theory of Driving Test do not differ significantly from zero.The ability scores for Lesson Preparation and Instruction/Coaching show six low butsignificant correlations with the subscales and the overall scale for the final performanceassessment lesson (between .12 and .14; p<.05).DiscussionThe central question in this study was whether decisions made about prospective drivinginstructors, as they follow from the result on theory tests of the innovated exam are valid andfair.First it can be concluded that the overall reliability of estimated ability scores on the theorytests shows acceptable levels. The reliability around the cut-off scores was also acceptable,which seems most important, because here the pass/fail decisions are made. The FinalPerformance Assessment (lesson) also shows acceptable reliability in terms of the alphavalue, used in classical test theory. 7
  8. 8. Second, the IRT models showed an acceptable fit, suggesting that the tests represent separableone-dimensional abilities. The moderate correlation between the Theory of Driving Test andthe Theory of Lesson Preparation Test shows that knowledge of the traffic task is animportant predictor for knowledge and decision making regarding lesson preparation. To alesser extent high (or poor) achievement on lesson preparation is related to similarperformance on instruction and coaching.Third, the predictive value of theory test performance for in-car instructional and coachingperformance was very low. The ability scores for lesson planning and instruction andcoaching only show very low, although significant, correlations with the in car-performancefor coaching and instruction. An explanation for this finding may be that only those whopassed the stage one theory exams are allowed to go through the final assessment. In addition,the effect of half a year of internship may have washed out initial differences between PDI’s.Regarding fairness, the question was whether the different versions of the theory testsrequired the same level of proficiency to pass. A first finding was that the cut-off scores forpass-fail decisions for the theory tests were well below the average ability level of thepopulation, implicating relatively low ability requirements. The Theory of Coaching andInstruction Test and the Theory of driving Test had comparable cut-off scores across versionand were hence equivalent in their ability requirements. For Lesson Preparation there werelarger differences in required ability across test versions, although the differences fell withinrange of the standards errors.As far as the construction and delivery process concerned some problems need furtherattention. Many of these are related to the way the assessments are delivered. The exam iscomputer-based administered at an exam office, involving items banks from which differentversions are drawn.To reach representativeness all versions need to reflect all sub domains (for each test at least9), mental activities (perception, decision making, action, cause-effect reasoning, conceptrecognition), and critical situations (learner characteristics, stage of acquisition, trafficsituation) that are distinguished. The relatively small size of the item bank resulted in thefrequent reuse of items, which may have led to overexposure of the items, which may result inthe lowering of item difficulties.In addition, we observed that a part of the items had poor quality, i.e. low or highly negativeitem-test correlation coefficients, and extreme p-values (near zero or one). In the currentexamination practice poor items were not excluded ad posteriori from the tests, becauseshorter test versions would not have been accepted by stakeholders. However, it would havebeen defendable to estimate ability levels based on a smaller ‘cleaned’ subset of items,yielding a more reliable and still representative score.An optimal approach to warrant acceptable item quality is to pre-test all items on arepresentative sample of target candidates before putting them into item banks. This howeverseems problematic because of the risk of early item exposure. In addition, exam costs wouldrise. However, in general it can be recommended to use exam data to redesign and improvethe exam on the fly. Following the evidence-based design model of Mislevy and colleagues,many questions can be answered on the fly: does the competence model reflected in the IRTmodel fit? Are any changes needed? Do the cut-off scores represent what we want PDI’s toknow and to be able to? Can certain item characteristics be traced back to the way the item 8
  9. 9. was designed? In short, using an evidence-based Design model, in combination with on thefly improvements can improve our decisions about prospective driving instructors.In follow-up research, we intend to take a closer look at other parts of the exam, thefunctioning of different item types, the way items are presented, the stimuli used in items, theresponses that are asked and the way these are related to estimates of PDI’s abilities.To evaluate the long term effects of the innovated exam for instructional practice, learnerdriver gain and crash involvement, longitudinal research will be necessary. In such a studyone should take into account the quality of all subsequent educational interventions andrelated driver activities to determine whether there is a case for driver training (Beanland etal., 2013).ReferencesAlmond, R. G., Steinberg, L. S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1(5)., G., Gregersen, N.P., & Sanders, N. (2005) EU MERIT Project: Minimum Requirements for Driving Instructor Training. Vienna: Institut Gute Fahrt.Bartl, G., Gregersen, N.P., & Sanders, N. (2005). EU MERIT Project: Minimum Requirements for Driving Instructor Training. Final Report. Vienna: Institut Gute Fahrt.Beanland, V., Goode, N., Salmon, P.M., Lenné, M.G. (2013). Is there a case for driver training? A review of the efficacy of pre- and post-licence driver training. Safety Science, 51, 127-137.Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58, 438-481.Fredericksen, J. R., & Collins, A. (1989). A systems approach to educational testing. Educational Researcher, 18(9), 27-32.Gower, J.C. (1971). A General Coefficient of Similarity and some of its Properties, Biometrics, 27, 857-871.Hatakka, M., Keskinen, E., Gregersen, N.P., Glad, A., Hernetkoski, K. (2002). From control of the vehicle to personal self-control; broadening the perspectives of driver education. Transportation Research Part F: Psychology and Behaviour, 5(3), 201–215.Isner, R.B., Starkey, N.J., Sheppard, P. (2011). Effects of higher-order driving skill training on young, inexperienced drivers’ on-road driving performance. Accident Analysis Prevention, 43, 1818–1827.Isner, R.B., Starkey, N.J., Williamson, A.R. (2009). Video-based road commentary training improves hazard perception of young drivers in a dual task. Accident Analysis Prevention, 41, 445–452.Kolen, M.J., & Brennan, R.L. (1995). Test Equating. New York: Spring.Madaus, G. F. (1988). The influences of testing on the curriculum. In L.N. Tarner (Ed.). Critical issues in curriculum (pp. 83-121). Eighty-Seventh Yearbook of the National Society for the Study of Education, Part I. Chicago: University of Chicago Press.Mayhew,D.R., Simpson, H.M. (2002). The safety value of driver education and training. Injury Prevention, 8, 3–8.Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67.Nägele, R., Vissers, J., & Roelofs, E.C. (2006) Herziening WRM: een model voor competentiegericht examineren. Eindrapport [Revision Law on motor vehicle 9
  10. 10. education: a model for a competence based exam] . Dossier : X4691.01.001; Registratienummer : MV-SE20060477. Amersfoort/Arnhem: DHV, Cito.Norman G, Swanson DB, Case SM. (1996) Conceptual and methodological issues in studies comparing assessment formats. Teaching and Learning in Medicine, 8(4):208-216.Roelofs, E, & Vissers, J (2008), Toetsspecificaties theoretische deeltoetsen WRM-examen. [Specifications of the theoretical exams for driver instructor candidates] Arnhem: Cito; Amersfoort: DHV groep.Roelofs, E.C., Onna, M. van, Vissers, J. (2010). Development of the Driver Performance Assessment: Informing Learner Drivers of their Driving Progress. In L. Dorn (Ed.) Driver behavior and training, volume IV (pp. 37-50). Hampshire: Ashgate Publishing Limited.Roelofs, E. & Sanders, P. (2007). Towards a framework for assessing teacher competence. European Journal for Vocational Training, 40(1), 123-139.Schuwirth, L.W.T., Verheggen, M.M., van der Vleuten, C,P,M,, Boshuizen, H.P.A., Dinant. G.J. (2000). Validation of short case-based testing using a cognitive psychological methodology. Medical Education, 35, 348–56.Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model. In: G.H. Fischer & I.W. Molenaar (Eds.). Rasch models: Foundations, recent developments and applications (pp. 215-239). New York: Springer.Verhelst, N.D., Glas, C.A.W. & Verstralen, H.H.F.M. (1995). OPLM: One Parameter Logistic Model. Computer program and manual. Arnhem: Cito.Wang, M.; Haertel, G.; Walberg, H. (1993). Toward a knowledge base for school learning. Review of educational research, 63, 249-294.Whetzel, D.L, McDaniel, M.A. (2009). Situational judgment tests: An overview of current research. Human Resource Management Review, 19, 188–202. 10
  11. 11. Table 1. Instructor competence profile and used tests for the Dutch prospective driver training exam Elaboration of task domains Assessment method Stage 1 Stage 2: Computer based: On the job: Knowledge base and Performance cognitive skills assessment 1.1 Driving responsibly as a first 1.2 Verbalizing mental Theory of driving test driver processes of driving Performance (60 items): traffic1. Competent in The driving instructor is able to The driving instructor is able to assessment drive as a participation rulesconscious traffic drive a vehicle safely, smoothly, verbalize the mental task first and a second driver knowledge, case-basedparticipation socially considerate, and in an eco- processes that take place when (all task domains cluster and situational judgment friendly way according to Dutch carrying out driving tasks in 1) items (task domain 1.1) driving standards. different traffic situations. Theory of lesson 2.1 Adaptive planning 2.2 Elaborating driving 2.3 Organizing learning preparation test The driving instructor is able to pedagogy The driving instructor is able (60 items): case-based construct an educational program The driving instructor is able to to organize lessons in such a2. Competent in concept application, for the long term (curriculum) and prepare a driving specific way that activities run smoothlesson preparation reasoning and for the short term (lesson design) pedagogical learning and without interruptions, situational judgment adapted to the needs of the environment for learner ensuring a maximum amount items (all task domains individual learner driver (LD). drivers. of productive learning time. cluster 2) 1. Performance 3.1 Providing instruction Theory of instruction assessment lesson with 3.2 Providing coaching The driving instructor is able to and coaching test real learner driver The driving instructor is able to provide instruction that is geared (60 items): case-based 2. Self –reflection3. Competent in monitor learner driver to the actual developmental level concept application, report internshipinstruction and development and guide the LD of the learner driver. It enables the reasoning and 3. Reflective interviewcoaching towards self-regulation in LD to progress towards self- situational judgment internship (all task solving driving tasks and regulated performance in items (all task domains domains cluster 2,3 and driving related tasks. increasingly complex tasks. cluster 3) 4) 4.1 Assessing learner progress The driving instructor is able to 4.2 Reflection and revision4. Competent in assess the progress in driver The driving instructor is able toevaluation, competence by judging the level of reflect on his own actions andreflection and performance himself and by using use the results of this reflectionrevision expertise of professional for adapting his approach. colleagues. 11
  12. 12. Table 1. Number of test versions, total number of items and sample size chosen Test Number of Minimum Maximum Total Sample size test versions number of number of number of selected responses per responses per items version version Theory of 14 99 484 211 3013 driving Theory of 15 38 586 201 2524 Lesson Preparation Theory of 15 32 551 148 2771 Instruction and Coaching Table 2. Cut-off scores and cut-off reliability for the three theoretical tests Cut-off Score Reliability Mean SD Min Max Average SE at the Average MAcc cut-off score proportion of true variance at the cut-off scoreTheory of 86.5 3.22 81.9 92.9 10.01 .55 .70drivingTheory of 85.3 5.47 77.3 93.1 9.56 .59 .75LessonPreparationTheory of 90.2 2.52 86.93 95.5 7.74 .73 .83InstructionandCoaching 12
  13. 13. Table 3. Psychometric report for the Final Performance Assessment Lesson. N Minimum Maximum Mean SD Alpha Coaching: Motivational Support 580 9.00 18.0 14.3 2.2 0.77 (6 items) Coaching: Diagnosis and Task Support 580 9.00 24.0 16.9 2.8 0.79 (8 items) Instructional skill 580 21.00 44.0 34.7 4.3 0.77 (15 items) Exam score (34 items) 580 50.00 99.0 78.2 8.9 0.88Table 4. Correlations between performance on theory tests and Performance Assessment Lesson Theory of Theory of (Sub)scale Final Performance Theory of Lesson Instruction Assessment Lesson driving Preparation coaching Coaching: motivational support .01 .06 .13* (6 items) Coaching: diagnosis and support .07 .12* .09 of task process(8 items) Instructional skill .10 .12* .11* (15 items) Exam score (34 items) .07 .12* .14** 13
  14. 14. Universe of task situations 1. Basis 3. Action 3. Action • Knowledge execution execution • Skills • Attitudes 4. Consequences • Moods • Emotions 2. Perception • Personality traits & decision making and task conditions Figure 1. Model of competent task performanceFigure 2. Cut-off scores and standard errors for 13 versions of the Theory of Driving Test. 14
  15. 15. Figure 3. Cut-off scores and standard errors for 15 versions of the Theory of Lesson Preparation TestFigure 4. Cut-off scores and standard errors for 15 versions of the theory of instruction and coaching test 15