Language Testing and AssessmentSummary of Units B4 and C4
Unit B4:Optimal Specification Design
The central question in this unit is:“What is the optimal design for a testspecification and what elementsshould it include?”
The Difference between Prompt Attributes ‘PA’ &Response Attributes ‘RA’
In Popham’s model, the Prompt Attributedescribes the input to the examinee whilethe Response Attribute describes what theexaminee does as a result.Bachman and Palmer (1996) phrased thissame distinction in different terms using‘characteristics of the input’ versus‘characteristics of the response.’
Clearly, the author disagrees with themajority. It’s also clear that he’s open tochange of perspective. How do we knowthis?a) The author states precisely a willingness tochange.(b) In line 10 ‘there may be some value to theopposing view if...’ suggests a willingness tochange.(c) The title suggests a willingness to change.(d) The comment near the end indicateswillingness to change: ‘I may be persuaded if...’
1- The RA for this item might simplyread: “The student will select thecorrect answer from among the choicesgiven.”If that were the case, then the specwriter has decided on a ratherminimalist approach to the PA/RAdistinction, describing the actual actionperformed by the examinee.
2- An alternative RA might read like this:a- The student will study all four choices.b- If a particular choice references aparticular line in the passage, the student willstudy that line carefully.c- He or she will reread the passage toeliminate three choices.d- Then the student will select the correctanswer from among the choices.
Either of these RAscould work in conjunction with a PA similar to the following:
“1-The item stem poses a question aboutthe author’s viewpoints, which will requireinference from the text.
2- Choices ‘a’, ‘b’, and ‘d’ are distracters thatattribute to the passage a comment that theauthor didn’t make, or which is taken out ofcontext and misinterpreted.Choice ‘a’ refers to a comment the authormade, without actual reference in the textwhile choices ‘b’ and ‘d’ refer to some part ofthe text, (e.g., a line number a paragraph, aheader, a title).
3- Choice ‘c’ will be the key or correctresponse; it may use any of the locatorfeatures given above (line number,paragraph, header, title, etc.), or it cansimply refer to the whole passage.”
This PA/RA formula is a classical model ofspec for multiple-choice items.In this formula, all guidelines about the itemare in the PA: the entire description of itsstem, its choices, why the incorrect choicesare incorrect, and why the key is correct isconsidered to be part of the prompt and notthe response.
The choices themselves seem to be part of theexaminee’s thinking.In our multiple-choice item, the examinee willprobably double-check whether the author didindeed say what is claimed in line 10 or nearthe end and if so, whether it is being interpretedcorrectly. In effect, the item itself is a kind ofoutline of the examinee’s answering strategy; alayout of the response.
Guidance about both the prompt and theresponse are important in a test specification.It is possible to fuse the PA and RA and simplygive clear specification guidance on both;actually, we could create a new spec element(the ‘PARA’) in which we can put all thisguidance.
The basic element of spec design isproducing samples and the guidinglanguage that goes with them.Guiding language and samples, constitutea minimalist definition of a specification, inan attempt to disentangle prompt fromresponse.
‘Event’ vs. ‘Procedure’ & Specplates as a universal design
Event versus Procedure
A testing event is a single task or testitem such as a multiple-choice.A procedure is a set of events or taskssuch as an oral interview or a portfolioassessment for teacher observation.
Test developers organize items into a test usinga ‘table of specs’ that presents information, at avery global level:- How many of each item type and skill areneeded?- What special materials are required to developthe test?
SpecplatesA ‘specplate’ is a combination of thewords ‘specification’ and ‘template,’ amodel for a specification, and agenerative blueprint which itselfproduces blueprints.
Over time, certain specs fuse into a higher-order specification. A specplate is a guidetool to ensure that the new specificationsmeet a common standard established by theexisting specs. One type of information thatmight appear in a specplate is guidance ontask type.
PA (excerpt)For a M.C. task on verb tense and voiceagreement: Each incorrect choice(distracter) in the item must be incorrectaccording to the focus of the item. Onedistracter should be incorrect in tense,another incorrect in voice, and the thirdincorrect in both tense and voice.
PA Specplate (excerpt)“When specifying the distracters, the PAshould contain the following language ‘Eachincorrect distracter in the item must beincorrect according to the focus of the item.’Immediately following that sentence, the PAshould clearly specify how each of the threedistracters is incorrect.”
You are encouraged to employ (if feasible) thedual-feature model of multiple-choice itemcreation, namely:Key : both of two features of the item arecorrect (tense/voice)Distracter 1 : one of two key features of theitem is incorrect (tense/voice)Distracter 2 : the other of two key features ofthe item is incorrect (tense/voice)Distracter 3 : both of two key features of theitem are incorrect (tense/voice).”
The ‘magic formula’ model of M.C. itemcreation is: crafting an item for which, in orderto get the item right, examinees must do twothings correctly.Once the specplate has been written, it canserve as the starting point for new specs thatrequire those features. Rather than startingfrom scratch each time, the specplategenerates the specification shell and importantdetails follow somewhat automatically.
OwnershipSpecs ownership is part of human naturebecause of a sense of investment in thetest-crafting process.However, a well-crafted test is never ownedby a single individual. Thus, a simplehistorical record of contributions is the bestway to attribute a spec to its variousauthors.
Disagreement is sometimes inevitable inspecs design; yet, a compromise betweenopposing positions is possible.There is consensus that the faculty willobserve the test in action and decide after awhile whether more changes are needed.
Summary of Unit B4The central focus of this unit was the natureof test specs and their elements.We have raised and tried to answer thequestion: what are the essential minimumcomponents to specs beyond the bareminimum of guiding language and samples?
Unit C4:Evolution in Action
In the conclusion to Unit A4 we listed thefollowing elements of a specification-driventesting theory:■ Specs exist.■ Specs evolve.■ The specs are not launched until ready.■ Discussion lead to transparency.■ All are welcome to discussion.
We saw in Unit B4, that all specs sharetwo common features: 1- spec-generated sample items, 2-relative guiding language.
In this Unit, we will focus on somedesign considerations that arise as aspec evolves.
[V. 1: Guiding language on the scoringscale]The objective of this spec is for students toproduce a role-play task on the pragmaticsof making a complaint in a simple everydaysituation.In a role-play with the teacher, students areasked to plan and render a complaint aboutsomething that has gone wrong.
Scoring of the interaction will be as follows:1- not competent – the student displayed littlecommand of the situation pragmatics.2- minimally competent – the student usedlanguage of complaint, but the interaction washesitant and/or impolite.3- competent – the student’s interactions weresmooth and generally fluent, and there was nouse of impolite language.4- superb – the student’s interactions weresmooth and very fluent, and in addition, thestudent displayed subtle command of nuance.
[Version 1, sample one]You’ve recently purchased a radio; back home, youdiscover that a part is missing from the box.[Version 1, sample two]After getting back home from shopping, you discoverthat a jar of peanut butter is open and its seal ispunctured, so you’re worried that it may be unsafe toeat.In both cases, you want to return to the store to theresolve the situation with the manager.(a) write out a plan of what you will say, then,(b) role-play the conversation with your teacher.
[V. 2: Guiding language on the scoring scale]1- not competent – the student displayed little command ofthe pragmatics of the situation. If the student wrote a plan, itwas inadequate or not implemented.2- minimally competent – the student used language ofcomplaint, but the interaction was hesitant and/or impolite.The student’s plan may have been adequate, but the studentwas unable to implement it.3- competent – the student’s interactions were smooth andgenerally fluent, and there was no evidence of impolitelanguage use. The student wrote a viable plan and generallyfollowed it during the interaction.4- superb – the student’s interactions were smooth and veryfluent, and in addition, the student displayed subtle commandof nuance. The student wrote a viable plan and generallyfollowed it during the interaction.
After some time, the descriptor for Level 4 isimproved again, Levels 1-3 being unchanged:[V. 3: Guiding language on the scoringscale]4- superb – the student’s interactions weresmooth and very fluent, and the student displayedsubtle command of nuance. He/she wrote a viableplan and generally followed it during theinteraction. Alternatively, the student wrote little (orno) plan, but seemed to be able to execute theinteraction in a commanding and nuanced manner .
There are some interesting questions that arise: - What is the role of the written plan?- Why have the instructors adapted the scoring scale to reflect alternative use of the plan? - Do you suspect that any changes might be coming for level 3 on the scale? - Do you suspect that the plan may prove to be an optional testing task, in general? - Do you think that the plan may prove unworkable?
Planning causes debate which in turncauses changeA newcomer arrives at the faculty at a pointin time between Version 2 and Version 3, anenergetic instructor who plays the role of aproductive debater in meetings.This new instructor asks, “Do we plan whenwe do complaints in real life, and if yes, dowe write it?”
The newcomer causes the teachers to watchcarefully the use of this task in the next testadministration, and sure enough, there are high-level students for whom the plan is irrelevant anda waste of time.New questions arise here:- What obligations do teachers have to challengeeach other and help make tests better?- What ownership should be given to this newteacher or to any new teacher?
However… change stagnatesGradually, teachers stop teaching the writtenplan in their lessons, and most students do notproduce one during the test.The instructors simply stop looking at the spec,they stop using a written plan, and the taskevolves beyond reference to the spec.
Then, one teacher remembers to teach written plansand the students feel they did better on the testthanks to plan writing.- Should students be welcome to discussions of testevolution and change?- Should teachers re-visit and re-affirm the wording ofthe spec, which does permit a plan?- Or should they follow their own instinct and ignorethis student feedback, encouraging role-plays withoutwritten plans?- Should teachers continue to heed the advice of their‘energetic colleague’ and teach their students to dosuch tasks without written plans, because that is moreauthentic?
ApplicationConduct a reverse engineering day-longworkshop with your colleagues on a testtask.
1- Introduction and Welcome:Orient your colleagues to selected tasks. Thegoal of this part is not to revise the tasks but tomake sure they know what the tasks are.Orient the participants to the basic design ofspecs: samples and guiding language. Don’t showactual specs because people will think that thespec samples you show are how all specs shouldbe written. In addition to the critical analysis that isthe target of the day, you want an organic, bottom-up growth of specs.
2- Group Phase 1:Divide the participants into groups or pairs,each being assigned the same set of tasks.Ask each group to do straight reverseengineering and write out what they think isthe guiding language for the tasks withoutrecommending any changes.This should be followed by a report back.
3- Vent Your Spleen:In the whole group, allow people to ventabout test tasks they have never liked –tasks they did not analyze in Phase 1.Based on the judgmental spleneticdiscussion that will certainly result, select anew set of tasks, and proceed to the nextstep.
4- Group Phase 2:Divide the participants into groups, each havingto do critical reverse engineering of some tasksabout which they feel particularly splenetic. Thegoal is a set of specs that improve testing atyour situation. A report back should follow.5- ‘What’s Next?’The group discusses which specs stand areasonable chance of implementation. Noteverything that arises will be feasible. Somethings will be difficult to implement. But someshould survive.
SummaryThis Unit was a practical application on Units A4and B4, a way to drill all the theoretical notionsand concepts that we have studied in both units.The Unit proposes more exercise related tovalidity as in Unit C1.