This unit aims to outline thepurposes which assessment servesand to describe current practicesand trends in assessment indifferent EFL contexts. Thus, wewill distinguish betweenevaluation, assessment andtesting.
The concepts of evaluation,assessment and testing seem todiffer for different authors.For the purpose of this unit, wewill use the terms as defined byHarris and Mc. Cann(1994)
Evaluation:This concept involves looking atall the factors that influence thelearning process, ex: syllabusobjectives, course design,materials, methodology, teacherperformance and assessment.
Assessment:It involves measuring theperformance of our students andthe progress that they are making.It helps us to be able to diagnosethe problems they have and toprovide them with usefulfeedback.
This assessment can be of threekinds:1) Informal assessment2) Formal assessment (testing)3) Self assessment
Informal assessment:It is the observation of everyday performance.It is a way of collecting information about our students’ performance in normal classroom conditions .It is done without establishing test conditions such as in the case of formal assessment. We intuitively assess them whenspeaking, writing, reading or listening. We can see which students are doing well and which students are having difficulties. We are also aware of their attitudes and effort.
Formal Assessment:This is synonymous of “testing”. And there aretwo possible interpretations of it.1) It refers to what are often calledexaminations.This examinations are often external (KET,PET, etc). They are administered to manystudents under standardized conditions. Theyassess a broad range of language. They aremarked objectively or under standardizedsubjective marking schemes and are likely to beadministered at the end of a course.
2) Other authors include all types of languagetests under this term. This tests include the kindof tests commonly administered in class by theteacher, in order to assess learning. These testsare not so formal as the examinations ofexternal bodies and their scope of action islimited to the context in hand. These tests areoften administered to one class, for purposesinternal to the class; they focus on a narrowrange of language; they are assessed eitherobjectively or subjectively; they are done toassist teaching and are often backward looking.
Self Assessment:It refers when the students themselves assesstheir own progress.Dickinson(1997) says it is particularlyappropriate:a) as a complement to self instruction.b) to build autonomous and self directedlanguage learners.c) It could give learners an opportunity toreflect on his/her learning in order to improveit.
Both Dickinson and Mac Cann point theproblems associated with the use of selfassessment in classrooms:a) It can not work in situations where markshave great intrinsic value and there iscompetition.b) the time required to train students to use selfassessment can be significant.c) Reliability problems. Can students makeadequate, fair assessment of theirperformance? Will many students be temptedto give themselves unfairly high assessmentsof their performance?
In formal assessment, we also have the termssummative and formative introduced byScriven (1967:43)A) Formative: this refers to forms ofassessment which aim to evaluate theeffectiveness of learning at a time during thecourse (quizzes), in order to make futurelearning more effective.
b) Summative: the administration of this testmay result in some judgement on the learner,such as ‘pass’ or ‘fail’. The amount of contentsassessed are usually several.
Formal Assessment can also refer to test typesaccording to purpose.The main types are listed below:1) aptitude tests2) placement tests3) diagnostic tests4) progress tests5) achievement tests6) proficiency tests
Aptitude Tests:These are designed to predict who will be asuccessful language learner and are based onthe factors which are thought to determine anindividual’s ability to acquire a second orforeign language.They are usually large scale tests taking a longtime to administer and with a number ofcomponents, each testing a different facet oflanguage. They are also forward-looking tests,concerned with future language learning.
Placement tests:These tests are used to make decisionsregarding the students’ placement intoappropriate groups. They tend to be quick toadminister and to mark. They are usuallyadministered at the start of a new phase orlanguage course. As a result, students are oftenput into homogenous groups for languagestudy according to their present languageability.
Diagnostic Tests:These tests are usually syllabus based and theyaim to determine the students’ areas of strengthand weaknesses in relation to the contents to becovered in the course.
Progress Tests:These tests are usually written andadministered by a class teacher, and look backover recent work, perhaps the work of the lastlesson or week. They usually therefore test asmall range of language. (pop quizzes)
Achievement Tests:This tests come at the end of a relatively longperiod of learning, and whose content derivesfrom the syllabus that has been taught over theperiod of time. They are usually large scaletests, covering a wide range of language andskills. These tests can be used for a variety ofpurposes, including promotion to a moreadvanced course, certification, or as an entryqualification to a job.
Proficiency Tests:These tests are based on a theory of languageproficiency and the specific language abilitiesto constitute language proficiency. They areoften related to specific academic orprofessional situations where English is needed.(PET, FCE, CAE, IELTS, TOEFL, etc)
Three phases to categorize formaltests and compare them:1) First generation tests.2) Second generation tests.3) Third generation tests.
First generation tests:These are broadly associated with the grammartranslation approach to language learning. Candidatesare asked to complete various questions such ascompositions, translations, or simple questions andanswer activities devoid of context.Ex: Write about a holiday you enjoyed. (200 words).These tests evaluate grammar, vocabulary,punctuation, spelling and discourse structure. Theylead to subjective scoring so this can lead to problemsof reliability in marking.
The degree of agreement between 2 examinersabout a mark for the same language sample isknown as inter-rater reliability. The degree ofagreement between one single examinermarking the same sample on 2 separateoccasions is known as intra-rater reliability.Both inter- and intra- rater reliability is lowin first generation tests.
Second generation tests:Where first generation testing techniques hadbeen marked subjectively, with the associatedproblems in standardizing marking to ensurefairness, language items could be assessedobjectively through multiple choice testing ofdiscrete language items. The text could bemarked by a non-expert, by different people, orby the same person more than once, and theresult would always be the same.
Questions in second generation testingnormally measure one item of language,known as discrete point. Since each questiontests one tiny aspect of language (ex: verbform, prepositions, etc), tests are often verylong, so these tests are criticized because theydo not sample integrative language as firstgeneration tests.
Third generation tests:The testing of integrative language, with the use ofboth objective and subjective testing formats, hascome together in third generation tests. These arethose tests which have come along the back ofdevelopments in communicative language teachings.Thus, communicative tests aim to emulate real lifelanguage use. Recent models of communicativelanguage ability propose that it consists of bothknowledge of language and the capacity forimplementing that knowledge in communicativelanguage use.
Examples of these tests could be authentic readingwith some transfer of information such as correctingsome notes taken from it, or writing a note withinstructions about some aspect of householdorganization, or listening to an airport announcementto find the arrival time of a plane, or giving someonespoken instructions for how to get to a certain place.Third generation techniques are contextualized bytheir very nature as authentic. Candidates are asked todo tasks which have clear reference in reality. Thesetests assess integrative language, so they have to beassessed subjectively.
West (1990) gives a good summary of these principlesof testing. The principles can be described in pairs:1) Competence v/s Performance2) Usage v/s Use3) Direct v/s Indirect Assessment4) Discreet Point v/s Integrative Assessment5) Objective v/s Subjective Assessment6) Receptive v/s Productive Skills7) Backward and Forward-looking Assessment8) Contextualized v/s Disembodied Language9) Criterion Referenced and Norm-Referenced Assess.10) Reliability v/s Validity
The opposition between members of a pairindicates some sort of tension that exists inlanguage testing in general; generally the morethat one test confirms to one of the pair, theless likely it is to exhibit characteristics of theother part of the pair. Thus, the more reliable atets (multiple choice), the less valid it is likelyto be ( it tests only discrete items). Thisopposition corresponds with the differencesbetween second & third generation testing.
Competence v/s Performance:Chomsky drew this distinction between theideal knowledge all mature speakers hold intheir minds (competence) and the flawedrealization of it that comes out in language use(performance).Third generation testing is often called“performance testing”
Usage v/s Use:Widdowson distinguished between language use andlanguage usage.For example, learners whose instruction has consistedof grammatical rules, will be required to producesentences to illustrate the rules. These sentences arefor Widdowson, examples of usage. Examples ofusage can show the learner’s current state ofcompetence, but will not necessarily indicate anythingabout the learner’s possible performance. He arguesthat performance teaching and testing requireexamples of language use, not usage.
Direct v/s Indirect Assessment:Testing that assesses competence withouteliciting peformance is known as indirecttesting. Multiple choice testing fits thisdecription, since language is assessed withoutany production of language use form thelearner. Conversely, direct tests use examplesof performance as an indicator ofcommunicative competence. These tests usetesting tasks of the same type as languagetasks in the real world.
Discrete Point v/s Integrative Assessment:Indirect assessment is usually carried out through abattery of many items, each one of which only testsone small part of the language. Each item is known asa discrete-point item. The theory is that if there areenough of them, they give a good indication of thelearner’s underlying competence. Thus, testers requireitems which test the ability to combine knowledge ofdifferent parts of the language, these items are knownas integrative or global. Ex: answering a letter, fillingin a form, etc.
Objective v/s Subjective Assessment:Objective assessment refers to test items that can bemarked clearly as right or wrong, as in a multiplechoice item. Subjective assessment requires that anassessor makes a judgement according to somecriteria and experience. Most integrative test elementsrequire subjective assessment. The difficulty insubjective assessment arises in trying to achieve someagreement over marks, both between differentmarkers and with the same marker at different times.
Receptive v/s Productive Skills:The receptive skills (reading and listening)tend themselves to objective marking. Theproductive skills (speaking and writing) aregenerally resistant to objective marking. Sothird generation testers are placing greatemphasis on achieving a high degree ofstandardisation between assessors throughtraining in the application of band descriptionsor rubrics.
Backward and Forward-looking Assessment:Competence based tests look backwards at ausage-based syllabus to see what degree hasbeen assimilated by the learner. Thirdgeneration tests are better linked to the futureuse of language (looking forward), and theirassessments of real language use also showmastery of a performance based syllabus.
Contextualised v/s Disembodied Language:Disembodied language has little or no context. This ismore evident in items of multiple choice, based onlanguage usage. The items bear little relevance toeach other and act as examples of disembodiedlanguage with no purpose other as part of a test.Integrative items need a full context in order tofunction. The closer the items in an integrative testare to simulating real world language tasks, the fullerthe context must be.
Criterion referenced and Norm ReferencedAssessment:Norm-referenced tests compare students with anaverage mark or a passing score, in order to makesome type of pass/fail judgement of them. Theproblem with this type of testing is that it is not clearwhat the norm refers to. To know that a learner is a 4,0in English and that is a pass, tells us nothing of whathe/she can actually do with the language. The fact thata 3,9 student is part of the “fail” ones and that heknows the same or probably more than the 4,0 one isnot taken into account.
Criterion-referenced assessment comparesstudents not against each other, but withsuccess in performing a task. The results of acriterion-referenced test can be expressed bycontinuing the sentence “he/she is able to…..”where the ability may refer to some small orlarger integrative language task. Often thesetests lead to a profile of language ability,where the learner is seen as capable ofcompleting certain tasks to the givenstandards, but not others.
Reliability v/s Validity:Reliability refers to the consistency of thescoring of the test, both between differentraters, and bewteen the same rater on differentoccasions.Objective testing should give perfectrealiability. However, faulty tests (ambiguousmultiple choice, wrong answers on an answersheet, etc) can reduce the realiability of even anobjective test.
The subjective testing inevitably associated withtesting the productive skills reduces reliability butthey are more valid because they test integrativeknowledge of the language so they give the teacherthe opportunity to see how students really use thelanguage. Teacher can have a better view of theirstudents’ competence of language.So the higher relaibilty, the less valid or all theopposite.
Desirable characteristics for tests:Apart from validity and reliability, we havethree extra characteristics to pay attention to:1) UTILITY2) DISCRIMINATION3) PRACTICALITY
Utility:a test which provides a lot of feedbackto assist in the planning of the rest of a courseor future courses.Discrimination: the ability of a test todiscriminate between stronger and weakerstudents.Practicality: the efficiency of the test inphysical terms. (Does it require a lot ofequipment? Does it take a lot of time to set,administer or mark?)
Students’ Assignment:Make an assessment of the test given by yourteacher (Appendix 3.1) by answering thefollowing questions.1) Does it test performance or competence?2) Does it ask for language use or usage?3) Is it direct or indirect testing?4) Is it discrete point or integrative testing?5) Is it objectively or subjectively marked?6) What skills does it test?7) Is it backward or forward looking?
8) Is language contextualised or disembodied?9) Is it criterion-referenced or norm-referenced?10) Would it have low/high reliability?11) Comment on its validity.12) Comment on its utility, discrimination andpracticality.