Principles of language_testing_rita_green


Published on


Published in: Education, Business, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Principles of language_testing_rita_green

  1. 1. Dr. R. Green, Aug 2006 1Principles of languagetestingEUROPOS SĄJUNGA
  2. 2. Dr. R. Green, Aug 2006 2Overview What are the principles of language testing? How can we define them? What factors can influence them? How can we measure them? How do they interrelate?EUROPOS SĄJUNGA
  3. 3. Dr. R. Green, Aug 2006 3ReliabilityRelated to accuracy, dependability and consistencye.g. 20°C here today, 20°C in North Italy – are theythe same?According to Henning [1987], reliability is a measure of accuracy, consistency, dependability,or fairness of scores resulting from theadministration of a particular examination e.g. 75%on a test today, 83% tomorrow – problem withreliability.EUROPOS SĄJUNGA
  4. 4. Dr. R. Green, Aug 2006 4Validity: internal & externalConstruct validity [internal] the extent to which evidence can be found tosupport the underlying theoretical constructon which the test is basedContent validity [internal] the extent to which the content of a test canbe said to be sufficiently representative andcomprehensive of the purpose for which ithas been designedEUROPOS SĄJUNGA
  5. 5. Dr. R. Green, Aug 2006 5Validity [2]Response validity [internal] the extent to which test takers respond in theway expected by the test developersConcurrent validity [external] the extent to which test takers scores on onetest relate to those on another externallyrecognised test or measureEUROPOS SĄJUNGA
  6. 6. Dr. R. Green, Aug 2006 6Validity [3]Predictive validity [external] the extent to which scores on test Y predict testtakers ability to do X e.g. IELTS + success inacademic studies at universityFace validity [internal/external] the extent to which the test is perceived to reflect thestated purpose e.g. writing in a listening test – is thisappropriate? depends on the target languagesituation i.e. academic environmentEUROPOS SĄJUNGA
  7. 7. Dr. R. Green, Aug 2006 7Validity [4] Validity is not a characteristic of a test, but afeature of the inferences made on the basisof test scores and the uses to which a test isput.Alderson [2002: 5]EUROPOS SĄJUNGA
  8. 8. Dr. R. Green, Aug 2006 8PracticalityThe ease with which the test: items can be replicated in terms of resourcesneeded e.g. time, materials, people can be administered can be graded results can be interpretedEUROPOS SĄJUNGA
  9. 9. Dr. R. Green, Aug 2006 9Factors which caninfluence reliability,validity and practicality…EUROPOS SĄJUNGA
  10. 10. Dr. R. Green, Aug 2006 10Test [1] quality of items number of items difficulty level of items level of item discrimination type of test methods number of test methodsEUROPOS SĄJUNGA
  11. 11. Dr. R. Green, Aug 2006 11Test [2] time allowed clarity of instructions use of the test selection of content sampling of content invalid constructsEUROPOS SĄJUNGA
  12. 12. Dr. R. Green, Aug 2006 12Test taker familiarity with test method attitude towards the test i.e. interest,motivation, emotional/mental state degree of guessing employed level of abilityEUROPOS SĄJUNGA
  13. 13. Dr. R. Green, Aug 2006 13Test administration consistency of administration procedure degree of interaction between invigilators andtest takers time of day the test is administered clarity of instructions test environment – light / heat / noise /space / layout of room quality of equipment used e.g. for listeningtestsEUROPOS SĄJUNGA
  14. 14. Dr. R. Green, Aug 2006 14Scoring accuracy of the key e.g. does it includeall possible alternatives? inter-rater reliability e.g. in writing,speaking intra-rater reliability e.g. in writing,speaking machine vs. humanEUROPOS SĄJUNGA
  15. 15. Dr. R. Green, Aug 2006 15How can we measure reliability?Test-retest same test administered to the same testtakers following an interval of no more than 2weeksInter-rater reliability two or more independent estimates on a teste.g. written scripts marked by two ratersindependently and results comparedEUROPOS SĄJUNGA
  16. 16. Dr. R. Green, Aug 2006 16Measuring reliability [2]Internal consistency reliability estimatese.g. Split half reliability Cronbach’s alpha / Kuder Richardson 20[KR20]EUROPOS SĄJUNGA
  17. 17. Dr. R. Green, Aug 2006 17Split half reliability test to be administered to a group of test takers isdivided into halves, scores on each half correlatedwith the other half the resulting coefficient is then adjusted bySpearman-Brown Prophecy Formula to allow for thefact that the total score is based on an instrumentthat is twice as long as its halvesEUROPOS SĄJUNGA
  18. 18. Dr. R. Green, Aug 2006 18Cronbachs Alpha [KR 20] this approach looks at how test takersperform on each individual item and thencompares that performance against theirperformance on the test as a whole measured on a -1 to +1 scale likediscriminationEUROPOS SĄJUNGA
  19. 19. Dr. R. Green, Aug 2006 19Reliability is influenced by ….. the longer the test, the more reliable it is likely to be[though there is a point of no extra return] items which discriminate will add to reliability,therefore, if the items are too easy / too difficult,reliability is likely to be lower if there is a wide range of abilities amongst the testtakers, test is likely to have higher reliability the more homogeneous the items are, the higherthe reliability is likely to beEUROPOS SĄJUNGA
  20. 20. Dr. R. Green, Aug 2006 20How can we measure validity?According to Henning [1987] non-empirically, involving inspection, intuitionand common sense empirically, involving the collection andanalysis of qualitative and quantitative dataEUROPOS SĄJUNGA
  21. 21. Dr. R. Green, Aug 2006 21Construct validity evidence is usually obtained through such statisticalanalyses as factor analysis [looks for items whichgroup together], discrimination; also throughretrospection proceduresContent validity this type of validity cannot be measured statistically;need to involve experts in an analysis of the test;detailed specifications should be drawn up to ensurethe content is both representative andcomprehensiveEUROPOS SĄJUNGA
  22. 22. Dr. R. Green, Aug 2006 22Response validity can be ascertained by means of interviewing testtakers [Henning]; asking them to take part inintrospection / retrospection procedures [Alderson]Concurrent validity determined by correlating the results on the test withanother externally recognised measure. Care needsto be taken that the two measures are measuringsimilar skills and using similar test methodsEUROPOS SĄJUNGA
  23. 23. Dr. R. Green, Aug 2006 23Predictive validity can be determined by investigating therelationship between a test takers score e.g.on IELTS/TOEFL and his/her success in theacademic program chosen problem - other factors may influencesuccess e.g. life abroad, ability in chosenfield, peers, tutors, personal issues, etc.;also time factor element
  24. 24. Dr. R. Green, Aug 2006 24Reliability vs. validity? an observation can be reliable without being valid,but cannot be valid without first being reliable. Inother words, reliability is a necessary, but notsufficient, condition for validity.[Hubley & Zumbo 1996] ‘Of all the concepts in testing and measurement, itmay be argued, validity is the most basic and far-reaching, for without validity, a test, measure orobservation and any inferences made from it aremeaningless’[Hubley & Zumbo 1996, 207]EUROPOS SĄJUNGA
  25. 25. Dr. R. Green, Aug 2006 25Reliability vs. validity [2] even an ideal test which is perfectly reliableand possessing perfect criterion-relatedvalidity will be invalid for some purposes[Henning 1987]EUROPOS SĄJUNGA
  26. 26. Dr. R. Green, Aug 2006 26PracticalityDesigning and developing good test itemsrequires working with other colleagues materials i.e. paper, computer, printer etc. timeSome items look very attractive but thisattraction has to be weighed against thesefactors.EUROPOS SĄJUNGA
  27. 27. Dr. R. Green, Aug 2006 27References Alderson, J. C 2002 Conceptions of validity and validation.Paper presented at a conference in Bucharest, June 2002. Angoff, 1988 Validity: An evolving concept. In H. Wainer & H.Braun [Eds.] Test validity [pp. 19-32], Hillsdale, NJ: Erlbaum. Bachman, L. F. 1990 Fundamental considerations in languagetesting. Oxford: O.U.P. Cumming A. & Berwick R. [Eds.] Validation in Language TestingMultilingual Matters 1996 Hatch, E. & Lazaraton, A. 1991 The Research Manual - Design& Statistics for Applied Linguistics Newbury HouseEUROPOS SĄJUNGA
  28. 28. Dr. R. Green, Aug 2006 28References [2] Henning, G. 1987 A guide to language testing: Development,evaluation and research Cambridge, Mass: Newbury House Hubley, A. M. & Zumbo, B. D. A dialectic on validity: where wehave been and where we are going. The Journal of GeneralPsychology 1996. 123[3] 207-215 Messick, S. 1988 The once and future issues of validity:Assessing the meaning and consequences of measurement. InH. Wainer & H. Braun [Eds.] Test validity [pp. 33-45], Hillsdale,NJ: Erlbaum. Messick, S. 1989 Validity. In R. L. Linn [Ed.] Educationalmeasurement. [3rd ed., pp 13-103]. New York: Macmillan.EUROPOS SĄJUNGA
  29. 29. Dr. R. Green, Aug 2006 29Item-total StatisticsCorrected Item-Total Alpha if ItemCorrelation DeletedR01 .5259 .7964R02 .6804 .7594R03 .6683 .7623R04 .5516 .7940R05 .7173 .7489R16 .3946 .8288N of Cases = 194.0 N of Items = 6 Alpha = .8121EUROPOS SĄJUNGA
  30. 30. Dr. R. Green, Aug 2006 30Item-total StatisticsCorrected Item Total Alpha if ItemCorrelation DeletedR16 .5773 .7909R17 .5995 .7863R18 .7351 .7553R19 .7920 .7419R20 .6490 .7753R01 .1939 .8663N of Cases = 194.0 N of Items = 6 Alpha = .8185EUROPOS SĄJUNGA
  31. 31. Dr. R. Green, Aug 2006 31Component Matrixa.502 .559.690 .423.683 .461.571 .404.750 .343.670 -.223.631 -.508.770 -.368.789 -.383.646 -.494R01R02R03R04R05R16R17R18R19R201 2ComponentExtraction Method: Principal Component Analysis.2 components extracted.a.EUROPOS SĄJUNGA
  32. 32. Dr. R. Green, Aug 2006 32Thank you for your attention!EUROPOS SĄJUNGA