Testing is a matter of using data to establish evidence of learning. But evidence does not occur concretely in the natural state, but is an abstract inference. It is a matter of judgment.

  3. 3. THE PURPOSE OF VALIDATION  The purpose of validadtion in Language Testing is to ensure the defensibility and fairness of interpretation based on test performance. 3
  4. 4. THE PURPOSE OF VALIDATION  The scrutiny of such procedure will involve both reasoning and examination of the facts. The reasoning may involve legal argumentation, and appeals to the common sense, insight, and human understanding of the jury members, as well as careful examination of the evidence. 5
  5. 5. THE PURPOSE OF VALIDATION Test validation similarly involves thinking about the logic of the test, particularly its design and its intentions, and also involves looking at empirical evidence –the hard facts- emerging from data from test trial or operational administrations. 5
  6. 6. QUALITIES 0F A GOOD TEST A good test has the following qualities:  It is valid  It is reliable  It is practical  It has negative effects on the teaching program. 6
  7. 7. PRACTICALITY  A good test is practical.  A good test is practical when it is within the means of financial limitations, time constraints, easy of administration, and scoring and interpretation. 7
  8. 8. PRACTICALITY  A test that is prohibitively expensive is not practical.  A test of language proficiency that takes a student ten hours to complete is impractical.  A test that takes a few minutes for a student to take is impractical. 8
  9. 9. PRACTICALITY  A test that takes several hours for an examiner to evaluate is impractical.  A test that requires individual one- to- one proctoring is impractical. 9
  10. 10. PRACTICALITY The extent to which a test is practical sometimes hinges on whether a test is designed to be norm-referenced or criterion-referenced. In norm – referenced tests, each test-taker’s score is interpreted in relation to a mean, median, standard deviation, and / or percentile rank. The purpose in such tests is to place test-takers along a 10 mathematical continuum in rank order.
  11. 11. PRACTICALITY Typical or non-referenced tests are standardized tests intended to be administered to large audiences, with results quickly disseminated to test-takers. Such tests must have fixed, predetermined responses in a form that can be electronically scanned. Practicality is a primary issue. The most important quality of any test is 11 how practical it is to administer.
  12. 12. PRACTICALITY It is the ability of a person or system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances. 12
  13. 13. VALIDITY The most complex criterion of a good test is validity, the degree to which the test actually measures what it is intended to measure. 13
  14. 14. FACE VALIDITY  Face validity: face validity is when a test appears valid to examinees who take it, personnel who administer it and other untrained observers. 14
  15. 15. RELIABILITY A testing reliability is a set of two probabilities, the definition of which varies by field. In medicine, the sensitivity and specificity are conventionally used. In the field of , the probabilities of detection and false call are conventionally used. 15
  16. 16. RELIABILITY If you give the same test to the same subject or matched subjects on two dfifferent occasions, the test itself should yield similar reults; it should have test reliability 16
  17. 17. RELIABILITY Means: - dependability - trustworthiness - precision 17
  18. 18. THREATS TO TEST VALIDITY Why is face validity not enough? What can threaten the validity  The meaningfulness  Interpretability  Fairness of assessment ( scores, ratings) 18
  19. 19. THREATS TO TEST VALIDITY Possible problem areas: - Test content - Test method and - Test construct 19
  20. 20. CONTENT VALIDITY  A test has content validity if it measures knowledge of the content domain of which it was designed to measure knowledge. Another way of saying this is that content validity concerns, primarily, the adequacy with which the test items adequately and representatively sample the content area to be measured. 20
  21. 21. CONTENT VALIDITY For example: a comprehensive math achievement test would lack content validity if good scores depended primarily on knowledge of English, or if it only had questions about one aspect of math (e.g., algebra). Content validity is primarily an issue for educational tests, certain industrial tests, and other tests of content knowledge like the Psychology Licensing Exam. 21
  22. 22. TEST METHOD A test method is a definitive procedure that produces a test result. (ASTM definition) The test result can be qualititive (yes/no), categorical, or quantititive (a measured value). It can be a personal observation or the output of a precision measuring instrument. 22
  23. 23. TEST CONSTRUCT Test Construct refers to those aspects of knowledge or skill possessed by the candidate which are being measured. Test Construct involves being clear about what knowledge of language consists of, and how that knoweledge is deployed in actual performance. 23
  24. 24. THREATS TO TEST VALIDITY Possible problem areas: Test content: What the test contains. Test method: The way in which the candidate is asked to engage with the materials and tasks in the test, and how these responses will be scored. Test construct: The underlying ability being captured by the test. 24
  25. 25. ESSAY TESTS To write compositions or essay tests seems very easy. Much easier, for example, than writing multiple-choice questions. All one seems to have to do is write a topic and leave the student to compose an answer. The following prompt is very common: “ HEALTHY FOOD ” Discuss. 25
  26. 26. ESSAY TESTS Format:  Introduction. Introduce your topic  Background. Give historical or philosophical background data to orient the reader to the topic.  Thesis and arguments. State the main points including causes and effects, methods used, dates, places, results.  Conclusion. Include the significance of each event and finish up with a summary. 26
  27. 27. INTRODUCTION The business practices of the Intel Corporation, a technology company best known for the production of microprocessors for computers, illustrate the importance of brand marketing. Intel was able to achieve a more than 1,500 percent increase in sales, moving from $ 1.2 billion in sales to more than $ 33 billion, in a little more than 10 years. Although the explosion of the home-computer market certainly accounted for some of this dramatic increase, the brilliance of its branding strategy also played a significant role. 27
  28. 28. BACKGROUND Intel became a major producer of microprocessor chips in 1978, when its 8086 chip was selected by IBM for use in its line of home computers. The 8086 chip and its successors soon became the industry standard, even as Intel’s competitors sought to break into this potentially lucrative market. Intel’s main problem in facing its competitors was its lack of trademark protection for its series of microchips. Competitors were able to exploit this lack by introducing clone products with similar sounding names, severely inhibiting Intel’s ability to create a brand identity. 28
  29. 29. THESIS AND ARGUMENTS In an effort to save its market share, Intel embarked on an ambitious branding program in 1991. The corporation’s decision to invest more than $ 100 million in this program was greeted with skepticism and controversy. Many within the company argued that the money could be better spent researching and developing new products, while others argue that a company that operated within such a narrow consumer niche had little need for such an aggressive branding campaign. Despite these misgivings, Intel went ahead with its strategy, which in a short time became a resounding success. 29
  30. 30. CONCLUSION Ironically, the success of the Intel’s branding strategy led to a marketing dilemma for the company. In 1992, Intel was prepared to unveil its new line of microprocessors. However, the company faced a difficult decision: release the new product under the current brand logo and risk consumer apathy or give the product a new name and brand and risk undoing all the work put into the branding strategy. In the end, Intel decided to move forward with a new brand identity. It was a testament to the strength of Intel’s earlier branding efforts that the new product line was seamlessly integrated into the public consciousness. 30
  31. 31. TOPICS  Some people like doing work by hand. Others prefer using machines. Which do you prefer? Use specific reasons and examples to support your answer.  Some people think that children should begin their formal education at a very early age and should spend most of their time on school studies. Others believe that young children should spend most of their time playing. Compare these two views. Which view do you agree with? Why?
  34. 34. ORAL INTERVIEWS J. B Heaton explains that in real life the two skills of listening and speaking are fully integrated in most everyday situations involving communication. Consequently, an excellent way of testing speaking is the oral interview since listening and speaking can be assessed in a natural situation. 34
  35. 35. SUMMARIES Summaries are used most often to test reading or listening comprehension and writing skills. Writing summaries may closely replicate many real-life activities. 35
  36. 36. INFORMATION GAP ACTIVITIES Work out what the differences are 36
  37. 37. TESTING READING SKILLS VOCABULARY TESTS often provide a good guide to reading ability. It is usually necessary for students to demonstrate not only a knowledge of the meaning of a particular word but al so an awareness of the other words with which it is generally used. However, in addition to their usefulness in proficiency tests, vocabulary tests are also useful in progress tests as they lend themselves to follow ­up work in class. 37
  38. 38. TRUE / FALSE ITEMS 1. ___ Children learn to recognize and produce the sounds of the language by listening to its spoken form. 2. ___ One remarkable thing about first language acquisition is the low degree of similarity which we see in the early language of children all over the world. 3. ___ Many sentences such as “ Mummy juice” and “baby fall down” are known as telegraphic speech. 38
  39. 39. MULTIPLE-CHOICE ITEMS Writing multiple-choice items is not too difficult after you have had a little practice. For most purposes three options are enough. Remember that the distracters should appear correct to any students who are not sure of the answer. Avoid writing absurd distracters which everyone can easily see are wrong. On the other hand, however, all the distracters should be written within the student’s range of proficiency and at the same level as the correct 39
  40. 40. MULTIPLE-CHOICE ITEM Example: According to the author, one cause of mountain formation is the a. effect of the climate change on sea level b. slowing down of volcanic activity c. force of Earth`s crustal plates hitting each other d. replacement of sedimentary rock with volcanic rock Correct answer: c 40
  41. 41. MATCHING ITEMS  Matching items are also very useful for testing vocabulary in context. It is necessary to instruct the students to write the correct word from the story at the side of each word listed below it. 41
  42. 42. MATCHING ITEM Example: Column A Column B 3. shy a. cheerful 4. happy b. thin 5. sad c. become scared 6. slim d. sorrowful 42
  43. 43. TESTING WRITING SKILLS  Jeremy Harmer explains that like many other aspects of English language teaching, the type of writing we get students to do will depend on their age, interests and level. 43
  44. 44. TESTING WRITING SKILLS  GRAMMAR AND STRUCTURE - Multiple-choice - Error recognition - Re-arrangement - Changing words - Blank -filling 44
  45. 45. TESTING WRITING SKILLS Controlled Writing  Transformation  Broken Sentences  Notes and Diaries  Free writing 45
  46. 46. TESTING WRITING SKILLS  GRAMMAR AND STRUCTURE - Multiple-choice items Multiple-choice items test an ability to recognize sentences which are grammatically correct. 46
  47. 47. TESTING WRITING SKILLS  ERROR RECOGNITION Students must choose the underlined word or phrase which is incorrect. 47
  48. 48. TESTING WRITING SKILLS  RE-ARRANGEMENT Students are required to unscramble sentences. They must write out each sentence, putting the words and phrases in their correct order. This type of item is useful for testing awareness of the order of adjectives, the position of adverbs, inversion and other areas of grammar.
  49. 49. TESTING WRITING SKILLS  CHANGING WORDS A completely different type of questions requires students to put verbs into their correct tense or voice. This question is quite easy and straightforward to construct. However, it is important to provide an interesting context. 49
  50. 50. TESTING WRITING SKILLS  Blank–filling Blank-filling items should consist of paragraphs providing an interesting and relevant context. It is important to choose the words to omit very carefully so that they are all grammatical words ( e.g. to, in, is, the). 50
  51. 51. CENTRAL TENDENCY The Central Tendency of a distribution is an estimate of the “center” of a distribution of values. 51
  52. 52. CENTRAL TENDENCY  There are three major types of estimates of Central Tendency:  - Mean  - Median  - Mode
  53. 53. CENTRAL TENDENCY The Mean or average is probably the most commonly used method of describing central tendency.
  54. 54. CENTRAL TENDENCY  The Mean  To compute the mean, add up all the values and divide by the number of values.
  55. 55. CENTRAL TENDENCY  The Mean  For example:  20, 20, 20, 18, 17, 14, 14= 135  The sum of these 8 values is 135/8= 16.87
  56. 56. CENTRAL TENDENCY  The Median  Is the score found at the exact middle of the set of values. One way to compute the median is to list all scores in numerical order, and then locate the score in the center of the sample.
  57. 57. CENTRAL TENDENCY  The Median  For example:  15, 15, 15, 15, 15, 17, 18, 20  There are 8 scores and score # 4 and # 5 represent the halfway point. Since both these scores are 15, the median is 15.
  58. 58. CENTRAL TENDENCY  The Median  If the two middle scores have different values, you would have to interpolate to determine the median.
  59. 59. DO`S AND DON`TS IN WRITING FOR READING COMPREHENSION  General Concerns:  The candidate or the student should be able to answer the questions on the basis of what is in the passage; the questions should not require outside knowledge.  Questions should cover all the important parts of the passage. Questions should not be asked exclusively about one section of the passage while other sections are neglected.  - Overlap among questions should be avoided. With many questions based on one passage, it is inevitable that more than one question may relate to a particular portion or aspect of the passage; care should be taken, however, that such questions explore different perspectives of the material.
  60. 60. DO`S AND DON`TS IN WRITING FOR READING COMPREHENSION  The stem:  - The stem should formulate the question or problem as simply and directly as possible. Avoid irrelevant verbiage.  - The stem should be as directed as possible; that is, it should have a focus and should clearly identify the problem. The candidate should not have to read all of the options to see what the question is asking.
  61. 61. DO`S AND DON`TS IN WRITING FOR READING COMPREHENSION  - Capitalize words such as NOT, LEAST, EXCEPT, etc. When they are used in the stem to call for a negative or unexpected response.  - If a word or phrase is used at the beginning of each option, move that word or phrase to the stem to avoid unnecessary repetition.
  62. 62. DO`S AND DON`TS IN WRITING FOR READING COMPREHENSION  Refer to the passage as such, not “selection,” “excerpt,” etc.  - Use specific line references when questions refer to specific words, phrases, or arguments in the passage.
  63. 63. DO`S AND DON`TS IN WRITING FOR READING COMPREHENSION  The Key:  There should be one and only one correct or clearly best answer.  The key should not be specifically determined in any way, e.g., by length, degree of precision, or language. Item writers often submit questions with the key so carefully qualified that it is twice as long as the distractors; it may help to write the key first so that the distractors can be tailored to be parallel.
  64. 64. DO`S AND DON`TS IN WRITING FOR READING COMPREHENSION  The options:  Do not use “ All of the above” or “None of the above” as options. The need to use “All of the above” may be an indication that a Roman numeral format would be appropriate.  All options should be as parallel as possible in grammatical structure, diction, and length.
  65. 65. DO`S AND DON`TS IN WRITING FOR READING COMPREHENSION  Unacceptable Sample Options:  The passage implies that an advantage of adopting the author’s theories is that we would increase our knowledge of atmospheric processes  national survival  the formulation of a set of hypotheses regarding motion in space 65
  66. 66. DO`S AND DON`TS IN WRITING FOR READING COMPREHENSION  The options:  - All options must fit the stem, e.g., they should not be easily identifiable as incorrect responses simply because they make no sense grammatically or idiomatically.  - Avoid options that overlap or subsume each other, or options that give away the answers to other questions.
  67. 67. DO`S AND DON`TS IN WRITING FOR READING COMPREHENSION  - Avoid using a pair of opposites in the options if one of the pair is the key. If such a pair of opposites is used, the item is likely to operate as two-choice rather than a four-choice item, and the probability of guessing the correct answer is increased.  - Arrange options in logical order, if one exists, or according to length ( for example, shortest to longest).
  68. 68. Computers and Language Testing  Rapid developments in computer technology have had a major impact on test delivery. Already, many important national and international language tests, including TOEFL, are moving to computer based testing (CBT). Stimulus texts and prompts are presented not in examination booklets but on the screen, with candidates being required to key in their responses. The advent of CBT has not necessarily involved any change in the test content, which may remain quite conservative in its assumptions, but often simply represents a change in test method.
  69. 69. Computers and Language  Testing  The proponents of computer based testing can point to a number of advantages. First, scoring of fixed response items can be done automatically, and the candidate can be given a score immediately. Second, the computer can deliver tests that are tailored to the particular abilities of the candidate.
  70. 70. Computers and Language  Testing  It seems inefficient for all candidates to take all the questions on a test; clearly some are so easy for some candidates that they provide little information on their abilities; others are too hard to be of use. It makes sense to use the very limited time available for testing to focus on those items that are just within, and just beyond a candidate’s threshold of ability.
  71. 71. Computers and Language Testing The use of computer for delivery of test materials raises questions of validity. For example, different levels of familiarity with computers will affect people’s performance with them, and interaction with the computer may be stressful experience for some students or candidates. McNamara Tim ( 2000, pages 79-81 )
  72. 72. LEARNING THEORY: Intrinsic Motivation / Teacher extrinsic motivation  Structure: focused practice / lots of oral practice  Sequence: learn well before moving on to next point.  Reinforcement: PROCESS andOUTPUT INPUT review feedback CORRECTION REVIEW FEEDBACK
  75. 75. THANK YOU
