Successfully reported this slideshow.

Evaluation the many faces


Published on

Discusses the rationale for Evaluation within Instruction.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Evaluation the many faces

  1. 1. THE MANY FACES OF EVALUATION Source: Morrison, Gary R. Designing Effective Instruction, 6th Edition. John Wiley & Sons (2011) Prepared by: Leesha Roberts, Instructor II, University of Trinidad and Tobago
  2. 2. QUESTIONS TO CONSIDER • How can I determine whether this course is teaching what it is supposed to? • What are some ways to measure the accomplishment of performance skills besides observing a person at work on a job? • When is it appropriate to use a performance test instead of an objective test? • The questions on this test don’t relate to the objectives the teacher gave us at the beginning of the unit. Shouldn’t they? • Should I pretest my students? If so, how can that information be used to improve instruction?
  3. 3. PURPOSES OF EVALUATION • Evaluation is used for the purposes of making judgments about the worth or success of people or things (e.g., lessons, programs, projects). • Before initiating an evaluation, you must determine its goals • depending on the stage of the instructional design process, one of three types of evaluation will become most useful: • The formative approach, • The summative approach • or The confirmative approach
  4. 4. Formative Evaluation • Its function is to inform the instructor or planning team how well the instructional program is serving the objectives as it progresses. • Formative evaluation is most valuable when conducted during development and tryouts. • It should be performed early in the process, before valuable time and resources are wasted on things that aren’t working. • If the instructional plan contains weaknesses, they can be identified and eliminated before full-scale implementation.
  5. 5. Formative Evaluation • Test results, reactions from learners, observations of learners at work, reviews by subject-matter experts, and suggestions from colleagues may indicate where there are deficiencies in the learning sequence, procedures, or materials. • Formative evaluation is the quality control of the development process. • Formative testing and revision (and retesting and further revision, if necessary) are important for the success of an instructional design plan. • They should relate not only to the suitability of objectives, subject content, instructional strategies, and materials but also to the roles of personnel, the use of facilities and equipment, the schedules, and other factors that together affect optimum performance in achieving objectives.
  6. 6. Formative Evaluation • Remember, the planning process is highly interactive—each element affects other elements. • "For instructors, the focus will be placed on the students. If students don’t perform up to expectations and the effectiveness of instruction has been previously demonstrated, the conclusion will be that the students, not the materials, are at fault. • The following questions might be used by designers to gather data during formative evaluation: 1. Given the objectives for the unit or lesson, is the level of learning acceptable? What weaknesses are apparent? 2. Are learners able to use the knowledge or perform the skills at an acceptable level? Are any weaknesses indicated? 3. How much time did the instruction and learning require? Is this acceptable?
  7. 7. Formative Evaluation 4. Did the activities seem appropriate and manageable to the instructor and learners? 5. Were the materials convenient and easy to locate, use, and file? 6. What were the learners’ reactions to the method of study, activities, materials, and evaluation methods? 7. Do the unit tests and other outcome measures satisfactorily assess the instructional objectives? 8. What revisions in the program seem necessary (content, format, etc.)? 9. Is the instructional context appropriate?"
  8. 8. Summative Evaluation • Summative evaluation is directed toward measuring the degree to which the major outcomes are attained by the end of the course. • Key information sources are therefore likely to be the results of both the unit post tests and the final examination for the course.
  9. 9. Summative Evaluation • In addition to measuring the effectiveness of student or trainee learning, summative evaluations frequently also measure the following: • Efficiency of learning(material mastered/time) • Cost of program development • Continuing expenses • Reactions toward the course or program • Long-term benefits of the program
  10. 10. Confirmative Evaluation • This type of Evaluation was originally introduced by Misanchuk (1978) • The rationale is that evaluation of instruction needs to be continuous and, therefore, extend beyond summative evaluation. • Similar to formative and summative evaluations, confirmative evaluations rely on multiple data-gathering instruments, such as questionnaires, interviews, performance assessments, self- reports, and knowledge tests (Moseley & Solomon, 1997). • Of special interest to the confirmative evaluator are questions such as these: • Do learners continue to perform successfully over time? • Do materials still meet their original objectives? • How can clients’ needs be best met over time? • If improvements are needed in the training or materials, how can they be made most effectively? • If the instruction isn’t working as well as it did originally, what are the reasons? Should the instruction be continued as is? Should it be revised? Should it be terminated?
  11. 11. Processes and Products • Formative evaluation asks, ‘‘How are we doing?’’ • Summative evaluation asks, ‘‘How did we do?’’ • Confirmative evaluation asks, ‘‘How are we still doing?’’ • To answer these questions, different types of measurement orientations are needed.
  12. 12. Processes and Products • Specifically, formative evaluation emphasizes the measurement of outcomes as instruction evolves (or ‘‘forms’’). • Interest is as much with process as with product. • Summative evaluation stresses measurement of criterion outcomes that occur at the end of instruction. Here interest is more with products than with processes. • Confirmative evaluation stresses the measurement of criterion outcomes that occur after the instruction has been completed for some time. Its interest is, therefore, the long-term maintenance of products. • Key idea: Formative evaluation gives equal attention to processes and products. Summative and especially confirmative evaluations give greater weight to products.
  13. 13. Time of Testing • For formative evaluations, testing is important at all phases of instruction— pretesting (before), embedded testing (during), and post testing (after). • Although all three types of testing may be used in both summative and confirmative evaluation, post testing is clearly the most critical and the main basis for forming conclusions about the instruction. • Confirmative evaluation, however, should generally include repeated post testing to monitor performance over time. • Key idea: Formative evaluation gives equal attention to pretesting, embedded testing, and post testing. Summative and confirmative evaluation give greater weight to post testing.
  14. 14. When to Evaluate • Formative evaluations are most valuable before instruction is fully developed, when it is inexpensive to make changes. • They are also most valuable when used in a continuous manner, at different phases of the design process. • Some common modes of formative evaluation are connoisseur- based (i.e., expert-based) review, one-to-one trials, small-group testing, and field testing. All these are used to refine instruction at different developmental stages. • Summative and confirmative evaluations, in contrast, are designed to examine the effectiveness of completed versions of instruction. • Summative comes after the instruction is first used, but before sustained implementation. • Whereas confirmative is used after implementation has occurred and the design has been used for a reasonable time (usually at least six months to a year).
  15. 15. Suggested Measures for Alternative Outcomes • When evaluating Knowledge: • Objective Tests are used: • Multiple Choice • True/False • Matching • Constructed-Response Tests • Completion (fill in the blank) • Short essay • Long essay
  16. 16. Suggested Measures for Alternative Outcomes • Problem solving • When evaluating Behaviour: • Direct testing of performance outcomes: e.g. A test of baking a specialty cake. • Analysis of naturally occurring results e.g. number of accidents, attendance, contribution to class discussions • Ratings of behaviours based on direct observation e.g. rate teacher clarity on a five point scale • Checklists of behaviour based on direct Observation: e.g. check each safety precaution while students are wiring circuits. • Ratings or checklists of behaviour based on indirect measures: e.g. peer evaluation of the student’s communication skills while working in groups • Authentic tests: e.g. portfolios or exhibitions that display students work in meaningful context.
  17. 17. Suggested Measures for Alternative Outcomes • When evaluating Attitudes: • Observation of instruction: e.g. does the student appear to be enjoying the lesson, is the student attentive during the lesson? • Observation/Assessment of behaviour: e.g. how many students observe the class rules • Attitude surveys: e.g. ratings of instructor preparedness, lesson difficulty, clarity and organization, open-ended evaluation by students • Interviews: e.g. what appear to be the strengths of the instruction? Why? What appears to be the weaknesses? Why?
  18. 18. Validity and Reliability of Tests • Once you have determined the types of measures for assessing objectives, selecting or developing the instruments becomes the next major task. Whichever route is taken, it is important to ensure that those instruments possess two necessary qualities: validity and reliability. • A test is considered valid when it specifically measures what was learned, as specified by the instructional objectives for the unit or topic. • One way of ensuring a high degree of test validity is to devise a second table of specifications that relates test items to objectives. • Such a table can serve two purposes: • First, it helps verify that outcomes at the higher learning levels (e.g., application, analysis, synthesis, and evaluation) receive adequate attention. • Second, it shows the number of questions needed for measuring individual instructional objectives or groups of related objectives.
  19. 19. Validity and Reliability of Tests • These frequency values reflect the relative importance of each objective, or the proportion of emphasis it is given during instruction. • Validity is not always easy to measure or quantify. • Several different types exist and are discussed in most measurement texts (e.g., face, content, predictive, con- current, and construct validity).
  20. 20. Validity and Reliability of Tests • The two most important types for the instructional designer are face validity and content validity, which both involve judgmental processes. • Face validity is supported by the judgment (often by an expert panel) that the measure appears (‘‘on the face of it’’) to assess the measure of interest. • Content validity is similar to face validity but typically involves a more specific examination of individual items or questions to ensure that each ‘‘content domain’’ is appropriately addressed. For example, a final achievement exam that draws 90% of its items from only one out of four primary course units would have questionable content validity.
  21. 21. Validity and Reliability of Tests • Reliability refers to a test’s ability to produce consistent results whenever used. • If the same learners, without changes in their preparation, were to take the same test or an equal form of the test, there should be little variation in the scores. • Certain procedures can affect the reliability of a test: • The more questions used relating to each instructional objective, the more reliable the test will be. If only one question is asked about a major objective or an extensive content area, it can be difficult to ascertain whether a learner has acquired the knowledge or guessed the correct answer.
  22. 22. Validity and Reliability of Tests • The test should be administered in a standardized way. If more than one person directs testing, similar instructions must be given to each group of individuals who take the test over a period of time. • Everyone should be tested under the same conditions so that distractions do not contribute to discrepancies in the scores.
  23. 23. Validity and Reliability of Tests • Testing time should be the same length for all learners. • Possibly the most important factor that can affect test reliability is the scoring method, especially when marking an essay test or judging performance on a rating scale. Despite attempts to standardize how different persons score tests, criteria can be viewed in various ways, and variations are unavoidable. The less subjective the scoring, the more reliable the test results will be.
  24. 24. Relationship between Validity and Reliability • A final question to consider is the relationship between validity and reliability. • Does validity require reliability? Does reliability require validity? • The answers to these two questions are yes and no, respectively. • For an assessment to be valid, it must be reliable. Think about it: How could a test measure what it is supposed to if the scores vary from testing to testing (without any change in testing conditions or learner states)? • On the other hand, you could have reliability without validity. • For example, an instructor might attempt to assess students’ ability to design lessons by giving them a 50-item true/false test on learning theories. The scores might remain consistent from one testing to the next, but they would hardly reflect instructional design skills, the outcome of major interest.
  25. 25. THANK YOU