ASSESSMENT Assessment is basically the act or result of judging the worth or value of something or someone in a given context. It is the process of gathering and documenting information about the achievement, skills, abilities, and personality variables of an individual. Assessment is used in both an educational and psychological setting by teachers, psychologists, and counsellors to accomplish a range of objectives. These include the following: to learn more about the competencies and deficiencies of the individual being tested to identify specific problem areas and/or needs to evaluate the individual's performance in relation to others to evaluate the individual's performance in relation to a set of standards or goals to provide teachers with feedback on effectiveness of instruction to evaluate the impact of psychological or neurological abnormalities on learning and behavior to predict an individual's aptitudes or future capabilities EVALUATION In essence, evaluation is used to determine the success level of something; this something is typically an individual or a product, such as a lesson, project or program (although a process can be evaluated). A properly designed and implemented evaluation provides ain instructional designer with appropriate data that can be analysed to determine the success level of who or what is being evaluated. HOW ARE THEY DIFFERENT Assessment collects of information, artefacts, samples of student work and performance, while Evaluation the analysis of the assessment pieces through some form of scoring or rating. Evaluation is related to and often confused with assessment . In the context of education, assessment is the process used to determine how much a student knows or how effectively a student can perform some task relative to a criterion or standard. Evaluation is intended to determine how well a project has achieved the criteria or goals established for it. If a project's goals include &quot;improved student learning&quot; however, then the evaluation would likely utilize assessment data. So while assessment and evaluation are different, evaluations often utilize assessment data to help make decisions about the value of the project under study. Many designers use the term evaluation to refer to a process that includes assessment and measurement, as well as the term assessment is often used to refer to the techniques used during evaluation to gather data
Assessment that serves as a pedagogical function enhances learning by creating awareness, cuing attention or providing practice. It provides an instructional designer with appropriate data that can be analysed to determine the success level of who or what is being evaluated, once this level is determined, the designer decides whether changes need to be made. These changes vary according to who or what is being evaluated, and when the evaluation is taking place. Changes could include learner remediation, the redesign of instructional materials, or abandonment of the entire instructional intervention. Changes are made to help improve the likelihood that a learner will achieve a high level of success. DEVELOPING PERFORMANCE MEASUREMENSTS An Instructional designer should usually develop performance measurements during or immediately following the preparation of performance objectives. Measurements of all kinds – sometimes called metrics – have been commanding some attention in recent years.
Instructional designers should be capable of developing tests, written questionnaires, interviews, and other methods of measuring performance. The performance measures should be written clearly and correspond to performance objectives, rely on appropriate methods of measuring learning outcomes, comply with time and instructional constraints, and meet requirements for validity and reliability. Instructional designers should be able to develop performance measurements only when they are furnished with necessary information on the characteristics of learners, the settings in which they are expected to perform, constraints on performance and instructional development, instructional objectives, and plans for analyzing needs and evaluating results as applicable.
Approaches to assessment differ depending on what is being assessed. Cognitive – learning is often assessed with paper and pencil test. Performance or demonstrations of the ability to apply learning are assessed through observation or examination of products through checklists. Affective – can be determined through inventories or self-reporting instruments, or through observation that yields indicators of valuing. Psychomotor objectives – can be assessed through all three approaches. Paper and pencil tests can be used to determine knowledge of the cognitive component of a psychomotor skill. Attitude inventories can determine feelings towards regular performance of the skill. Performance measures can be used to collect data on ability to execute the skill.
In order to develop assessment plans that include testing instruments you need to understand the concepts of criterion-referenced testing, reliability and validity. Instruction is designed to bring about learning and tests or other means of assessment are used to determine whether learning occurred. CRTs refer to the relationship between the objective and method of assessment and to the level of performance required for mastery. A learner evaluation that is CRT indicates that a learner is being judged on the basis of his or her competency level. Competence is determined by a specific criteria, such as being able to answer a specified number of questions or to demonstrate certain skills in a specific amount of time. Usually a cut-off score is established and everyone reaching or exceeding the score passes the test. E.g. Taking a driving test, where you must show competency in being able to operate a vehicle. Passing the test indicates competence by the licensing office. NRTs indicate that the results of the evaluation are being used to compare or rank learners. E.g. As a student you are familiar with numerous norm referenced examinations, e.g. SEA, CXC, CAPE, SAT, GRE. Although both types of evaluation are useful, CRTs generally provide more useful information. This information can help instructors and instructional designers decide on whee a learner may have gaps in his or her learning. Knowing this type of information will allow the changes to be made to instruction that will benefit the learner. NRTs evaluation does not measure competence. Knowing how a learner ranks in relationship to other learners does not provide the information necessary to determine how competent the learner is or what changes need to be made to help improve instruction.
Reliability means that a test yields a dependable measure, that if the test is repeated the same results will be obtained. If the test is given to the same learner at different times, without the learner’s receiving additional preparation, and similar results occur, the test is considered reliable. Various statistical methods can be used to determine reliability, however, they are beyond the scope of this course, it should be taken care of in curriculum, testing and assessment, or research and evaluation courses. Reliable tests have consistency and temporal dependability.
Consistency exist in a test when a conclusion can confidently drawn from the fact a learner’s performance is based on consistent performance (i.e. Good or bad) of all items aimed at the same objective or outcome. For example: A student may correctly answer a question by guessing, or answer a question incorrectly because it was phrased misleadingly. A single item is not sufficient evidence to conclude that a student has or has not mastered an objective. Basically, a student who has answered several test questions correctly is more likely to have mastered the objective than a student who can answer only one or two questions.
The following are factors that affect the number of items developed when ensuring consistency on a test: Consequences of misclassification - what is the cost of judging a master, a non-master, and vice versa? The greater the cost, the greater the need for multiple items. Specificity of the competency – The more specific the competency, the smaller the number of test items need to assess competency. This is especially true of performance test of simple motor sills, for example playing the C scale. Once is usually enough. Resources available for testing – when it is impossible to test for long periods of time, develop a strategy for selecting competencies for testing to make maximum use of the time and money allocated for the instruction.
Each time a test is administered, it should produce similar results. The student’s demonstration of mastery should be the same on Tuesday as it is on Friday. Temporal dependency is usually determined by administering the tests on two occasions to the same group of students. A high degree of correspondence between scores suggests good reliability. Statistics employed for traditional norm-referenced reliability ae not appropriate here. The comparison can be done by a simple percentage. Test-retest reliability is enhanced by constructing unambiguous items and by making scoring as objective as possible. The most objective tests are those that can be scored by machine or by anyone with a scoring key. By contrast, subjective items must depend upon the judgement of experts who may have varying opinions about correctness. Subjects tests get varying results from the same students on different days.
Consider the example: if the test item says that an automotive repair student must remove a brake drum, then to be valid, the test must be designed to provide an automobile and the equipment to actually remove the drum. A multiple choice test requiring students to recognize the correct procedure for removing the drum is not a valid test of the objective, nor would the test be valid if it required the student to recite from memory the procedures for removal. Neither test provides evidence of the student’s ability to use the actual equipment. Responses to such a test may be interpreted as a memorized verbal chain. Only when the test and the objective are corespondening is the test valid. Frequently it is not possible to test the actual performance of a task, because it is difficult to obtain the resources to develop and administer the test. For example, consider the problem of obtaining equipment for the test on automotive repair. Practical considerations may necessitate a scaled-down version of the actual performance. Therefore the object must be written to take these constraints into consideration while still obtaining a valid measure. In cases such as these, it is a good idea to obtain the judgment of others about whether the objective and the test truly reflect the purposes of the lesson.
This is because the assessment should be derived from the objectives. Take a look at the action (the verb) that is pat of the instructional objective. For example
This is because the assessment should be derived from the objectives. Take a look at the action (the verb) that is pat of the instructional objective. The outcome of the instructional objective is for the learner to be able to identify when an individual might be having a heart attack. The action and the outcome has been identified, but what type of outcome is this? The taxonomies can be referred to. The action fits into the cognitive learning taxonomy, at the comprehension level. Since it fits into the cognitive domain, the outcome of the instructional objective is a change in the learner’s knowledge. Knowing this information can help you develop an appropriate assessment technique that will determine whether the learner has achieved the outcome.
Cognitive test can be used for Learning that requires acquisition of knowledge, the appropriateness of paper and pencil test is self evident. Verbal chians may be measured by recitation. Knowledge of facts and other types of information may be assessed by questions that require the student to make mastery explicitly. Intellectual skills are assessed by having a student solve problems, apply rules or classify objects.
Unobservable cognitive tasks are usually made visible by some form of a written test.
For example a student’s ability to perform a motor task is evaluated by observing and judging his behaviour. In the test, the student is directed to perform a task and his performance is evaluated against some predetermined standard. If the output is a process, performance is evaluated as it occurs. Tasks evaluated this way included actions performed by athletes, performing artists and equipment operators. Some feel that because performance tests directly measure capability, they are inherently more valid than written tests. But they usually require judgement by the examiner, they tend to be less reliable than cognitive tests.
Process – Objectives can be process objectives. This means that students are expected to learn ways of doing things, such as problem solving and discussing. Problem-based learning uses process objectives. In other words, students are expected to learn procedures for problem solving. Rubrics and observation checklists are used to assess such outcomes. A rubric is a table, list or scale for scoring performance on assigned tasks. It can be designed to allow comparison between levels of achievement or aspects of a task. Another form of process assessment is using checklists to determine whether the learner can execute a procedure or demonstrate applying a rule. Product – the outcome of a procedure is a product which is evaluated against a standard. Products may take many forms. A sample of the student’s handwriting may be compared to an ideal sample of correct penmanship, or an apprentice cabinetmaker’s work may be evaluated against certain workmanship standards. A learner may demonstrate the ability to do double entry bookkeeping by recording debits and credits from paper records such has check stubs and invoices. The product can be in the form of written language, such as a report or it can be in graphic form, such as a chart or it can be in edible form sch as a cake or it can be a dramatic performance or a speech. Portfolios – One of the alternate assessment practices increasingly used is portfolios. They have the advantage of providing a basis for both process and product review. Portfolios are a product because they contain examples of work that can be examined. An example of an objective for a portfolio might be: “ Given 3 months and a research topic and resources, the student will be able to write a formal paper that demonstrates mastery of the topic, including original interpretations and thoughts. The paper will meet standards summarized in a rubric, and the portfolio will include evidence of growth and collaborative reflection on the experience and will meet standards set forth in a checklist”. The portfolio concept has been used before by models, artists and even instructional designers.
Portfolios – One of the alternate assessment practices increasingly used is portfolios. They have the advantage of providing a basis for both process and product review. Portfolios are a product because they contain examples of work that can be examined. An example of an objective for a portfolio might be: “ Given 3 months and a research topic and resources, the student will be able to write a formal paper that demonstrates mastery of the topic, including original interpretations and thoughts. The paper will meet standards summarized in a rubric, and the portfolio will include evidence of growth and collaborative reflection on the experience and will meet standards set forth in a checklist”. The portfolio concept has been used before by models, artists and even instructional designers. How does one set standards for portfolios, which are works in process and involve both process and product approaches to performance assessment? One way is to set standards for parts as they evolve and use these standards in addition to review of the portfolio as a whole and evidence of involvement and reflectivity. However, many problems arise from this approach: Personal standards must be integrated with criterion-referenced standards Assignments must be relevant to the individual student Time for evidence of growth and achievement must be provided on an individual basis Time for collaborative reflection is required. Portfolio generation is a process, not just a simple procedure. It requires many procedures, from decisions about what will be included, to when it will be reviewed periodically and how reflection and feedback will occur. Rubrics present criteria in a graphic form that allows an evaluator to give feedback that stimulates reflection. Some teachers develop many forms, rubrics, lists of questions or considerations to be used during their process. They have to continually reflect on how well the process is working . Projects - The goal of a project is to learn more about the topic rather than to see the right answers to questions posed by the teacher. This type of object can be assessed through a rubric or checklist can be used.
Authentic assessment is generating great interest currently in areas such as science and mathematics. Wiggins compares authentic assessment with more traditional means of assessment and found: Point 2: Where as traditional testing standardizes objective items and hence, the (one) right answer of each. Point 3: whereas validity on most multiple-choice tests is determined merely by matching items to the curriculum content (or through sophisticated correlations with other test results) Point 4: Whereas traditional tests are more like drills, assessing static and too-often arbitrarily discrete or simplistic elements of those activities. (Wiggins, 1990, p.1) The major problem with this type of assessment is that it is very labour intensive and therefore costly. It costs double the amount to uses these test as opposed to performance based tests and the benefits can be achieved in the long run. Process, product, portfolio and project assessments can be ways to assess authentically.
When you assess attitudes, it is desirable to place the learner in a voluntary situation in order to collect data. Otherwise, the learner may indicate a preference just to please the examiner rather than being honest. This means that you measure preference for reading indirectly by counting books signed out from the library, not by collecting a book report done to fulfil an assignment. Affective objectives are measured by criterion items that are often voluntary and indirect. They have a flexible, personalized level for success. The following objective is an example of attitude assessment through a journal or diary kept on a voluntary basis: “ Given a budget and the opportunity to participate in a research study, the teenage volunteer will show preference for a balanced diet in reports on a five-day eating period in restaurants. The balanced diet will be judged on a five-day rather than a meal-by-meal basis.”
You write criterion item consistent with the objective. The taxonomies of objectives include sample test items that can be used as models.
You write criterion item consistent with the objective. The taxonomies of objectives include sample test items that can be used as models. This is not to say that multiple-choice items cannot be used for knowledge and attitude assessment. However, it is difficult to measure application of a psychomotor skill with multiple choice items. Items can be constructed to be suitable for different learning outcomes. Still there are domains for which an item is most suitable.
To use an extreme example, one does not use observation with a written test on cognitive learning. However some data collections methods appropriate for performance assessment can also be used for assessment of attitude change. Still there are methods more suitable for one than the other. Self-reports methods, the oral or written response are an effective way of assessing attitude change.
A successful learner evaluation provides sufficient data to help the individuals involved in the instructional intervention (eg. Instructor, instructional designer) make decisions about what took place and what should subsequently be done. Generally, what is being established by looking at the data is whether a learner was able to meet the instructional objectives, and to what degree. Finding out why will help establish what appropriate next steps need to take place for a learner. It could indicate that the learner needs remediation or that the instructional materials need to be improved. If adequate data were not collected during the learner evaluation to allow for the learner to be properly evaluated, the learner evaluation should be examined. Such an examination could indicate that the assessment techniques used to gather data were inappropriate. Another possible issue can be how it was implemented, if the assessment techniques were not implemented properly, inadequate data may be collected.
Assessing learning in Instructional Design
Prepared by Leesha Roberts, Instructor II University of Trinidad and Tobago – Valsayn Campus Evaluating Learner Success and the Instructional Design
Rationale for using Evaluating The Learner in Instruction Design
Overview <ul><li>What is Assessment and Evaluation? </li></ul><ul><li>How do they differ? </li></ul><ul><li>What role does it play in the ID Process </li></ul><ul><li>When should learner performance be assessed? </li></ul><ul><li>How can assessment be made reliable and valid </li></ul><ul><li>Matching Assessment to Objectives </li></ul><ul><li>How does an instructional designer determine when a learner evaluation has been successful </li></ul>
What is Assessment and Evaluation? <ul><li>What is Assessment? </li></ul><ul><ul><li>Procedures or techniques used to obtain data about a learner or a product. </li></ul></ul><ul><li>What is Evaluation? </li></ul><ul><ul><li>The process for determining the success level of an individual or a product on the basis of data </li></ul></ul><ul><li>How are they different? </li></ul><ul><ul><li>Assessment collects of information, while Evaluation is the analysis of the assessment pieces </li></ul></ul>
What is Assessment and Evaluation <ul><li>Measurement – refers to the data collected which is typically expressed quantitatively (i.e. Numbers) </li></ul><ul><li>Instruments - The physical devices used to collect the data e.g. Rating scales, observation sheets, checklists, objectives tests) </li></ul>
What role does Assessment and Evaluation play in the ID Process <ul><li>Assessment serves as a pedagogical function to: </li></ul><ul><ul><li>Measuring </li></ul></ul><ul><ul><li>Diagnosing </li></ul></ul><ul><ul><li>Instructing </li></ul></ul><ul><li>Information from Assessment can be used as a secondary function in evaluation. </li></ul>
Developing Performance Measurements <ul><li>Instructional Designers should be capable of developing: </li></ul><ul><ul><li>Tests </li></ul></ul><ul><ul><li>Written questionnaires </li></ul></ul><ul><ul><li>Interviews </li></ul></ul><ul><ul><li>Other methods of measuring performance </li></ul></ul>
Basic Principles of Measurement <ul><li>Tests that measure what a person has learned to do are called achievement tests. </li></ul><ul><li>There are two types of achievement tests </li></ul><ul><ul><li>Criterion-referenced tests (CRTs), also known as minimum competency or mastery. Basically it allows everyone to know exactly how well students stand relative to a standard. </li></ul></ul><ul><ul><li>Norm-referenced tests (NRTs). Basically these tests are designed to “reliably” select the best performers </li></ul></ul>
Reliability and Validity <ul><li>What is Reliability? </li></ul><ul><ul><li>Learner evaluation will provide similar results when it is conducted on multiple occasions. </li></ul></ul><ul><li>What is Validity? </li></ul><ul><ul><li>Determines whether the learners have achieved the intended outcomes of instruction (based on the intended outcomes of the instruction) </li></ul></ul>
Characteristics of Reliability in Tests <ul><li>Reliable tests have: </li></ul><ul><ul><li>Consistency </li></ul></ul><ul><ul><li>Temporal Dependency </li></ul></ul><ul><li>Consistency </li></ul><ul><ul><li>To increase the consistency of NRT, developers simply increase the number of items on the test. </li></ul></ul><ul><ul><li>To increase the consistency of CRT, assess each competency a test covers. </li></ul></ul>
Characteristics of Reliability in Tests (Cont’d) <ul><li>The following are factors that affect the number of items developed when ensuring consistency on a test: </li></ul><ul><ul><li>Consequences of misclassification </li></ul></ul><ul><ul><li>Specificity of the competency </li></ul></ul><ul><ul><li>Resources available for testing </li></ul></ul>
Characteristics of Reliability in Tests (Cont’d) <ul><li>Temporal Dependency – each time a test is administered, it should produce similar results. </li></ul>
Characteristics of Validity in Tests <ul><li>There can be no validity without reliability </li></ul><ul><li>The performance on a CRT must be exactly the same as the performance specified by the objective. </li></ul><ul><li>Achieving validity is not always straightforward. </li></ul>
Matching Assessment to Objectives <ul><li>Instructional objectives are a key element in the development of effective learner assessment. </li></ul><ul><li>A direct relationship must exist between the instructional objectives and the learner assessment. </li></ul><ul><li>How can you determine whether the intended outcome of an instructional objective is a change in knowledge, skill or attitude? </li></ul>
Matching Assessment to Objectives <ul><li>Example: </li></ul><ul><li>Read the following sentence and identify the action. </li></ul><ul><ul><li>The learner will be able to list the three major warning signs of a heart attach. </li></ul></ul><ul><li>The action in the instructional objective is to list – more specifically, to list the three major warning signs of a heart attack . </li></ul>
Cognitive Tests <ul><li>Measures acquisition of knowledge </li></ul><ul><li>Paper and Pencil tests </li></ul><ul><li>Recitation </li></ul><ul><li>The six types of test that apply to cognitive tasks: </li></ul><ul><ul><li>Multiple-choice </li></ul></ul>
Performance Tests <ul><li>Measures a student’s ability to do something. </li></ul><ul><li>There are five types of Performance Assessment: </li></ul><ul><ul><li>Performance (examination of actions or behaviours, that can be directly observed) </li></ul></ul>
Performance Tests <ul><ul><li>Process (learning ways of doing things such as problem solving and discussing) </li></ul></ul><ul><ul><li>Product (outcome of a procedure is the product which is evaluated against a standard.) </li></ul></ul>
Performance Tests <ul><ul><li>Portfolios (provides the basis for a product and process review) </li></ul></ul><ul><ul><li>Projects (a product assessment which is an in-depth investigation of a topic worth learning more about, according to Katz(1994) </li></ul></ul>
Performance Process Product Portfolios Projects
Authentic Assessment <ul><li>Focus is on “real” tasks </li></ul><ul><li>Achieves validity and reliability by emphasizing and standardizing the appropriate criteria for scoring such (varied) Products </li></ul><ul><li>“ test validity” depends on whether the test simulates real-world tests of ability </li></ul><ul><li>Involves “ill-structured” challenges and roles that help students rehearse for the complex ambiguities of the “game” of adult and professional life. </li></ul>
Appropriateness of Items <ul><li>How do you go about writing valid criterion items? </li></ul><ul><li>The taxonomies of objectives include sample test items that can be used as models. </li></ul>
Appropriateness of Items <ul><li>An assessment procedure may be more appropriate for some learning outcomes than others. </li></ul><ul><li>There are several bases on which the logical consistency between assessment and other aspects of design can be determined </li></ul>
Appropriateness of Items <ul><li>These are: </li></ul><ul><ul><li>Matching the objectives to the criterion </li></ul></ul><ul><ul><li>Matching the type of assessment to the type of learning </li></ul></ul><ul><ul><li>Matching the data collection method to the purpose of the assessment </li></ul></ul>
<ul><li>A successful learner evaluation provides: </li></ul><ul><ul><li>Data for instructional intervention </li></ul></ul><ul><ul><li>Data as to whether the learner has met the instructional objectives </li></ul></ul><ul><ul><li>Recommendations based of data gathered </li></ul></ul>Determination of the success of Learner Evaluation