Test Scoring
Approaches:
Objective and Essay Tests
Presented by James Gororo
Test
Scoring defined
• "Scoring refers to the process of assigning numerical or
qualitative values to performance or response on a
task, with the aim of summarizing the quality or level
of achievement" (Angelo, 1995).
Key Thinkers in Scoring Assessments
• In the field of educational assessment, several
authorities have made significant contributions to
scoring methods:
• Benjamin Bloom: Bloom's taxonomy, developed in the
1950s, categorized learning objectives into different
cognitive levels. This taxonomy provided a framework
for designing assessments and scoring student
performance based on different levels of complexity.
• Robert Glaser: Glaser, an American psychologist,
contributed to the development of criterion-
referenced assessment, which involves scoring based
on predefined criteria and learning objectives. His
work emphasized the importance of aligning
assessments with instructional goals.
Classroom
Test
• An assessment administered by teachers within the
classroom setting to evaluate students' knowledge,
understanding, and skills in a specific subject or course.
• Typically includes question types, such as multiple-
choice, short-answer, or essay questions.
• Aims to measure students' comprehension and
mastery of the learning objectives for that particular
instructional period (Angelo & Cross, 1993).
Essay Test
• Type of assessment that requires students to construct
written responses to questions or prompts.
• Allows students to demonstrate their understanding of a
topic, apply critical thinking skills, and articulate their
thoughts in a coherent and well-structured manner.
• Assesses higher-order thinking skills and require students
to provide detailed explanations, analysis, and support
their arguments with evidence (Wiggins, 1998).
Testing
Approaches for
Essay Tests
• Each answer is compared with an already prepared ideal
marking scheme (scoring key) and marks are assigned according
to the adequacy of the answer.
• Provides a means for maintaining uniformity in scoring
between markers/scorers and between scripts thus improving
the reliability of the scoring.
• This method is generally used satisfactorily to score
Restricted Response Questions thus clearly defining the degree
of quality precisely enough to assign point values to them.
• Particular weaknesses or strengths of examinee can be
identified.
• Each aspect of the item is rated separately, providing greater
objectivity and thus increases the diagnostic value of the result.
The Point or Analytic Method
Testing
Approaches for
Essay Tests
• Ideal for the extended response questions where relative
judgments are made (no exact numerical scores) concerning
the relevance of ideas, organization of the material and similar
qualities.
• The examiner first sorts the response into categories of varying
quality based on his general or global impression, providing a
relative scale, which forms the basis for ranking responses from
those with the poorest quality response to the highest quality
response.
• Usually between five and ten categories are used, each
representing the degree of quality and determines the credit to
be assigned.
• Requires a lot of skill and time in allocating to categories.
• Each characteristic is rated separately thus allowing for greater
objectivity and increases the diagnostic value of the results.
The Global/Holistic of Rating Method
Scoring
Procedures for
Essay Tests
• Prepare a marking scheme with ideal answers showing mark allocation.
• Use the scoring method that is most appropriate for the test item: the analytic or
global method as appropriate to the requirements of the test item.
• Decide how to handle factors that are irrelevant to the learning outcomes being
measured, i.e. legibility of handwriting, spelling, sentence structure, punctuation
and neatness.
• To preserve objectivity, score each item for the whole group separately.
• To control bias, score scripts anonymously without knowledge of candidates.
• Evaluate the marking scheme before actual scoring by scoring a random sample of
examinees actual responses. This might call for a revision of the scoring key before
actual scoring.
• Make comments when marking each test item which will provide feedback to
candidates.
• Obtain two or more independent ratings if important decisions are to be based on
the results to allow for more reliable results.
Objective Test
• Is an assessment that consists of questions with
predetermined correct answers.
• Employs question formats such as multiple-choice, true
or false, or matching items, which allow for objective
scoring and measurement.
• Aims to assess specific factual knowledge,
comprehension, and application of concepts, as well as
the ability to analyse and evaluate information
(Gronlund, 2006).
Testing
Approaches for
Objective Tests
• Manual Scoring: Direct comparison of candidates’ answers to a
marking scheme.
• Stencil Scoring: A scoring stencil is prepared using a blank answer
sheet by pending holes where the correct answers should be. The
stencil is then placed over each answer sheet and the number of
answer checks appearing through the holes is counted. Each test
paper is scanned afterwards to eliminate possible errors due to
examinees supplying more than one answer.
• Machine Scoring: Ideal for multiple choice questions. Specially
prepared answer sheets are used where responses are shown through
shading. Answer sheets are then machine scored with computers or
scanners.
• Correction for guessing: The most common formula for guessing
correction is: Score = Right responses – [wrong responses / (n-1)]
where n is the number of possible responses per item.
References
• Binet, A., & Simon, T. (1916). The Development of
Intelligence in Children. Williams & Wilkins.
• Bloom, B. S. (1956). Taxonomy of educational
objectives: The classification of educational goals.
Handbook I: Cognitive domain. David McKay Company.
• Glaser, R. (1963). Instructional technology and the
measurement of learning outcomes: Some questions.
American Psychologist, 18(8), 519-521.
• Holland, P. W., & Wainer, H. (1993). Differential item
functioning. Lawrence Erlbaum Associates.
• Skinner, B. F. (1953). Science and human behavior. Free
Press.
• Terman, L. M. (1916). The Measurement of
Intelligence: An Explanation of and a Complete Guide
for the Use of the Stanford Revision and Extension of
the Binet-Simon Intelligence Scale. Houghton Mifflin.

Scoring Approaches for Essays and Objective Tests.pptx

  • 1.
    Test Scoring Approaches: Objective andEssay Tests Presented by James Gororo
  • 2.
    Test Scoring defined • "Scoringrefers to the process of assigning numerical or qualitative values to performance or response on a task, with the aim of summarizing the quality or level of achievement" (Angelo, 1995).
  • 3.
    Key Thinkers inScoring Assessments • In the field of educational assessment, several authorities have made significant contributions to scoring methods: • Benjamin Bloom: Bloom's taxonomy, developed in the 1950s, categorized learning objectives into different cognitive levels. This taxonomy provided a framework for designing assessments and scoring student performance based on different levels of complexity. • Robert Glaser: Glaser, an American psychologist, contributed to the development of criterion- referenced assessment, which involves scoring based on predefined criteria and learning objectives. His work emphasized the importance of aligning assessments with instructional goals.
  • 4.
    Classroom Test • An assessmentadministered by teachers within the classroom setting to evaluate students' knowledge, understanding, and skills in a specific subject or course. • Typically includes question types, such as multiple- choice, short-answer, or essay questions. • Aims to measure students' comprehension and mastery of the learning objectives for that particular instructional period (Angelo & Cross, 1993).
  • 5.
    Essay Test • Typeof assessment that requires students to construct written responses to questions or prompts. • Allows students to demonstrate their understanding of a topic, apply critical thinking skills, and articulate their thoughts in a coherent and well-structured manner. • Assesses higher-order thinking skills and require students to provide detailed explanations, analysis, and support their arguments with evidence (Wiggins, 1998).
  • 6.
    Testing Approaches for Essay Tests •Each answer is compared with an already prepared ideal marking scheme (scoring key) and marks are assigned according to the adequacy of the answer. • Provides a means for maintaining uniformity in scoring between markers/scorers and between scripts thus improving the reliability of the scoring. • This method is generally used satisfactorily to score Restricted Response Questions thus clearly defining the degree of quality precisely enough to assign point values to them. • Particular weaknesses or strengths of examinee can be identified. • Each aspect of the item is rated separately, providing greater objectivity and thus increases the diagnostic value of the result. The Point or Analytic Method
  • 7.
    Testing Approaches for Essay Tests •Ideal for the extended response questions where relative judgments are made (no exact numerical scores) concerning the relevance of ideas, organization of the material and similar qualities. • The examiner first sorts the response into categories of varying quality based on his general or global impression, providing a relative scale, which forms the basis for ranking responses from those with the poorest quality response to the highest quality response. • Usually between five and ten categories are used, each representing the degree of quality and determines the credit to be assigned. • Requires a lot of skill and time in allocating to categories. • Each characteristic is rated separately thus allowing for greater objectivity and increases the diagnostic value of the results. The Global/Holistic of Rating Method
  • 8.
    Scoring Procedures for Essay Tests •Prepare a marking scheme with ideal answers showing mark allocation. • Use the scoring method that is most appropriate for the test item: the analytic or global method as appropriate to the requirements of the test item. • Decide how to handle factors that are irrelevant to the learning outcomes being measured, i.e. legibility of handwriting, spelling, sentence structure, punctuation and neatness. • To preserve objectivity, score each item for the whole group separately. • To control bias, score scripts anonymously without knowledge of candidates. • Evaluate the marking scheme before actual scoring by scoring a random sample of examinees actual responses. This might call for a revision of the scoring key before actual scoring. • Make comments when marking each test item which will provide feedback to candidates. • Obtain two or more independent ratings if important decisions are to be based on the results to allow for more reliable results.
  • 9.
    Objective Test • Isan assessment that consists of questions with predetermined correct answers. • Employs question formats such as multiple-choice, true or false, or matching items, which allow for objective scoring and measurement. • Aims to assess specific factual knowledge, comprehension, and application of concepts, as well as the ability to analyse and evaluate information (Gronlund, 2006).
  • 10.
    Testing Approaches for Objective Tests •Manual Scoring: Direct comparison of candidates’ answers to a marking scheme. • Stencil Scoring: A scoring stencil is prepared using a blank answer sheet by pending holes where the correct answers should be. The stencil is then placed over each answer sheet and the number of answer checks appearing through the holes is counted. Each test paper is scanned afterwards to eliminate possible errors due to examinees supplying more than one answer. • Machine Scoring: Ideal for multiple choice questions. Specially prepared answer sheets are used where responses are shown through shading. Answer sheets are then machine scored with computers or scanners. • Correction for guessing: The most common formula for guessing correction is: Score = Right responses – [wrong responses / (n-1)] where n is the number of possible responses per item.
  • 11.
    References • Binet, A.,& Simon, T. (1916). The Development of Intelligence in Children. Williams & Wilkins. • Bloom, B. S. (1956). Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. David McKay Company. • Glaser, R. (1963). Instructional technology and the measurement of learning outcomes: Some questions. American Psychologist, 18(8), 519-521. • Holland, P. W., & Wainer, H. (1993). Differential item functioning. Lawrence Erlbaum Associates. • Skinner, B. F. (1953). Science and human behavior. Free Press. • Terman, L. M. (1916). The Measurement of Intelligence: An Explanation of and a Complete Guide for the Use of the Stanford Revision and Extension of the Binet-Simon Intelligence Scale. Houghton Mifflin.