TEST DEVELOPMENT AND EVALUATION (6462)

TEST DEVELOPMENT AND
EVALUATION (6462)
CLASSROOM TESTING AND HIGH-STAKE TESTING
Department of Secondary Teacher Education
ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD

OBJECTIVES OF THE UNIT
After studying this unit, the students will have ability to demonstrate.
1. understand the concept of class room testing and its techniques
2. understand the need and scope of high stake testing
3. differentiate between teacher made tests/classroom tests/low stake tests and
standardized/high stake tests
4. enumerate advantages and limitations of the low stake and high stake tests
5. prepare tests using Bloom’s Taxonomy and SOLO Taxonomy
6. elaborate the procedure for test development
7. provide examples of standardized tests with characteristics with examples.
8. enlist few trends in high stake testing

3.1 CONCEPT OF CLASSROOM TESTING AND ITS TECHNIQUES
Classroom assessment is the process, usually conducted by teachers, of designing, collecting,
interpreting and applying information about student learning and attainment to make educational
decisions. There are four interrelated steps to the classroom assessment process.
 The first step is to define the purposes for the information. During this period, the teacher
considers how the information will be used and how the assessment fits in the students'
educational program.
 The next step in the assessment process is to measure student learning or attainment.
Measurement involves using tests, surveys, observation or interviews to produce either numeric
or verbal descriptions of the degree to which a student has achieved academic goals.
 The third step is to evaluate the measurement data, which entails making judgments about the
information. During this stage, the teacher interprets the measurement data to determine if
students have certain strengths or limitations or whether the student has sufficiently attained the
learning goals.
 In the last stage, the teacher applies the interpretations to fulfill the aims of assessment that
were defined in first stage. The teacher uses the data to guide instruction, render grades, or help
students with any particular learning deficiencies or barriers.

3.2 HIGH STAKE TESTING: ITS NATURE, NEED AND SCOPE
 High-stakes testing has consequences attached to the results. For example, highstakes tests
can be used to determine students’ promotion from grade to grade or graduation from high
school (Resnick, 2004; Cizek, 2001).
 The use and misuse of high-stakes tests are a controversial topic in public education, in
advanced countries and even in Pakistan as they are used not only to assess students but in
attempts to increase teacher accountability also.
Precisely we can say that a high-stakes test is a test that:
o is a single, defined assessment,
o has a clear line drawn between those who pass and those who fail, and
o has direct consequences for passing or failing (something "at stake").
• What is Need of High Stake Testing?
• What is Nature of the High Stake Testing?

Teacher made vs
Standardized test
EduTainment

EduTainment
Teacher made vs
Standardized test

Teacher made vs Standardized test
EduTainment

Differences Between Standard And Teachers Made Tests
EduTainment

EduTainment

3.5.2 Advantage and Disadvantage of High Stake Testing
 It holds teachers accountable for ensuring that all students learn what they are expected to learn.
 Motivates students to work harder, learn more, and take the tests more seriously, which can promote higher
student achievement.
 Establishes high expectations for both educators and students, which can help reverse the cycles of low
educational expectations, achievement, and attainment that have historically disadvantaged some student
groups, particularly students of color, and that have characterized some schools in poorer communities or
more troubled urban areas.
 Reveals areas of educational need that can be targeted for reform and improvement, such as programs for
students who may be underperforming academically or being underserved by schools.
 Provides easily understandable information about school and student performance in the form of numerical
test scores that reformers, educational leaders, elected officials and policy makers can use to develop new
laws, regulations, and school-improvement strategies.
 Gives parents, employers, colleges and others more confidence that students are learning at a high level or
that high school graduates have acquired the skills they will need to succeed in adulthood.

Disadvantage of High-Stakes Testing
 It forces educators to “teach to the test”—
 It promotes a more “narrow” academic program in schools—
 It may contribute to higher, or even much higher, rates of cheating—
 It has been correlated in some research studies to increase failure rates,
lower graduation rates, and higher dropout rates—
 May diminish the overall quality of teaching and learning—
 Exacerbates negative stereotypes about the intelligence and academic
ability of minority students—

3.6 CONCEPT OF USE OF TAXONOMIES IN TEST
DEVELOPMENT
Using Bloom’s Taxonomy in Test Development
Using SOLO Taxonomy in Test Development

Bloom’s Taxonomy (1956) question samples:
•Knowledge: How many…? Who was it that…? Can you name the…?
•Comprehension: Can you write in your own words…? Can you write a brief outline…? What do you
think could have happened next…?
•Application: Choose the best statements that apply Judge the effects of… What would result …?
•Analysis: Which events could have happened…? If … happened, how might the ending have been
different? How was this similar to…?
•Synthesis: Can you design a … to achieve …? Write a poem, song or creative presentation about…?
Can you see a possible solution to…?
•Evaluation: What criteria would you use to assess…? What data was used to evaluate…? How could
you verify…?

SOLO Taxonomy
 SOLO taxonomy was developed by Biggs and Collis (1982) Stands for Structure of Observed Learning Outcomes

3.7 PROCEDURE OR STEPS FOR A STANDARDIZED TEST
DEVELOPMENT PROCESS
Pilot
Forms, Scoring and Analysis
Development
Review
Purpose
Specifications

3.8 EXAMPLES OF STANDARDIZED TESTS WITH
CHARACTERISTICS
The Standardized tests can be classified as per their functions are
• Group and Individual Tests
• Norm-referenced
• Achievement Tests
• Criterion-referenced
• Aptitude
• Personality
• Projective
• Interest Inventories
• Intelligence tests

Reliability refers to the consistency of scores
obtained by the same individuals when re-
examined with test on different occasions, or
with different sets of equivalent items.
Reliability

Inter-rater reliability by considering the similarity of the scores
awarded by the two observers.
Inter-Rater or Inter-ObserverReliability

⚫ It is used to judge the consistency of
results across items on the same test.
⚫ We estimate test-retest reliability when
we administer the same test to the same
sample on two different occasions.
⚫ The amount of time allowed between
measures is critical.
⚫ The shorter the time gap, the higher the
correlation; the longer the time gap, the
lower the correlation.
Test-RetestReliability

⚫ In split-half reliability we randomly divide all items that claim to
measure the same contents into two sets.
⚫ The split-half reliability estimate is simply the correlation between two
total scores.
Split-Half Reliability

⚫ In parallel form reliability we have to create two different tests from
the same contents to measure the same learning outcomes.
⚫ The correlation between the two parallel forms is the estimate of
reliability.
Parallel-FormReliability

● It is the degree to which items on an instrument are consistent among
themselves and with the instrument as a whole.
Internal ConsistencyReliability

Validity
 The validity of an assessment tool is the degree to which it measures
for what it is designed to measure.
 The concept refers to the appropriateness, meaningfulness, and
usefulness of the specific inferences made from test scores.

Methods of Measuring Validity
1 2
3
4 5

Content Validity
 Content validity evidence involves the degree to which the content of the test
matches a content domain associated with the construct.
 Items in a test appear to cover whole domain.
Face validity
It is an estimate of
whether a test appears
to measure a certain
criterion. It is
appearance of test.

Construct Validity
 Construct is the concept or the characteristic that a test is designed to measure.
 According to Howell (1992) Construct validity is a test’s ability to measure
factors which are relevant to the field of study.
Convergent
Convergent validity
refers to the degree to
which a measure is
correlated with other
measures.

Criterion Validity
 Criterion validity evidence involves
the correlation between the test and a
criterion variable (or variables) taken
as representative of the construct.
 It compares the test with other
measures or outcomes (the criteria)
already held to be valid.

Concurrent Validity
 Concurrent validity refers to the degree to which the scores taken at one point
correlates with other measures (test, observation or interview) of the same
construct that is measured at the same time.

Predictive Validity
 Predictive validity assures how well the
test predicts some future behaviour of the
examinee.
 If higher scores on the Boards Exams are
positively correlated with higher
G.P.A.’s in the Universities and vice
versa, then the Board exams is said to
have predictive validity.

Factors Affecting Validity
 Instructions to Take A Test
 Difficult Language Structure
 Inappropriate Level of Difficulty
 Poorly Constructed Test Items
 Ambiguity in Items Statements
 Length of the Test
 Improper Arrangement of Items
 Identifiable Pattern of Answers

Relationship between Validity and Reliability
 Reliability is a necessary requirement for validity
 Establishing good reliability is only the first part of establishing validity
 Reliability is necessary but not sufficient for validity.

3.8.3 Usability of Tests
 Usability testing refers to evaluating a product or service by testing it with
representative users. Typically, during a test, participants will try to complete
typical tasks while observers watch, listen and takes notes. You should also
select tests based on how easy the test is to use. In addition to reliability and
validity, you need to think about how much time you have to create a test, grade
it and administer it. You need to think about how you will interpret and use the
scores from the tests. And you need to check to make sure the test questions and
directions are written clearly, the test itself is short enough not to overwhelm the
students, the questions don't includes stereotypes or personal biases, and that
they are interesting and make the students think.

Department of Secondary Teacher Education
ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD
Dr. Hina Jalal
hinansari23@gmail.com

TEST DEVELOPMENT AND EVALUATION (6462)

More Related Content

What's hot

Similar to TEST DEVELOPMENT AND EVALUATION (6462)

More from HennaAnsari

Recently uploaded

TEST DEVELOPMENT AND EVALUATION (6462)