Assessment and evaluation- A new perspective
Unit 2- Tests and its Application
Syllabus of Unit 2
Testing- Concept and Nature
Developing and Administering Teacher Developed Tests
Characteristics of a good Test
Standardization of Test
Types of Tests- Psychological Test, Reference Test, Diagnostic Tests
2.2.1. Introduction-
Teachers construct various tools for the assessment of various traits of their students.
The most commonly used tools constructed by a teacher are the achievement tests. The achievement tests are constructed as per the requirement of a particular class and subject area they teach.
Besides achievement tests, for the assessment of the traits, a teacher observes his students in a classroom, playground and during other co-curricular activities in the school. The social and emotional behavior is also observed by the teacher. All these traits are assessed. For this purpose too, tools like rating scales are constructed.
Evaluation Tools used by the teacher may both be standardized and non-standardised.
A standardized tool is one which got systematically developed norms for a population. It is one in which the procedure, apparatus and scoring have been fixed so that precisely the same test can be given at different time and place as long as it pertains to a similar type of population. The standardized tools are used in order to:
Compare achievements of different skills in different areas
Make comparison between different classes and schools They have norms for the particular population. They are norm referenced.
On the other hand, teachers make tests as per the requirements of a particular class and the subject area they teach. Hence, they are purposive and criterion referenced. They want:
to assess how well students have mastered a unit of instruction;
to determine the extent to which objectives have been achieved;
to determine the basis for assigning course marks and find out how effective their teaching has been.
So our syllabus here revolves around the Tests.
2.2.2- Developing and Administering Teacher Developed Tests-
2.2.3-CHARACTERISTICS OF GOOD MEASURING INSTRUMENT -
1. VALIDITY-
Any measuring instruments must fulfill certain conditions. This is true in all spheres, including educational evaluation.
Test validity refers to the degree to which a test accurately measures what it claims to measure. It is a critical concept in the field of psychometrics and is essential for ensuring that a test is meaningful and useful for its intended purpose. It is the test is meant to examine the understanding of scientific concept; it should do only that and should not be attended for other abilities such as his style of presentation, sentence patterns or grammatical construction. Validity is specific rather than general criterion of a good test. Validity is a matter of degree. It may be high, moderate or low.
There are several types of validity, each addressing different aspects of the testing process:
1. Face-validity, 2.Content
2. SYLLABUS OF UNIT 2
Testing- Concept and Nature
Developing and Administering Teacher Developed Tests
Characteristics of a good Test
Standardization of Test
Types of Tests- Psychological Test, Reference Test, Diagnostic Tests
3. 2.2.1. INTRODUCTION-
Teachers construct various tools for the assessment of various traits of their
students.
The most commonly used tools constructed by a teacher are the achievement tests.
The achievement tests are constructed as per the requirement of a particular class
and subject area they teach.
Besides achievement tests, for the assessment of the traits, a teacher observes
his students in a classroom, playground and during other co-curricular activities in
the school. The social and emotional behavior is also observed by the teacher. All
these traits are assessed. For this purpose too, tools like rating scales are
constructed.
Evaluation Tools used by the teacher may both be standardized and non-
standardised.
4. A standardized tool is one which got systematically developed norms for a population.
It is one in which the procedure, apparatus and scoring have been fixed so that
precisely the same test can be given at different time and place as long as it pertains
to a similar type of population. The standardized tools are used in order to:
1. Compare achievements of different skills in different areas
2. Make comparison between different classes and schools They have norms for the
particular population. They are norm referenced.
On the other hand, teachers make tests as per the requirements of a particular class
and the subject area they teach. Hence, they are purposive and criterion referenced.
They want:
1. to assess how well students have mastered a unit of instruction;
2. to determine the extent to which objectives have been achieved;
3. to determine the basis for assigning course marks and find out how effective their
teaching has been.
So our syllabus here revolves around the Tests.
5. 2.2.2- DEVELOPING AND ADMINISTERING
TEACHER DEVELOPED TESTS-
Steps of developing a
Test
Steps of administering a
Test
Time for Test
1. Defining the Test 1. Editing of Tests 1. Answer List and Scoring Scheme
2. Defining the
Objectives
2. Exercise instructions
to students
2. Question wise Analysis
3. Determining the
weightage for content
and objectives
3. Examiner must be
completely aware of
minute details.
3. Appropriate Time-Not Too much
not too less
4. To develop questions - -
6. 2.2.3-CHARACTERISTICS OF GOOD
MEASURING INSTRUMENT -
Characteristic
s of a Good
Test
Validity
Reliability
Objectivity
Adequacy Feasibility
Discriminatio
n
Precision
Difficulty
Level
7. 1. VALIDITY-
Any measuring instruments must fulfill certain conditions. This is true in all
spheres, including educational evaluation.
Test validity refers to the degree to which a test accurately measures what it
claims to measure. It is a critical concept in the field of psychometrics and is
essential for ensuring that a test is meaningful and useful for its intended
purpose. It is the test is meant to examine the understanding of scientific
concept; it should do only that and should not be attended for other abilities
such as his style of presentation, sentence patterns or grammatical
construction. Validity is specific rather than general criterion of a good test.
Validity is a matter of degree. It may be high, moderate or low.
There are several types of validity, each addressing different aspects of the
testing process:
1. Face-validity, 2.Content validity, 3.Validity construct, 4.Concurrent validity,
5.Predictive-validity
8. FACTORS AFFECTING THE VALIDITY OF A
TEST-
1. Lack of clarity indirection.
2. Language difficulty.
3. Medium of expression.
4. Difficulty level of items.
5. Poor test items.
6. Time limit.
7. Inadequate coverage.
8. Halo effect.
9. Influence of Extraneous factors.
9. 2. RELIABILITY-
It is the trustworthiness.
Reliability of a test may be defined as “The degree of consistency among
test scores”.
The term “Reliability” in evaluation refers to the consistency of scores
obtained by the same individuals on different occasions or with
different sets of equivalent items.
It is a statistical concept and is expressed as correlation co-efficient
called reliability co-efficient.
10. FACTORS AFFECTING RELIABILTY -
1. Length of the test.
2. Objectivity in scoring.
3. Range of talent.
4. Difficulty of the test.
5. Ambiguous wording of questions.
6. Testing conditions.
7. Optional questions.
11. INTERRELATIONSHIP BETWEEN
VALIDITY AND RELIABILITY-
The interrelationship between validity and reliability can be understood through
their complementary roles in the assessment of measurement instruments, such
as tests.
Reliability is a Prerequisite for Validity.
While reliability establishes the consistency and dependability of
measurements, validity ensures that the measurements are accurate and
meaningful in relation to the construct of interest. Both concepts are critical
for building confidence in the interpretation and use of test scores.
In summary, reliability and validity work together to ensure the quality of a
measurement instrument.
12. 3. OBJECTIVITY-
A test is said to be objective when the examiner’s personal opinion or judgment does not affect the
scoring. Objectivity of a test refers to “The degree to which equally competent scores obtain the
same result”. Objectivity is always relative never absolute.
The objectivity of a test can be increased by-
1. Using more objective type test items.
2. Making essay type test items more unambiguous, well-constructed, giving specific directions
which could establish a frame work within which students can operate.
3. Preparing marking scheme.
4. Setting realistic standard.
5. Asking two independent examiners to evaluate the test and using the average score of the two as
the final score.
6. Objectivity in a test makes for the elimination of biased opinion or judgment of the person who
scores it.
7. Objective judgments are accurate and hence tend to be reliable.
13. 4. ADEQUACY-
A test is said to adequate only if it is balanced and fair; the aim of teaching is
not merely to enable the students to repeat what has been taught.
Apart from knowledge there are several types of skills that can be considered
as outcomes of teaching.
A test must examine these skills also.
The test constructor must measure all the educational attainments of students
by including test items that measure various outcomes of teaching.
14. 5. PRACTICABILITY OR FEASIBILITY-
Practicability relates to the practical aspects by the test respect of
administration, scoring and economy. Practicability of a test refers to
the extent to which the measuring instrument can be successfully
employed by the teacher and school administrators without an
unnecessary expenditure of time and energy.
Practicability of the test depends upon the following factors.
a. Ease of Administration
b. Ease of scoring
c. Ease of interpretation
d. Economy
e. Availability of equivalent forms of a test
15. 6. DISCRIMINATING POWER-
The basic function of all educational measurement is to place individuals in a defined
scale in accordance with differences in their achievements. A good measuring instrument
should detect or measure small differences in the achievement of students and thus
discriminate between the good and the bad students.
The difference between the intelligent brain and the common brain must be in the test.
The more discriminating the test, the better the test.
The discriminating power of test items refers to the degree to which it discriminates
between good and bad students in a group.
RU-RL Discrimination Index = ------------- ½ N RU-No. of correct responses from the
upper group; RL-No. of correct responses from the lower group; N-Total No. of pupil who
attempted them.
It is usually expressed as decimal.
16. 7. DIFFICULTY LEVEL OR FACILITY LEVEL-
Dealing with classification is also an important issue.
When questions are too easy or too difficult it would not classify among
pupils; thereby the validity of a test will be lowered.
This is a matter which determines the discriminatory opinion of test.
A test that is too easy or too difficult does not make a good test. It
depends on whether the test is easy or difficult.
Therefore it is necessary to check the validity of each student and then
to check the number of easy and difficult students in the test.
17. 8. ACCURACY (PRECISION):
We are going to measure this point with the help of test to be accurate.
Because if there is a mistake in the measurements we make, the
benchmarks based on these measurements are prone to error. Accuracy
is a pre-requisite condition for any measurement
Also, if we are not going to use multiple choice in the test, we need to be
very careful about the choices we are going to give. Your options should
be such that the test taker is not left guessing and the options should
look exactly the same.
18. 2.2.4-STANDARDIZATION OF TEST-
A standardized test is a method of assessment built on the principle of
consistency: all test takers are required to answer the same questions
and all answers are graded in the same, predetermined way.
Standardized tests are often used to select students for specific
programs. For example, the SAT (Scholastic Assessment Test) and ACT
(American College Test) are norm referenced tests used to help
determine if high school students are admitted to selective colleges.
The characteristics of standardized test are as follows:
Validity − The test has to be devised to measure what it claims to
measure in order to be held as valid and usable.
Reliability − This refers to the consistency of scores obtained by an
individual on the same test on two different occasions.
19. STANDARDIZED TESTS-
Standardized testing can be defined as testing developed by experts in which
each student taking the test responds to either the same set of questions or
questions derived from a common bank of questions.
Standardized tests tend to use multiple-choice or true/false questions,
however, some standardized tests include essay questions, matching questions,
or even spoken items.
These questions are carefully selected and aim to offer an objective
measurement of the student's current level of achievement in a given area.
Different types of standardized tests include college-admissions tests,
aptitude tests, achievement tests, psychological (IQ) tests, and international
comparison tests intended to monitor trends in achievement across
participating nations.
20. PROS AND CONS OF STANDARDIZED
TESTS-
What are the advantages of standardized tests?
The quantifiable results that give educators the opportunity to identify areas where students
are proficient or where students are in need of remediation or advancement. Through regular
standardized testing, educators can view reports with information on a student's progress and
identify a trend of growth or decline. It should be noted that opinions on these perceived
advantages may differ among different education stakeholders including policymakers, parents,
and teachers.
What are some disadvantages of standardized testing?
The non-academic factors that can influence a student's test score including anxiety, fatigue,
and a lack of motivation. Standardized test questions also fail to assess a student's higher-
level thinking skills, and educators may be tempted to "teach-to-the-test" rather than focus on
the unique needs of the students in their classroom. Test questions can also be written in ways
that students without certain lived experiences are unable to understand. This creates a
system of inequity and one in which some students are rewarded for their ability to understand
a question's context while others are penalized at no fault of their own.
21. STEPS FOR STANDARDIZING A TEST-
Step 1-
Defining
Objectives
Step 2-
Preparation
of Rough
Test Draft
Step 3- Pilot
Study of the
Initial Test
Step 4-
Develop a
Manual for
Administerin
g the Test
Step 5- Make
the Test
open to use
to all
22. 2.2.5- TYPES OF TESTS
Psychological Test
Reference Test
Diagnostic Tests
23. PSYCHOLOGICAL TEST-
What is a psychological test?
A psychological test is one that measures intricate and in-depth information, as
opposed to the broad information of an assessment. A test typically is conducted in
the presence of an administer. Psychological Assessments or Psychological Tests are
verbal or written tests formed to evaluate a person’s behaviour. Many types of
Psychological tests help people understand various dynamics of the human being. It
helps us understand why someone is good at something, while the other is good at
another.
What is the purpose of psychological testing?
The purpose of psychological testing is to discover an individual's emotional and
psychological characteristics.
24. THERE ARE NINE TYPES OF
PSYCHOLOGICAL TESTS:
1. Personality Tests.
2. Intelligence Tests.
3. Attitude Tests.
4. Aptitude Tests.
5. Emotional Intelligence Tests.
6. Intelligence Test.
7. Neuropsychological Tests.
8. Projective Tests.
9. Direct Observation Tests.
25. USES OF PSYCHOLOGICAL TESTING-
Psychological Tests are mainly used to analyze the mental abilities and attributes of an
individual, including personality, achievement, ability and neurological functioning.
Here are the central and most important uses of Psychological Testing:
1. Detection of Specific Behavior
2. Psychological Diagnosis
3. Tools in Academic Placements
4. Screening Job Candidates
5. Individual Differences
6. Research
7. To Promote Self-awareness and Understanding
8. Psychometrics
9. Organizational Development
10. Career Assessment Tests
26. ETHICAL CONSIDERATIONS:
Psychological tests are standardized and objective measures of an individual's
behavior, cognitive abilities, personality, or other psychological constructs.
These tests are designed to assess various aspects of an individual's psychological
functioning and provide valuable information for understanding behavior, making
diagnoses, guiding interventions, or making predictions about future behavior.
The use of psychological tests requires careful consideration of ethical guidelines
to ensure the fair and responsible treatment of individuals being tested. This
includes obtaining informed consent, maintaining confidentiality, and providing
feedback to test-takers.
Psychological testing is a valuable tool in various settings, including clinical
psychology, education, human resources, and research. Psychologists use these tests
to gain insights into an individual's psychological functioning, inform treatment
planning, and make informed decisions in various professional contexts.
27. NORM REFERENCED TEST-
A norm-referenced test is a type of assessment that compares an individual's performance to that of a
group of individuals, known as the norming or reference group. The primary goal of a norm-referenced
test is to rank individuals relative to each other, providing information about how a test-taker compares
to a larger population.
Key characteristics of norm-referenced tests include:
Norming Group: The test is administered to a representative sample, or norming group, that is considered
to be similar to the population for which the test is intended. The performance of individuals within this
group is used to establish norms or reference points.
Scores in Percentiles or Standard Scores: Norm-referenced tests often provide scores in percentiles
or standard scores (e.g., z-scores or T-scores).
Comparative Information: The primary purpose of norm-referenced testing is to compare individuals to
their peers. Test results help determine how well a person performed in comparison to others within the
norming group.
Ranking: Test-takers are ranked in relation to the norming group. For example, if an individual's score is
at the 75th percentile, it means they scored as well as or better than 75% of the norming group.
28. STEPS TO DEVELOP NRT-
Step 1-
Planning the
Test
Step 2-
Developing the
Test
Step 3-
Administering
the Test
Step 4-
Developing
Test Norms
Step 5-
Writing Test
Manual
29. USES AND LIMITATIONS OF NRT -
Uses in Education: Norm-referenced tests are commonly used in educational
settings for various purposes, such as admission to educational programs, placement
in classes, and identifying individuals who may need additional support or enrichment.
Critiques and Limitations: While norm-referenced tests provide valuable
comparative information, critics argue that they may not capture the full range of an
individual's abilities or provide detailed information about specific skills.
Additionally, some argue that an overemphasis on norm-referenced testing can lead
to a focus on competition rather than individual growth.
Examples of norm-referenced tests include standardized achievement tests (e.g.,
SAT, ACT), intelligence tests (e.g., Wechsler Intelligence Scale for Children, WISC),
and various psychological assessments.
It's important to note that norm-referenced tests are just one type of assessment,
and their appropriateness depends on the specific goals of the assessment and the
context in which it is used.
30. CRITERION REFERENCED TEST-
A criterion-referenced test is a type of assessment that evaluates an individual's performance against a
predetermined set of criteria or a specific standard rather than comparing their performance to that of a
group (as is done in norm-referenced tests). The key focus of a criterion-referenced test is to determine
what a test-taker knows and can do in relation to a defined set of skills or knowledge.
Key features of criterion-referenced tests include:
Objective Standards: Criterion-referenced tests are designed to assess whether an individual has
achieved a specific level of competence or mastery in a particular area. The standards or criteria are set in
advance, often based on educational or performance objectives.
Performance Indicators: The test includes specific performance indicators or behaviors that demonstrate
the mastery of a skill or the understanding of content. Test items are aligned with these indicators.
Absolute Measurement: Performance on a criterion-referenced test is not compared to the performance
of a norming group. Instead, it is measured against an absolute standard, indicating the extent to which a
test-taker has met specific criteria.
Score Interpretation: Scores on criterion-referenced tests are typically reported as the number or
percentage of items answered correctly, indicating the degree to which the individual has mastered the
content or skills being assessed.
31. DEVELOPING A CRITERION-REFERENCED TEST (CRT) INVOLVES
SEVERAL SYSTEMATIC STEPS TO REFINE THE TEST. IT HAS TO BE
REVIEWED REGULARLY -
Prepare a list of
20-25 experts
related to the
content on which
the test is to be
prepared.
The experts will
determine the
minimum number
of questions the
target students
can solve.
Sum up all the
score of each
test. Average is
the minimum level
or standard
expected.
32. CRT-
Uses in Education: Criterion-referenced tests are commonly used in educational settings for
purposes such as grading, determining mastery of specific learning objectives, and providing
feedback for instructional purposes.
Individualized Feedback: Because criterion-referenced tests focus on specific skills or
knowledge areas, they can provide detailed feedback to test-takers, educators, and parents
about strengths and areas that may need improvement.
Examples: Examples of criterion-referenced tests include many classroom quizzes, end-of-
course exams, and standardized tests designed to measure specific skills or knowledge domains
(e.g., a spelling test, a mathematics proficiency test).
Criterion-referenced testing is often used in educational assessment to measure the
effectiveness of instructional programs and to determine whether students have achieved
specific learning objectives. It is particularly valuable in situations where it is important to
assess mastery of specific skills or knowledge rather than comparing individuals to each other.
In summary, while norm-referenced tests compare individuals to a norming group, criterion-
referenced tests focus on whether an individual has met predefined standards or criteria,
providing a more absolute measure of performance.
33. COMPARATIVE STUDY OF NRT AND CRT-
Purpose and
Focus:
Norm-Referenced Tests (NRTs): NRTs are designed
to compare an individual's performance to that of a
norming or reference group. The focus is on ranking
individuals relative to each other, providing
information about how a test-taker compares to a
larger population
Criterion-Referenced Tests (CRTs): CRTs, on the
other hand, are designed to determine whether an
individual has achieved a specific level of competence or
mastery in a particular area. The focus is on absolute
measurement against predetermined criteria.
Score
Interpretation:
NRTs: Scores on norm-referenced tests are often
reported in percentiles or standard scores, indicating
the test-taker's rank relative to the norming group.
CRTs: Scores on criterion-referenced tests are
typically reported as the number or percentage of items
answered correctly, indicating the degree to which the
individual has mastered the specified criteria.
Comparative vs.
Absolute
Measurement:
NRTs: NRTs provide a relative measure of
performance, indicating how a test-taker compares to
others in the norming group
CRTs: CRTs provide an absolute measure of
performance, indicating whether a test-taker has met
predetermined criteria.
Uses in Education: NRTs: NRTs are often used for purposes such as
ranking students, making group-level comparisons, and
identifying individuals for specific programs or
interventions.
CRTs: CRTs are commonly used for grading, determining
mastery of specific learning objectives, and providing
detailed feedback for instructional purposes.
Individualized and
Group-Level
Information:
NRTs: NRTs provide information about how an
individual compares to a larger population.
CRTs: CRTs provide information about whether an
individual has achieved specific learning objectives. They
also offer detailed feedback that can inform
instructional decisions at the individual and group levels.
34. NRT AND CRT-
Norm-referenced tests (NRTs) and criterion-referenced tests (CRTs) are two distinct types
of assessments, but they serve different purposes and can be used in complementary ways.
Here's an overview of the interrelationship between norm-referenced and criterion-
referenced tests:
Complementary Use:
Interplay in Educational Assessment: In practice, educational assessments often include
both NRTs and CRTs. For example, a standardized achievement test (NRT) may be used to
compare a student's performance to a national sample, while a classroom quiz (CRT) may
assess mastery of specific lesson objectives.
In summary, while norm-referenced and criterion-referenced tests have different purposes
and measurement approaches, they are not mutually exclusive. Educational assessment
strategies often involve a combination of both types of tests to provide a comprehensive
understanding of a student's performance, comparing them to others (NRTs) and assessing
their mastery of specific criteria (CRTs). The interplay between these two types of
assessments contributes to a more holistic view of educational outcomes.
35. DIAGNOSTIC TEST-
A diagnostic test is an assessment tool designed to identify specific strengths and weaknesses in an
individual's knowledge or skills. The primary purpose of a diagnostic test is to diagnose areas of difficulty
and provide insights into the learner's abilities. These tests are often used in educational settings,
healthcare, psychology, and other fields to inform decision-making and guide interventions.
Key features of diagnostic tests include:
1. Identification of Areas of Need
2. Individualized Assessment
3. Informative Feedback
4. Basis for Intervention
5. Varied Formats
6. Early Intervention
36. STEPS TO DEVELOP DIAGNOSTIC TEST-
Planning
Writing
Items
Assembling
Test
Providing
Directions
and Preparing
a Scoring Key
Reviewing the
Test
37. EXAMPLES OF DIAGNOSTIC TESTS
INCLUDE-
Reading Diagnostic Test: Assessing specific reading skills such as decoding, fluency, and
comprehension.
Mathematics Diagnostic Test: Evaluating mathematical skills in areas such as computation,
problem-solving, and understanding of mathematical concepts.
Language Skills Diagnostic Test: Assessing language skills such as vocabulary, grammar, and
language comprehension.
Cognitive Skills Diagnostic Test: Identifying strengths and weaknesses in cognitive
abilities, such as memory, attention, and processing speed.
In healthcare, diagnostic tests may refer to medical tests used to identify the presence or
absence of a disease or medical condition.
Overall, the goal of a diagnostic test is to provide targeted information that can guide
educational or intervention decisions and help individuals improve in specific areas of need.
38. QUESTIONS FOR SELF-STUDY-
1. Prepare a test of at least 20 MCQ. Administer that test on at least
50 students. And specify its Discrimination Ability and Difficulty
Level.
2. With the help of any Interest Inventory, find out the interest of any
1 student and write a report on it.
3. Visit any College of Education, Study the psychological tests available
there. Write a report on any 1 test.
4. With the help of Intelligence Test, check the intelligence of any 1
student and write a report on it.