This document discusses the key characteristics of a good measuring instrument or test, including validity, reliability, objectivity, norms, and usability. It defines validity as the accuracy with which a test measures what it claims to measure, and describes different types of validity including content validity, criterion-related validity, and construct validity. Reliability is defined as the consistency of measurement and different methods for estimating reliability are outlined. Objectivity refers to eliminating personal bias from scoring. Norms provide average scores for comparison. Usability factors like ease of administration, timing, cost, and scoring are also addressed.
4. Characteristics Of A Good
Measuring Instrument (Test)
The most essential of these are:
1. Validity
2. Reliability
3. Objectivity
4. Norms
5. Usability.
5. 1. Validity:
Test experts generally agree that the most
important quality of a test is its validity.
The word “validity” means effectiveness”
It refers to the accuracy with which a thing is
measured.
A test is said to be valid if it measures what it
claims to measure.
6. According to Gronlund
“Validity refers to the appropriateness
of the interpretations made from test
scores and other evaluation results,
with regard to a particular use”.
7. Nature/Characteristics of
Validity:
When using the term validity in relation to testing
and evaluation, there are a number of cautions to
be borne in mind. These are:
a. Validity refers to the appropriateness of the
interpretation of the results of a test not to the test
itself.
8. B. Validity is a matter of degree: Validity
does not exist on an all-or-none basis. So
we should avoid speaking valid or invalid
test. Validity is best considered in terms of
categories that specify degree, such as
high validity, moderate validity and low
validity.
9. C. Validity is always specific to some
particular use or interpretation. No
test is valid for all purpose. When
describing validity, it is necessary to
consider the specific interpretation or
use to be made of the result and it is
not valid for all purposes.
10. Types/Approaches to Test
Validation:
According to Gronlund (1990), there are
three basic approaches to the validity of
tests. These are
1. Content validity
2. Criterion-related validity
3. Construct validity
11.
12. 1.Content Validity
According to Gronlund (1990) it refers
to the extent to which the test content
represents a specified universe of
content.
It means that the “test content” (Test
Items) should measure the “course
content” (Curriculum/objectives).
13. Johanson, B & Christensen, L. (2008) stated that
when making your decision, try to answer these
tree questions:
Do the items appear to represent the thing you
are trying to measure?
Does the set of items fully representing the
important content areas or topics?
Have you included all relevant items?
15. Content: Level of Bloom Taxonomy Total
Knowledge Comprehension Application
Grammar 10 5 1 16
Reading 10 5 1 16
Writing 12 5 1 18
Total 32 15 3 50
TABLE OF SPECIFICATION :1
16. 2.Criterion Related Validity:
Criterion-Related validity deals whenever we
need prediction of future performance of
students or to assess/estimate
present/current performance on some
criterion (valued measure other the test itself).
It refers to the extent to which the test
correlates with some criterion.
17.
18. Types Of Criterion-related
Evidence (Validity):
According to Rizvi, A. (1973) there are two
types of Criterion-Related Evidence (validity)
according to time factor. These are
i. Concurrent Evidence (Validity)
ii. Predictive Evidence (Validity)
19. I. Concurrent Evidence
(Validity):
It refers to the extent to which the test
correlates with some criterion obtained at the
same time (i.e. concurrently).
For example when Math's test scores
developed by a class room teacher correlated
with another Math's test or with Teachers
rating, you have concurrent Evidence
(Validity).
20. Concurrent Validity
Concurrent validity is a type of Criterion Validity.
If you create some type of test, you want to
make sure it’s valid: that it measures what it is
supposed to measure. Criterion validity is one
way of doing that. Concurrent validity
measures how well a new test compares to an
well-established test. It can also refer to the
practice of concurrently testing two groups at
the same time, or asking two different groups
of people to take the same test.
21. Advantages:
It is a fast way to validate your
data.
It is a highly appropriate way to
validate personal attributes (i.e.
depression, IQ, strengths and
weaknesses).
22. Disadvantages:
It is less effective than predictive validity to
predict future performance or potential, like
job performance or ability to succeed in
college.
If you are testing different groups, like
people who want jobs and people who have
jobs, responses may differ between groups.
For example, people who already have jobs
may be less inclined to put their best foot
forward.
23. Concurrent Validity Examples
Example 1: If you create a new test for depression levels,
you can compare its performance to previous depression
tests (like a 42-item depression level survey) that have
high validity. Concurrent means “as the same time”, so
you would perform both tests at about the same interval:
you could test depression level on one day with your test,
and on the next day with the established test.
A statistically significant result would mean that you have
achieved concurrent validity. If the tests are farther apart
(i.e. they aren’t administered concurrently), then they
would fall into the category of Predictive Validity instead
of criterion validity.
24.
25. Example 2:
Concurrent validity can also occur between two
different groups. For example, let’s say a group of
nursing students take two final exams to assess their
knowledge. One exam is a practical test and the
second exam is a paper test. If the students who score
well on the practical test also score well on the paper
test, then concurrent validity has occurred. If, on the
other hand, students who score well on the practical
test score poorly on the paper test (and vice versa),
then you have a problem with concurrent validity. In
this particular example, you would question the ability
of either test to assess knowledge.
26. ii. Predictive Evidence (Validity):
According to Gronlund “it
refers to the extent to which
the test correlates with some
criterion obtained after a
stated interval of time”.
27. The Predictive Validity Of A
Test Is Determined By
establishing the relationship between
scores on the test and some measure of
success in the situation of interest.
The test use to predict success is referred
to as the Predictor and
the behavior predicted is referred to as
the “Criterion”
29. Predictive Validity
The ability of a measure to predict for future
outcomes
Predictive validity is the ability of a survey
instrument to predict for future
occurrences. Correlations are used to generate
predictive validity coefficients with other
measures that assess a validated construct that
will occur in the future. Predictive validity is a
type of criterion-related evidence or criterion
validity.
30.
31. Construct Validity:
It refers to the extent to which the test
measure the construct it claims to measure.
A construct is a psychological trait or
quality that we assume to exists in order to
explain some aspect of behavior.
Mathematical reasoning, intelligence,
creativity, sociability, honesty and anxiety
are the examples of construct.
32. There are three basic steps to construct validity.
1. To identify the construct e.g intelligence.
2. From the theory the student can analyses the prediction e.g it is
trait to know.
A. General knowledge.
B. Reasoning power.
C. To solve numerical problems.
D. Decision power.
3. Then measure these factors through test items.
34. 2. Reliability
According to Gronlund “Reliability refers to
the consistency of measurement”
According to Ebel “The ability of the test
to measure the same quality when it is
administered to an individual on two
different occasions or by two different
testers or evaluators is known as reliability”.
35. Nature/Characteristics Of
Reliability:
The meaning of Reliability can further
clarified by noting the following
general points:
1. Reliability refers to the result obtained
with an evaluation instrument (test)
and not to the instrument (test) itself.
36. 2. Reliability Refers To Some
Particular Type Of Consistency.
Test scores are not reliable in general. They
are reliable (or generalizable)
over different period of time,
over different samples of questions,
over different raters, and the like.
37. 3. Reliability Is A Necessary But
Not A Sufficient Condition For
Validity.
A valid test must also be a reliable test, but
high reliability does not ensure that a
satisfactory degree of validity will be
present.
In summary, reliability merely provides the
consistency that makes validity possible.
38. Reliability Is Primarily Statistical
The two widely used method of expressing
reliability are “Standard Error of Measurement”
and Reliability Co-efficient.
Reliability Co-efficient “is a correlation Co-
efficient that indicates the degree of
relationship between two set of measures
obtained from same instrument or procedure”
39.
40. Methods Of Estimating
Reliability
Lien, A.J. (1976) mentioned three basic
methods of estimating Reliability
1. Test-Re-Test Method
2. Alternative Form / Equivalent forms
Method
3. Split-Halves Method/Internal Consistency
Method
41.
42. 2. The Parallel Forms Method
(Equivalent Forms)
In this method two
equivalent forms of a test
are given to the same group
of students and the scores
obtained on the two forms
are correlated.
44. 3.The Split-half Method.
In this method the test is so
designed that it can be divided
into two equivalent halves, say
odd-numbered and even-
numbered items.
It is then administered as whole
only once. The scores of even-
numbered and odd-numbered
items are correlated
45.
46. 3. Practicality Or Usability.
After constructing the test, we have to administer and then to
score the test and improve the quality of the test.
it contains the following steps.
1.EASE OF ADMINISTRATION:
It consists of two points.
a. Instructions: If the instructions are not clear, teacher and
student both will be in difficulty. Such as; how many
questions have to solve or how much time is available for
the test?
b. Timing: Fixing the time for the test is also a difficult fob.
Season must be kept in moving in preparing time table for
the test.
48. Cost Of Testing.
The test should
inexpensive. It should not
be burden of students or
institute, but reliability and
validity must be kept in
mind.
49. Easy To Score.
The test should be so constructed
that its scoring must be easy.
There should be specific marks for
every part and every questions.
Scoring key must be provided to the
evaluator. There should be objectivity
in scoring the test.
50. Objectivity
Objectivity: ... Gronlund and Linn (1995)
states “Objectivity of a test refers to the
degree to which equally competent scores
obtain the same results. So a test is
considered objective when it makes for
the elimination of the scorer's personal
opinion and bias judgement.
51. What Is Objective Type Of
Test?
Objective tests require recognition and
recall of subject matter. The forms vary:
questions of fact, sentence completion,
true-false, analogy, multiple-choice, and
matching. They tend to cover more
material than essay tests. They have one,
and only one, correct answer to each
question.
52. Norms
Test “norms” — short for normative
scores — are scores from standardized
tests given to representative samples of
students who will later take the same test.
Norms provide a way for teachers to
know what scores are typical (or average)
for students in a given grade
53. What Is A Norm-referenced
Test Used For?
Norm-referenced tests report whether test
takers performed better or worse than a
hypothetical average student, which is
determined by comparing scores against
the performance results of a statistically
selected group of test takers, typically of
the same age or grade level, who have
already taken the exam.
Editor's Notes
Concurrent validity pertains to the ability of a survey to correlate with other measures that are already validated. Both the survey of interest and the validated survey are administered to participants at the same time. Then, concurrent validity coefficients are generated using correlations between the survey of interest and the validated survey. Concurrent validity is a type of criterion-related evidence or criterion validity.