2. Why do we need to determine the
Reliability and validity of the test?
3. • Without validity and reliability, one
cannot test an hypothesis
• Without hypothesis testing, one
cannot support a theory
• Without a supported theory, one
cannot explain why events occur.
• Without adequate explanation, one
cannot develop effective material
and non- material technologies,
including programs designed for
positive social change
4. Reliability
“consistency”
A measure is considered reliable if it would give
us the same result over and over again
Correlation
-measure how things
are related to one
another
Correlation Coefficient
-the degree of
relationship
5. +1.00 -1.00
+ (positive) Correlation means when one
variable goes up, so as the other on
- (negative) Correlation means when one
variable goes up, and the other one goes down
6. “No TEST, no matter how it is
designed, is FREE from ERROR”
7. We have to consider the
Standard Error Measurement
Random Error (e.g Mood)
Systematic Error (e.g traffic)
8. But have you wondered what is
the real importance of
Standard Error Measurement?
9. The standard error of
measurement (SEm) estimates
how repeated measures of a
person on the same
instrument tend to be
distributed around his or her
“true” score
X1= T + e1 X2 = T + e2
10.
11. Let’s say a child in a class took an
individual intelligence test that yielded a
standard score of 88. The mean of this
test is 100 and the standard deviation is
15. At first glance this score suggests that
the child is in the average range of 85 to
115, where 68% of the normative
population would score
SEM =10
12. Most typical confidence intervals are
68%, 90%, or 95%. Respectively, these
bands may be interpreted as the range
within which a person’s “true” score can
be found 68%, 90%, or 95%of the time.
The 68% confidence level is the one most
typically reported in evaluation reports. This is
often reported in the following manner;
“Given the student’s obtained score of _______,
there are two out of three chances that the
individual’s true score would fall
between_______(low score in range)
and_______(high score in range).”
By
Denise
Bishop
(2006)
13. So what?
The confidence band of the score is
that 2 out of 3, this child’s true score
will be between 78 and 98
The smaller the SEM, the more reliable the test is .
14. Types of Reliability
•Test-Retest Reliability
Used to assess the consistency of a measure from
one time to another.
15. •Inter-Rater or Inter-Observer Reliability
Used to assess the degree to which different
raters/observers give consistent estimates of the
same phenomenon.
16. •Parallel-Forms Reliability
Used to assess the consistency of the results of two
tests constructed in the same way from the same
content domain.
18. Validity
The term validity refers to whether or not
a test measures what it intends to measure.
On a test with high validity the items will be
closely linked to the test’s intended focus. If a
test has poor validity then it does not measure
the competencies it ought to.
Like reliability, there are several ways to
estimate the validity of a test.
19. 1. CONTENT VALIDITY
Content validity refers to the connections
between the test items and the subject-related
tasks. The test should evaluate only the content
related to the field of study in a manner
sufficiently representative, relevant, and
comprehensible.
20. 2. CONSTRUCT VALIDITY
It implies using the construct correctly
(concepts, ideas, notions). Construct validity
seeks agreement between a theoretical concept
and a specific measuring device or procedure.
For example, a test of intelligence nowadays
must include measures of multiple intelligences,
rather than just logical-mathematical and
linguistic ability measures.
21. 3. CRITERION-RELATED VALIDITY
Also referred to as instrumental validity, it
states that the criteria should be clearly
defined by the teacher in advance. It has to
take into account other teachers´ criteria to
be standardized and it also needs to
demonstrate the accuracy of a measure or
procedure compared to another measure or
procedure which has already been
demonstrated to be valid.
22. 4. CONCURRENT VALIDITY
Concurrent validity is a statistical method using correlation,
rather than a logical method.
Examinees who are known to be either masters or non-masters
on the content measured by the test are identified before the
test is administered. Once the tests have been scored, the
relationship between the examinees’ status as either masters
or non-masters and their performance (i.e., pass or fail) is
estimated based on the test. This type of validity provides
evidence that the test is classifying examinees correctly. The
stronger the correlation is, the greater the concurrent validity
of the test is.
23. 5. PREDICTIVE VALIDITY
This is another statistical approach to validity that
estimates the relationship of test scores to an
examinee's future performance as a master or non-master.
Predictive validity considers the question,
"How well does the test predict examinees' future
status as masters or non-masters?" For this type of
validity, the correlation that is computed is based on
the test results and the examinee’s later
performance. This type of validity is especially useful
for test purposes such as selection or admissions.
24. 6. FACE VALIDITY
Like content validity, face validity is determined by a
review of the items and not through the use of
statistical analyses. Unlike content validity, face
validity is not investigated through formal procedures.
Instead, anyone who looks over the test, including
examinees, may develop an informal opinion as to
whether or not the test is measuring what it is
supposed to measure. While it is clearly of some value
to have the test appear to be valid, face validity alone
is insufficient for establishing that the test is
measuring what it claims to measure.
25. 7. Convergent Validity
It connotes whether the information from the
instrument of a quality to be helpful to plan an
intervention
8. Treatment Validity
It indicates the degree to which the instrument
provides information that can lead to the
development of intervention strategies, including
developing goals and objectives, determining methods
and detecting progress.
26. 9. Social Validity
It represents the value and use of the information
obtained from the instrument.