Reliability and Validity 
Ma. Benilda L. Adcock
Why do we need to determine the 
Reliability and validity of the test?
• Without validity and reliability, one 
cannot test an hypothesis 
• Without hypothesis testing, one 
cannot support a theory 
• Without a supported theory, one 
cannot explain why events occur. 
• Without adequate explanation, one 
cannot develop effective material 
and non- material technologies, 
including programs designed for 
positive social change
Reliability 
“consistency” 
A measure is considered reliable if it would give 
us the same result over and over again 
Correlation 
-measure how things 
are related to one 
another 
Correlation Coefficient 
-the degree of 
relationship
+1.00 -1.00 
+ (positive) Correlation means when one 
variable goes up, so as the other on 
- (negative) Correlation means when one 
variable goes up, and the other one goes down
“No TEST, no matter how it is 
designed, is FREE from ERROR”
We have to consider the 
Standard Error Measurement 
Random Error (e.g Mood) 
Systematic Error (e.g traffic)
But have you wondered what is 
the real importance of 
Standard Error Measurement?
The standard error of 
measurement (SEm) estimates 
how repeated measures of a 
person on the same 
instrument tend to be 
distributed around his or her 
“true” score 
X1= T + e1 X2 = T + e2
Let’s say a child in a class took an 
individual intelligence test that yielded a 
standard score of 88. The mean of this 
test is 100 and the standard deviation is 
15. At first glance this score suggests that 
the child is in the average range of 85 to 
115, where 68% of the normative 
population would score 
SEM =10
Most typical confidence intervals are 
68%, 90%, or 95%. Respectively, these 
bands may be interpreted as the range 
within which a person’s “true” score can 
be found 68%, 90%, or 95%of the time. 
The 68% confidence level is the one most 
typically reported in evaluation reports. This is 
often reported in the following manner; 
“Given the student’s obtained score of _______, 
there are two out of three chances that the 
individual’s true score would fall 
between_______(low score in range) 
and_______(high score in range).” 
By 
Denise 
Bishop 
(2006)
So what? 
The confidence band of the score is 
that 2 out of 3, this child’s true score 
will be between 78 and 98 
The smaller the SEM, the more reliable the test is .
Types of Reliability 
•Test-Retest Reliability 
Used to assess the consistency of a measure from 
one time to another.
•Inter-Rater or Inter-Observer Reliability 
Used to assess the degree to which different 
raters/observers give consistent estimates of the 
same phenomenon.
•Parallel-Forms Reliability 
Used to assess the consistency of the results of two 
tests constructed in the same way from the same 
content domain.
•Internal Consistency Reliability 
Used to assess the consistency of results across 
items within a test.
Validity 
The term validity refers to whether or not 
a test measures what it intends to measure. 
On a test with high validity the items will be 
closely linked to the test’s intended focus. If a 
test has poor validity then it does not measure 
the competencies it ought to. 
Like reliability, there are several ways to 
estimate the validity of a test.
1. CONTENT VALIDITY 
Content validity refers to the connections 
between the test items and the subject-related 
tasks. The test should evaluate only the content 
related to the field of study in a manner 
sufficiently representative, relevant, and 
comprehensible.
2. CONSTRUCT VALIDITY 
It implies using the construct correctly 
(concepts, ideas, notions). Construct validity 
seeks agreement between a theoretical concept 
and a specific measuring device or procedure. 
For example, a test of intelligence nowadays 
must include measures of multiple intelligences, 
rather than just logical-mathematical and 
linguistic ability measures.
3. CRITERION-RELATED VALIDITY 
Also referred to as instrumental validity, it 
states that the criteria should be clearly 
defined by the teacher in advance. It has to 
take into account other teachers´ criteria to 
be standardized and it also needs to 
demonstrate the accuracy of a measure or 
procedure compared to another measure or 
procedure which has already been 
demonstrated to be valid.
4. CONCURRENT VALIDITY 
Concurrent validity is a statistical method using correlation, 
rather than a logical method. 
Examinees who are known to be either masters or non-masters 
on the content measured by the test are identified before the 
test is administered. Once the tests have been scored, the 
relationship between the examinees’ status as either masters 
or non-masters and their performance (i.e., pass or fail) is 
estimated based on the test. This type of validity provides 
evidence that the test is classifying examinees correctly. The 
stronger the correlation is, the greater the concurrent validity 
of the test is.
5. PREDICTIVE VALIDITY 
This is another statistical approach to validity that 
estimates the relationship of test scores to an 
examinee's future performance as a master or non-master. 
Predictive validity considers the question, 
"How well does the test predict examinees' future 
status as masters or non-masters?" For this type of 
validity, the correlation that is computed is based on 
the test results and the examinee’s later 
performance. This type of validity is especially useful 
for test purposes such as selection or admissions.
6. FACE VALIDITY 
Like content validity, face validity is determined by a 
review of the items and not through the use of 
statistical analyses. Unlike content validity, face 
validity is not investigated through formal procedures. 
Instead, anyone who looks over the test, including 
examinees, may develop an informal opinion as to 
whether or not the test is measuring what it is 
supposed to measure. While it is clearly of some value 
to have the test appear to be valid, face validity alone 
is insufficient for establishing that the test is 
measuring what it claims to measure.
7. Convergent Validity 
It connotes whether the information from the 
instrument of a quality to be helpful to plan an 
intervention 
8. Treatment Validity 
It indicates the degree to which the instrument 
provides information that can lead to the 
development of intervention strategies, including 
developing goals and objectives, determining methods 
and detecting progress.
9. Social Validity 
It represents the value and use of the information 
obtained from the instrument.

Rep

  • 1.
    Reliability and Validity Ma. Benilda L. Adcock
  • 2.
    Why do weneed to determine the Reliability and validity of the test?
  • 3.
    • Without validityand reliability, one cannot test an hypothesis • Without hypothesis testing, one cannot support a theory • Without a supported theory, one cannot explain why events occur. • Without adequate explanation, one cannot develop effective material and non- material technologies, including programs designed for positive social change
  • 4.
    Reliability “consistency” Ameasure is considered reliable if it would give us the same result over and over again Correlation -measure how things are related to one another Correlation Coefficient -the degree of relationship
  • 5.
    +1.00 -1.00 +(positive) Correlation means when one variable goes up, so as the other on - (negative) Correlation means when one variable goes up, and the other one goes down
  • 6.
    “No TEST, nomatter how it is designed, is FREE from ERROR”
  • 7.
    We have toconsider the Standard Error Measurement Random Error (e.g Mood) Systematic Error (e.g traffic)
  • 8.
    But have youwondered what is the real importance of Standard Error Measurement?
  • 9.
    The standard errorof measurement (SEm) estimates how repeated measures of a person on the same instrument tend to be distributed around his or her “true” score X1= T + e1 X2 = T + e2
  • 11.
    Let’s say achild in a class took an individual intelligence test that yielded a standard score of 88. The mean of this test is 100 and the standard deviation is 15. At first glance this score suggests that the child is in the average range of 85 to 115, where 68% of the normative population would score SEM =10
  • 12.
    Most typical confidenceintervals are 68%, 90%, or 95%. Respectively, these bands may be interpreted as the range within which a person’s “true” score can be found 68%, 90%, or 95%of the time. The 68% confidence level is the one most typically reported in evaluation reports. This is often reported in the following manner; “Given the student’s obtained score of _______, there are two out of three chances that the individual’s true score would fall between_______(low score in range) and_______(high score in range).” By Denise Bishop (2006)
  • 13.
    So what? Theconfidence band of the score is that 2 out of 3, this child’s true score will be between 78 and 98 The smaller the SEM, the more reliable the test is .
  • 14.
    Types of Reliability •Test-Retest Reliability Used to assess the consistency of a measure from one time to another.
  • 15.
    •Inter-Rater or Inter-ObserverReliability Used to assess the degree to which different raters/observers give consistent estimates of the same phenomenon.
  • 16.
    •Parallel-Forms Reliability Usedto assess the consistency of the results of two tests constructed in the same way from the same content domain.
  • 17.
    •Internal Consistency Reliability Used to assess the consistency of results across items within a test.
  • 18.
    Validity The termvalidity refers to whether or not a test measures what it intends to measure. On a test with high validity the items will be closely linked to the test’s intended focus. If a test has poor validity then it does not measure the competencies it ought to. Like reliability, there are several ways to estimate the validity of a test.
  • 19.
    1. CONTENT VALIDITY Content validity refers to the connections between the test items and the subject-related tasks. The test should evaluate only the content related to the field of study in a manner sufficiently representative, relevant, and comprehensible.
  • 20.
    2. CONSTRUCT VALIDITY It implies using the construct correctly (concepts, ideas, notions). Construct validity seeks agreement between a theoretical concept and a specific measuring device or procedure. For example, a test of intelligence nowadays must include measures of multiple intelligences, rather than just logical-mathematical and linguistic ability measures.
  • 21.
    3. CRITERION-RELATED VALIDITY Also referred to as instrumental validity, it states that the criteria should be clearly defined by the teacher in advance. It has to take into account other teachers´ criteria to be standardized and it also needs to demonstrate the accuracy of a measure or procedure compared to another measure or procedure which has already been demonstrated to be valid.
  • 22.
    4. CONCURRENT VALIDITY Concurrent validity is a statistical method using correlation, rather than a logical method. Examinees who are known to be either masters or non-masters on the content measured by the test are identified before the test is administered. Once the tests have been scored, the relationship between the examinees’ status as either masters or non-masters and their performance (i.e., pass or fail) is estimated based on the test. This type of validity provides evidence that the test is classifying examinees correctly. The stronger the correlation is, the greater the concurrent validity of the test is.
  • 23.
    5. PREDICTIVE VALIDITY This is another statistical approach to validity that estimates the relationship of test scores to an examinee's future performance as a master or non-master. Predictive validity considers the question, "How well does the test predict examinees' future status as masters or non-masters?" For this type of validity, the correlation that is computed is based on the test results and the examinee’s later performance. This type of validity is especially useful for test purposes such as selection or admissions.
  • 24.
    6. FACE VALIDITY Like content validity, face validity is determined by a review of the items and not through the use of statistical analyses. Unlike content validity, face validity is not investigated through formal procedures. Instead, anyone who looks over the test, including examinees, may develop an informal opinion as to whether or not the test is measuring what it is supposed to measure. While it is clearly of some value to have the test appear to be valid, face validity alone is insufficient for establishing that the test is measuring what it claims to measure.
  • 25.
    7. Convergent Validity It connotes whether the information from the instrument of a quality to be helpful to plan an intervention 8. Treatment Validity It indicates the degree to which the instrument provides information that can lead to the development of intervention strategies, including developing goals and objectives, determining methods and detecting progress.
  • 26.
    9. Social Validity It represents the value and use of the information obtained from the instrument.