2. Reliability
Reliability is defined as the ability of an
instrument to create reproducible results,
because without the ability to reproduce
results no truth can be known.
3. Reliability
“the reliability of an instrument is the degree of
consistency with which the instrument measures
the target attribute”
-Pilot and Hungler
“reliability constitutes the ability of a measuring
instrument to produce the same answer on
successive occasions when no change has
occurred in the thing being measured”
-Burroughs
4. Features
Reliability concerns a measure’s accuracy
An instrument is reliable to the extent that it’s measures reflects the true score
It depends on the extent that the errors of measurement are absent from
obtained scores
A reliable measures maximizes the true score and minimizes the error
component
Reliability testing must be performed on each instrument used in a study
Reliability exists in degrees
Correlation coefficient with 1.00 indicates perfect reliability and 0 indicates no
reliability. Negative scores (-1.0) indicates negative reliability
6. Stability
Stability of an instrument is the extent to which
similar results or responses are obtained on two
or more separate occasions.
The assessment of an instrument’s stability
involves a procedure that evaluates its test-retest
reliability.
Investigators administer the same measure to a
sample twice and then compare the scores.
Usually, an interval of two weeks is kept between
the two assessments, because too short a period
(less than two weeks) would influence the second
scorings and too far apart in time would result in
loss of subjects or unexpected change in the
variable under study.
7. Stability
Test-retest:
Test-retest reliability is the degree to which
scores are consistent over time. It indicates
score variation that occurs from testing
session to testing session as a result of errors
of measurement.
Same test- different Times
Only works if phenomenon is unchanging
Example: Administering the same
questionnaire at 2 different times
8. Stability
The statistical test coefficient of correlation is
computed to confirm the test’s reliability.
The possible values for a correlation
coefficient range from −1.00 to +1.00.
A reliability coefficient above 0.80 is usually
considered good; however, a score of 0.70 can
be considered acceptable.
9. Reliability
There are a number of statistical formulae that
are used to compute reliability.
1. Stability (Test-retest) method
Pearson’s correlation coefficient formula for
estimation of reliability
10. Equivalence
The focus equivalence is the comparison of
two versions of the same paper and pencil
instrument or of the two observers measuring
the same events.
11. Inter-item reliability: (internal
consistency)
The association of answers to set of questions
designed to measure the same concept
Cronbach’s alpha is a statistic commonly used
to measure inter-item reliability which is based
on the average of all the possible correlations
of all the split 1/2 s of set of questions on a
questionnaire.
12.
13. Parallel form of Reliability
Split-Half Reliability:
Especially appropriate when the test is very
long. The most commonly used method to split
the test into two is using the odd-even
strategy.
Since longer tests tend to be more reliable,
and since split-half reliability represents the
reliability of a test only half as long as the
actual test.
14.
15. Inter observer Reliability
Correspondence between measures made by
different observers
Inter-Rater or Inter-Observer Reliability:
Used to assess the degree to which different
raters/observers give consistent estimates of
the same phenomenon.
16. Homogeneity (Internal
Consistency)
An instrument may be said to be internally
consistent or homogeneous to the extent that
its items measure the same trait.
Scales that are designed to measure an
attribute are ideally composed of items that
measure that attribute and nothing else.
On a scale to measure married women’s
attitude towards family planning, it would be
inappropriate to include few items that
measure their attitude towards breast feeding.
17. Homogeneity (Internal
Consistency)
The most widely used method for evaluating internal
consistency is coefficient alpha (Cronbach’s alpha).
By using split-half method, the internal consistency of
a tool can be calculated with the help of Spearman–
Brown correlation formula.
Most statistical software can be used to calculate
alpha.
Coefficient alpha can be interpreted like other
reliability coefficients. The normal range is between
0.00 and +1.00 and higher values reflect higher
internal consistency.
A high value would mean that all the items in the
instrument will consistently measure the construct.
18. Reliability interpretation
Excellent
0.90 and above
best test
0.80-0.90 very
good
Good
0.70-0.80 good
Few items may
need
improvement
Somewhat low
0.60-0.70 low
Some items need
to improve/ need
more tests for
grading
Need for
revision of test
0.50-0.60
Need for revision
of test
Questionable
reliability
0.50 and below
Complete revision
of test
19. Validity
In general, validity is an indication of how
sound your research is.
Validity applies to both the design and
methods of research.
20. VALIDITY
“Any research can be affected by different kinds
of factors which, while extraneous to the
concerns of the research, can invalidate the
findings”
(Seliger and Shohamy, 1989)
21. Validity
“the degree to which an instrument measures,
what it is supposed to measure.”
The accuracy with which a test measures
whatever it is intended/supposed to measure.
22. Taxonomy of validity
Pertains to assessment
Pertains to casual
inference
Pertains to generalization
of finding to real world
phenomenon
26. Face validity
This is concerned with how a measure or
procedure appears.
Face validity does not depend upon
established theories for support.
27. Content validity
It is based on the extent to which a
measurement reflects the specific intended
domain of content.
All major aspects of the content must be
adequately covered by the test items and in
correct positions.
28. Criterion validity
It is also referred to as instrumental validity.
It is used o demonstrate the accuracy of a
measure or procedure by comparing it with
another measure or procedure, which has
been demonstrated to be valid.
29. Construct validity
It is the extent to which the test may be set to
measure a theoretical construct or trait.
For example, of construct are anxiety,
intelligence, verbal fluency, dominance. It
refers to the extent to which a test reflects and
seems to measure a hypothesized trait.
30. Predictive validity
A test is corrected against the criterion to be
made available in future.
The extent to which a test can predict the
future performance of subjects.
31. Concurrent validity
It can be determined by establishing the
relationship or discrimination.
The relationship between scores on measuring
tool and criteria available at same time in the
present situation.