RELIABILITY AND VALIDITY
Mrs. Bhaumika Sharma
• For a measure to be useful, it must be both
reliable and valid.
• Reliable= consistent in producing the same
results every time the measure is used
• Valid= measuring what it is supposed to
• An instrument’s reliability is the consistency with
which it measures the target attribute.
• If a scale weighed a person at 120 pounds one
minute and 150 pounds the next, we would
consider it unreliable.
• The less variation an instrument produces in
repeated measurements, the higher its reliability.
• Thus, reliability can be equated with a measure’s
stability, consistency, or dependability.
• A reliable measure maximizes the true score
component and minimizes the error component.
• The reliability of an instrument can be
assessed in various ways.
• The method chosen depends on the nature of
the instrument and on the aspect of reliability
of greatest concern.
• Three key aspects are stability, internal
consistency, and equivalence.
What in the world is a measurement
• Any tool that you use to measure with…
• What “instrument” might you use to measure the following
1. How heavy the apples are
2. How hot the meat is
3. How much orange juice there is
4. How tall the wall is
a meat thermometer
a measuring cup
Stability of an instrument
• The stability of an instrument is the extent to
which similar results are obtained on two
• Assessments of an instrument’s stability
involve procedures that evaluate test–retest
• Researchers administer the same measure to
a sample on two occasions and then compare
• The comparison is performed objectively by
computing a reliability coefficient, which is a
numeric index of the magnitude of the test’s
• The value of the reliability coefficient
theoretically can range between 0.00 and
• The test–retest method is a relatively easy
approach to estimating reliability, and can be
used with all the measures.
• Attitudes, behaviors, knowledge, physical
condition, and so forth can be modified by
experiences between testing.
• Scales and tests that involve summing items
are often evaluated for their internal
• Scales designed to measure an attribute
ideally are composed of items that measure
that attribute and nothing else.
• Internal consistency reliability is the most
widely used reliability approach among nurse
Split Half Technique
• One of the oldest methods for assessing
internal consistency is the split-half
• For this approach, items on a scale are split
into two groups and scored independently.
• Use odd items versus even items.
• Scores on the two half tests then are used to
compute a correlation coefficient.
• The most widely used method for evaluating
internal consistency is coefficient alpha (or
• The normal range of values is between .00
and 1.00, and higher values reflect a higher
• Coefficient alpha is preferable to the Split-half
procedure because it gives an estimate of the
split-half correlation for all possible ways of
dividing the measure into two halves.
• Used for observational method
• Interrater (or interobserver) reliability is
estimated by having two or more trained
observers watching an event simultaneously,
and independently recording data according
to the instrument’s instructions.
• The data can then be used to compute an
index of equivalence or agreement between
• Another procedure is to compute reliability as
a function of agreements, using the following
No. of agreements
No. of agreements + disagreements
METHODS TO MAINTAIN RELIABILITY
1. Translation of research instruments in
common language (local language) for the
concerned respondents which gives accurate
message and responses.
2. Applying test-retest method
3. Training to the enumerator to prevent
ambiguity and misunderstand, and providing
4. Alternative method of data collection: two
different types of measuring tools are used
for the same respondents for getting similar
• Two instruments are compared on item by
item basis and degree of similarity is
determined. The greater the differences
lower the reliability.
5. Split half method: the instrument is used to
• After collecting data, the instrument is halved
into two and correlation is calculated.
• The correlation coefficient ranges from 0 to 1.
• Value 0.6 or less is considered less reliability.
• 6. Pre-testing: pre testing or preliminary
testing is the process of measuring the
effectiveness of the instruments prepared to
gather data in advance.
• After the tool is completed it must be tested
on subjects who meet the criterion for the
• The study area/setting used for the pre testing
should match the population under study.
• Its objective is to detect discrepancies that
have crept in and to remove them after
necessary modifications in the
• DEPENDENT VARIABLE: the variable used to
describe or measure the problem under study.
• Dependent variable is effect
• INDEPENDENT VARIABLE: the variable under
study that influence the problem (dependent
variable) is called independent variable.
• Independents variable is cause.
Any change in the dependent variable is due
to change in the independent variable.
• INTERVENING VARIABLES: Independent variables
that are not related to the purpose of the study, but
may affect the dependent variable.
• Example: Effect of nurses educational level on their
Nurses educational level
• CONFOUNDING VARIABLE: A confounding variable
(also called a third variable) is an extraneous variable
that DOES cause a problem because we know that it
DOES have a relationship with the independent and
• A confounding variable is the type of extraneous
variable that systematically varies or influences the
independent variable and also influences the
• A confounding variable is the kind of extraneous
variable that we must be most concerned with.
Nutritional Status of
Economic Status of
• The important criterion for evaluating a quantitative
• Validity is the degree to which an instrument measures what
it is supposed to measure.
• Reliability and validity are not independent qualities of an
• A measuring device that is unreliable cannot possibly be valid.
• Validity characteristics of research suggests the universal
acceptance of their research findings; if the research is
conducted systematically and all concerned accept the
findings of the research.
Types of Validity
1. Internal validity: refers to the extent to which it is
possible to make an inference the independent
variable is truly causing or influencing the
dependent variable and that the relationship
between two is not the spurious effect of an
extraneous variable. Also called causal validity
2. External validity: refers to the extent to which the
results of the study can be generalized.
– A study is externally valid to the extent that the sample is
representative of the broader population and the study
setting and experimental arrangements are representative
of the environments.
1. History threat refers to any event, other than
the planned treatment event, that occurs
between the pretest and posttest
measurement and has an influence on the
2. Selection refers to selecting participants for
the various groups in the study.
• Selection is not a threat for the one group
design but it is a threat for the two group
• If subjects were selected by random sampling
and random assignment, all had equal chance
of being in treatment or comparison groups,
and the groups are equivalent.
3. Maturation is present when a physical or mental
change occurs over time and it affects the participants'
performance on the dependent variable.
• For example, if we measure first grade students' ability
to perform arithmetic problems at the beginning of the
year and again at the end of the year, some of their
improvement will probably be due to their natural
maturation (and not just due to what you have taught
them during the year).
• Therefore in the one group design, we will not know if
their improvement is due to the teacher or if it is due
• Maturation is not a threat in the two group
design because as long as the people in both
groups mature at the same rate, the
difference between the two groups will not
be due to maturation.
4. Testing refers to any change on the second
administration of a test as a result of having
previously taken the test.
• Did the pre-test affect the scores on the post-
• A pre-test may sensitize participant in
unanticipated ways and their performance on
the post-test may be due to the pre-test, not
to the treatment, or, more likely, and
interaction of the pre-test and treatment.
• This is a threat to the one group design.
• Not a threat to the two group(intervention
and control) design.
• Both groups are exposed to the pre-test and
so the difference between groups is not due
5. Instrumentation refers to any change that
occurs in the way the dependent variable is
measured in the research study.
• Instrumentation is not a threat in the two
group design because as long as the people in
both groups are affected equally by the
instrumentation effect, the difference
between the two groups will not be due to
• Differential loss of participants across groups.
• Did some participants drop out? Did this affect
• Did about the same number of participants
make it through the entire study in both
experimental and comparison groups?
• Is a threat for any design with more than one
a. DESIGN CONTAMINATION
• Did the comparison group know (or find out)
about the experimental group?
• Did either group have a reason to want to
make the research succeed or fail?
• Often, investigators must interview subjects
after the experiment concludes in order to
find out if design contamination occurred.
b. Compensatory rivalry
• When subjects in some treatments receive
goods or services believed to be desirable and
this becomes known to subjects in other
groups, social competition may motivate the
latter to attempt to reverse or reduce the
anticipated effects of the desirable treatment
c. Resentful demoralization
• If subjects learn that their group receives less
desirable goods or services, they may
experience feelings of resentment and
• Their response may be to perform at an
abnormally low level, thereby increasing the
magnitude of the difference between their
performance and that of groups that receive
the desirable goods or services.
Types of Internal Validity
1. Content validity
2. Criterion related validity
3. Construct validity
• Deals with whether the assessment content
and composition is appropriate given what is
• The content validity of an instrument is
necessarily based on judgments.
• There are no totally objective methods for
ensuring the adequate content coverage of an
• Experts in the content area are often called on
to analyze the items’ adequacy in
representing the content coverage of an
• Face validity: subtype of content validity
which verifies basically that the instrument
gives the appearance of measuring concepts.
• Here, colleagues or subjects can give their
opinion about the instrument.
• Consensual validity: subtype of content
validity which is a process by which a panel of
experts judges the validity.
CRITERION RELATED VALIDITY
• It represents the relationship between one measure
and another measure of the same phenomena.
• Criterion validity is usually measured using a
correlation coefficient – when the correlation is high,
the tool can be considered valid
• It indicates that what degree the subject’s
performance on the measurement tool and the
subject’s actual behavior are related.
• A correlation coefficient is computed between scores
on the instruments and the criterion.
• Two forms of criterion validity: concurrent and
• Concurrent validity uses an already existing and
well-accepted measure against which the new
measure can be compared.
• For example, if you were developing a new pain
assessment tool you would compare the ratings
obtained from the new tools with those obtained
using a previously validated tool.
• Predictive validity measures the extent to which a
tool can predict a future event of interest.
• For example, does a tool developed to measure the
risk of pressure sores in children in hospital in fact
identify the children at risk?
• This tests the link between a measure and the underlying
• If a test has construct validity, you would expect to see a
reasonable correlation with tests measuring related areas.
• Evidence of construct validity can be provided by comparing
the results obtained with the results obtained using other
tests, other (related) characteristics of the individual or
factors in the individual’s environment which would be
expected to affect test performance.
• Construct validity is usually measured using a correlation
coefficient – when the correlation is high, the tool can be
• Construct validity is based on the extent to
which a test measures a theoretical construct
• Constructs are specified and then interrelated
with others in empirical testing
• Empirical testing confirms or fails to confirm
the relationship that would be predicted
• A complex process involving several studies.
Maintaining Research Validity
• Consistency of instrument with statement of the
research problem, questions, objectives, hypothesis
(if stated) and variables under study.
• Using reliable instruments
• Using random sampling method for data collection
• Selecting matching groups for intervention and
• Controlling extraneous variables strictly
• Adequate and representative sample size
Considering threat to internal and external validity.
• It is the extent to which the results of a study
can be generalized to other situations and to
THREATS TO EXTERNAL VALIDITY
• "A threat to external validity is an explanation
of how you might be wrong in making a
• Generally, generalizability is limited when the
cause (i.e. the independent variable) depends
on other factors; therefore, all threats to
external validity interact with the independent
Threats to External Validity
• Pre-test treatment interaction
– When subjects’ reactions to a treatment are
affected by exposure to a pretest
• Multiple treatment interference
– When subjects receive multiple treatments,
effects from the first treatment may make
determining the impact of the second treatment
• Selection treatment interaction
– A problem when non-random samples are used
– Ex) When using volunteer subjects, what target
population do they represent?
Threats to External Validity
• Specificity of variables
– Refers to the idea that experiments are conducted using
specific variables under specific conditions that may limit
– A problem when variables are poorly operationalized
– Do the experimental conditions represent reality?
• Treatment diffusion
– Refers to unintended information sharing
– When information is shared between experimental groups
that impacts the how treatments are implemented in each
• Experimenter effects
– Refers to a researchers influence on subjects or how
procedures are followed. (ex, was the researcher more
enthusiastic with one group over another?)
Threats to External Validity
• Reactive arrangements
– Artificial environment – responding differently to a “fake”
– Hawthorne effect – acting differently because you know
you are a participant
– John Henry effect – when the control group tries to “beat
the treatment” because they know they are in the
– Placebo effect – when control group subjects respond to
the placebo in a manner consistent with their expectations
– Novelty effect – increased response to a treatment
because it is different, not better
THREATS TO EXTERNAL VALIDITY
• Failure to describe independent variables explicitly
• Lack of representativeness of available and target populations
• Hawthorne effect: the alteration of behavior by the subjects of
a study due to their awareness of being observed.
• Inadequate operationalizing of dependent variables
• Sensitization/reactivity to experimental/research conditions
• Interaction effects of extraneous factors and experimental/
• Ecological validity
• Invalidity or unreliability of instruments
• Multiple treatment validity
• Ecological validity has typically been take to
refer to whether or not one can generalize
from observed behaviour in the laboratory to
natural behaviour in the world.
• Multiple treatment validity: When subjects
receive multiple treatments, effects from the
first treatment may make determining the
impact of the second treatment difficult.