2. • Reliability and Validity: Definition, importance of reliability and
validity during data collection
• Types of data and methods used for data collection, pros and
cons of various data collection methods
3. Psychometric Properties
Reliability is the consistency or reproducibility
of test measurements. It is the degree of
agreement of the measurements with each
other after repeated tests.
What is test reliability?
4. Reliability
Internal consistency The consistency of items that measure
the same general characteristic.
Inter-rater reliability
The consistency of measurements
obtained by different people.
5. Reliability
Intra-rater reliability
The agreement of the tester with
himself or herself when administering
the same test at different times.
Test-retest reliability
The consistency of the measurement
when the same test is repeated at
different times.
6. What is Validity?
The degree to which a measure represents what it intends to
measure.
7. Validity
Face Validity
The degree to which the measurement
seems to represent what it is supposed
to measure.
Content Validity
The degree to which the
measure covers the meaningful
elements of the construct being
measured.
8. Validity
Construct Validity
The degree to which a
measurement represents the
underlying theoretical construct.
Criterion-related Validity
Comparison with the “gold
standard” for measuring the
same construct.
9. Validity
Concurrent Validity
Comparison with the “gold standard”
measurement obtained at
approximately the same time.
Predictive Validity
Comparison with the “gold standard”
measurement obtained at a later point in
time.
10. Psychometric Properties
• Sensitivity
How well a test identifies people who truly have
the condition measured by the test.
• Specificity
How well the test identifies people who do NOT
have the condition being measured.
11. Reliability
• Reliability refers to a measure’s ability to capture an
individual’s true score, i.e. to distinguish accurately one
person from another
• While a reliable measure will be consistent, consistency
can actually be seen as a by-product of reliability, and in
a case where we had perfect consistency (everyone
scores the same and gets the same score repeatedly),
reliability coefficients could not be calculated
• No variance/covariance to give a correlation
• The error in our analyses is due to individual differences
but also the lack of the measure being perfectly reliable
12. Reliability
• Criteria of reliability
• Test-retest
• Test components (internal consistency)
• Test-retest reliability
• Consistency of measurement for individuals over time
• The score similarly e.g. today and 6 months from now
• Issues
• Memory
• If too close in time the correlation between scores is due to memory of
item responses rather than true score captured
• Chance covariation
• Any two variables will always have a non-zero correlation
• Reliability is not constant across subsets of a population
• General IQ scores good reliability
• IQ scores for college students, less reliable
• Restriction of range, fewer individual differences
13. Internal Consistency
• We can get a sort of average correlation among
items to assess the reliability of some measure1
• As one would most likely intuitively assume, having
more measures of something is better than few
• It is the case that having more items which
correlate with one another will increase the test’s
reliability
14. What’s good reliability?
• While we have conventions, it really kind of depends
• As mentioned reliability of a measure may be different
for different groups of people
• What we may need to do is compare reliability to those
measures which are in place and deemed ‘good’ as well
as get interval estimates to provide an assessment of the
uncertainty in our reliability estimate
• Note also that reliability estimates are biased upwardly
and so are a bit optimistic
• Also, many of our techniques do not take into account
the reliability of our measures, and poor reliability can
result in lower statistical power i.e. an increase in type II
error
• Though technically increasing reliability can potentially also
lower power1
15. Validity
• Validity refers to the question of whether our
measurements are actually hitting on the construct
we think they are
• While we can obtain specific statistics for reliability
(even different types), validity is more of a global
assessment based on the evidence available
• We can have reliable measurements that are invalid
• Classic example: The scale which is consistent and able
to distinguish from one person to the next but actually
off by 5 pounds
16. Validity Criteria
• Content validity
• Criterion validity
• Concurrent
• Predictive
• Construct-related validity
• Convergent
• Discriminant
• Content validity
• Items represent the kinds of material (or content areas) they are
supposed to represent
• Are the questions worth a flip in the sense they cover all domains of a
given construct?
• E.g. job satisfaction = salary, relationship w/ boss, relationship w/ coworkers
etc.
17. • Criterion validity
• the degree to which the measure correlates with various
outcomes
• Does some new personality measure correlate with the Big
5
• Concurrent
• Criterion is in the present
• Measure of ADHD and current scholastic behavioral
problems
• Predictive
• Criterion in the future
• SAT and college gpa
18. • Construct-related validity
• How much is it an actual measure of the construct of
interest
• Convergent
• Correlates well with other measures of the construct
• Depression scale correlates well with other dep scales
• Discriminant
• Is distinguished from related but distinct constructs
• Dep scale != Stress scale
19. Validity Criteria in Experimentation
• Statistical conclusion validity
• Is there a causal relationship between X and Y?
• Correlation is our starting point (i.e. correlation isn’t causation, but
does lead to it)
• Related to this is the question of whether the study was sufficiently
sensitive to pick up on the correlation
• Internal validity
• Has the study been conducted so as to rule out other effects which
were controllable?
• Poor instruments, experimenter bias
• External validity
• Will the relationship be seen in other settings?
• Construct validity
• Same concerns as before
• Ex. Is reaction time an appropriate measure of learning?
24. IMPORTANCE OF RELIABILITY
AND VALIDITY
• It is also important that validity and reliability not be viewed
as independent qualities. A measurement cannot be valid
unless it is reliable; it must be both valid and reliable if it is to
be depended upon as an accurate representation of a concept
or attribute (Wan, 2002). A research study design that meets
standards for validity and reliability produces results that are
both accurate (validity) and consistent (reliability). The archery
metaphor is often used to illustrate the relationship between
validity and reliability.
25. • Knowledge of validity and reliability not only aids the
researcher in designing and judging one’s own work, it also
makes one a better consumer of research through the ability
to evaluate research literature and in choosing among
alternative research designs and interventions (Gliner &
Morgan, 2000). Adopting these standards will ensure that
study results are credible to your key constituents
26. Primary data Secondary data
• Primary data – data you
collect
• Surveys
• Focus groups
• Questionnaires
• Personal interviews
• Experiments and
observational study
• Secondary data – data
someone else has collected
• County health departments
• Vital Statistics – birth, death
certificates
• Hospital, clinic, school nurse
records
• Private and foundation
databases
• City and county governments
• Surveillance data from state
government programs
• Federal agency statistics -
Census, NIH, etc.
Methods of data collection
27. Primary data Secondary data
• Researcher error
• Uniqueness
• Time and money
• Time of collection
• How?
• Reliable source
• Confounding bias
• Incomplete data set
• consistency
Methods of data collection-
limitations