DATA COLLECTION
• Reliability and Validity: Definition, importance of reliability and
validity during data collection
• Types of data and methods used for data collection, pros and
cons of various data collection methods
Psychometric Properties
Reliability is the consistency or reproducibility
of test measurements. It is the degree of
agreement of the measurements with each
other after repeated tests.
What is test reliability?
Reliability
Internal consistency The consistency of items that measure
the same general characteristic.
Inter-rater reliability
The consistency of measurements
obtained by different people.
Reliability
Intra-rater reliability
The agreement of the tester with
himself or herself when administering
the same test at different times.
Test-retest reliability
The consistency of the measurement
when the same test is repeated at
different times.
What is Validity?
The degree to which a measure represents what it intends to
measure.
Validity
Face Validity
The degree to which the measurement
seems to represent what it is supposed
to measure.
Content Validity
The degree to which the
measure covers the meaningful
elements of the construct being
measured.
Validity
Construct Validity
The degree to which a
measurement represents the
underlying theoretical construct.
Criterion-related Validity
Comparison with the “gold
standard” for measuring the
same construct.
Validity
Concurrent Validity
Comparison with the “gold standard”
measurement obtained at
approximately the same time.
Predictive Validity
Comparison with the “gold standard”
measurement obtained at a later point in
time.
Psychometric Properties
• Sensitivity
How well a test identifies people who truly have
the condition measured by the test.
• Specificity
How well the test identifies people who do NOT
have the condition being measured.
Reliability
• Reliability refers to a measure’s ability to capture an
individual’s true score, i.e. to distinguish accurately one
person from another
• While a reliable measure will be consistent, consistency
can actually be seen as a by-product of reliability, and in
a case where we had perfect consistency (everyone
scores the same and gets the same score repeatedly),
reliability coefficients could not be calculated
• No variance/covariance to give a correlation
• The error in our analyses is due to individual differences
but also the lack of the measure being perfectly reliable
Reliability
• Criteria of reliability
• Test-retest
• Test components (internal consistency)
• Test-retest reliability
• Consistency of measurement for individuals over time
• The score similarly e.g. today and 6 months from now
• Issues
• Memory
• If too close in time the correlation between scores is due to memory of
item responses rather than true score captured
• Chance covariation
• Any two variables will always have a non-zero correlation
• Reliability is not constant across subsets of a population
• General IQ scores good reliability
• IQ scores for college students, less reliable
• Restriction of range, fewer individual differences
Internal Consistency
• We can get a sort of average correlation among
items to assess the reliability of some measure1
• As one would most likely intuitively assume, having
more measures of something is better than few
• It is the case that having more items which
correlate with one another will increase the test’s
reliability
What’s good reliability?
• While we have conventions, it really kind of depends
• As mentioned reliability of a measure may be different
for different groups of people
• What we may need to do is compare reliability to those
measures which are in place and deemed ‘good’ as well
as get interval estimates to provide an assessment of the
uncertainty in our reliability estimate
• Note also that reliability estimates are biased upwardly
and so are a bit optimistic
• Also, many of our techniques do not take into account
the reliability of our measures, and poor reliability can
result in lower statistical power i.e. an increase in type II
error
• Though technically increasing reliability can potentially also
lower power1
Validity
• Validity refers to the question of whether our
measurements are actually hitting on the construct
we think they are
• While we can obtain specific statistics for reliability
(even different types), validity is more of a global
assessment based on the evidence available
• We can have reliable measurements that are invalid
• Classic example: The scale which is consistent and able
to distinguish from one person to the next but actually
off by 5 pounds
Validity Criteria
• Content validity
• Criterion validity
• Concurrent
• Predictive
• Construct-related validity
• Convergent
• Discriminant
• Content validity
• Items represent the kinds of material (or content areas) they are
supposed to represent
• Are the questions worth a flip in the sense they cover all domains of a
given construct?
• E.g. job satisfaction = salary, relationship w/ boss, relationship w/ coworkers
etc.
• Criterion validity
• the degree to which the measure correlates with various
outcomes
• Does some new personality measure correlate with the Big
5
• Concurrent
• Criterion is in the present
• Measure of ADHD and current scholastic behavioral
problems
• Predictive
• Criterion in the future
• SAT and college gpa
• Construct-related validity
• How much is it an actual measure of the construct of
interest
• Convergent
• Correlates well with other measures of the construct
• Depression scale correlates well with other dep scales
• Discriminant
• Is distinguished from related but distinct constructs
• Dep scale != Stress scale
Validity Criteria in Experimentation
• Statistical conclusion validity
• Is there a causal relationship between X and Y?
• Correlation is our starting point (i.e. correlation isn’t causation, but
does lead to it)
• Related to this is the question of whether the study was sufficiently
sensitive to pick up on the correlation
• Internal validity
• Has the study been conducted so as to rule out other effects which
were controllable?
• Poor instruments, experimenter bias
• External validity
• Will the relationship be seen in other settings?
• Construct validity
• Same concerns as before
• Ex. Is reaction time an appropriate measure of learning?
Verbal
Intell
IMPORTANCE OF RELIABILITY
AND VALIDITY
• It is also important that validity and reliability not be viewed
as independent qualities. A measurement cannot be valid
unless it is reliable; it must be both valid and reliable if it is to
be depended upon as an accurate representation of a concept
or attribute (Wan, 2002). A research study design that meets
standards for validity and reliability produces results that are
both accurate (validity) and consistent (reliability). The archery
metaphor is often used to illustrate the relationship between
validity and reliability.
• Knowledge of validity and reliability not only aids the
researcher in designing and judging one’s own work, it also
makes one a better consumer of research through the ability
to evaluate research literature and in choosing among
alternative research designs and interventions (Gliner &
Morgan, 2000). Adopting these standards will ensure that
study results are credible to your key constituents
Primary data Secondary data
• Primary data – data you
collect
• Surveys
• Focus groups
• Questionnaires
• Personal interviews
• Experiments and
observational study
• Secondary data – data
someone else has collected
• County health departments
• Vital Statistics – birth, death
certificates
• Hospital, clinic, school nurse
records
• Private and foundation
databases
• City and county governments
• Surveillance data from state
government programs
• Federal agency statistics -
Census, NIH, etc.
Methods of data collection
Primary data Secondary data
• Researcher error
• Uniqueness
• Time and money
• Time of collection
• How?
• Reliable source
• Confounding bias
• Incomplete data set
• consistency
Methods of data collection-
limitations
Obtrusive vs. Unobtrusive
Methods
IPDET
©
2009
28
Obtrusive
data collection methods
that directly obtain
information from those
being evaluated
e.g. interviews, surveys,
focus groups
Unobtrusive
data collection
methods that do
not collect
information directly
from evaluees
e.g., document
analysis, GoogleEarth,
observation at a
distance, trash of the
stars
Data Collection Tools
IPDET
©
2009
29
• Participatory Methods
• Records and Secondary Data
• Observation
• Surveys and Interviews
• Focus Groups
• Diaries, Journals, Self-reported Checklists
• Expert Judgment
• Delphi Technique
• Other Tools
Records and Secondary Data
IPDET
©
2009
30
• Examples of sources:
• files/records
• computer data bases
• industry or government reports
• other reports or prior evaluations
• census data and household survey data
• electronic mailing lists and discussion groups
• documents (budgets, organizational charts, policies
and procedures, maps, monitoring reports)
• newspapers and television reports
THANK YOU

Data collection reliability

  • 1.
  • 2.
    • Reliability andValidity: Definition, importance of reliability and validity during data collection • Types of data and methods used for data collection, pros and cons of various data collection methods
  • 3.
    Psychometric Properties Reliability isthe consistency or reproducibility of test measurements. It is the degree of agreement of the measurements with each other after repeated tests. What is test reliability?
  • 4.
    Reliability Internal consistency Theconsistency of items that measure the same general characteristic. Inter-rater reliability The consistency of measurements obtained by different people.
  • 5.
    Reliability Intra-rater reliability The agreementof the tester with himself or herself when administering the same test at different times. Test-retest reliability The consistency of the measurement when the same test is repeated at different times.
  • 6.
    What is Validity? Thedegree to which a measure represents what it intends to measure.
  • 7.
    Validity Face Validity The degreeto which the measurement seems to represent what it is supposed to measure. Content Validity The degree to which the measure covers the meaningful elements of the construct being measured.
  • 8.
    Validity Construct Validity The degreeto which a measurement represents the underlying theoretical construct. Criterion-related Validity Comparison with the “gold standard” for measuring the same construct.
  • 9.
    Validity Concurrent Validity Comparison withthe “gold standard” measurement obtained at approximately the same time. Predictive Validity Comparison with the “gold standard” measurement obtained at a later point in time.
  • 10.
    Psychometric Properties • Sensitivity Howwell a test identifies people who truly have the condition measured by the test. • Specificity How well the test identifies people who do NOT have the condition being measured.
  • 11.
    Reliability • Reliability refersto a measure’s ability to capture an individual’s true score, i.e. to distinguish accurately one person from another • While a reliable measure will be consistent, consistency can actually be seen as a by-product of reliability, and in a case where we had perfect consistency (everyone scores the same and gets the same score repeatedly), reliability coefficients could not be calculated • No variance/covariance to give a correlation • The error in our analyses is due to individual differences but also the lack of the measure being perfectly reliable
  • 12.
    Reliability • Criteria ofreliability • Test-retest • Test components (internal consistency) • Test-retest reliability • Consistency of measurement for individuals over time • The score similarly e.g. today and 6 months from now • Issues • Memory • If too close in time the correlation between scores is due to memory of item responses rather than true score captured • Chance covariation • Any two variables will always have a non-zero correlation • Reliability is not constant across subsets of a population • General IQ scores good reliability • IQ scores for college students, less reliable • Restriction of range, fewer individual differences
  • 13.
    Internal Consistency • Wecan get a sort of average correlation among items to assess the reliability of some measure1 • As one would most likely intuitively assume, having more measures of something is better than few • It is the case that having more items which correlate with one another will increase the test’s reliability
  • 14.
    What’s good reliability? •While we have conventions, it really kind of depends • As mentioned reliability of a measure may be different for different groups of people • What we may need to do is compare reliability to those measures which are in place and deemed ‘good’ as well as get interval estimates to provide an assessment of the uncertainty in our reliability estimate • Note also that reliability estimates are biased upwardly and so are a bit optimistic • Also, many of our techniques do not take into account the reliability of our measures, and poor reliability can result in lower statistical power i.e. an increase in type II error • Though technically increasing reliability can potentially also lower power1
  • 15.
    Validity • Validity refersto the question of whether our measurements are actually hitting on the construct we think they are • While we can obtain specific statistics for reliability (even different types), validity is more of a global assessment based on the evidence available • We can have reliable measurements that are invalid • Classic example: The scale which is consistent and able to distinguish from one person to the next but actually off by 5 pounds
  • 16.
    Validity Criteria • Contentvalidity • Criterion validity • Concurrent • Predictive • Construct-related validity • Convergent • Discriminant • Content validity • Items represent the kinds of material (or content areas) they are supposed to represent • Are the questions worth a flip in the sense they cover all domains of a given construct? • E.g. job satisfaction = salary, relationship w/ boss, relationship w/ coworkers etc.
  • 17.
    • Criterion validity •the degree to which the measure correlates with various outcomes • Does some new personality measure correlate with the Big 5 • Concurrent • Criterion is in the present • Measure of ADHD and current scholastic behavioral problems • Predictive • Criterion in the future • SAT and college gpa
  • 18.
    • Construct-related validity •How much is it an actual measure of the construct of interest • Convergent • Correlates well with other measures of the construct • Depression scale correlates well with other dep scales • Discriminant • Is distinguished from related but distinct constructs • Dep scale != Stress scale
  • 19.
    Validity Criteria inExperimentation • Statistical conclusion validity • Is there a causal relationship between X and Y? • Correlation is our starting point (i.e. correlation isn’t causation, but does lead to it) • Related to this is the question of whether the study was sufficiently sensitive to pick up on the correlation • Internal validity • Has the study been conducted so as to rule out other effects which were controllable? • Poor instruments, experimenter bias • External validity • Will the relationship be seen in other settings? • Construct validity • Same concerns as before • Ex. Is reaction time an appropriate measure of learning?
  • 20.
  • 24.
    IMPORTANCE OF RELIABILITY ANDVALIDITY • It is also important that validity and reliability not be viewed as independent qualities. A measurement cannot be valid unless it is reliable; it must be both valid and reliable if it is to be depended upon as an accurate representation of a concept or attribute (Wan, 2002). A research study design that meets standards for validity and reliability produces results that are both accurate (validity) and consistent (reliability). The archery metaphor is often used to illustrate the relationship between validity and reliability.
  • 25.
    • Knowledge ofvalidity and reliability not only aids the researcher in designing and judging one’s own work, it also makes one a better consumer of research through the ability to evaluate research literature and in choosing among alternative research designs and interventions (Gliner & Morgan, 2000). Adopting these standards will ensure that study results are credible to your key constituents
  • 26.
    Primary data Secondarydata • Primary data – data you collect • Surveys • Focus groups • Questionnaires • Personal interviews • Experiments and observational study • Secondary data – data someone else has collected • County health departments • Vital Statistics – birth, death certificates • Hospital, clinic, school nurse records • Private and foundation databases • City and county governments • Surveillance data from state government programs • Federal agency statistics - Census, NIH, etc. Methods of data collection
  • 27.
    Primary data Secondarydata • Researcher error • Uniqueness • Time and money • Time of collection • How? • Reliable source • Confounding bias • Incomplete data set • consistency Methods of data collection- limitations
  • 28.
    Obtrusive vs. Unobtrusive Methods IPDET © 2009 28 Obtrusive datacollection methods that directly obtain information from those being evaluated e.g. interviews, surveys, focus groups Unobtrusive data collection methods that do not collect information directly from evaluees e.g., document analysis, GoogleEarth, observation at a distance, trash of the stars
  • 29.
    Data Collection Tools IPDET © 2009 29 •Participatory Methods • Records and Secondary Data • Observation • Surveys and Interviews • Focus Groups • Diaries, Journals, Self-reported Checklists • Expert Judgment • Delphi Technique • Other Tools
  • 30.
    Records and SecondaryData IPDET © 2009 30 • Examples of sources: • files/records • computer data bases • industry or government reports • other reports or prior evaluations • census data and household survey data • electronic mailing lists and discussion groups • documents (budgets, organizational charts, policies and procedures, maps, monitoring reports) • newspapers and television reports
  • 35.