Data Collection: Methods and Instruments
Professor Tarek Tawfik Amin
Epidemiology and Public Health, Faculty of Medicine, Cairo University
Geneva Foundation for Medical Education and Training
Asian Pacific Organization for Cancer Prevention
International Osteoporosis Foundation
Wiley Innovative Panel
amin55@myway.com dramin55@gmail.com
Basic Research Competency Program for Research Coordinators
August 2015, MEDC, Faculty Of Medicine, Cairo University, Cairo, Egypt.
Objectives
By the end of this session, research coordinator will be able to:
1- Indentify the research data plan and its importance.
2- Differentiate between primary and secondary data sources
3- Recognize the different data collection methods, tools and
techniques and the demerits of each.
4- Recognize the importance of reliability and validity of an
instrument.
• Data can be define as the quantitative or qualitative value of a
variable (e.g. number, images, words, figures, facts or ideas)
• It is a lowest unit of information from which other
measurements and analysis can be done.
• Data is one of the most important and vital aspect of any
research study.
Introduction
Before starting
• Accurate and systematic data collection is
critical to conducting scientific research.
• Data collection allows to collect information
about study objects/subjects/participants.
• Includes documents review, observation,
questioning, measuring, or a combination of
different methods.
o Objectives and scope of the enquiry (research
question).
o Sources of information (type, accessibility).
o Quantitative expression (measurement/scale).
o Techniques of data collection.
o Unit of collection.
Factors to be Considered
Before
Collection of Data (plan)
Data collection plan
Identify types of
data
Types of
measurements and
variables
Instrume
nts
Scales
Methods
Written
permissions
Pilot testing
Revise
1- Data
collection
forms
2-Operational
procedures
Implementat
ion
Research question
Research hypothesis
Sources of
Data
External
sources
Primary
data
Secondary
data
Internal
sources
Internal
o Many institutions and departments
have information about their
regular functions, for their own
internal purposes.
o When those information are used
in any survey is called internal
sources of data.
Routine surveillance, hospital records
.
External
o When information is collected
from outside source.
o Such types of data are either
primary or secondary.
o This type of information can be
collected by census or sampling.
Internal vs. External Sources of Data
 Collected from first-hand experiences is known as primary
data. More reliable, authentic and not been published
anywhere.
 Primary data has not been changed or altered by human
being, therefore its validity is greater than secondary data.
I- Primary Data
Methods of
collecting
primary data
Direct Personal
Investigation
(interviewing)
Indirect oral
investigation
Case studies
Measurements
Lab. results
Experimentation
Investigation
through
observation
Merits
Targeted issues are
addressed
Data interpretation is better
High accuracy of data
Addressing specific
research issues
Greater control
Demerits
Cost
Time
More personnel / resources
Inaccurate feedback
Training, skill and
laborious.
Primary Data
 Already been collected by others.
 Journals, periodicals, research publication ,official
record etc.
 May be available in the published or unpublished
form.
 Resorted to when primary data sources/methods are
infeasible -, inaccessible.
II-Secondary Data
Published
International
Government
Corporation
InstitutionalUnpublished
Method of collection: secondary data
Merits
Quick and cheap
Wider geographical area
Longer orientation period
Leading to primary data
Demerits
Not fulfilling specific
research needs
Poor accuracy
Not up to date
Poor accessibility in some
cases
Secondary Data
Primary data
 Real time
 Sure about the sources
 Can answer research
question.
 Cost and time
 Can avoid bias
 More flexible
Secondary data
o Past data
o Not sure about sources
o Refining the research
problem
o Cheap and no time
o Bias can’t be ruled out
o Less flexible
Primary vs. secondary data
 Birth and death records
 Medical records at physician offices, hospitals,
nursing homes, etc.
 Medical databases within various agencies,
universities, and institutions.
 Physical exams and laboratory testing
 Diseases registries
 Self-report measures: interviews and questionnaires
Data Sources for Health Research
Primary or secondary sources
Research Data: Considerations
Data collection vs. data analysis
• Poor data collection and management can render a
perfectly executed trial useless
• Bad data practices carry resource and ethical costs
• Good practices:
– What are data?
– How are they represented?
– How are they stored for retrieval and use?
Considerations in collecting clinical data:
Data are Surrogates
Data (specimens) are all that remain after the
active phase of a clinical trial
 Data are about the objects and events in the
trial
 Understanding how the data are captured
and recorded affects interpretation
 Improper collection or interpretation lead to
Indirect nature of Data
Hematocrit:
– an indicator of oxygen carrying capacity -depends on chemical
alterations of hemoglobin and concentration of hydrogen ions
– an indicator of blood or bone marrow health -falsely normal in the
setting of an acute hemorrhage.
Mortality:
– patients who died from unrelated causes
– accounting for patients who are lost to follow-up
Considerations Cont’:
Objectivity, Subjectivity and Reproducibility
Objectivity: the degree to which recorded data
may be influenced by the individual thought of
the observer
• Data reported by subjects (such as symptoms of a disease)
can be objectively recorded (statements themselves are
subjective)
• Objective observations (such as signs of a disease) can be made
by outside observers
Episode of bleeding reported by a subject ≠episode witnessed by a
researcher
Objectivity is a process by which the observation is represented
in the data
– Human observer is never completely free of influences
Controlling of subjectivity:
• Use an unbiased device to record information
• Employ rating systems and train the observers
• Consider limits to data precision
Reproducibility – corroborating findings in a
subsequent study requires knowing how observations were
made (metadata)
The concept of METADATA
“Data about the data” (e.g., method of collection,
relationship of data to the events in the research
protocol, etc.)
- Temporal metadata require particular attention
- Understand the implications of a time (e.g., if a
blood specimen is drawn to measure a drug level,
we must know the time that the specimen was
drawn and the time the drug was administered
- Need to choose when to measure and how often
Types of Data
I- Quantitative data - measurements that can be manipulated
mathematically
• Precision - body temperature, serum chloride , absolute
eosinophil count.
II- Qualitative data - conceptual entities rather than numeric
values (subject gender and race, signs and symptoms,
diagnoses)
• May represent concepts that relate to quantitative data
[“blood pressure” is numeric, but the procedures themselves
are qualitative]
III- Ordinal data look like numbers (e.g., urine protein
measurements “0”, “1+”, “2+”, etc.)
IV- Signal data - quantitative in nature but are treated as
qualitative (e.g., electrocardiogram tracings)
Data Standards
A.Support data usefulness and exchange
B.Standards for quantitative data: units of
measurements.
C.Standards for qualitative data: controlled
terminologies (ICD-10)
D.Standards for data format
Primary Research Methods and Techniques
Surveys
 Personal interview
(intercepts)
 Mail
 In-house, self-
administered
 Telephone, fax, e-
mail, Web
Quantitative Data
Primary
Research
Experiments
Mechanical
observation
Simulation
Qualitative Data
Case studies
Human
observation
Individual in-depth
interviews
Focus groups
Differentiation between data collection
techniques and tools.
ToolsTechniques
Data compilation sheet
Check list, eye, watch, scales,
Microscope, pen and paper.
Schedule, agenda, questionnaire, recorder.
Questionnaire.
Using available data
Observation
Interviewing
Self-administered
questionnaire
1- Self-reported data
Some information can be gathered only by asking people
questions (i.e. not easily observable)
 Self report measures are estimates of true scores
True score + Measurement error = Survey response
1
1
Pitfalls of self-reported information
Susceptible to the respondent’s
A. Mood
B. Motivation
C. Memory
D. Understanding
Also susceptible to:
oContext circumstances of interview
oSocial desirability choosing answers that
are viewed favorably
Common Types of Questions
• Open-ended
What health conditions do you have?
• Closed-ended
Which of the following conditions do you currently
have? Say yes or no to
each.
-Diabetes?
-Asthma?
- Hypertension
Common Types of Questions
I- Response options
- Nominal – unordered response categories (e.g.
male, female)
- Ordinal – ordered response categories
(e.g. excellent, good, fair, poor)
II- Type of information
- Factual – objectively verifiable facts and
events
- Subjective – knowledge, perceptions,
feelings, judgment
Data collection Methods
I- Document Review
A qualitative (sometimes quantitative) research
project may require review of documents such
as:
– Course syllabi
– Faculty journals
– Meeting minutes
– Strategic plans
– Newspapers
Depending on the research question, the
researcher might utilize:
– Rating scale
– Checklist
– Content analysis
– Matrix analysis
II-Self reported data
Survey
1
2 1 2 1
2
Self reported data
collection methods
Computerized Paper-based
Interviewer
administration
(human)
In person Telephone In person
Telephone
Self Administration Web, Smartphone, Tablet Paper
Interactive voice
response
Telephone, Web Not applicable
Pros 1- Faster data availability
2- Can handle complex skip
Patterns
3- Can be tailored to severity of
symptoms or situation
(computerized adaptive testing)
1- Answer respondent questions, probe for
adequate answers
2- Administer to illiterate/low reading level
3- Easier to reach poor, homeless, etc.
4- Build rapport
5- People feel more anonymous
6- Can use visual aids
Cons 1- Data can get lost if system
crashes
2- Requires power source
1- Expensive
2- Longer data collection period
3- Interviewer presence/technique
can bias results
A-Personal Interview
Interviews consist of collecting data by asking questions.
• Data can be collected by listening to individuals, recording,
filming their responses, or a combination of methods.
There are four types of interview:
• Structured interview
• Semi-structured interview
• In-depth interview, and
• Focused group discussion
In structured interviews the questions as well as
their order is already scheduled.
• Your additional intervention consists of giving
more explanation to clarify your question (if
needed), and to ask your respondent to provide
more explanation if the answer they provide is
vague (probing).
Semi-structured and in-depth interviews
• Semi-structured interviews include a number of
planned questions, but the interviewer has more
freedom to modify the wording and order of questions.
• In-depth interview is less formal and the least
structured, in which the wording and questions are not
predetermined. This type of interview is more
appropriate to collect complex information with a
higher proportion of opinion-based information.
Pros Cons
•Collect complete information with greater
understanding.
•It is more personal, as compared to
questionnaires, higher response rates.
•It allows more control over the order and
flow of questions.
•Necessary changes in the interview
schedule based on initial results can be
made (which is not possible in the case of a
questionnaire study/survey)
•Data analysis—especially when there
is a lot of qualitative data.
•Interviewing can be tiresome for
large numbers of participants.
• Risk of bias is high due to fatigue
and to becoming too involved with
interviewees
Interview
Pros Cons
1- Lower Costs
2- Can ensure uniform data
collection
3- Shorter data collection period
4- Cell phones are best way to
reach transient people
1-Omit persons without phones
2-Phone accessibility
3-Need complex statistical
framework
4- Cannot use visual aids
5- Many of us do not answer our
phone
B- Telephone Interview
2- Paper and pen Self Administered
Pros Cons
1- Anonymity
2- Can use longer, more complex
response categories
3- Can use visual aids
4- Consistent across respondents
5- Cover large geographic area
6- Length easy to see (plus or minus?)
1- Good reading and writing skills
required
2- Cannot have complex skip patterns
3- No quality control
4- Similar cost and response rates to
other methods
3- Web, smart phone administered survey
Pros Cons
1- Anonymity
2- Better for sensitive items
3- Timely data
4- Lower cost
5- Can use long list of response
categories
6- Can use visual aids
7- Any time/location
8- Cover large geographic area
9- Can use complex skip patterns
1- Varying degrees of computer skills,
access, connection speeds,
configurations
2- Challenge to verify informed consent
3- Concern about multiple responses
from same person
4- Difficult to track non responders
5- Could be biased sample
Focus Group Discussion
• Focus group is a structured discussion with the purpose of stimulating
conversation around a specific topic.
• Focus group discussion is led by a facilitator who poses questions and
the participants give their thoughts and opinions.
• Focus group discussion gives us the possibility to cross check one
individual’s opinion with other opinions gathered.
• A well organized and facilitated FGD is more than a question and
answer session.
• In a group situation, members tend to be more open and the dynamics
within the group and interaction can enrich the quality and quantity of
information needed.
FGD: practical issues
The ideal size of the Focus groups:
• 8-10 participants
• 1 Facilitator
• 1 Note-taker
Preparation for the Focus Group
• Identifying the purpose of the discussion
• Identifying the participants
• Develop the questions
Running the Focus Group
1) Opening the Discussion
2) Managing the discussion
3) Closing the focus group
4) Follow-up after the focus group
III- Observation
OBSERVATION is a technique that involves systematically
selecting, watching and recording behavior and characteristics of
living beings, objects or phenomena.
• Without training, our observations will heavily reflect our
personal choices of what to focus on and what to remember.
• You need to heighten your sensitivity to details that you
would normally ignore and at the same time to be able to
focus on phenomena of true interest to your study.
Observation of human behavior
• Participant observation: The observer takes part in the
situation he or she observes
– Example: a doctor hospitalized with a broken hip, who now observes
hospital procedures ‘from within’
• Non-participant observation: The observer
watches the situation, openly or concealed,
but does not participate
Types of observation
Open
– (e.g., ‘shadowing’ a health worker with his/her permission
during routine activities)
Concealed
– (e.g., ‘mystery clients’ trying to obtain antibiotics without
medical prescription)
Observations of objects
– For example, the presence or absence of a operative room
hand washing facilities and its state of cleanliness
1. General observation may be used as the starting point
in to be familiar with the setting and the new context.
2. Focus observation may be used to evaluate
whether people really do what they say they do.
3. Access the unspoken knowledge of subject, that
is, the subconscious knowledge that they would not be
able to verbalize in an interview setting.
4. Compare a phenomena and its specific
components in greater detail.
1. Space (physical places)
2. Actors (people involved)
3. Activities (the set of related acts people do)
4. Object (the physical things that are present)
5. Time (the sequencing that takes place over time)
6. Goal (the things people are trying to accomplish)
7. Feeling (the emotions felt and expressed)
(Spradlet 1979)
Dimensions of observation
Mixed Methods in data collection
Integrating or combining qualitative and quantitative methods to
draw on strengths of each:
Reasons for using mixed methods
View problems from multiple perspectives
Contextualize information
Develop more complete understanding of problem
Challenges
Teamwork, resources, sample size, interpretation
Basic Mixed Methods Designs
Qualitative →Quantitative: qualitative used to develop outcome
measure or intervention
Quantitative →Qualitative: qualitative used to explain
quantitative outcomes in-depth
Concurrent: Qualitative used to understand participant’s
experiences with intervention/describe process
1
2
Effect of data collection methods on response
Multiple methods increase response rates
Aural vs. Visual (Interviewer vs. self response)
Aural more positive
Aural give more agreeable answers
Questions often tailored to mode
- Yes/No popular with telephone; Long list of check
boxes popular for web
- Long scales often used for self-administered; shorter
scales for telephone
- Vast array of visual/graphic choices available for
computerized surveys
Techniques of data collection
(advantages and disadvantages)
DisadvantagesAdvantagesTechnique
1. Accessible.
2. Non-ethical
3. Incomplete and imprecise.
A. Ethical issues
B. Observer bias
C. Data collector may influence
results.
D. Need training.
1. Inexpensive
2. Permit examination of past
trends.
A. More detailed information.
B. Facts not mentioned by
questioning
C. Test reliability
Records and registries
Observation
Techniques of data collection
(advantages and disadvantages)
DisadvantagesAdvantagesTechnique
I. Interviewer may influence results
II. Less accurate recording than
observation
III. Needs trained personnel
1. Not suitable for illiterate
2. Low response rate
3. Problem of misunderstanding
I. Suitable for illiterates
II. Permits clarification
III. High response rate
1. Less expensive
2. Permit anonymity
3. Less personnel
4. Eliminate bias
Personal interviewing
Self administered
questionnaire
Techniques of data collection
(advantages and disadvantages)
DisadvantagesAdvantagesTechnique
1. Interviewer may
influence results
2. Open-ended questions
3. Domination
4. Non response
o Training
o Validity and accuracy
Collection of in-depth
information and exploration
oPrecision
oEliminate bias
Focus group discussion
Measuring scale
- Quantitative methods to statistically assess the
reliability and validity of survey instruments; also a
way to establish scoring mechanisms
- Enables users to combine a set of items and come up
with a single score(e.g. level of depression or physical
functioning)
Psychometry
Classical Test
Theory (Old
science)
Requires the
use of EVERY
item in a set
Modern Test
Theory (Current
standard)
Focuses on
contribution of
each individual
item in a set
To what extent
does each item measure
the underlying
construct?
Differential item functioning (DIF)
detects error related to subgroups of
people, Identify items that introduce
bias
Computerized Adaptive
Testing
Combines item response
theory (IRT) and computer
technology:
- Question selected based on
person’s
response to previous questions-
Requirements of Measurement in Research
Definitions
CONSTRUCT: A theoretical concept
MEASUREMENT: A system of defining the
level of a construct
Operational Definition: The method used
for examining some domain
Examples
1. Depression
A. Hamilton Depression Rating Scale
B. Beck Depression Inventory
2. Tremor
A. Judge rated spirals
B. Computer evaluated spirals
3. Heart Disease
A. Cholesterol
B. C-Reactive Protein
Validity: How
well does the
measure reflect
the construct?
1. Construct
A. Face
B. Content
2.Criterion-related
A. Convergent
B. Divergent
Reliability:
Consistency of
measurement
1. Internal Consistency
2. Inter-Rater
3. Test-Retest
Reliability is defined as the extent to which a questionnaire,
test, observation or any measurement procedure produces
the same results on repeated trials.
In short, it is the stability or consistency of scores over time
or across raters.
Reliability
Aspects of reliability:
equivalence,
stability and
internal consistency
(homogeneity )
The amount of agreement between
two or more instruments that are
administered at nearly the same point
in time.
Measured through a parallel forms
procedure in which one administers
alternative forms of the same measure
to either the same group or different
group of respondents.
The higher the degree of correlation
between the two forms, the more equivalent
they are. Seldom implemented, difficult to
verify that two tests are indeed parallel (i.e.,
have equal means, variances, and
correlations with other measures)
Equivalence is
demonstrated by
assessing inter-rater
reliability which refers to
the consistency with
which observers or
raters make judgments.
Aspects of reliability:
equivalence,
stability and
internal consistency
(homogeneity )
When the same or similar scores are
obtained with repeated testing with the same
group of respondents. In other words, the
scores are consistent from one time to the
next. Stability is assessed through a test-
retest procedure that involves administering
the same measurement instrument to the
same individuals under the same conditions
after some period of time. Test-rest
reliability is estimated with correlations
between the scores at Time 1 and those at
Time 2
Assumptions:
1-The characteristic that is
measured does not change
over the time period.
2-The time period is long
enough that the respondents’
memories of taking the test at
Time 1 does not influence their
scores at the second test
administrations.
Aspects of reliability:
equivalence,
stability and
internal consistency
(homogeneity )
The extent to which items on the test or
instrument are measuring the same thing.
If the individual items are highly correlated
with each other you can be highly
confident in the reliability of the entire
scale. Internal consistency is estimated via
the split-half reliability index, coefficient
alpha (Cronbach, 1951) index or the
Kuder- Richardson formula 20 (KR -
20)(Kuder & Richardson, 1937) index.
The split- half estimate entails
dividing up the test into two
parts (e.g., odd/even
items or first half of the
items/second half of the
items), administering the two
forms to the same group of
individuals and correlating the
responses. Coefficient alpha
and KR - 20 both represent
Specifically, coefficient alpha during scale
development with items that have several
response options (i.e., 1 = strongly
disagree to 5 = strongly agree) whereas
KR -20 is used to estimate reliability for
dichotomous (i.e., Yes/No; True/False)
response scales
The more items you have in
your scale to measure the
construct of interest the
more reliable your scale will
become. But at what cost?
The extent to which the instrument
measures what it purports to
measure:
1- content validity ,
2- face validity ,
3- criterion -related validity (or
predictive validity),
4- construct validity ,
5- factorial validity ,
6- concurrent validity ,
7- convergent validity and divergent
Validity of an instrument
Content validity pertains to the degree to
which the instrument fully assesses or
measures the construct
of interest. The development of a content
valid instrument is typically achieved by a
rational analysis of the instrument by
raters (ideally 3 to 5) familiar with the
construct of interest.
Specifically, raters will review all of the
items for readability, clarity and
comprehensiveness.
The extent to which the instrument
measures what it purports to
measure:
1- content validity ,
2- face validity ,
3- criterion -related validity (or
predictive validity),
4- construct validity ,
5- factorial validity ,
6- concurrent validity ,
7- convergent validity and divergent
Validity of an instrument
Face validity is a component of
content validity and is established
when an individual reviewing the
instrument concludes that it
measures the characteristic or trait
of interest.
The extent to which the instrument
measures what it purports to
measure:
1- content validity ,
2- face validity ,
3- criterion -related validity (or
predictive validity),
4- construct validity ,
5- factorial validity ,
6- concurrent validity ,
7- convergent validity and divergent
Validity of an instrument
Assessed when one is interested in
determining the relationship of
scores on a test to a specific
criterion.
The extent to which the instrument
measures what it purports to
measure:
1- content validity ,
2- face validity ,
3- Criterion -related : predictive
4- construct validity ,
5- factorial validity ,
6- Criterion-related: concurrent
7- convergent validity and divergent
Validity of an instrument
The degree to which an instrument
measures the trait or theoretical
construct that it is intended to
measure. Construct validity is very
much an ongoing process as one
refines a theory, if necessary, in
order to make predictions about
test scores in various settings and
situations.
Thank you

Data collection

  • 1.
    Data Collection: Methodsand Instruments Professor Tarek Tawfik Amin Epidemiology and Public Health, Faculty of Medicine, Cairo University Geneva Foundation for Medical Education and Training Asian Pacific Organization for Cancer Prevention International Osteoporosis Foundation Wiley Innovative Panel amin55@myway.com dramin55@gmail.com Basic Research Competency Program for Research Coordinators August 2015, MEDC, Faculty Of Medicine, Cairo University, Cairo, Egypt.
  • 2.
    Objectives By the endof this session, research coordinator will be able to: 1- Indentify the research data plan and its importance. 2- Differentiate between primary and secondary data sources 3- Recognize the different data collection methods, tools and techniques and the demerits of each. 4- Recognize the importance of reliability and validity of an instrument.
  • 3.
    • Data canbe define as the quantitative or qualitative value of a variable (e.g. number, images, words, figures, facts or ideas) • It is a lowest unit of information from which other measurements and analysis can be done. • Data is one of the most important and vital aspect of any research study. Introduction
  • 4.
    Before starting • Accurateand systematic data collection is critical to conducting scientific research. • Data collection allows to collect information about study objects/subjects/participants. • Includes documents review, observation, questioning, measuring, or a combination of different methods.
  • 5.
    o Objectives andscope of the enquiry (research question). o Sources of information (type, accessibility). o Quantitative expression (measurement/scale). o Techniques of data collection. o Unit of collection. Factors to be Considered Before Collection of Data (plan)
  • 7.
    Data collection plan Identifytypes of data Types of measurements and variables Instrume nts Scales Methods Written permissions Pilot testing Revise 1- Data collection forms 2-Operational procedures Implementat ion Research question Research hypothesis
  • 8.
  • 9.
    Internal o Many institutionsand departments have information about their regular functions, for their own internal purposes. o When those information are used in any survey is called internal sources of data. Routine surveillance, hospital records . External o When information is collected from outside source. o Such types of data are either primary or secondary. o This type of information can be collected by census or sampling. Internal vs. External Sources of Data
  • 10.
     Collected fromfirst-hand experiences is known as primary data. More reliable, authentic and not been published anywhere.  Primary data has not been changed or altered by human being, therefore its validity is greater than secondary data. I- Primary Data
  • 11.
    Methods of collecting primary data DirectPersonal Investigation (interviewing) Indirect oral investigation Case studies Measurements Lab. results Experimentation Investigation through observation
  • 12.
    Merits Targeted issues are addressed Datainterpretation is better High accuracy of data Addressing specific research issues Greater control Demerits Cost Time More personnel / resources Inaccurate feedback Training, skill and laborious. Primary Data
  • 13.
     Already beencollected by others.  Journals, periodicals, research publication ,official record etc.  May be available in the published or unpublished form.  Resorted to when primary data sources/methods are infeasible -, inaccessible. II-Secondary Data
  • 14.
  • 15.
    Merits Quick and cheap Widergeographical area Longer orientation period Leading to primary data Demerits Not fulfilling specific research needs Poor accuracy Not up to date Poor accessibility in some cases Secondary Data
  • 16.
    Primary data  Realtime  Sure about the sources  Can answer research question.  Cost and time  Can avoid bias  More flexible Secondary data o Past data o Not sure about sources o Refining the research problem o Cheap and no time o Bias can’t be ruled out o Less flexible Primary vs. secondary data
  • 17.
     Birth anddeath records  Medical records at physician offices, hospitals, nursing homes, etc.  Medical databases within various agencies, universities, and institutions.  Physical exams and laboratory testing  Diseases registries  Self-report measures: interviews and questionnaires Data Sources for Health Research Primary or secondary sources
  • 18.
    Research Data: Considerations Datacollection vs. data analysis • Poor data collection and management can render a perfectly executed trial useless • Bad data practices carry resource and ethical costs • Good practices: – What are data? – How are they represented? – How are they stored for retrieval and use?
  • 19.
    Considerations in collectingclinical data: Data are Surrogates Data (specimens) are all that remain after the active phase of a clinical trial  Data are about the objects and events in the trial  Understanding how the data are captured and recorded affects interpretation  Improper collection or interpretation lead to
  • 20.
    Indirect nature ofData Hematocrit: – an indicator of oxygen carrying capacity -depends on chemical alterations of hemoglobin and concentration of hydrogen ions – an indicator of blood or bone marrow health -falsely normal in the setting of an acute hemorrhage. Mortality: – patients who died from unrelated causes – accounting for patients who are lost to follow-up
  • 21.
    Considerations Cont’: Objectivity, Subjectivityand Reproducibility Objectivity: the degree to which recorded data may be influenced by the individual thought of the observer • Data reported by subjects (such as symptoms of a disease) can be objectively recorded (statements themselves are subjective) • Objective observations (such as signs of a disease) can be made by outside observers Episode of bleeding reported by a subject ≠episode witnessed by a researcher
  • 22.
    Objectivity is aprocess by which the observation is represented in the data – Human observer is never completely free of influences Controlling of subjectivity: • Use an unbiased device to record information • Employ rating systems and train the observers • Consider limits to data precision Reproducibility – corroborating findings in a subsequent study requires knowing how observations were made (metadata)
  • 23.
    The concept ofMETADATA “Data about the data” (e.g., method of collection, relationship of data to the events in the research protocol, etc.) - Temporal metadata require particular attention - Understand the implications of a time (e.g., if a blood specimen is drawn to measure a drug level, we must know the time that the specimen was drawn and the time the drug was administered - Need to choose when to measure and how often
  • 24.
    Types of Data I-Quantitative data - measurements that can be manipulated mathematically • Precision - body temperature, serum chloride , absolute eosinophil count. II- Qualitative data - conceptual entities rather than numeric values (subject gender and race, signs and symptoms, diagnoses) • May represent concepts that relate to quantitative data [“blood pressure” is numeric, but the procedures themselves are qualitative] III- Ordinal data look like numbers (e.g., urine protein measurements “0”, “1+”, “2+”, etc.) IV- Signal data - quantitative in nature but are treated as qualitative (e.g., electrocardiogram tracings)
  • 25.
    Data Standards A.Support datausefulness and exchange B.Standards for quantitative data: units of measurements. C.Standards for qualitative data: controlled terminologies (ICD-10) D.Standards for data format
  • 26.
    Primary Research Methodsand Techniques Surveys  Personal interview (intercepts)  Mail  In-house, self- administered  Telephone, fax, e- mail, Web Quantitative Data Primary Research Experiments Mechanical observation Simulation Qualitative Data Case studies Human observation Individual in-depth interviews Focus groups
  • 27.
    Differentiation between datacollection techniques and tools. ToolsTechniques Data compilation sheet Check list, eye, watch, scales, Microscope, pen and paper. Schedule, agenda, questionnaire, recorder. Questionnaire. Using available data Observation Interviewing Self-administered questionnaire
  • 28.
    1- Self-reported data Someinformation can be gathered only by asking people questions (i.e. not easily observable)  Self report measures are estimates of true scores True score + Measurement error = Survey response
  • 29.
    1 1 Pitfalls of self-reportedinformation Susceptible to the respondent’s A. Mood B. Motivation C. Memory D. Understanding
  • 30.
    Also susceptible to: oContextcircumstances of interview oSocial desirability choosing answers that are viewed favorably
  • 31.
    Common Types ofQuestions • Open-ended What health conditions do you have? • Closed-ended Which of the following conditions do you currently have? Say yes or no to each. -Diabetes? -Asthma? - Hypertension
  • 32.
    Common Types ofQuestions I- Response options - Nominal – unordered response categories (e.g. male, female) - Ordinal – ordered response categories (e.g. excellent, good, fair, poor) II- Type of information - Factual – objectively verifiable facts and events - Subjective – knowledge, perceptions, feelings, judgment
  • 33.
  • 34.
    I- Document Review Aqualitative (sometimes quantitative) research project may require review of documents such as: – Course syllabi – Faculty journals – Meeting minutes – Strategic plans – Newspapers
  • 35.
    Depending on theresearch question, the researcher might utilize: – Rating scale – Checklist – Content analysis – Matrix analysis
  • 36.
  • 37.
    1 2 1 21 2 Self reported data collection methods Computerized Paper-based Interviewer administration (human) In person Telephone In person Telephone Self Administration Web, Smartphone, Tablet Paper Interactive voice response Telephone, Web Not applicable Pros 1- Faster data availability 2- Can handle complex skip Patterns 3- Can be tailored to severity of symptoms or situation (computerized adaptive testing) 1- Answer respondent questions, probe for adequate answers 2- Administer to illiterate/low reading level 3- Easier to reach poor, homeless, etc. 4- Build rapport 5- People feel more anonymous 6- Can use visual aids Cons 1- Data can get lost if system crashes 2- Requires power source 1- Expensive 2- Longer data collection period 3- Interviewer presence/technique can bias results
  • 38.
    A-Personal Interview Interviews consistof collecting data by asking questions. • Data can be collected by listening to individuals, recording, filming their responses, or a combination of methods. There are four types of interview: • Structured interview • Semi-structured interview • In-depth interview, and • Focused group discussion
  • 39.
    In structured interviewsthe questions as well as their order is already scheduled. • Your additional intervention consists of giving more explanation to clarify your question (if needed), and to ask your respondent to provide more explanation if the answer they provide is vague (probing). Semi-structured and in-depth interviews • Semi-structured interviews include a number of planned questions, but the interviewer has more freedom to modify the wording and order of questions. • In-depth interview is less formal and the least structured, in which the wording and questions are not predetermined. This type of interview is more appropriate to collect complex information with a higher proportion of opinion-based information.
  • 40.
    Pros Cons •Collect completeinformation with greater understanding. •It is more personal, as compared to questionnaires, higher response rates. •It allows more control over the order and flow of questions. •Necessary changes in the interview schedule based on initial results can be made (which is not possible in the case of a questionnaire study/survey) •Data analysis—especially when there is a lot of qualitative data. •Interviewing can be tiresome for large numbers of participants. • Risk of bias is high due to fatigue and to becoming too involved with interviewees Interview
  • 41.
    Pros Cons 1- LowerCosts 2- Can ensure uniform data collection 3- Shorter data collection period 4- Cell phones are best way to reach transient people 1-Omit persons without phones 2-Phone accessibility 3-Need complex statistical framework 4- Cannot use visual aids 5- Many of us do not answer our phone B- Telephone Interview
  • 42.
    2- Paper andpen Self Administered Pros Cons 1- Anonymity 2- Can use longer, more complex response categories 3- Can use visual aids 4- Consistent across respondents 5- Cover large geographic area 6- Length easy to see (plus or minus?) 1- Good reading and writing skills required 2- Cannot have complex skip patterns 3- No quality control 4- Similar cost and response rates to other methods
  • 43.
    3- Web, smartphone administered survey Pros Cons 1- Anonymity 2- Better for sensitive items 3- Timely data 4- Lower cost 5- Can use long list of response categories 6- Can use visual aids 7- Any time/location 8- Cover large geographic area 9- Can use complex skip patterns 1- Varying degrees of computer skills, access, connection speeds, configurations 2- Challenge to verify informed consent 3- Concern about multiple responses from same person 4- Difficult to track non responders 5- Could be biased sample
  • 44.
    Focus Group Discussion •Focus group is a structured discussion with the purpose of stimulating conversation around a specific topic. • Focus group discussion is led by a facilitator who poses questions and the participants give their thoughts and opinions. • Focus group discussion gives us the possibility to cross check one individual’s opinion with other opinions gathered. • A well organized and facilitated FGD is more than a question and answer session. • In a group situation, members tend to be more open and the dynamics within the group and interaction can enrich the quality and quantity of information needed.
  • 45.
    FGD: practical issues Theideal size of the Focus groups: • 8-10 participants • 1 Facilitator • 1 Note-taker Preparation for the Focus Group • Identifying the purpose of the discussion • Identifying the participants • Develop the questions Running the Focus Group 1) Opening the Discussion 2) Managing the discussion 3) Closing the focus group 4) Follow-up after the focus group
  • 46.
    III- Observation OBSERVATION isa technique that involves systematically selecting, watching and recording behavior and characteristics of living beings, objects or phenomena. • Without training, our observations will heavily reflect our personal choices of what to focus on and what to remember. • You need to heighten your sensitivity to details that you would normally ignore and at the same time to be able to focus on phenomena of true interest to your study.
  • 47.
    Observation of humanbehavior • Participant observation: The observer takes part in the situation he or she observes – Example: a doctor hospitalized with a broken hip, who now observes hospital procedures ‘from within’ • Non-participant observation: The observer watches the situation, openly or concealed, but does not participate Types of observation
  • 48.
    Open – (e.g., ‘shadowing’a health worker with his/her permission during routine activities) Concealed – (e.g., ‘mystery clients’ trying to obtain antibiotics without medical prescription) Observations of objects – For example, the presence or absence of a operative room hand washing facilities and its state of cleanliness
  • 49.
    1. General observationmay be used as the starting point in to be familiar with the setting and the new context. 2. Focus observation may be used to evaluate whether people really do what they say they do. 3. Access the unspoken knowledge of subject, that is, the subconscious knowledge that they would not be able to verbalize in an interview setting. 4. Compare a phenomena and its specific components in greater detail.
  • 50.
    1. Space (physicalplaces) 2. Actors (people involved) 3. Activities (the set of related acts people do) 4. Object (the physical things that are present) 5. Time (the sequencing that takes place over time) 6. Goal (the things people are trying to accomplish) 7. Feeling (the emotions felt and expressed) (Spradlet 1979) Dimensions of observation
  • 51.
    Mixed Methods indata collection Integrating or combining qualitative and quantitative methods to draw on strengths of each: Reasons for using mixed methods View problems from multiple perspectives Contextualize information Develop more complete understanding of problem Challenges Teamwork, resources, sample size, interpretation
  • 52.
    Basic Mixed MethodsDesigns Qualitative →Quantitative: qualitative used to develop outcome measure or intervention Quantitative →Qualitative: qualitative used to explain quantitative outcomes in-depth Concurrent: Qualitative used to understand participant’s experiences with intervention/describe process
  • 53.
    1 2 Effect of datacollection methods on response Multiple methods increase response rates Aural vs. Visual (Interviewer vs. self response) Aural more positive Aural give more agreeable answers Questions often tailored to mode - Yes/No popular with telephone; Long list of check boxes popular for web - Long scales often used for self-administered; shorter scales for telephone - Vast array of visual/graphic choices available for computerized surveys
  • 54.
    Techniques of datacollection (advantages and disadvantages) DisadvantagesAdvantagesTechnique 1. Accessible. 2. Non-ethical 3. Incomplete and imprecise. A. Ethical issues B. Observer bias C. Data collector may influence results. D. Need training. 1. Inexpensive 2. Permit examination of past trends. A. More detailed information. B. Facts not mentioned by questioning C. Test reliability Records and registries Observation
  • 55.
    Techniques of datacollection (advantages and disadvantages) DisadvantagesAdvantagesTechnique I. Interviewer may influence results II. Less accurate recording than observation III. Needs trained personnel 1. Not suitable for illiterate 2. Low response rate 3. Problem of misunderstanding I. Suitable for illiterates II. Permits clarification III. High response rate 1. Less expensive 2. Permit anonymity 3. Less personnel 4. Eliminate bias Personal interviewing Self administered questionnaire
  • 56.
    Techniques of datacollection (advantages and disadvantages) DisadvantagesAdvantagesTechnique 1. Interviewer may influence results 2. Open-ended questions 3. Domination 4. Non response o Training o Validity and accuracy Collection of in-depth information and exploration oPrecision oEliminate bias Focus group discussion Measuring scale
  • 57.
    - Quantitative methodsto statistically assess the reliability and validity of survey instruments; also a way to establish scoring mechanisms - Enables users to combine a set of items and come up with a single score(e.g. level of depression or physical functioning) Psychometry
  • 58.
    Classical Test Theory (Old science) Requiresthe use of EVERY item in a set Modern Test Theory (Current standard) Focuses on contribution of each individual item in a set To what extent does each item measure the underlying construct? Differential item functioning (DIF) detects error related to subgroups of people, Identify items that introduce bias Computerized Adaptive Testing Combines item response theory (IRT) and computer technology: - Question selected based on person’s response to previous questions-
  • 59.
  • 60.
    Definitions CONSTRUCT: A theoreticalconcept MEASUREMENT: A system of defining the level of a construct Operational Definition: The method used for examining some domain
  • 61.
    Examples 1. Depression A. HamiltonDepression Rating Scale B. Beck Depression Inventory 2. Tremor A. Judge rated spirals B. Computer evaluated spirals 3. Heart Disease A. Cholesterol B. C-Reactive Protein
  • 62.
    Validity: How well doesthe measure reflect the construct? 1. Construct A. Face B. Content 2.Criterion-related A. Convergent B. Divergent Reliability: Consistency of measurement 1. Internal Consistency 2. Inter-Rater 3. Test-Retest
  • 63.
    Reliability is definedas the extent to which a questionnaire, test, observation or any measurement procedure produces the same results on repeated trials. In short, it is the stability or consistency of scores over time or across raters. Reliability
  • 64.
    Aspects of reliability: equivalence, stabilityand internal consistency (homogeneity ) The amount of agreement between two or more instruments that are administered at nearly the same point in time. Measured through a parallel forms procedure in which one administers alternative forms of the same measure to either the same group or different group of respondents. The higher the degree of correlation between the two forms, the more equivalent they are. Seldom implemented, difficult to verify that two tests are indeed parallel (i.e., have equal means, variances, and correlations with other measures) Equivalence is demonstrated by assessing inter-rater reliability which refers to the consistency with which observers or raters make judgments.
  • 65.
    Aspects of reliability: equivalence, stabilityand internal consistency (homogeneity ) When the same or similar scores are obtained with repeated testing with the same group of respondents. In other words, the scores are consistent from one time to the next. Stability is assessed through a test- retest procedure that involves administering the same measurement instrument to the same individuals under the same conditions after some period of time. Test-rest reliability is estimated with correlations between the scores at Time 1 and those at Time 2 Assumptions: 1-The characteristic that is measured does not change over the time period. 2-The time period is long enough that the respondents’ memories of taking the test at Time 1 does not influence their scores at the second test administrations.
  • 66.
    Aspects of reliability: equivalence, stabilityand internal consistency (homogeneity ) The extent to which items on the test or instrument are measuring the same thing. If the individual items are highly correlated with each other you can be highly confident in the reliability of the entire scale. Internal consistency is estimated via the split-half reliability index, coefficient alpha (Cronbach, 1951) index or the Kuder- Richardson formula 20 (KR - 20)(Kuder & Richardson, 1937) index. The split- half estimate entails dividing up the test into two parts (e.g., odd/even items or first half of the items/second half of the items), administering the two forms to the same group of individuals and correlating the responses. Coefficient alpha and KR - 20 both represent Specifically, coefficient alpha during scale development with items that have several response options (i.e., 1 = strongly disagree to 5 = strongly agree) whereas KR -20 is used to estimate reliability for dichotomous (i.e., Yes/No; True/False) response scales
  • 67.
    The more itemsyou have in your scale to measure the construct of interest the more reliable your scale will become. But at what cost?
  • 68.
    The extent towhich the instrument measures what it purports to measure: 1- content validity , 2- face validity , 3- criterion -related validity (or predictive validity), 4- construct validity , 5- factorial validity , 6- concurrent validity , 7- convergent validity and divergent Validity of an instrument Content validity pertains to the degree to which the instrument fully assesses or measures the construct of interest. The development of a content valid instrument is typically achieved by a rational analysis of the instrument by raters (ideally 3 to 5) familiar with the construct of interest. Specifically, raters will review all of the items for readability, clarity and comprehensiveness.
  • 69.
    The extent towhich the instrument measures what it purports to measure: 1- content validity , 2- face validity , 3- criterion -related validity (or predictive validity), 4- construct validity , 5- factorial validity , 6- concurrent validity , 7- convergent validity and divergent Validity of an instrument Face validity is a component of content validity and is established when an individual reviewing the instrument concludes that it measures the characteristic or trait of interest.
  • 70.
    The extent towhich the instrument measures what it purports to measure: 1- content validity , 2- face validity , 3- criterion -related validity (or predictive validity), 4- construct validity , 5- factorial validity , 6- concurrent validity , 7- convergent validity and divergent Validity of an instrument Assessed when one is interested in determining the relationship of scores on a test to a specific criterion.
  • 71.
    The extent towhich the instrument measures what it purports to measure: 1- content validity , 2- face validity , 3- Criterion -related : predictive 4- construct validity , 5- factorial validity , 6- Criterion-related: concurrent 7- convergent validity and divergent Validity of an instrument The degree to which an instrument measures the trait or theoretical construct that it is intended to measure. Construct validity is very much an ongoing process as one refines a theory, if necessary, in order to make predictions about test scores in various settings and situations.
  • 72.