SlideShare a Scribd company logo
EREAN SH. (MPH/E)
Objectives
At the end of this section, the students will be able to:
• Differentiate study design that is relevant for specific research
question
• Differentiate appropriate variable for specific study
• Explain population, sample size calculation and sampling
techniques method for quantitative studies
Methodology
Possible sub-sections of the methodology:
• The study design
• Setting/Area
• Population of the study
• Sample size and sampling strategies
• Variables
3
Study area
• Location, Physical features (climate, altitude...), Population
size and composition
• Infrastructures -education, health, communication…
• Economy
Study period
Time required to conduct the study
4
Choosing study designs
A study design is a specific plan or protocol for
conducting the study, which allows the investigator to
translate the conceptual hypothesis into an operational
one.
• A study design is the process that guides researchers on
how to collect, analyze and interpret observations.
• It is a logical model that guides the investigator in the
various stages of the research.
5
• Does it adequately test the hypotheses?
• Hypotheses determine participants, variables measured &
data analysis methods
• Are results generalizable?
• Replicate to other samples and other contexts
• Random selection of participants
• Does it identify and control extraneous factors?
• Eliminate alternative explanations for results to increase
confidence in cause-effect conclusion (internal validity)
• Control depends on type of design
How to Choose a Research Design
6
How to Choose a Research Design…
• Can the hypothesis be rejected or retained via
statistical means? (statistical conclusion validity)
• Need reliable measures
• Need large enough sample to detect true effect
• Is the design efficient in using available resources?
• Optimal balance between research design, time, resources
and researcher expertise
7
Eg. If the problems calls for:
a) the identification of factors that influence an outcome
b) the utility of an intervention, or
c) knowing the prevalence of diseases
Then a quantitative approach is best
• If the research problem needs a concept or phenomenon
needs to be understood because little research has been
done on it, then qualitative approach is preferred
• A mixed methods design is useful when either the
quantitative or qualitative approach by itself is inadequate
to best understand a research problem
How to Choose problems call for specific approaches…
8
9
10
Features of Descriptive Study
• Studies occurrence of diseases with respect to time,
place and person.
• Useful for health managers to allocate resource and to
plan effective prevention programs.
• Useful to generate epidemiological hypothesis in the
search for disease risk factors.
11
Features of Descriptive Study…
• Not aimed specifically to test a hypothesis
• No attempt to gather data on controls
• Inexpensive and less time-consuming: can use
information collected routinely.
• Most common type of epidemiological study in the
medical literature.
12
Case Report and Case Series
• Documentation of unusual medical occurrence with
detailed description
• First clue in the identification of new disease or adverse
effect of exposure
• The profile of a single patient is reported in detail by
one or more clinicians
13
Case Series
• Collection of individual case reports, which occurs with in
fairly short period of time.
• Is an individual case report that has been expanded to include
a number of patients with a given disease
• Helps in identifying the beginning or presence of epidemic.
• Helps in hypothesis formulation
• Lack comparison group
14
Case report/series: Limitations
–Lack of denominator to calculate rates of disease
–Lack of comparison group
–No selection of appropriate study populations
–Sampling variations
– No sampling employed, emerging cases are
reported.
15
A cross-sectional study (survey)
 Snapshot of the health status of populations at a certain
point in time.
 Compare prevalence of disease in persons with and
without the exposure of interest
 Cross-sectional studies must be done on representative
samples of the population.
16
17
Advantage of Cross-sectional
• Provides prevalence estimates of exposure and
disease.
• Easier to perform than studies that require follow-
up (hence relatively inexpensive).
• Can evaluate multiple risk (and protective) factors
and health outcomes at the same point in time
18
Advantage …
• May identify groups of persons at high or low risk
of disease
• Can be used to generate hypotheses about
associations between predictive factors and disease
outcomes
19
Limitation of cross-sectional
 Temporal(time) sequence between exposure and disease
cannot be established
* i.e. chicken-or-egg dilemma.
Example: In the study of knowledge of modern
contraceptive, did the women know about it and then
start to use it or did they learn about it because they were
using it.
20
Analytic Studies
• Focus on identifying risk factors
• Always use comparison group
• Test hypotheses
• Relatively costly
• Less often used than descriptive studies
21
Case-control Studies
A case-control study is one in which persons with a
condition ("cases") and suitable comparison subjects
("controls") are identified, and then the two groups are
compared with respect to prior exposure.
– subjects are sampled by their outcome status.
Assess retrospectively on exposure status
Relatively cheaper, (Time and Cost)
Measure of association is using Odds ratio
22
23
Case control: numbers and ratio
• A single control group is optimal in most studies. Conditions for
multiple controls:
– when a single control group is considered to be not appropriate.
– when the selected group has a specific deficiency that could be
avoided by inclusion of another control group.
Control-case ratio
• Efficiency of a study can be maximized up to control-case ratio of
4:1.
24
25
26
Cohort Studies
• Cohort studies are epidemiologic designs that
identify comparison groups according to their
exposure status.
• Disease free subjects are sampled by exposure
status
27
28
Characteristics of a Cohort Study
• Groups of individuals defined on the basis of
presence/absence of exposure to the suspected risk
factor
• All potential subjects must be initially free of the
disease under investigation
• Eligible participants are then followed over time to
assess occurrence of disease
29
Types of Cohort Studies
•Classification of cohort studies depends on the
temporal relationship between the initiation of the
study and the occurrence of the outcome
Prospective Cohort Study
Retrospective Cohort Study
30
Prospective Cohort Studies
•The investigator collects information on the exposure status of the
cohort members at the time the study begins, and identifies new
cases of disease (or deaths) from that time forward
•The exposures may have occurred at the beginning of study BUT
the outcome has certainly not yet occurred.
•After the selection of the cohort, participants must be followed over
time to assess incidence of disease.
E.g. identify oral contraceptive users and non-users; follow for the
years to come and assess heart disease status.
31
32
Retrospective Cohort Studies
• Both the exposures and the outcomes have already
occurred when the study is initiated.
• Exposure status is established from information
recorded at some time in the past, and disease
incidence (or mortality) is determined from then until
the present.
33
Retrospective Cohort Studies…
• Either interview the participants, or use medical records,
to determine their subsequent history from that point to the
present in terms of developing outcome.
• Retrospective: deals with past events; can be done
quickly
• Cohort: the comparison is made between users and non-
users of OCs
34
Factors in Selection of Exposed Group
• Frequency of the exposure of interest: ability of
obtaining sufficient exposed individuals in a
reasonable period of time
• Ability to obtain complete and accurate exposure
and outcome information on all study subjects
35
36
Advantages of Cohort Design
• Valuable when the exposure is rare
• Allows direct measurement of risk
• Can elucidate temporal relationship
• Minimize bias in ascertainment of exposure
• Can examine multiple effects of a single
exposure
37
Disadvantages of Cohort Design
• Not suitable for rare disease
• Cannot be applied for diseases with long
incubation period
• Cost in terms of time and resources
• Obtaining complete information for all
comparison groups
• Loss-to-follow-up
38
39
Experimental Studies
Experimental designs are epidemiologic studies where:
1) Investigator manipulates the condition under study
2) Always prospective
40
41
42
43
Classification of Intervention Studies:
• Based on population
• Clinical trial - usually performed in clinical setting and the
subjects are patients.
• Field trial- used in testing medicine for preventive purpose and
the subjects are healthy people.
• Community trial - a field trial in which the unit of the study is
group of people/ community.
44
Interventions that Can Be Evaluated
• New drugs and new treatment of diseases
• New medical and health care technology
• New methods of primary prevention
• New programs for screening
• New ways of organizing and delivering health services
• New behavioral intervention programs
45
Ethical Considerations in experimental studies
• Risks vs benefits
• Comparison: Standard care vs placebo
• Ethical approval
• Informed consent & confidentiality
• Freedom to withdraw
• Duty of care
• Stopping/Monitoring
• Reporting findings
• Quality: ‘Poor’ quality research is unethical!
46
Experimental studies: Advantages
• The major advantage of experimental studies lie in the
strength of causal inference that can be made.
– it is very difficult to make causal inferences based on
observational studies.
• Experimental studies offer the best design for controlling
confounding variables.
• Gold standard for epidemiologic research
– Randomized Controlled Trials (RCTs)
47
The Quality of “Gold Standard"
• Randomization
• Blinding
• Use of Placebo
48
Assignment
1. Review three articles for one specific study design. From the
article you reviewed:
• Write the characteristics of the study design in the that articles
• Strengths and limitations of the study designs reviewed.
• Which study design need to be repeated to solve the limitation of
that conducted designs and how it can solve?
• How do you calculate sample size for that study design?
• 2. How do you develop data collection tools?
• What are the sources of the tool
• How do you assess whether it is measuring what it intended
to measure or not?.
• For question 1, Write full name and a DOI (Digital Object
Identifier) of an article
Sampling Methods
♣ Sampling involves the selection of a number of study
units from a defined study population.
♣ The population is too large for us to consider collecting
information from all its members.
♣ Instead we select a sample of individuals hoping that the
sample is representative of the population.
50
51
SAMPLING…
• Importance of sampling:
- To save time and money
- Measurements more accurate on samples than entire
population (census)
Defining the population:
- Target population
- Study population
Sampling…
When taking a sample, we will be confronted with the
following questions:
• What is the group of people from which we want to draw a
sample?
• How many people do we need in our sample?
• How will these people be selected?
• What are the errors to be confronted with when taking a
random sample?
52
Definitions of Population
• Target population (reference population or source
population): Is that population about which an investigator
wishes to draw a conclusion.
• Study population: Population from which the sample
actually is drawn
• Sampling unit: The unit of selection in the sampling
process. For example, in a sample of districts, the sampling
unit is a district; in a sample of persons, a person etc.
53
• Study unit: The unit on which the observations will be
collected. For example, persons in a study of disease
prevalence, or households, in a study of family size.
N.B. The sampling unit is not necessarily the same as the
study unit.
• Sampling frame: The list of units from which the
sample is to be selected. The existence of an adequate
and up-to-date sampling frame often defines the study
population.
Definitions of Population
What is a defined population?
♣ The problem of obtaining a sample which is
representative of a larger population needs special
attention.
♣ The population under consideration should be clearly
defined.
♣ It is only after having such a clearly defined population
(i.e., in terms of geographical area, type of study
subjects, etc. ) that the selection of the random sample
could take place
♣ What are the main reasons for the necessity of such
“clear definitions of the population”?
55
How are the study subjects selected?
♣ An important issue influencing the choice of the most
appropriate sampling method is whether a sampling
frame is available (can be maintained), that is, a listing
of all the units that compose the population.
♣ Two broad areas: Non-probability sampling method
and probability sampling method
56
Sampling methods…
♣ Non-probability sampling methods - used when a
sampling frame does not exist
Examples:
• Convenience sampling
• Quota sampling
• These sampling methods do not claim to be
representative of the entire population.
When do you use these techniques?
57
b) Probability sampling methods
♣ They involve random selection procedures to ensure
that each unit of the sample is chosen on the basis of
chance
All units of the population should have an equal or at
least a known chance of being included in the sample.
♣ Sample findings can be generalized
58
b) Probability sampling methods
1. Simple Random Sampling
2. Systematic Sampling
3. Stratified sampling
4. Cluster sampling (all selected clusters will be
considered –take care of clustering effect)
5. Multi-Stage Sampling (consider the design effect)
59
Simple random sampling (SRS)
• This is the most basic scheme of random sampling.
• Each unit in the sampling frame has an equal
chance of being selected
• Representativeness of the sample is ensured.
• However, it is costly to conduct SRS.
• Moreover, minority subgroups of interest in the
population my not be present in the sample in
sufficient numbers for study.
60
Simple random sampling (SRS)…
To select a simple random sample you need to:
 Make a numbered list of all the units in the population from
which you want to draw a sample.
 Each unit on the list should be numbered in sequence from 1
to N (where N is the size of the population)
 Decide on the size of the sample
 Select the required number of study units, using a “lottery”
method or a table of random numbers.
61
Simple random sampling…
Lottery method : for a small population it may be possible to use the
“lottery” method: each unit in the population is represented by a
slip of paper, these are put in a box and mixed, and a sample of the
required size is drawn from the box.
Table of random numbers: if there are many units, however, the
above technique soon becomes laborious. Selection of the units is
greatly facilitated and made more accurate by using a set of random
numbers in which a large number of digits is set out in random
order. The property of a table of random numbers is that, whichever
way it is read, vertically in columns or horizontally in rows, the
order of the digits is random.
Computer generated random list:
62
Systematic Sampling
• Individuals are chosen at regular intervals ( for
example, every kth) from the sampling frame.
• The first unit to be selected is taken at random from
among the first k units.
• For example, a systematic sample is to be selected
from 1200 students of a school.
• The sample size is decided to be 100. The sampling
fraction is: 100 /1200 = 1/12.
63
Systematic Sampling…
• The number of the first student to be included in the
sample is chosen randomly, for example by blindly
picking one out of twelve pieces of paper, numbered 1 to
12.
• If number 6 is picked, every twelfth student will be
included in the sample, starting with student number 6,
until 100 students are selected.
• The numbers selected would be 6,18,30,42,etc.
64
Merits
• Systematic sampling is usually less time consuming and
easier to perform than simple random sampling. It
provides a good approximation to SRS.
• Unlike SRS, systematic sampling can be conducted
without a sampling frame (useful in some situations
where a sampling frame is not readily available). E.g. In
patients attending a health center, where it is not possible
to predict in advance who will be attending.
65
Demerits
• If there is any sort of cyclic pattern in the ordering
of the subjects which coincides with the sampling
interval, the sample will not be representative of the
population.
66
Stratified Sampling
• Appropriate when the distribution of the characteristic to be
studied is strongly affected by certain variable (heterogeneous
population).
• The population is first divided into groups (strata) according to a
characteristic of interest (eg., geographic area, prevalence of
disease, etc.)
• A separate sample is taken independently from each stratum, by
simple random or systematic sampling.
• proportional allocation - if the same sampling fraction is used
for each stratum.
• non- proportional allocation - if a different sampling fraction is
used for each stratum or if the strata are unequal in size and a
fixed number of units is selected from each stratum.
67
Stratified Sampling:
• Merit
- The representativeness of the sample is improved. That
is, adequate representation of minority subgroups of
interest can be ensured by stratification and by varying
the sampling fraction between strata as required.
• Demerit
- sampling frame for the entire population has to be
prepared separately for each stratum.
68
Cluster sampling
• The selection of groups of study units (clusters) instead of the
selection of study units individually
• The sampling unit is a cluster, and the sampling frame is a list of
these clusters.
• Procedure - the reference population (homogeneous) is divided
into clusters.
• These clusters are often geographic units (e.g. districts, villages,
etc.).
• - a sample of such clusters is selected
- all the units in the selected clusters are studied.
• It is preferable to select a large number of small clusters rather
than a small number of large clusters.
69
Cluster sampling…
• Merit - A list of all the individual study units in the
reference population is not required. It is sufficient to
have a list of clusters.
• Demerit - It is based on the assumption that the
characteristic to be studied is uniformly distributed
throughout the reference population, which may not
always be the case.
70
Multi-stage sampling
• This method is appropriate when the reference population is large
and widely scattered.
• Selection is done in stages until the final sampling unit (eg.,
households or persons) are arrived at.
• The primary sampling unit (PSU) is the sampling unit (usually
large size) in the first sampling stage.
• The secondary sampling unit (SSU) is the sampling unit in the
second sampling stage. etc.
• Example - The PSUs could be kebeles and the SSUs could be
households.
71
Multi-stage sampling
• Merit - Cuts the cost of preparing sampling frame
• Demerit - Sampling error is increased compared with a
simple random sample.
 Multistage sampling gives less precise estimates than
simple random sampling for the same sample size, but
the reduction in cost usually far outweighs this, and
allows for a larger sample size.
 That is, a design effect need to be considered
72
What are the errors to be confronted with when
taking a random sample?
♣ When we take a sample, our results will not exactly
equal the correct results for the whole population. That
is, our results will be subject to errors. This error has
two components.
a) Sampling error (i.e., random error)
b) Non Sampling error (i.e., bias)
73
Sampling error (i.e., random error)
•Random error consists of random deviations from
the true value, which can occur in any direction.
• Sampling error (random error) can be
minimized by increasing the size of the sample
74
Non Sampling error (i.e., bias)
• Bias consists of systematic deviations from the true
value, always in the same direction
• It is possible to eliminate or reduce the non-sampling
error (bias) by careful design of the sampling
procedure and by taking care of the errors that may
arise during data analysis.
75
2. Nonprobability Sampling
• Here, the sample is less likely to be representative of
the population, thus it is difficult to extrapolate from
the sample to the population.
• Is used when there is no sampling frame or when it is
impossible to conduct probability sampling due to
economical and feasibility factors.
76
Nonprobability Sampling Cont..
• Judgmental or Purposive Sampling: The researcher chooses the
sample based on who he/she think would be appropriate for the
study.
• Convenience Sampling: The selection of units from the
population is based on availability and/or accessibility.
• Quota Sampling: It starts with systematically setting “Quota” to
represent subgroups of a population. Then data is collected to
meet the predefined Quota.
• Snowball Sampling: The researcher begins by identifying
someone who meets the inclusion criteria of the study. Then the
study subject would be asked to recommend others who s/he may
know who also meet the criteria.
77
Sample size estimation for a study
78
How do we ensure enough precision to make good
programmatic decisions after the results are analysed?
• We calculate the sample size we will need in our study before we start
collecting data
• If a study has not calculated the sample size beforehand, it may have
wasted resources gathering more data than necessary.
• Supervision and training are more difficult with more teams and
study workers
• Not calculating sample size can lead to a greater chance of having
bias in the results.
Sample size determination for study
• Sample size:- it is the number of study population (subjects)
required to study an estimate in a population
• Any sample size will give you an estimate of the population
parameter
• However, the larger the sample the more precision
• Always calculate the sample size that gives the required
precision!
• Too large sample size: too expensive and time consuming
• Too small sample size: it has inadequate precision to show a
good estimate or show difference
Sample size depends on
1.Estimated variability (denoted by P)
2.The precision; or margin of error
(denoted by d)
3.The sampling (clustering; Dffect)
(denoted by g)
4.Size of population
5.Feasibility (cost)
6.Confidence level (Z value of certainty)
7.Non-response rate
* g
 If the whole population is examined, then there is
no uncertainty
 If a sample is taken, then sampling variability is
introduced – level of precision
Sampling variability and precison
An estimate based on a sample depends on chance!
It depends on which of the numerous different samples, study subject are
selected
1. Estimated variability
• This is the estimated proportion of the event in the population as
estimated from similar population
• It is taken usually from similar study in literature
• It could also be found in pilot study
• In the absence of a way to estimate this variability, the maximum
variability of 50% is considered.
• If there are two similar studies about the population estimate, it is
good to take the one nearer to 50%.
2. What is precision (margin of error)
• It is postulated confidence level the true population mean would be
bounded from the sample.
• It is the standard error of the mean or the proportion of the sample
• Standard error is a function of the standard deviation against the
square root of the sample size
• Sample size is inversely related to precision
SE =
SD
n
Mean
(p•(1-p)
 n
SE =
Proportion
Cont’d …
• A measure of how close an estimate is to the true
value of a population parameter.
• For example,
• a prevalence of 10% from a sample of size 20
 a 95% confidence interval of 1% to 31%,
 not very precise or informative.
• a prevalence of 10% from a sample of size 400
 a 95% confidence interval of 7% to 13%,
sufficiently accurate.
• Sample size calculations help to avoid this
situation.
Cont’d…
• Precision is estimated based on the researchers trial
to estimate the true population proportion
• A precision of 1-3% is chosen if in the presence of
resource if high precision is required
• It is usually estimated between 1-5% for variability
estimate when counted by percent
3. The sampling method (clustering)
• Sample size also is related to sampling method
• If cluster sampling is used, the random error produced will
be larger to lower the power
• People within cluster are homogenous but more
heterogeneous between clusters
• This homogeneity within a cluster could produce higher
imprecision
• Therefore higher number of clusters and taking few
individuals within each cluster could lower the design effect
Cont’d…
• Design effect is the variance produced by clustering
against the variance produced by SRS
• It ranges between 1.5 to 10.
• If a correction is not done, the random error can be boosted
and a deviation from the true estimate could result
• In case of cluster sampling, the sample size is multiplied
by the design effect that is estimated
4. Size of a population
• Sample size for finite population needs finite
population correction
• Finite population is considered when the study is done
in a sub-population having no other reference
population
• The sample size is reduced as a proportion of the
population
5. Feasibility (cost)
• Sample size also is dependent on the cost of the sampling
• More people provides higher precision but high cost
• After the optimal sample size, the increment in sample size will add
only few precision
• After the optimal sample size adding the size will cost more than to
add the precision
• Therefore, the issue of cost should be controlled by precision estimate
6. Confidence level (Z value of certainty)
• The confidence level is based on extrapolating the value
of certainty in the sample to measure the population
parameter
• Population parameter is the true estimate of the
population but unknown
• It is estimated based on Central limit theorem
• Central limit theorem: The distribution of the sample
means will be nearly Normal regardless of how the
variable is distributed in the population as long as the
sample size is large enough.
SAMPLE SIZE FORMULA FOR PROPORTION
Sample size for single population proportion
• n = Z2
(1-α) p(1-p)/d2
• Sample size calculation for two population
proportion:
n = (Z(1-α)+Z(1-β))2 [p1(1-p1)+ p2(1-p2)]
(p1-p2)2
92
Erean Sh. (mph/Epi)
Variable
 What is a variable?
 What types of variables do you know?
 What is the importance of knowing types of variables?
 Identify the type of the outcome variable and also indicate the
outcome variable and possible independent variables for the title
bellow
 Exclusive breast feeding practice and associated factors among
mothers of under two years children in ‘X’ town
94
Variable
 Variable: A characteristic which takes different
values in different persons, places, or things.
 Any aspect of an individual or object that is
measured (e.g. BP) or recorded (e.g. age, sex) and
takes any value.
 There may be one variable in a study or many.
 E.g. A study of treatment outcome of TB
95
Variable
Qualitative
or categorical
Quantitative
measurement
Nominal
(not ordered)
e.g. ethnic
group
Ordinal
(ordered)
e.g. response
to treatment
Discrete
(count data)
e.g. # of
admissions
Continuous
(real-valued)
e.g. height
Measurement scales
SUMMARY
96
Depending on scales of measurement we have:
Four levels of measurement
• Nominal measures
• Ordinal measures
• Interval measures
• Ratio measures
97
98
1. Nominal scale:
• The simplest type of data, in which the values fall into
un-ordered categories or classes
• Uses names, labels or symbols to assign each
measurement.
• Examples: Blood type, sex, race, marital status
99
2. Ordinal scale:
• Assigns each measurement to one of a limited number of
categories that are ranked in terms of order.
• However, the distances between the categories are uneven
or unknown
• Although non-numerical, can be considered to have a
natural ordering
• Examples: Patient status, cancer stages, social class
100
3. Interval scale:
- Measured on a continuum
- Differences between any two numbers on a scale are of known size.
Example: Temp. in o
F on 4 consecutive days
Days: A B C D
Temp. o
F: 50 55 60 65
For these data, not only is day A with 50o F cooler than day D with 65o but is
15o cooler.
- It has no true zero point. “0” is arbitrarily chosen and doesn’t reflect the absence
of temp.
- Interval data differs from ordinal data because the differences between adjacent
scores are equal.
101
4. . Ratio scale:
- Measurement begins at a true zero point and the scale has
equal space.
- In ratio scales, zero is an absolute absence of the variable
- The data can be categorized, ranked, evenly spaced, and
has a natural zero.
- Examples: Height, weight, BP, etc.
- 40cm height is twice 20cm height.
Why is Level of Measurement Important?
• Helps to decide how to interpret the data from that
variable.
• Helps to decide what statistical analysis is appropriate on
the values that were assigned.
• If a measure is nominal, then you know that you would
never average the data values or do a t-test on the data.
Exercises:
Give the correct scales of measurement for each variable
1. Temperature (Celsius)
2. Hair colour
3. Job satisfaction index (1-5)
4. Number of heart attacks
5. Calendar year
6. Serum uric acid (mg/100ml)
7. Number of accidents in a 3 - year period
8. Number of cases of each reportable disease reported by a
health worker
9. The average weight gain of six 1-year old dogs with a special
diet supplement was 950 grams last month.
103
Dependent and independent variables
• Because in health research we often look for
associations, it is important to make a distinction
between dependent and independent variables.
• Both the dependent and independent variables
together with their operational definitions (when
necessary) should be stated.
104
Dependent and independent variables …
• The variable that is used to describe or measure the
problem under study is called the dependent
variable.
• The variables that are used to describe or measure
the factors that are assumed to influence (or cause)
the problem are called independent variables.
105
Variables …
• For example, in a study of relationship between
smoking and lung cancer, "suffering from lung
cancer" (with the values yes, no) would be the
dependent variable and "smoking" (with the
values no, less than a packet/day, 1 to 2
packets/day, more than 2 packets/day) would be the
independent variable.
106
For each of the following research questions identify the outcome
and independent variables
a. The prevalence of contraceptive use among HIV +VE women in
the reproductive age group in Y town
b. The incidence of COVID-19 infection among under five children
in Kebelle “X” in Y town
c. Is double burden malnutrition the emerging problem among under
5 children in Oromia region?
d. Factors associated with Age at first sexual initiation among youths
visiting HIV testing and counselling centres in North Shoa Zone,
Ethiopia
107
Background variables
 In almost every study involving human subjects, background
variables are usually demographic characteristics—such as, age,
sex, educational status, monthly family income, marital status and
religion will be included.
 These background variables are often related to a number of
independent variables, so that they influence the problem indirectly.
 Hence they are called background variables or background
characteristics.
 The researcher cannot manipulate
108
Operationalizing variables
• Operationalizing variables means that you make
them ‘measurable'.
• Example: In a study on VCT acceptance, you want
to determine the level of knowledge concerning
HIV in order to find out to what extent the factor
‘poor knowledge’ influences willingness to be
tested for HIV.
109
Cont’d
• The variable ‘level of knowledge’ cannot be
measured as such.
• You would need to develop a series of questions to
assess a person’s knowledge.
• The answers to these questions form an indicator of
someone’s knowledge on this issue, which can then
be categorized.
110
Cont’d …
If 10 questions were asked, you might decide that the knowledge of those
with:
0 to 3 correct answers is poor,
 4 to 6 correct answers is reasonable, and
 7 to 10 correct answers is good.
Operational definitions of variables are used in order to:
• Avoid ambiguity
• Make the variables to be more measurable
111
Data collection
Introduction
 Data collection is a crucial stage in the planning and
implementation of a study.
 data analysis becomes difficult when the data collection
has been
- superficial,
- biased or
- incomplete,
 Therefore, we should concentrate all possible efforts on
developing appropriate tools, and should test them
several times.
112
 Depending on the type of study, different data-collection
techniques may be used.
 In HSR studies we usually combine a number of
different techniques and look at problems from different
perspectives (triangulation).
113
Data collection….
The choice of methods of data collection is based on:
The resource required
Acceptability of the method
Coverage of the method
Familiarization of the procedure
Relevance
The accuracy of information they will yield
Practical considerations, such as, the need for personnel, time,
equipment and other facilities, in relation to what is available
114
Data collection….
OVERVIEW OF DATA COLLECTION TECHNIQUES
Data-collection techniques allow us to systematically
collect information about our
- objects of study (people, objects, phenomena)
- the settings in which they occur.
 In the collection of data we have to be systematic.
If data are collected haphazardly, it will be difficult to
answer our research questions in a conclusive way.
115
• Various data collection techniques can be used such as:
➢Using available information
➢Observing
➢Interviewing (face-to-face)
➢Administering questionnaire
➢Focus group discussion
➢Physiological measurement(in vitro vs in vivo)
DATA COLLECTION TECHNIQUES…
1. Using available information/documentary
sources
Locating the sources and retrieving the information is a
good starting point in any data collection effort.
117
These include:
• Health Information System Data,
• Census Data,
• Unpublished Reports
• Publications
• Clinical Records
• Personal Records,
• Death Certificates,
• Published Mortality Statistics,
• Census Publications, etc
• Key Informants
• Newspapers
Advantages:
Documents can provide ready made information
relatively easily
The best means of studying past events.
 Data collection is inexpensive
118
Disadvantages:
Problems of reliability and validity
There is a possibility that errors may occur when the
information is extracted from the records.
Since the records are maintained not for research
purposes, but for clinical, administrative or other ends,
the information required may not be recorded at all, or
only partly recorded.
119
2. Observing
 Is a technique that involves systematically selecting,
watching and recording behavior and characteristics of
living beings, objects or phenomena.
Observation of human behavior is a much-used data
collection technique.
It can be undertaken in different ways:
 Participant observation
 Non-participant observation
 Structured observation
 Unstructured observation
120
Participant observation
The observer takes part in the situation he or she
observes
 E.g., a nurse researcher observing how nurses
communicate with their patients while taking
part in patient care activities in the facility.
Non-participant observation
• The observer watches the situation, openly or concealed,
but does not participate in the situation being observed.
• Observation is commonly used in qualitative studies
• Phenomena amenable to observation in research include:
☞Activities and behaviour
☞Characteristics and conditions of individuals
☞Skill attainment and performance
☞Verbal and nonverbal communication
☞Environmental characteristics
Observation can be made by using structured or
unstructured tools or both
a) Unstructured observation:
➢Involves spontaneously observing & recording what is seen using field
diaries or field notes
Conducted in an open and free manner in a sense that there would be no
pre-determined variables or objectives.
b) Structured observation:
➢ The researcher carefully defines what is to be observed & how the
observations are to be made, recorded, & coded
Data collection is conducted using specific variables and according to a
pre-defined schedule.
cont’d…
• Involves the use of categorical system or checklist or rating
scales to guide observation & recording
• If observations are made using a calibrated scale they may be
called measurements
• Measurements often require additional tools e.g., weighing
scale to measure weight, meter tap to measure height,
thermometer to measure body temperature.
• Observations can give additional, more accurate information
on behaviour of people than interviews or questionnaires
Advantages of observation
• Give additional, more accurate information on behavior of
people than interviews or questionnaires.
• Check on the information collected through interviews
especially on sensitive topics such as alcohol or drug use, or
stigmatizing diseases.
• They can also be made on objects. For example, the presence
or absence of a latrine and its state of cleanliness may be
observed.
• They would be the major research technique.
125
They are time consuming
 They are most often used in small-scale studies.
Investigators or observers own bias, prejudices,
desires, etc
Needs more resources and skill human power during
the use of high level machines.
 Ethical issues
126
Disadvantagesof
observation
3. Interviewing
• Is a data-collection technique that involves oral questioning of
respondents, either individually or as a group.
• Based on qualitative or quantitative or both, type of data
collection can be:
 Face to face interview
 Telephone interview
 Self-reported/completed questionnaire
• Answers to questions posed during an interview can be
recorded by:
• ☞ writing them down either during the interview itself or
immediately after interview; or
• ☞ Tape-recording the responses; or
• ☞ A combination of both
127
Can stimulate and maintain the respondents interest the frank
answering of questions.
 If anxiety is aroused the interviewer can allay it.
 Can repeat questions which are not understood, and give
standardized explanations where necessary.
 An interviewer can ask “follow-up” or “probing” questions to
clarify a response.
can make observations during the interview;
128
Advantages of
Interviewing
Disadvantages Interviewing
• Questions may be misunderstood
• Time consuming
• Need to setup interviews
• Can be expensive
• Respondents bias
• Needs a set of questions
4. Administering written
questionnaires
 Is a data collection technique in which written questions
are presented that are to be answered by the respondents
in written form
It can be administered in different ways, such as by:
• Sending questionnaires by mail
• Self-administered questionnaires
• Interviewer -administered questionnaires
130
Administering written
questionnaires cont’d…
Advantages:
• Less expensive
• Permits anonymity & may result in more honest
responses
• does not require research assistants
• Eliminates bias due to phrasing questions differently
with different respondents
Disadvantages:
• Cannot be used with illiterates
• there is often a low rate of response
• questions may be misunderstood
Administering written
questionnaires cont’d…
Rating scale
• The question in self-administered questionnaire can be open-ended
or closed (with pre-categorized answers)
• Closed questions can be composed of dichotomous questions,
multiple-choice questions, rank-order questions, & rating scales
• Rating Scales is elicit response in terms of the degree of attitude,
perception, needs, or experiences
• Rating scales involve the composite psychosocial scale used to
make fine quantitative discriminations among people with
different attitudes, perceptions, needs, or experiences
Rating scale…
• These psychosocial scales are:
☞Likert scales (summated rating scales)
☞Semantic differential scales
☞Visual analogue scale
Likert Scales
• Consist of several declarative statements (items)
expressing viewpoints or opinion or attitude of subjects
• Responses are on an agree/disagree continuum (usually
ranging from 4 - 7 response options) i.e., Strongly agree,
agree, uncertain, disagree, strongly disagree
Likert scale
• Values are placed on each response, with 1 on the most
negative response & highest (4-7) value on most positive
response
• Responses to items are summed to compute a total scale
score
• Example of Likert Scales
S
D= StronglyDisagree;D
G= Disagree;UN=Uncertain;A
G= Agree;S
A= StronglyAgree
Semantic Differential Scales
• Used to measure attitudes & beliefs
• Require ratings of various concepts
• Rating scales involve bipolar adjective pairs, with 7-point
ratings
• Value of 1 denotes the most negative response & 7
denotes the most positive response
• Ratings for each dimension are summed to compute a total
score for each concept
Example of a Semantic Differential Scales
Visual analogue scale(VAS)
Differences between data collection
techniques and data collection tools
Data collection techniques Data collection tools
Using available information Checklist; data compilation
forms
Observation Eyes and other senses,
pen/paper, watch, scales,
microscope, etc..
Interviewing Interview guide, checklist,
questionnaire, tape recorder
Administering written
questionnaire
Questionnaire
Data collection instruments
• Types of questions
 Depending on how questions are asked and recorded
we can distinguish major possibilities:
1. Closed questions
2. Open-ended questions
3. Semi-opened questions
140
141
 Closed questions
 Open-ended questions
 Semi-opened questions
Types of questions
142
Closed questions
 A list of possible answers or options
 Commonly used for background variables
 Should be exhaustive & mutually exclusive
What is your marital status?
1. Single
2. Married
3. Divorced
4. Separated
5. Widowed
143
Open-ended questions
 Free to answer with fewer limits imposed by the
researcher
 Useful for exploring new areas
What is your opinion on the services provided in the
antenatal (AN) care?
_______________________________________
_____
_______________________________________
144
Semi-opened questions
What is your occupation?
(1) Dependent
(2) Manual labourer
(3) Government employee
(4) Private employee
(5) Owned business
(6) Others (please specify) _____________
Open-ended questions
1. (allowing for completely open as well as partially
categorized answers).
 It permit free responses which should be recorded in the
respondents' own words.
 Such questions are useful for obtaining in-depth information on:
• facts with which the researcher is not very familiar,
• opinions, attitudes and suggestions of informants
Examples;
1. At what age the child started supplementary food?
2. 'What is your opinion on the services provided in the ANC?'
(Explain why.)
3. 'What do you think are the reasons some adolescents in this area
start using drugs? 145
Advantage of open-ended
questions…
Allow you to probe more deeply into issues of interest
being raised.
Information provided in the respondents' own words
might be useful
Providing valuable new insights on the problem.
Permit unlimited number of answers
146
Risks of completely open-ended
questions…
A big risk is incomplete recording of all relevant issues
covered in the discussion.
Analysis is time-consuming and requires experience;
otherwise important data may be lost.
Skilled interviewers are needed to get the discussion
started and focused on relevant issues and to record all
information collected.
147
2. Closed questions:
 Have a list of possible options or answers
from which the respondents must choose.
Example: closed ended question
What is the current breastfeeding status of mother ?
A. Exclusive breastfeeding
B. Partial breastfeeding
C. Not breastfeeding
148
Advantages of closed ended
questions
 It saves time
 Comparing responses of different groups, or of the same
group over time, becomes easier.
 Answers easier to analyze on computer
 Response choices make question clearer
Risks of closed ended questions:
• In case of illiterate respondents, bias will be introduce
• Many choices can be confusing
• Can't tell if respondent misinterpreted the question
• Fine distinctions may be lost
149
Questionnaire Design
• Designing a questionnaire always takes several drafts.
• In the first draft we should concentrate on the content.
• In the second, we should look critically at the formulation and
sequencing of the questions.
• Then we should scrutinize the format of the questionnaire.
• Finally, we should do a test-run to check whether the
questionnaire gives us the information we required & whether both
the respondents & we feel at ease with it.
150
Steps in designing questionnaire
Step 1: Content
Step 2: Formulating questions
Step 3: Sequencing the questions
Step 4: Formatting the questionnaire
Step 5: Translation
Step 6: pre-test
151
 Take your objectives and variables as a starting point
 Decide what questions will be needed to measure or to
define your variables and reach your objectives.
152
Step 1: Content:
Step 2: Formulating
questions:
 Formulate one or more questions that will provide the
information needed for each variable.
 Check whether each question measures one thing at a time.
 Take care that questions are specific and precise enough
that different respondents do not interpret them differently.
 Avoid words with double or vaguely defined meanings or
that are emotionally laden e.g., omit concepts such as dirty
(clinics), lazy (patients), or unhealthy (foods)
 Ask sensitive questions in a socially acceptable way.
 Avoid leading questions.
A question is leading if it suggests a certain answer.
• Design your interview schedule or questionnaire to be
'informant friendly’
• The sequence of questions must be logical for the
respondent & allow as mush as possible for a “natural”
discussion.
• Organize the questions in a logical order & use simple,
everyday language
• Pose more sensitive questions as late as possible in the
interview.
154
Step 3: Sequencing the
questions:
Step 4: Formatting the
questionnaire
• When you finalize your questionnaire, be sure that:
• An introductory page explaining the purpose of the
study & confidentiality issue is attached to the
questionnaire
• Sufficient space is provided for answers to open-ended
questions
• Page layout & margins are properly formatted
Step 5:Translation
If interview will be conducted in one or more local
languages, the questionnaire has to be translated to
standardize the way questions will be asked.
After having it translated you should have it
retranslated into the original language.
You can then compare the two versions for
differences and make a decision concerning the final
phrasing of difficult concepts.
156
Step 6: Pretest
A pretest usually refers to a small-scale trial of a
particular research component.
A pretest serves as a trial run that allows us to
identify potential problems in the proposed study.
As a result, a good deal of time, effort, and money
can be saved in the long run
Pre testing is :
 Simpler
 Less time consuming and less Costly
157
Pre test…
• A pretest determine whether the instrument is clearly
worded, free from major biases, and useful in
generating desired information
• When do we carry out a pre-test?
• Pre-testing the data collection 1-2 weeks before starting
the fieldwork so that you have time to make revisions.
Pre test…
Components to be assessed during the pre-test?
The reactions of respondents to:
• The research procedures and
• Questions related to sensitive issues.
 The appropriateness of format and wording of
questionnaires and the accuracy of the translations.
 The time needed to carry out interviews, observations or
measurements.
Pretesting and Pilot study
• Pretest – usually refers to a small-scale trial of particular
research components
• Pilot study – is the process of carrying out a preliminary
study, going through the entire research procedure with a
small sample
• Whatever the type of measurement, its performance
can be described in several ways:
Validity
Reliability
Range
Variation
Responsiveness (An instrument’s ability to detect
change over time)
Measurement Properties
Measurement
• Involves rules for assigning numeric values to
qualities of objects to designate the quantity of the
attribute
• Advantage of measurement
–It removes guesswork in gathering information
Measurement error
Systematic Error - Also called "constant error" or "bias"
• Design or instrument Error which affect the data in a consistent
way.
–Either pull all the scores up or all of the scores down.
Therefore, systematic errors affect the group mean score
• a systematic upward or downward distortion of the level of
measurement
Random Error/noise
• Transient aspects of the measurement situation that cause
variable errors.
• These errors cause greater variability within the data set, but do
not make the mean score higher or lower.
• Is the consistency of measurement results across
persons, occasions, locations and instruments
• Consistency of responses to a question (if you get on
your scale and it tells you that you weigh 110 lbs
one minute, then you step on it again and it tells you
that you weight 115, then it is not very reliable).
Reliability
Reliability…
• Is the degree to which the same results are obtained when the
measurement is repeated.
• Repeated measurements of a stable phenomenon by different people
and instruments at different times and places get similar results.
• Reproducibility and precision are other words for this property
• Relates to the consistency of a measure, or the degree to which an
instrument measures the same way each time it is used under the
same condition with the same subjects
• It is unlikely that the same results will be given every time due to
differences at the time changes in the population and the sample
• There are three methods of testing the reliability of research
instruments:
1.Tests for the stability of the instruments (how stable it is over time)
2.Tests for equivalence (consistency of the results by different
investigators)
3.Internal consistency (the measurement of the concept is consistent
in all parts of the test).
Reliability...
Reliability…
• Stability: the same score is obtained when the instrument
is used with the same people on a separate occasion
–Test-Retest Reliability(stability): Administer the same
questionnaire at a later time
–Reliability coefficient
• Equivalence: the consistency of the instrument by
different observers/raters
–Interrater reliablity
• Internal consistency: the extent that all its subparts
measure the same characteristics
–Split-Half Reliability
–Cornbach’s alpha/cofficient alpha
Tests of Stability
• A stable research instrument is one that can be repeated on the same
individual more than once and achieve the same results.
• In observational methods, when the characteristic being observed is
expected to change over time, a test of stability cannot be used.
• Repeated observations and test/retest procedures are used to test
the stability of an instrument.
• Pearson Correlation Coefficient is used to calculate, which takes on
a value between -1 and 1
Reliability...
Tests of Equivalence
• Tests of equivalence attempt to determine if the same results
can be obtained using different observers at the same time
or if similar tests given at the same time yield the same
results.
• The equivalence aspect considers how much error may get
introduced by different investigators or different samples of the
items being studied
Reliability...
Test of Internal consistency
• Internal consistency refers to the extent to which all parts of the
measurement technique are measuring the same concept
• The most common one is called Cronbatch’s alpha
o Cronbatch’s alpha can be calculated SPSS, SAS, STATA
o This Cronbatch's alpha gives the lower bound for reliability.
o If it is high for the whole scale (>=0.7), then you know the scale is reliable
Reliability...
• Variability of observer ratings can be distinguished by
observer disagreement
• indicated by how observers classify individual
subjects into the same category on the measurement
scale
• Kappa coefficient is one of the most common
approaches.
• Ranges from -1 to +1
Reliability...
 Cohen's kappa measures the agreement between two raters who
each classify N items into C mutually exclusive categories.
 The equation for κ is:
K = Po - Pe = Actual agreement beyond chance
1 - Pe Potential agreement beyond chance
Po = the total proportion of observations on which there is agreement
Pe = the proportion of agreement expected by chance alone.
Reliability...
Agreement matrix for kappa statistic
(inter-rater agreement, 2 observers, dichotomous data)
OBSERVER B
OBSERVER A
Yes No TOTALS
Yes a b f1
No c d f2
TOTALS n1 n2 N
Agreement matrix for kappa statistic
(2 observers, dichotomous data)
OBSERVER B
OBSERVER A
Yes No TOTALS
Yes 69 15 84
No 18 48 66
TOTALS 87 63 150
K (Cont’d)
• Observed agreement (Po) = (a+d)/N= (69 + 48)/150 = 0.78 or
78%.
• Agreement expected at chance (Pe) Calculated by the product
of the marginal totals
(Pe) =[ (f1*n1)/N+(f2*n2)/N]*1/N
87 x 84/150 = 48.75
63 x 66/150 = 27.72
Then divide sum [76.47] by 150 to get Pe = 0.51 or 51%.
K (Cont’d)
• K = Po - Pe = 0.78 - 0.51 = 0.27 = 0.55 or 55%
1 - Pe 1 - 0.51 0.47
Kappa varies from -1 to +1, with a value of zero denoting
agreement no better than chance (negative values denotes
agreement worse than chance!)
Reliability...
Interpretation of Kappa
• Poor agreement = Less than 0.20
• Fair agreement = 0.20 to 0.40
• Moderate agreement = 0.40 to 0.60
• Good agreement = 0.60 to 0.80
• Very good agreement = 0.80 to 1.00
Validity
• Validity (accuracy) is the degree to which the results of a
measurement correspond to the true state of the
phenomenon being measured.
• For clinical observations that can be measured by physical
means, the observed measurement is compared with some
accepted standard.
• Thus, it is relatively easy to establish validity.
• Some other clinical measurements such as pain,nausea, dyspnea,
depression, and fear cannot be verified physically.
• In patient care, information about these phenomena is usually
obtained informally by “taking a history.”
• More formal and standardized approaches, used in research, are
structured interviews and questionnaires.
Validity…
• A valid measurement thus requires both a valid method
(instrument for measurement) and a valid observer
(measurer).
• Individual questions (items) are designed to measure
specific phenomena (e.g., symptoms, feelings, attitudes,
knowledge, beliefs) called constructs
• Three general strategies are used to establish the validity of
measurements that cannot be directly verified physically.
Validity…
• Is the extent to which a particular method of measurement includes all
of the dimensions of the construct one intends to measure and nothing
more.
• For example, a scale for measuring pain would have content validity if
it included questions about aching, throbbing, pressure, burning, and
stinging, but not about itching, nausea, and tingling.
• It looks at whether the instrument adequately covers all the content
that it should with respect to the variable.
• In other words, does the instrument cover the entire domain related to
the variable, or construct it was designed to measure?
A. Content Validity
B. Criterion Validity
• Also called "concurrent validity" has to do with the
correlation between:
• measurement items on the one hand and known and
accepted standard measures or criteria on the other.
• It is any other instrument that measures the same variable
• Criterion validity is measured in three ways:
1. Convergent validity—shows that an instrument is
highly correlated with instruments measuring similar
variables.
2. Divergent validity—shows that an instrument is poorly correlated to instruments that
measure different variables.
• For example, there should be a low correlation between an instrument that measures
motivation and one that measures self-efficacy.
3. Predictive validity—means that the instrument should have high correlations with
future criterions.
• The predictive validity is like concurrent validity except that there is a time elapse
between the criterion and test measures
• For example, a score of high self-efficacy related to performing a task should predict
the likelihood a participant completing the task.
Criterion Validity…
C. Construct validity
• It refers to whether you can draw inferences about test scores related
to the concept being studied.
• Construct validation is the accumulation of evidence to support the
interpretation of what a measure reflects
• For example, if a person has a high score on a survey that measures
anxiety, does this person truly have a high degree of anxiety?
• There are three types of evidence that can be used to demonstrate a
research instrument has construct validity:
• Homogeneity—meaning that the instrument measures one construct.
• Convergence—this occurs when the instrument measures concepts
similar to that of other instruments.
• However, if there are no similar instruments available this will not be
possible to do.
• Theory evidence—this is evident when behaviour is similar to
theoretical propositions of the construct measured in the instrument.
• For example, when an instrument measures anxiety, one would expect
to see that participants who score high on the instrument for anxiety
also demonstrate symptoms of anxiety in their day-to-day lives
Construct validity…
Summary of validity
• Validity is defined as the extent to which a concept is
accurately measured in a quantitative study
• Content validity: The extent to which a research
instrument accurately measures all aspects of a construct
• Construct validity: The extent to which a research
instrument (or tool) measures the intended construct
• Criterion validity: The extent to which a research
instrument is related to other instrument
• To assess the accuracy of any particular measuring
'instrument', we should distinguish between the reliability
of the data collected and their validity.
• Reliability is essentially the extent of the agreement or
consistency between repeated measurements
• Validity is the extent to which a method of measurement
provides a true assessment of that which it purports to
measure
Reliability versus Validity
Reliable
Valid
Not Reliable
Not Valid
Not Valid
Reliability Validity
Stages in the Data Collection Process
Three main stages can be distinguished:
Stage 1: Permission to proceed
Stage 2: Data collection
Stage 3: Data handling
190
DATA PROCESSING, ANALYZING
& INTERPRETATION
 Data are numbers which can be measured or can be
obtained by counting.
 Data are sources of facts or information from which
conclusion can be drawn after they are statistically
treated in some way.
 They are the raw material for statistics.
• Data processing and analysis should start in the field,
with checking for completeness of the data and
performing quality control checks, while sorting the data
by instrument used and by group of informants.
• Data of small samples may even be processed and
analyzed as soon as it is collected.
Data processing, analyzing & interpretation
 Data processing involves:
• Data entry
• Data coding
• Data categorizing
• Data cleaning
Analyzing & interpretation
 WHAT IS DATA PROCESSING?
• Data processing refers to:
• Data entry onto a computer
• Data coding
• Data categorizing
• Data checks and correction
• The aim of this process is to produce a relatively “clean”
data set which may be imported into a statistical package.
• When to start?
Data pro…
Why process data?
 It helps the researchers to assure that :-
• All the information one needs has been collected, and in a
standardized way;
• She/he has not collected unnecessary data which will never be
analyzed.
• Provide better insight into the feasibility of the analysis to be
performed as well as the resources that are required.
• It assures the appropriateness of the data collection tools that
he/she needs.
Data can be processed:-
• Manually, using data master sheets, manual sorting,
or tally counts.
• Computer, using existing software for data analysis
(eg:- spss, epi…).
 Computer compilation consists of the following
steps:
1. Choosing an appropriate computer program
2. Data entry
3. Verification or validation of the data
4. Programming (if necessary)
5. Computer outputs/prints
I. Data entry
• Data entry concerns the transfer of data from a
questionnaire to a computer file.
• It is a process of entering raw data into a computer
• It is a process where raw data could be manipulated
and changed
• Data is coded and entered into a computer
Data entry….
• We can use any software
• EPI info
• EPI 6 (dose format)
• EPI-data (Dutch
format)
• SPSS for windows
• Excel (office)
• Access (office)
• etc
Selection of data entry software
• There are different computer software for data entry
• A software is selected based on its
• strength to enter data through resisting to change
• Its lower cost
• Presence of program looking for consistency
• Non-visibility of the whole data to the entering Clark
• Ability to enter through double entry and its validity
DATA ENTRY...
 Who does data entry?
• Data often entered to a computer by a clerk who may
not be familiar with how the research was designed &
how the data was collected.
• To facilitate data entry and minimizing errors, the data
entry person should not make guesses, calculations,
coding etc.
• Data entry is quick and easy for the data entry person if
he/she simply type the information which is seen on the
answer sheet (i.e. direct data entry).
DATA ENTRY...
If the questionnaire is adequately designed, direct data entry is
possible if:
• Answers are put in separate column or separate answer sheet
• Documents are edited before data entry
• Closed ended questions are pre-coded etc.
When working with computers, note to:
• Saving your work frequently
• Keep back-ups (more than one copy)
• Share time with other users etc.
2. DATA CODING
 For computers to work their magic they must be able to read your
data. In general computers are at best with numbers.
 Alphabetic codes and open ended responses must be translated to
numbers through the process called “coding”.
Coding:
 is assigning a separate (non-overlapping) numerical code
for separate answers and missing values
 E.g. instead of using “Male” and “Female” for the
variable sex, it can be indicated as:
1= Male, 2= Female
Coding may pre, post and recoding
• Pre-coding ;-when questionnaire being written.
• Post -coding:- After respondents have answered
questions.
• for open ended questions for which
response categories can't anticipate.
• Recoding:-changes earlier coding to facilitate
meaningful analysis.
Coding missing values:
• Missing values occur when measurements were not
taken, or respondents did not answers etc.
• In general, missing values should not be entered as a
“blank” because some statistical packages interpret
blanks as zeros
• Ideally, a code should be chosen to denote a missing
value (e.g. code “9” or “99” or “999” is often used
missing values).
DATA CODING...
Who does the coding?
• The principal investigator should coordinate the coding
process and ideally all the coding should be done by one
person.
• Certainly, no more than three different people should be
involved in this process.
• If the work is done by more than one person, they should
have codebook
• Code book: It is essentially a list of each variables entered
in the column and the codes associated with the value of the
variables
Code book provides
oA guide used in the coding process
oLocating the variables
oAssignments of the values of the variable
oList of the code assignments of the values of
the variable
oDecoding back to original variables when
reporting.
DATA CODING...
Coding conventions
• Common responses should have the same code in each
question, as this minimizes mistakes by coders.
• For example:
• Yes (or positive response) code - Y or 1
• No (or negative response) code - N or 2
• Don’t know code - D or 8
• No response/unknown code - U or 9
3. Data categorizing
• Decisions have to be made concerning how to
categorize responses.
• For categorical variables that are investigated
through closed questions or observation, the categories
have been decided upon beforehand.
• In interviews the answers to open-ended questions (for
example, ‘Why do you visit the health centre?’) can
be pre-categorised to a certain extent, depending on
the knowledge of possible answers that may be given.
• However, there should always be a category called ‘Others,
specify . . .’, which can only be categorised afterwards.
• For numerical variables, the data are often better collected
without any pre-categorisation.
• If you do not exactly know the range and the dispersion of
the different values of these variables when you collect your
sample.
Example:
Home-clinic distance for out-patients,
 income
Age
Weight
4. Data cleaning
• Once data is entered, the second step is data cleaning
• Data cleaning is a process of similarizing data entered in
a computer (soft copy) with that of the hard copy on a
paper
• The aim of this process is to produce a clean set of data
for statistical analysis.
• Checking for errors, impossible or implausible values
and inconsistencies that may be due to coding or data
entry process.
• No matter how carefully the data have been entered, some
errors are inevitable.
DATA CLEANING…
 Errors can result from:
• Incorrect reading
• Incorrect reporting
• Incorrect filling
• Incorrect sensing
• Incorrect coding
• Incorrect typing
• Incorrect etc.
• Data cleaning goes at three occasions,
1. During template formation
2. During data entry
3. After data is entered
(the more we use combination of the above cleaning
process, the more valid will be our data)
DATA CLEANING…
I. Cleaning during template formation
• It is programming of data during template formation
• Program is formed
• by limiting values that enter within a variable
• By looking for consistency of values
• By providing good skipping pattern
• By controlling for must enter
• By making the computer to calculate and see for
consistency, etc
II. Cleaning during data entry
a. Using two computers
• It is when we use two data clerks with two computers
• Data entered by the two computers are validated for
similarity
• When there is difference, a correction measure (based
on the hard copy) is taken
Cont…
data
data
Validated
b. Double entry using a single computer
• It is also possible to do double entry using the same
computer
• In EPI data version 3.1
• When there is change in the second entry, pop sound is
heard, and a corrective measure can undertake
• Counter checking entered data by principal investigator
is also another method
Other cleaning is by
• Trying to counter check 5 to 10 % of daily entering data
is useful
III. Cleaning after data entry is completed
• It is by making
• Simple frequency,
• Tabulating variables for consistency, and
• Sorting (in SPSS)
• Out layers and missing values are usually evaluated
(against hard copy)
• Giving serial number for the hard and soft copy makes
things simple
Analysis of epidemiological study
• Quantitative data analysis is making sense of the
numbers to permit meaningful interpretation
It involves:
1. organizing the data
2. doing the calculations
3. interpreting the information
• lessons learned
4. explaining limitations
Analysis of epidemiological study…
Prerequisites for analysis
1. More acquainted to the objectives of study
2. Knowledge of type of variables (dependent/
independent)
3. Knowledge of measurement of variables
4. Knowledge of type of analysis needed for each
objectives (and designs)
5. Knowledge of statistics to be done
6. Selection of statistical software for analysis
Aware of study objectives
• A research is made principally to answer study
questions
• Our:
• Results should answer the objectives (study questions)
• Discussion should interpret what it mean by the results
answering the objectives
• Conclusion should be based on the answer to the objectives
• Recommendation also should be based on finding but not on
wish
Knowledge of types of analysis and study design
• Each study designs has a distinct type of analysis
• For descriptive design analysis may be based on data
summary (point estimate), and parametric measurement
(confidence interval measure)
• For analytic studies, analysis is based on comparison
Components of Data Analysis
Data processing
• Data entry
• Coding
• Cleaning
• Descriptive /exploratory
• Frequencies,
• Tables and graphs
• Cross tabulations (chi-squares, spearman’s correlation…)
• Measures of central tendency and variations
• Proportions/percentages
• Analytic /inferential
• Estimation
• Confidence intervals (P-value, OR,…)
• Hypothesis testing
• Statistical models
Tadesse A., 2013
Statistical Inference
• Depending on different factors, there are a number of statistical
models which will be appropriate for the data we have in hand
• These are:
• Objective of the study
• Study Design
• Nature of the variable
• Distribution of the variable
• The nature of the data
• Sample size
• The number of group we want to compare
225
Different t-tests
• If we want to compare two independent groups whether
there is significant difference or not:
• Independent sample t-test
• If our aim is to compare two dependent groups
(measurement before treatment and after treatment, two
measurements taken from each individuals in a group, …)
• Paired sample t-test
226
Regressions
• Linear regression
• If the response variable (y) is continuous
Simple linear regression
y = 0 + 1 x + 
Multiple linear regression
227
ιj
ε
ι
ι
β
...
2
2
β
1
1
β
α 




 x
x
x
y
Regression…
• When the response variable is categorical
• Logistic regression
Binary logistic regression
Bivariate logistic regression
Multiple logistic regression
• Analysis of variance (ANOVA)
• Survival analysis
228
i
i
2
2
1
1 x
β
...
x
β
x
β
α
P
-
1
P
ln 









βx
α
)
(
log
P
-
1
P
ln 








p
it
Thank You
!!

More Related Content

Similar to Study design & instrument

study design1.pdf
study design1.pdfstudy design1.pdf
study design1.pdf
Josephmwanika
 
study designs.ppt
study designs.pptstudy designs.ppt
study designs.ppt
AqilahHisham5
 
Study design
Study designStudy design
Research Design for health care students
Research Design for health care studentsResearch Design for health care students
Research Design for health care students
Charu Parthe
 
Rerearch design
Rerearch designRerearch design
Rerearch design
Baydaa Hameed
 
Fundamentals of clinical research and experimental design, Prof. Usama M.Fouda
Fundamentals of clinical research and experimental design, Prof. Usama M.Fouda Fundamentals of clinical research and experimental design, Prof. Usama M.Fouda
Fundamentals of clinical research and experimental design, Prof. Usama M.Fouda
umfrfouda
 
Reserch methodology
Reserch methodologyReserch methodology
Reserch methodology
Dr. Ghazanfar Gul
 
Chapter seven. epidemiologic study designs
Chapter seven. epidemiologic study designsChapter seven. epidemiologic study designs
Chapter seven. epidemiologic study designs
Omar Osman Eid
 
Study designs & amp; trials presentation1 2
Study designs & amp; trials presentation1 2Study designs & amp; trials presentation1 2
Study designs & amp; trials presentation1 2
Praveen Ganji
 
Research Methods 2 for Midwifery students .pptx
Research Methods 2 for Midwifery students .pptxResearch Methods 2 for Midwifery students .pptx
Research Methods 2 for Midwifery students .pptx
Endex Tam
 
Experimental Studies
Experimental StudiesExperimental Studies
Experimental Studies
Abhijit Das
 
6-study designs.pptx
6-study designs.pptx6-study designs.pptx
6-study designs.pptx
IbrahimKhlil1
 
GROUP 20.pptx
GROUP 20.pptxGROUP 20.pptx
GROUP 20.pptx
PatrickJames94
 
4 Epidemiological Study Designs 1.pdf
4 Epidemiological Study Designs 1.pdf4 Epidemiological Study Designs 1.pdf
4 Epidemiological Study Designs 1.pdf
mergawekwaya
 
Epidemiological statistics I
Epidemiological statistics IEpidemiological statistics I
Epidemiological statistics I
Subramani Parasuraman
 
Week 02, Stux f gfgzgzch nxfgnfxndy Designs.pptx
Week 02, Stux f gfgzgzch nxfgnfxndy Designs.pptxWeek 02, Stux f gfgzgzch nxfgnfxndy Designs.pptx
Week 02, Stux f gfgzgzch nxfgnfxndy Designs.pptx
saeedapkli
 
1. unit 3 part I- intro with (a) Observational studies – descriptive and anal...
1. unit 3 part I- intro with (a) Observational studies – descriptive and anal...1. unit 3 part I- intro with (a) Observational studies – descriptive and anal...
1. unit 3 part I- intro with (a) Observational studies – descriptive and anal...
Ashesh1986
 
Observational descriptive study: case report, case series & ecological study
Observational descriptive study: case report, case series & ecological studyObservational descriptive study: case report, case series & ecological study
Observational descriptive study: case report, case series & ecological study
Prabesh Ghimire
 
Epidemiological study designs
Epidemiological study designsEpidemiological study designs
Epidemiological study designs
jarati
 
Experimental epidemiology
Experimental epidemiology Experimental epidemiology
Experimental epidemiology
Jagan Kumar Ojha
 

Similar to Study design & instrument (20)

study design1.pdf
study design1.pdfstudy design1.pdf
study design1.pdf
 
study designs.ppt
study designs.pptstudy designs.ppt
study designs.ppt
 
Study design
Study designStudy design
Study design
 
Research Design for health care students
Research Design for health care studentsResearch Design for health care students
Research Design for health care students
 
Rerearch design
Rerearch designRerearch design
Rerearch design
 
Fundamentals of clinical research and experimental design, Prof. Usama M.Fouda
Fundamentals of clinical research and experimental design, Prof. Usama M.Fouda Fundamentals of clinical research and experimental design, Prof. Usama M.Fouda
Fundamentals of clinical research and experimental design, Prof. Usama M.Fouda
 
Reserch methodology
Reserch methodologyReserch methodology
Reserch methodology
 
Chapter seven. epidemiologic study designs
Chapter seven. epidemiologic study designsChapter seven. epidemiologic study designs
Chapter seven. epidemiologic study designs
 
Study designs & amp; trials presentation1 2
Study designs & amp; trials presentation1 2Study designs & amp; trials presentation1 2
Study designs & amp; trials presentation1 2
 
Research Methods 2 for Midwifery students .pptx
Research Methods 2 for Midwifery students .pptxResearch Methods 2 for Midwifery students .pptx
Research Methods 2 for Midwifery students .pptx
 
Experimental Studies
Experimental StudiesExperimental Studies
Experimental Studies
 
6-study designs.pptx
6-study designs.pptx6-study designs.pptx
6-study designs.pptx
 
GROUP 20.pptx
GROUP 20.pptxGROUP 20.pptx
GROUP 20.pptx
 
4 Epidemiological Study Designs 1.pdf
4 Epidemiological Study Designs 1.pdf4 Epidemiological Study Designs 1.pdf
4 Epidemiological Study Designs 1.pdf
 
Epidemiological statistics I
Epidemiological statistics IEpidemiological statistics I
Epidemiological statistics I
 
Week 02, Stux f gfgzgzch nxfgnfxndy Designs.pptx
Week 02, Stux f gfgzgzch nxfgnfxndy Designs.pptxWeek 02, Stux f gfgzgzch nxfgnfxndy Designs.pptx
Week 02, Stux f gfgzgzch nxfgnfxndy Designs.pptx
 
1. unit 3 part I- intro with (a) Observational studies – descriptive and anal...
1. unit 3 part I- intro with (a) Observational studies – descriptive and anal...1. unit 3 part I- intro with (a) Observational studies – descriptive and anal...
1. unit 3 part I- intro with (a) Observational studies – descriptive and anal...
 
Observational descriptive study: case report, case series & ecological study
Observational descriptive study: case report, case series & ecological studyObservational descriptive study: case report, case series & ecological study
Observational descriptive study: case report, case series & ecological study
 
Epidemiological study designs
Epidemiological study designsEpidemiological study designs
Epidemiological study designs
 
Experimental epidemiology
Experimental epidemiology Experimental epidemiology
Experimental epidemiology
 

More from KhadiraMohammed

management .pptx
management .pptxmanagement .pptx
management .pptx
KhadiraMohammed
 
Seizure Disorders.pptx
Seizure Disorders.pptxSeizure Disorders.pptx
Seizure Disorders.pptx
KhadiraMohammed
 
Eye presentation.pptx
Eye presentation.pptxEye presentation.pptx
Eye presentation.pptx
KhadiraMohammed
 
seminar.pptx
seminar.pptxseminar.pptx
seminar.pptx
KhadiraMohammed
 
Kedir (Encephalitis).pptx
Kedir (Encephalitis).pptxKedir (Encephalitis).pptx
Kedir (Encephalitis).pptx
KhadiraMohammed
 
2023 MSS.ppt
2023 MSS.ppt2023 MSS.ppt
2023 MSS.ppt
KhadiraMohammed
 
Esophageal deases .pptx
Esophageal deases .pptxEsophageal deases .pptx
Esophageal deases .pptx
KhadiraMohammed
 
TEN.pptx
TEN.pptxTEN.pptx
TEN.pptx
KhadiraMohammed
 
Final superior vena cava syndrome .pptx
Final superior vena cava syndrome .pptxFinal superior vena cava syndrome .pptx
Final superior vena cava syndrome .pptx
KhadiraMohammed
 
Assessment of Breast and Axiila .pptx
Assessment of Breast and Axiila .pptxAssessment of Breast and Axiila .pptx
Assessment of Breast and Axiila .pptx
KhadiraMohammed
 
Final superior vena cava syndrome .pptx
Final superior vena cava syndrome .pptxFinal superior vena cava syndrome .pptx
Final superior vena cava syndrome .pptx
KhadiraMohammed
 
dokumen.tips_ocular-emergencies-ocular-emergencies-medical-conjunctivitis-iri...
dokumen.tips_ocular-emergencies-ocular-emergencies-medical-conjunctivitis-iri...dokumen.tips_ocular-emergencies-ocular-emergencies-medical-conjunctivitis-iri...
dokumen.tips_ocular-emergencies-ocular-emergencies-medical-conjunctivitis-iri...
KhadiraMohammed
 
F3.ppt
F3.pptF3.ppt
F1.ppt
F1.pptF1.ppt
Kedir Breast and Axiila .pptx
Kedir Breast and Axiila .pptxKedir Breast and Axiila .pptx
Kedir Breast and Axiila .pptx
KhadiraMohammed
 
Final Group assignment Electrolytes Tests.pptx
Final Group assignment Electrolytes Tests.pptxFinal Group assignment Electrolytes Tests.pptx
Final Group assignment Electrolytes Tests.pptx
KhadiraMohammed
 
group one.pptx
group one.pptxgroup one.pptx
group one.pptx
KhadiraMohammed
 
Group 2 presentation (2).pptx
Group 2 presentation (2).pptxGroup 2 presentation (2).pptx
Group 2 presentation (2).pptx
KhadiraMohammed
 
Nursing carriculum group 4 ass. (2).pptx
Nursing carriculum group 4 ass. (2).pptxNursing carriculum group 4 ass. (2).pptx
Nursing carriculum group 4 ass. (2).pptx
KhadiraMohammed
 

More from KhadiraMohammed (20)

management .pptx
management .pptxmanagement .pptx
management .pptx
 
Seizure Disorders.pptx
Seizure Disorders.pptxSeizure Disorders.pptx
Seizure Disorders.pptx
 
Eye presentation.pptx
Eye presentation.pptxEye presentation.pptx
Eye presentation.pptx
 
seminar.pptx
seminar.pptxseminar.pptx
seminar.pptx
 
Kedir (Encephalitis).pptx
Kedir (Encephalitis).pptxKedir (Encephalitis).pptx
Kedir (Encephalitis).pptx
 
2023 MSS.ppt
2023 MSS.ppt2023 MSS.ppt
2023 MSS.ppt
 
Esophageal deases .pptx
Esophageal deases .pptxEsophageal deases .pptx
Esophageal deases .pptx
 
5164729.ppt
5164729.ppt5164729.ppt
5164729.ppt
 
TEN.pptx
TEN.pptxTEN.pptx
TEN.pptx
 
Final superior vena cava syndrome .pptx
Final superior vena cava syndrome .pptxFinal superior vena cava syndrome .pptx
Final superior vena cava syndrome .pptx
 
Assessment of Breast and Axiila .pptx
Assessment of Breast and Axiila .pptxAssessment of Breast and Axiila .pptx
Assessment of Breast and Axiila .pptx
 
Final superior vena cava syndrome .pptx
Final superior vena cava syndrome .pptxFinal superior vena cava syndrome .pptx
Final superior vena cava syndrome .pptx
 
dokumen.tips_ocular-emergencies-ocular-emergencies-medical-conjunctivitis-iri...
dokumen.tips_ocular-emergencies-ocular-emergencies-medical-conjunctivitis-iri...dokumen.tips_ocular-emergencies-ocular-emergencies-medical-conjunctivitis-iri...
dokumen.tips_ocular-emergencies-ocular-emergencies-medical-conjunctivitis-iri...
 
F3.ppt
F3.pptF3.ppt
F3.ppt
 
F1.ppt
F1.pptF1.ppt
F1.ppt
 
Kedir Breast and Axiila .pptx
Kedir Breast and Axiila .pptxKedir Breast and Axiila .pptx
Kedir Breast and Axiila .pptx
 
Final Group assignment Electrolytes Tests.pptx
Final Group assignment Electrolytes Tests.pptxFinal Group assignment Electrolytes Tests.pptx
Final Group assignment Electrolytes Tests.pptx
 
group one.pptx
group one.pptxgroup one.pptx
group one.pptx
 
Group 2 presentation (2).pptx
Group 2 presentation (2).pptxGroup 2 presentation (2).pptx
Group 2 presentation (2).pptx
 
Nursing carriculum group 4 ass. (2).pptx
Nursing carriculum group 4 ass. (2).pptxNursing carriculum group 4 ass. (2).pptx
Nursing carriculum group 4 ass. (2).pptx
 

Recently uploaded

Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 

Recently uploaded (20)

Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 

Study design & instrument

  • 2. Objectives At the end of this section, the students will be able to: • Differentiate study design that is relevant for specific research question • Differentiate appropriate variable for specific study • Explain population, sample size calculation and sampling techniques method for quantitative studies
  • 3. Methodology Possible sub-sections of the methodology: • The study design • Setting/Area • Population of the study • Sample size and sampling strategies • Variables 3
  • 4. Study area • Location, Physical features (climate, altitude...), Population size and composition • Infrastructures -education, health, communication… • Economy Study period Time required to conduct the study 4
  • 5. Choosing study designs A study design is a specific plan or protocol for conducting the study, which allows the investigator to translate the conceptual hypothesis into an operational one. • A study design is the process that guides researchers on how to collect, analyze and interpret observations. • It is a logical model that guides the investigator in the various stages of the research. 5
  • 6. • Does it adequately test the hypotheses? • Hypotheses determine participants, variables measured & data analysis methods • Are results generalizable? • Replicate to other samples and other contexts • Random selection of participants • Does it identify and control extraneous factors? • Eliminate alternative explanations for results to increase confidence in cause-effect conclusion (internal validity) • Control depends on type of design How to Choose a Research Design 6
  • 7. How to Choose a Research Design… • Can the hypothesis be rejected or retained via statistical means? (statistical conclusion validity) • Need reliable measures • Need large enough sample to detect true effect • Is the design efficient in using available resources? • Optimal balance between research design, time, resources and researcher expertise 7
  • 8. Eg. If the problems calls for: a) the identification of factors that influence an outcome b) the utility of an intervention, or c) knowing the prevalence of diseases Then a quantitative approach is best • If the research problem needs a concept or phenomenon needs to be understood because little research has been done on it, then qualitative approach is preferred • A mixed methods design is useful when either the quantitative or qualitative approach by itself is inadequate to best understand a research problem How to Choose problems call for specific approaches… 8
  • 9. 9
  • 10. 10
  • 11. Features of Descriptive Study • Studies occurrence of diseases with respect to time, place and person. • Useful for health managers to allocate resource and to plan effective prevention programs. • Useful to generate epidemiological hypothesis in the search for disease risk factors. 11
  • 12. Features of Descriptive Study… • Not aimed specifically to test a hypothesis • No attempt to gather data on controls • Inexpensive and less time-consuming: can use information collected routinely. • Most common type of epidemiological study in the medical literature. 12
  • 13. Case Report and Case Series • Documentation of unusual medical occurrence with detailed description • First clue in the identification of new disease or adverse effect of exposure • The profile of a single patient is reported in detail by one or more clinicians 13
  • 14. Case Series • Collection of individual case reports, which occurs with in fairly short period of time. • Is an individual case report that has been expanded to include a number of patients with a given disease • Helps in identifying the beginning or presence of epidemic. • Helps in hypothesis formulation • Lack comparison group 14
  • 15. Case report/series: Limitations –Lack of denominator to calculate rates of disease –Lack of comparison group –No selection of appropriate study populations –Sampling variations – No sampling employed, emerging cases are reported. 15
  • 16. A cross-sectional study (survey)  Snapshot of the health status of populations at a certain point in time.  Compare prevalence of disease in persons with and without the exposure of interest  Cross-sectional studies must be done on representative samples of the population. 16
  • 17. 17
  • 18. Advantage of Cross-sectional • Provides prevalence estimates of exposure and disease. • Easier to perform than studies that require follow- up (hence relatively inexpensive). • Can evaluate multiple risk (and protective) factors and health outcomes at the same point in time 18
  • 19. Advantage … • May identify groups of persons at high or low risk of disease • Can be used to generate hypotheses about associations between predictive factors and disease outcomes 19
  • 20. Limitation of cross-sectional  Temporal(time) sequence between exposure and disease cannot be established * i.e. chicken-or-egg dilemma. Example: In the study of knowledge of modern contraceptive, did the women know about it and then start to use it or did they learn about it because they were using it. 20
  • 21. Analytic Studies • Focus on identifying risk factors • Always use comparison group • Test hypotheses • Relatively costly • Less often used than descriptive studies 21
  • 22. Case-control Studies A case-control study is one in which persons with a condition ("cases") and suitable comparison subjects ("controls") are identified, and then the two groups are compared with respect to prior exposure. – subjects are sampled by their outcome status. Assess retrospectively on exposure status Relatively cheaper, (Time and Cost) Measure of association is using Odds ratio 22
  • 23. 23
  • 24. Case control: numbers and ratio • A single control group is optimal in most studies. Conditions for multiple controls: – when a single control group is considered to be not appropriate. – when the selected group has a specific deficiency that could be avoided by inclusion of another control group. Control-case ratio • Efficiency of a study can be maximized up to control-case ratio of 4:1. 24
  • 25. 25
  • 26. 26
  • 27. Cohort Studies • Cohort studies are epidemiologic designs that identify comparison groups according to their exposure status. • Disease free subjects are sampled by exposure status 27
  • 28. 28
  • 29. Characteristics of a Cohort Study • Groups of individuals defined on the basis of presence/absence of exposure to the suspected risk factor • All potential subjects must be initially free of the disease under investigation • Eligible participants are then followed over time to assess occurrence of disease 29
  • 30. Types of Cohort Studies •Classification of cohort studies depends on the temporal relationship between the initiation of the study and the occurrence of the outcome Prospective Cohort Study Retrospective Cohort Study 30
  • 31. Prospective Cohort Studies •The investigator collects information on the exposure status of the cohort members at the time the study begins, and identifies new cases of disease (or deaths) from that time forward •The exposures may have occurred at the beginning of study BUT the outcome has certainly not yet occurred. •After the selection of the cohort, participants must be followed over time to assess incidence of disease. E.g. identify oral contraceptive users and non-users; follow for the years to come and assess heart disease status. 31
  • 32. 32
  • 33. Retrospective Cohort Studies • Both the exposures and the outcomes have already occurred when the study is initiated. • Exposure status is established from information recorded at some time in the past, and disease incidence (or mortality) is determined from then until the present. 33
  • 34. Retrospective Cohort Studies… • Either interview the participants, or use medical records, to determine their subsequent history from that point to the present in terms of developing outcome. • Retrospective: deals with past events; can be done quickly • Cohort: the comparison is made between users and non- users of OCs 34
  • 35. Factors in Selection of Exposed Group • Frequency of the exposure of interest: ability of obtaining sufficient exposed individuals in a reasonable period of time • Ability to obtain complete and accurate exposure and outcome information on all study subjects 35
  • 36. 36
  • 37. Advantages of Cohort Design • Valuable when the exposure is rare • Allows direct measurement of risk • Can elucidate temporal relationship • Minimize bias in ascertainment of exposure • Can examine multiple effects of a single exposure 37
  • 38. Disadvantages of Cohort Design • Not suitable for rare disease • Cannot be applied for diseases with long incubation period • Cost in terms of time and resources • Obtaining complete information for all comparison groups • Loss-to-follow-up 38
  • 39. 39
  • 40. Experimental Studies Experimental designs are epidemiologic studies where: 1) Investigator manipulates the condition under study 2) Always prospective 40
  • 41. 41
  • 42. 42
  • 43. 43
  • 44. Classification of Intervention Studies: • Based on population • Clinical trial - usually performed in clinical setting and the subjects are patients. • Field trial- used in testing medicine for preventive purpose and the subjects are healthy people. • Community trial - a field trial in which the unit of the study is group of people/ community. 44
  • 45. Interventions that Can Be Evaluated • New drugs and new treatment of diseases • New medical and health care technology • New methods of primary prevention • New programs for screening • New ways of organizing and delivering health services • New behavioral intervention programs 45
  • 46. Ethical Considerations in experimental studies • Risks vs benefits • Comparison: Standard care vs placebo • Ethical approval • Informed consent & confidentiality • Freedom to withdraw • Duty of care • Stopping/Monitoring • Reporting findings • Quality: ‘Poor’ quality research is unethical! 46
  • 47. Experimental studies: Advantages • The major advantage of experimental studies lie in the strength of causal inference that can be made. – it is very difficult to make causal inferences based on observational studies. • Experimental studies offer the best design for controlling confounding variables. • Gold standard for epidemiologic research – Randomized Controlled Trials (RCTs) 47
  • 48. The Quality of “Gold Standard" • Randomization • Blinding • Use of Placebo 48
  • 49. Assignment 1. Review three articles for one specific study design. From the article you reviewed: • Write the characteristics of the study design in the that articles • Strengths and limitations of the study designs reviewed. • Which study design need to be repeated to solve the limitation of that conducted designs and how it can solve? • How do you calculate sample size for that study design? • 2. How do you develop data collection tools? • What are the sources of the tool • How do you assess whether it is measuring what it intended to measure or not?. • For question 1, Write full name and a DOI (Digital Object Identifier) of an article
  • 50. Sampling Methods ♣ Sampling involves the selection of a number of study units from a defined study population. ♣ The population is too large for us to consider collecting information from all its members. ♣ Instead we select a sample of individuals hoping that the sample is representative of the population. 50
  • 51. 51 SAMPLING… • Importance of sampling: - To save time and money - Measurements more accurate on samples than entire population (census) Defining the population: - Target population - Study population
  • 52. Sampling… When taking a sample, we will be confronted with the following questions: • What is the group of people from which we want to draw a sample? • How many people do we need in our sample? • How will these people be selected? • What are the errors to be confronted with when taking a random sample? 52
  • 53. Definitions of Population • Target population (reference population or source population): Is that population about which an investigator wishes to draw a conclusion. • Study population: Population from which the sample actually is drawn • Sampling unit: The unit of selection in the sampling process. For example, in a sample of districts, the sampling unit is a district; in a sample of persons, a person etc. 53
  • 54. • Study unit: The unit on which the observations will be collected. For example, persons in a study of disease prevalence, or households, in a study of family size. N.B. The sampling unit is not necessarily the same as the study unit. • Sampling frame: The list of units from which the sample is to be selected. The existence of an adequate and up-to-date sampling frame often defines the study population. Definitions of Population
  • 55. What is a defined population? ♣ The problem of obtaining a sample which is representative of a larger population needs special attention. ♣ The population under consideration should be clearly defined. ♣ It is only after having such a clearly defined population (i.e., in terms of geographical area, type of study subjects, etc. ) that the selection of the random sample could take place ♣ What are the main reasons for the necessity of such “clear definitions of the population”? 55
  • 56. How are the study subjects selected? ♣ An important issue influencing the choice of the most appropriate sampling method is whether a sampling frame is available (can be maintained), that is, a listing of all the units that compose the population. ♣ Two broad areas: Non-probability sampling method and probability sampling method 56
  • 57. Sampling methods… ♣ Non-probability sampling methods - used when a sampling frame does not exist Examples: • Convenience sampling • Quota sampling • These sampling methods do not claim to be representative of the entire population. When do you use these techniques? 57
  • 58. b) Probability sampling methods ♣ They involve random selection procedures to ensure that each unit of the sample is chosen on the basis of chance All units of the population should have an equal or at least a known chance of being included in the sample. ♣ Sample findings can be generalized 58
  • 59. b) Probability sampling methods 1. Simple Random Sampling 2. Systematic Sampling 3. Stratified sampling 4. Cluster sampling (all selected clusters will be considered –take care of clustering effect) 5. Multi-Stage Sampling (consider the design effect) 59
  • 60. Simple random sampling (SRS) • This is the most basic scheme of random sampling. • Each unit in the sampling frame has an equal chance of being selected • Representativeness of the sample is ensured. • However, it is costly to conduct SRS. • Moreover, minority subgroups of interest in the population my not be present in the sample in sufficient numbers for study. 60
  • 61. Simple random sampling (SRS)… To select a simple random sample you need to:  Make a numbered list of all the units in the population from which you want to draw a sample.  Each unit on the list should be numbered in sequence from 1 to N (where N is the size of the population)  Decide on the size of the sample  Select the required number of study units, using a “lottery” method or a table of random numbers. 61
  • 62. Simple random sampling… Lottery method : for a small population it may be possible to use the “lottery” method: each unit in the population is represented by a slip of paper, these are put in a box and mixed, and a sample of the required size is drawn from the box. Table of random numbers: if there are many units, however, the above technique soon becomes laborious. Selection of the units is greatly facilitated and made more accurate by using a set of random numbers in which a large number of digits is set out in random order. The property of a table of random numbers is that, whichever way it is read, vertically in columns or horizontally in rows, the order of the digits is random. Computer generated random list: 62
  • 63. Systematic Sampling • Individuals are chosen at regular intervals ( for example, every kth) from the sampling frame. • The first unit to be selected is taken at random from among the first k units. • For example, a systematic sample is to be selected from 1200 students of a school. • The sample size is decided to be 100. The sampling fraction is: 100 /1200 = 1/12. 63
  • 64. Systematic Sampling… • The number of the first student to be included in the sample is chosen randomly, for example by blindly picking one out of twelve pieces of paper, numbered 1 to 12. • If number 6 is picked, every twelfth student will be included in the sample, starting with student number 6, until 100 students are selected. • The numbers selected would be 6,18,30,42,etc. 64
  • 65. Merits • Systematic sampling is usually less time consuming and easier to perform than simple random sampling. It provides a good approximation to SRS. • Unlike SRS, systematic sampling can be conducted without a sampling frame (useful in some situations where a sampling frame is not readily available). E.g. In patients attending a health center, where it is not possible to predict in advance who will be attending. 65
  • 66. Demerits • If there is any sort of cyclic pattern in the ordering of the subjects which coincides with the sampling interval, the sample will not be representative of the population. 66
  • 67. Stratified Sampling • Appropriate when the distribution of the characteristic to be studied is strongly affected by certain variable (heterogeneous population). • The population is first divided into groups (strata) according to a characteristic of interest (eg., geographic area, prevalence of disease, etc.) • A separate sample is taken independently from each stratum, by simple random or systematic sampling. • proportional allocation - if the same sampling fraction is used for each stratum. • non- proportional allocation - if a different sampling fraction is used for each stratum or if the strata are unequal in size and a fixed number of units is selected from each stratum. 67
  • 68. Stratified Sampling: • Merit - The representativeness of the sample is improved. That is, adequate representation of minority subgroups of interest can be ensured by stratification and by varying the sampling fraction between strata as required. • Demerit - sampling frame for the entire population has to be prepared separately for each stratum. 68
  • 69. Cluster sampling • The selection of groups of study units (clusters) instead of the selection of study units individually • The sampling unit is a cluster, and the sampling frame is a list of these clusters. • Procedure - the reference population (homogeneous) is divided into clusters. • These clusters are often geographic units (e.g. districts, villages, etc.). • - a sample of such clusters is selected - all the units in the selected clusters are studied. • It is preferable to select a large number of small clusters rather than a small number of large clusters. 69
  • 70. Cluster sampling… • Merit - A list of all the individual study units in the reference population is not required. It is sufficient to have a list of clusters. • Demerit - It is based on the assumption that the characteristic to be studied is uniformly distributed throughout the reference population, which may not always be the case. 70
  • 71. Multi-stage sampling • This method is appropriate when the reference population is large and widely scattered. • Selection is done in stages until the final sampling unit (eg., households or persons) are arrived at. • The primary sampling unit (PSU) is the sampling unit (usually large size) in the first sampling stage. • The secondary sampling unit (SSU) is the sampling unit in the second sampling stage. etc. • Example - The PSUs could be kebeles and the SSUs could be households. 71
  • 72. Multi-stage sampling • Merit - Cuts the cost of preparing sampling frame • Demerit - Sampling error is increased compared with a simple random sample.  Multistage sampling gives less precise estimates than simple random sampling for the same sample size, but the reduction in cost usually far outweighs this, and allows for a larger sample size.  That is, a design effect need to be considered 72
  • 73. What are the errors to be confronted with when taking a random sample? ♣ When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be subject to errors. This error has two components. a) Sampling error (i.e., random error) b) Non Sampling error (i.e., bias) 73
  • 74. Sampling error (i.e., random error) •Random error consists of random deviations from the true value, which can occur in any direction. • Sampling error (random error) can be minimized by increasing the size of the sample 74
  • 75. Non Sampling error (i.e., bias) • Bias consists of systematic deviations from the true value, always in the same direction • It is possible to eliminate or reduce the non-sampling error (bias) by careful design of the sampling procedure and by taking care of the errors that may arise during data analysis. 75
  • 76. 2. Nonprobability Sampling • Here, the sample is less likely to be representative of the population, thus it is difficult to extrapolate from the sample to the population. • Is used when there is no sampling frame or when it is impossible to conduct probability sampling due to economical and feasibility factors. 76
  • 77. Nonprobability Sampling Cont.. • Judgmental or Purposive Sampling: The researcher chooses the sample based on who he/she think would be appropriate for the study. • Convenience Sampling: The selection of units from the population is based on availability and/or accessibility. • Quota Sampling: It starts with systematically setting “Quota” to represent subgroups of a population. Then data is collected to meet the predefined Quota. • Snowball Sampling: The researcher begins by identifying someone who meets the inclusion criteria of the study. Then the study subject would be asked to recommend others who s/he may know who also meet the criteria. 77
  • 78. Sample size estimation for a study 78
  • 79. How do we ensure enough precision to make good programmatic decisions after the results are analysed? • We calculate the sample size we will need in our study before we start collecting data • If a study has not calculated the sample size beforehand, it may have wasted resources gathering more data than necessary. • Supervision and training are more difficult with more teams and study workers • Not calculating sample size can lead to a greater chance of having bias in the results.
  • 80. Sample size determination for study • Sample size:- it is the number of study population (subjects) required to study an estimate in a population • Any sample size will give you an estimate of the population parameter • However, the larger the sample the more precision • Always calculate the sample size that gives the required precision! • Too large sample size: too expensive and time consuming • Too small sample size: it has inadequate precision to show a good estimate or show difference
  • 81. Sample size depends on 1.Estimated variability (denoted by P) 2.The precision; or margin of error (denoted by d) 3.The sampling (clustering; Dffect) (denoted by g) 4.Size of population 5.Feasibility (cost) 6.Confidence level (Z value of certainty) 7.Non-response rate * g
  • 82.  If the whole population is examined, then there is no uncertainty  If a sample is taken, then sampling variability is introduced – level of precision Sampling variability and precison An estimate based on a sample depends on chance! It depends on which of the numerous different samples, study subject are selected
  • 83. 1. Estimated variability • This is the estimated proportion of the event in the population as estimated from similar population • It is taken usually from similar study in literature • It could also be found in pilot study • In the absence of a way to estimate this variability, the maximum variability of 50% is considered. • If there are two similar studies about the population estimate, it is good to take the one nearer to 50%.
  • 84. 2. What is precision (margin of error) • It is postulated confidence level the true population mean would be bounded from the sample. • It is the standard error of the mean or the proportion of the sample • Standard error is a function of the standard deviation against the square root of the sample size • Sample size is inversely related to precision SE = SD n Mean (p•(1-p)  n SE = Proportion
  • 85. Cont’d … • A measure of how close an estimate is to the true value of a population parameter. • For example, • a prevalence of 10% from a sample of size 20  a 95% confidence interval of 1% to 31%,  not very precise or informative. • a prevalence of 10% from a sample of size 400  a 95% confidence interval of 7% to 13%, sufficiently accurate. • Sample size calculations help to avoid this situation.
  • 86. Cont’d… • Precision is estimated based on the researchers trial to estimate the true population proportion • A precision of 1-3% is chosen if in the presence of resource if high precision is required • It is usually estimated between 1-5% for variability estimate when counted by percent
  • 87. 3. The sampling method (clustering) • Sample size also is related to sampling method • If cluster sampling is used, the random error produced will be larger to lower the power • People within cluster are homogenous but more heterogeneous between clusters • This homogeneity within a cluster could produce higher imprecision • Therefore higher number of clusters and taking few individuals within each cluster could lower the design effect
  • 88. Cont’d… • Design effect is the variance produced by clustering against the variance produced by SRS • It ranges between 1.5 to 10. • If a correction is not done, the random error can be boosted and a deviation from the true estimate could result • In case of cluster sampling, the sample size is multiplied by the design effect that is estimated
  • 89. 4. Size of a population • Sample size for finite population needs finite population correction • Finite population is considered when the study is done in a sub-population having no other reference population • The sample size is reduced as a proportion of the population
  • 90. 5. Feasibility (cost) • Sample size also is dependent on the cost of the sampling • More people provides higher precision but high cost • After the optimal sample size, the increment in sample size will add only few precision • After the optimal sample size adding the size will cost more than to add the precision • Therefore, the issue of cost should be controlled by precision estimate
  • 91. 6. Confidence level (Z value of certainty) • The confidence level is based on extrapolating the value of certainty in the sample to measure the population parameter • Population parameter is the true estimate of the population but unknown • It is estimated based on Central limit theorem • Central limit theorem: The distribution of the sample means will be nearly Normal regardless of how the variable is distributed in the population as long as the sample size is large enough.
  • 92. SAMPLE SIZE FORMULA FOR PROPORTION Sample size for single population proportion • n = Z2 (1-α) p(1-p)/d2 • Sample size calculation for two population proportion: n = (Z(1-α)+Z(1-β))2 [p1(1-p1)+ p2(1-p2)] (p1-p2)2 92
  • 94. Variable  What is a variable?  What types of variables do you know?  What is the importance of knowing types of variables?  Identify the type of the outcome variable and also indicate the outcome variable and possible independent variables for the title bellow  Exclusive breast feeding practice and associated factors among mothers of under two years children in ‘X’ town 94
  • 95. Variable  Variable: A characteristic which takes different values in different persons, places, or things.  Any aspect of an individual or object that is measured (e.g. BP) or recorded (e.g. age, sex) and takes any value.  There may be one variable in a study or many.  E.g. A study of treatment outcome of TB 95
  • 96. Variable Qualitative or categorical Quantitative measurement Nominal (not ordered) e.g. ethnic group Ordinal (ordered) e.g. response to treatment Discrete (count data) e.g. # of admissions Continuous (real-valued) e.g. height Measurement scales SUMMARY 96
  • 97. Depending on scales of measurement we have: Four levels of measurement • Nominal measures • Ordinal measures • Interval measures • Ratio measures 97
  • 98. 98 1. Nominal scale: • The simplest type of data, in which the values fall into un-ordered categories or classes • Uses names, labels or symbols to assign each measurement. • Examples: Blood type, sex, race, marital status
  • 99. 99 2. Ordinal scale: • Assigns each measurement to one of a limited number of categories that are ranked in terms of order. • However, the distances between the categories are uneven or unknown • Although non-numerical, can be considered to have a natural ordering • Examples: Patient status, cancer stages, social class
  • 100. 100 3. Interval scale: - Measured on a continuum - Differences between any two numbers on a scale are of known size. Example: Temp. in o F on 4 consecutive days Days: A B C D Temp. o F: 50 55 60 65 For these data, not only is day A with 50o F cooler than day D with 65o but is 15o cooler. - It has no true zero point. “0” is arbitrarily chosen and doesn’t reflect the absence of temp. - Interval data differs from ordinal data because the differences between adjacent scores are equal.
  • 101. 101 4. . Ratio scale: - Measurement begins at a true zero point and the scale has equal space. - In ratio scales, zero is an absolute absence of the variable - The data can be categorized, ranked, evenly spaced, and has a natural zero. - Examples: Height, weight, BP, etc. - 40cm height is twice 20cm height.
  • 102. Why is Level of Measurement Important? • Helps to decide how to interpret the data from that variable. • Helps to decide what statistical analysis is appropriate on the values that were assigned. • If a measure is nominal, then you know that you would never average the data values or do a t-test on the data.
  • 103. Exercises: Give the correct scales of measurement for each variable 1. Temperature (Celsius) 2. Hair colour 3. Job satisfaction index (1-5) 4. Number of heart attacks 5. Calendar year 6. Serum uric acid (mg/100ml) 7. Number of accidents in a 3 - year period 8. Number of cases of each reportable disease reported by a health worker 9. The average weight gain of six 1-year old dogs with a special diet supplement was 950 grams last month. 103
  • 104. Dependent and independent variables • Because in health research we often look for associations, it is important to make a distinction between dependent and independent variables. • Both the dependent and independent variables together with their operational definitions (when necessary) should be stated. 104
  • 105. Dependent and independent variables … • The variable that is used to describe or measure the problem under study is called the dependent variable. • The variables that are used to describe or measure the factors that are assumed to influence (or cause) the problem are called independent variables. 105
  • 106. Variables … • For example, in a study of relationship between smoking and lung cancer, "suffering from lung cancer" (with the values yes, no) would be the dependent variable and "smoking" (with the values no, less than a packet/day, 1 to 2 packets/day, more than 2 packets/day) would be the independent variable. 106
  • 107. For each of the following research questions identify the outcome and independent variables a. The prevalence of contraceptive use among HIV +VE women in the reproductive age group in Y town b. The incidence of COVID-19 infection among under five children in Kebelle “X” in Y town c. Is double burden malnutrition the emerging problem among under 5 children in Oromia region? d. Factors associated with Age at first sexual initiation among youths visiting HIV testing and counselling centres in North Shoa Zone, Ethiopia 107
  • 108. Background variables  In almost every study involving human subjects, background variables are usually demographic characteristics—such as, age, sex, educational status, monthly family income, marital status and religion will be included.  These background variables are often related to a number of independent variables, so that they influence the problem indirectly.  Hence they are called background variables or background characteristics.  The researcher cannot manipulate 108
  • 109. Operationalizing variables • Operationalizing variables means that you make them ‘measurable'. • Example: In a study on VCT acceptance, you want to determine the level of knowledge concerning HIV in order to find out to what extent the factor ‘poor knowledge’ influences willingness to be tested for HIV. 109
  • 110. Cont’d • The variable ‘level of knowledge’ cannot be measured as such. • You would need to develop a series of questions to assess a person’s knowledge. • The answers to these questions form an indicator of someone’s knowledge on this issue, which can then be categorized. 110
  • 111. Cont’d … If 10 questions were asked, you might decide that the knowledge of those with: 0 to 3 correct answers is poor,  4 to 6 correct answers is reasonable, and  7 to 10 correct answers is good. Operational definitions of variables are used in order to: • Avoid ambiguity • Make the variables to be more measurable 111
  • 112. Data collection Introduction  Data collection is a crucial stage in the planning and implementation of a study.  data analysis becomes difficult when the data collection has been - superficial, - biased or - incomplete,  Therefore, we should concentrate all possible efforts on developing appropriate tools, and should test them several times. 112
  • 113.  Depending on the type of study, different data-collection techniques may be used.  In HSR studies we usually combine a number of different techniques and look at problems from different perspectives (triangulation). 113 Data collection….
  • 114. The choice of methods of data collection is based on: The resource required Acceptability of the method Coverage of the method Familiarization of the procedure Relevance The accuracy of information they will yield Practical considerations, such as, the need for personnel, time, equipment and other facilities, in relation to what is available 114 Data collection….
  • 115. OVERVIEW OF DATA COLLECTION TECHNIQUES Data-collection techniques allow us to systematically collect information about our - objects of study (people, objects, phenomena) - the settings in which they occur.  In the collection of data we have to be systematic. If data are collected haphazardly, it will be difficult to answer our research questions in a conclusive way. 115
  • 116. • Various data collection techniques can be used such as: ➢Using available information ➢Observing ➢Interviewing (face-to-face) ➢Administering questionnaire ➢Focus group discussion ➢Physiological measurement(in vitro vs in vivo) DATA COLLECTION TECHNIQUES…
  • 117. 1. Using available information/documentary sources Locating the sources and retrieving the information is a good starting point in any data collection effort. 117 These include: • Health Information System Data, • Census Data, • Unpublished Reports • Publications • Clinical Records • Personal Records, • Death Certificates, • Published Mortality Statistics, • Census Publications, etc • Key Informants • Newspapers
  • 118. Advantages: Documents can provide ready made information relatively easily The best means of studying past events.  Data collection is inexpensive 118
  • 119. Disadvantages: Problems of reliability and validity There is a possibility that errors may occur when the information is extracted from the records. Since the records are maintained not for research purposes, but for clinical, administrative or other ends, the information required may not be recorded at all, or only partly recorded. 119
  • 120. 2. Observing  Is a technique that involves systematically selecting, watching and recording behavior and characteristics of living beings, objects or phenomena. Observation of human behavior is a much-used data collection technique. It can be undertaken in different ways:  Participant observation  Non-participant observation  Structured observation  Unstructured observation 120
  • 121. Participant observation The observer takes part in the situation he or she observes  E.g., a nurse researcher observing how nurses communicate with their patients while taking part in patient care activities in the facility.
  • 122. Non-participant observation • The observer watches the situation, openly or concealed, but does not participate in the situation being observed. • Observation is commonly used in qualitative studies • Phenomena amenable to observation in research include: ☞Activities and behaviour ☞Characteristics and conditions of individuals ☞Skill attainment and performance ☞Verbal and nonverbal communication ☞Environmental characteristics
  • 123. Observation can be made by using structured or unstructured tools or both a) Unstructured observation: ➢Involves spontaneously observing & recording what is seen using field diaries or field notes Conducted in an open and free manner in a sense that there would be no pre-determined variables or objectives. b) Structured observation: ➢ The researcher carefully defines what is to be observed & how the observations are to be made, recorded, & coded Data collection is conducted using specific variables and according to a pre-defined schedule.
  • 124. cont’d… • Involves the use of categorical system or checklist or rating scales to guide observation & recording • If observations are made using a calibrated scale they may be called measurements • Measurements often require additional tools e.g., weighing scale to measure weight, meter tap to measure height, thermometer to measure body temperature. • Observations can give additional, more accurate information on behaviour of people than interviews or questionnaires
  • 125. Advantages of observation • Give additional, more accurate information on behavior of people than interviews or questionnaires. • Check on the information collected through interviews especially on sensitive topics such as alcohol or drug use, or stigmatizing diseases. • They can also be made on objects. For example, the presence or absence of a latrine and its state of cleanliness may be observed. • They would be the major research technique. 125
  • 126. They are time consuming  They are most often used in small-scale studies. Investigators or observers own bias, prejudices, desires, etc Needs more resources and skill human power during the use of high level machines.  Ethical issues 126 Disadvantagesof observation
  • 127. 3. Interviewing • Is a data-collection technique that involves oral questioning of respondents, either individually or as a group. • Based on qualitative or quantitative or both, type of data collection can be:  Face to face interview  Telephone interview  Self-reported/completed questionnaire • Answers to questions posed during an interview can be recorded by: • ☞ writing them down either during the interview itself or immediately after interview; or • ☞ Tape-recording the responses; or • ☞ A combination of both 127
  • 128. Can stimulate and maintain the respondents interest the frank answering of questions.  If anxiety is aroused the interviewer can allay it.  Can repeat questions which are not understood, and give standardized explanations where necessary.  An interviewer can ask “follow-up” or “probing” questions to clarify a response. can make observations during the interview; 128 Advantages of Interviewing
  • 129. Disadvantages Interviewing • Questions may be misunderstood • Time consuming • Need to setup interviews • Can be expensive • Respondents bias • Needs a set of questions
  • 130. 4. Administering written questionnaires  Is a data collection technique in which written questions are presented that are to be answered by the respondents in written form It can be administered in different ways, such as by: • Sending questionnaires by mail • Self-administered questionnaires • Interviewer -administered questionnaires 130
  • 131. Administering written questionnaires cont’d… Advantages: • Less expensive • Permits anonymity & may result in more honest responses • does not require research assistants • Eliminates bias due to phrasing questions differently with different respondents
  • 132. Disadvantages: • Cannot be used with illiterates • there is often a low rate of response • questions may be misunderstood Administering written questionnaires cont’d…
  • 133. Rating scale • The question in self-administered questionnaire can be open-ended or closed (with pre-categorized answers) • Closed questions can be composed of dichotomous questions, multiple-choice questions, rank-order questions, & rating scales • Rating Scales is elicit response in terms of the degree of attitude, perception, needs, or experiences • Rating scales involve the composite psychosocial scale used to make fine quantitative discriminations among people with different attitudes, perceptions, needs, or experiences
  • 134. Rating scale… • These psychosocial scales are: ☞Likert scales (summated rating scales) ☞Semantic differential scales ☞Visual analogue scale Likert Scales • Consist of several declarative statements (items) expressing viewpoints or opinion or attitude of subjects • Responses are on an agree/disagree continuum (usually ranging from 4 - 7 response options) i.e., Strongly agree, agree, uncertain, disagree, strongly disagree
  • 135. Likert scale • Values are placed on each response, with 1 on the most negative response & highest (4-7) value on most positive response • Responses to items are summed to compute a total scale score • Example of Likert Scales S D= StronglyDisagree;D G= Disagree;UN=Uncertain;A G= Agree;S A= StronglyAgree
  • 136. Semantic Differential Scales • Used to measure attitudes & beliefs • Require ratings of various concepts • Rating scales involve bipolar adjective pairs, with 7-point ratings • Value of 1 denotes the most negative response & 7 denotes the most positive response • Ratings for each dimension are summed to compute a total score for each concept
  • 137. Example of a Semantic Differential Scales
  • 139. Differences between data collection techniques and data collection tools Data collection techniques Data collection tools Using available information Checklist; data compilation forms Observation Eyes and other senses, pen/paper, watch, scales, microscope, etc.. Interviewing Interview guide, checklist, questionnaire, tape recorder Administering written questionnaire Questionnaire
  • 140. Data collection instruments • Types of questions  Depending on how questions are asked and recorded we can distinguish major possibilities: 1. Closed questions 2. Open-ended questions 3. Semi-opened questions 140
  • 141. 141  Closed questions  Open-ended questions  Semi-opened questions Types of questions
  • 142. 142 Closed questions  A list of possible answers or options  Commonly used for background variables  Should be exhaustive & mutually exclusive What is your marital status? 1. Single 2. Married 3. Divorced 4. Separated 5. Widowed
  • 143. 143 Open-ended questions  Free to answer with fewer limits imposed by the researcher  Useful for exploring new areas What is your opinion on the services provided in the antenatal (AN) care? _______________________________________ _____ _______________________________________
  • 144. 144 Semi-opened questions What is your occupation? (1) Dependent (2) Manual labourer (3) Government employee (4) Private employee (5) Owned business (6) Others (please specify) _____________
  • 145. Open-ended questions 1. (allowing for completely open as well as partially categorized answers).  It permit free responses which should be recorded in the respondents' own words.  Such questions are useful for obtaining in-depth information on: • facts with which the researcher is not very familiar, • opinions, attitudes and suggestions of informants Examples; 1. At what age the child started supplementary food? 2. 'What is your opinion on the services provided in the ANC?' (Explain why.) 3. 'What do you think are the reasons some adolescents in this area start using drugs? 145
  • 146. Advantage of open-ended questions… Allow you to probe more deeply into issues of interest being raised. Information provided in the respondents' own words might be useful Providing valuable new insights on the problem. Permit unlimited number of answers 146
  • 147. Risks of completely open-ended questions… A big risk is incomplete recording of all relevant issues covered in the discussion. Analysis is time-consuming and requires experience; otherwise important data may be lost. Skilled interviewers are needed to get the discussion started and focused on relevant issues and to record all information collected. 147
  • 148. 2. Closed questions:  Have a list of possible options or answers from which the respondents must choose. Example: closed ended question What is the current breastfeeding status of mother ? A. Exclusive breastfeeding B. Partial breastfeeding C. Not breastfeeding 148
  • 149. Advantages of closed ended questions  It saves time  Comparing responses of different groups, or of the same group over time, becomes easier.  Answers easier to analyze on computer  Response choices make question clearer Risks of closed ended questions: • In case of illiterate respondents, bias will be introduce • Many choices can be confusing • Can't tell if respondent misinterpreted the question • Fine distinctions may be lost 149
  • 150. Questionnaire Design • Designing a questionnaire always takes several drafts. • In the first draft we should concentrate on the content. • In the second, we should look critically at the formulation and sequencing of the questions. • Then we should scrutinize the format of the questionnaire. • Finally, we should do a test-run to check whether the questionnaire gives us the information we required & whether both the respondents & we feel at ease with it. 150
  • 151. Steps in designing questionnaire Step 1: Content Step 2: Formulating questions Step 3: Sequencing the questions Step 4: Formatting the questionnaire Step 5: Translation Step 6: pre-test 151
  • 152.  Take your objectives and variables as a starting point  Decide what questions will be needed to measure or to define your variables and reach your objectives. 152 Step 1: Content:
  • 153. Step 2: Formulating questions:  Formulate one or more questions that will provide the information needed for each variable.  Check whether each question measures one thing at a time.  Take care that questions are specific and precise enough that different respondents do not interpret them differently.  Avoid words with double or vaguely defined meanings or that are emotionally laden e.g., omit concepts such as dirty (clinics), lazy (patients), or unhealthy (foods)  Ask sensitive questions in a socially acceptable way.  Avoid leading questions. A question is leading if it suggests a certain answer.
  • 154. • Design your interview schedule or questionnaire to be 'informant friendly’ • The sequence of questions must be logical for the respondent & allow as mush as possible for a “natural” discussion. • Organize the questions in a logical order & use simple, everyday language • Pose more sensitive questions as late as possible in the interview. 154 Step 3: Sequencing the questions:
  • 155. Step 4: Formatting the questionnaire • When you finalize your questionnaire, be sure that: • An introductory page explaining the purpose of the study & confidentiality issue is attached to the questionnaire • Sufficient space is provided for answers to open-ended questions • Page layout & margins are properly formatted
  • 156. Step 5:Translation If interview will be conducted in one or more local languages, the questionnaire has to be translated to standardize the way questions will be asked. After having it translated you should have it retranslated into the original language. You can then compare the two versions for differences and make a decision concerning the final phrasing of difficult concepts. 156
  • 157. Step 6: Pretest A pretest usually refers to a small-scale trial of a particular research component. A pretest serves as a trial run that allows us to identify potential problems in the proposed study. As a result, a good deal of time, effort, and money can be saved in the long run Pre testing is :  Simpler  Less time consuming and less Costly 157
  • 158. Pre test… • A pretest determine whether the instrument is clearly worded, free from major biases, and useful in generating desired information • When do we carry out a pre-test? • Pre-testing the data collection 1-2 weeks before starting the fieldwork so that you have time to make revisions.
  • 159. Pre test… Components to be assessed during the pre-test? The reactions of respondents to: • The research procedures and • Questions related to sensitive issues.  The appropriateness of format and wording of questionnaires and the accuracy of the translations.  The time needed to carry out interviews, observations or measurements.
  • 160. Pretesting and Pilot study • Pretest – usually refers to a small-scale trial of particular research components • Pilot study – is the process of carrying out a preliminary study, going through the entire research procedure with a small sample
  • 161. • Whatever the type of measurement, its performance can be described in several ways: Validity Reliability Range Variation Responsiveness (An instrument’s ability to detect change over time) Measurement Properties
  • 162. Measurement • Involves rules for assigning numeric values to qualities of objects to designate the quantity of the attribute • Advantage of measurement –It removes guesswork in gathering information
  • 163. Measurement error Systematic Error - Also called "constant error" or "bias" • Design or instrument Error which affect the data in a consistent way. –Either pull all the scores up or all of the scores down. Therefore, systematic errors affect the group mean score • a systematic upward or downward distortion of the level of measurement Random Error/noise • Transient aspects of the measurement situation that cause variable errors. • These errors cause greater variability within the data set, but do not make the mean score higher or lower.
  • 164. • Is the consistency of measurement results across persons, occasions, locations and instruments • Consistency of responses to a question (if you get on your scale and it tells you that you weigh 110 lbs one minute, then you step on it again and it tells you that you weight 115, then it is not very reliable). Reliability
  • 165. Reliability… • Is the degree to which the same results are obtained when the measurement is repeated. • Repeated measurements of a stable phenomenon by different people and instruments at different times and places get similar results. • Reproducibility and precision are other words for this property • Relates to the consistency of a measure, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects • It is unlikely that the same results will be given every time due to differences at the time changes in the population and the sample
  • 166. • There are three methods of testing the reliability of research instruments: 1.Tests for the stability of the instruments (how stable it is over time) 2.Tests for equivalence (consistency of the results by different investigators) 3.Internal consistency (the measurement of the concept is consistent in all parts of the test). Reliability...
  • 167. Reliability… • Stability: the same score is obtained when the instrument is used with the same people on a separate occasion –Test-Retest Reliability(stability): Administer the same questionnaire at a later time –Reliability coefficient • Equivalence: the consistency of the instrument by different observers/raters –Interrater reliablity • Internal consistency: the extent that all its subparts measure the same characteristics –Split-Half Reliability –Cornbach’s alpha/cofficient alpha
  • 168. Tests of Stability • A stable research instrument is one that can be repeated on the same individual more than once and achieve the same results. • In observational methods, when the characteristic being observed is expected to change over time, a test of stability cannot be used. • Repeated observations and test/retest procedures are used to test the stability of an instrument. • Pearson Correlation Coefficient is used to calculate, which takes on a value between -1 and 1 Reliability...
  • 169. Tests of Equivalence • Tests of equivalence attempt to determine if the same results can be obtained using different observers at the same time or if similar tests given at the same time yield the same results. • The equivalence aspect considers how much error may get introduced by different investigators or different samples of the items being studied Reliability...
  • 170. Test of Internal consistency • Internal consistency refers to the extent to which all parts of the measurement technique are measuring the same concept • The most common one is called Cronbatch’s alpha o Cronbatch’s alpha can be calculated SPSS, SAS, STATA o This Cronbatch's alpha gives the lower bound for reliability. o If it is high for the whole scale (>=0.7), then you know the scale is reliable Reliability...
  • 171. • Variability of observer ratings can be distinguished by observer disagreement • indicated by how observers classify individual subjects into the same category on the measurement scale • Kappa coefficient is one of the most common approaches. • Ranges from -1 to +1 Reliability...
  • 172.  Cohen's kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories.  The equation for κ is: K = Po - Pe = Actual agreement beyond chance 1 - Pe Potential agreement beyond chance Po = the total proportion of observations on which there is agreement Pe = the proportion of agreement expected by chance alone. Reliability...
  • 173. Agreement matrix for kappa statistic (inter-rater agreement, 2 observers, dichotomous data) OBSERVER B OBSERVER A Yes No TOTALS Yes a b f1 No c d f2 TOTALS n1 n2 N
  • 174. Agreement matrix for kappa statistic (2 observers, dichotomous data) OBSERVER B OBSERVER A Yes No TOTALS Yes 69 15 84 No 18 48 66 TOTALS 87 63 150
  • 175. K (Cont’d) • Observed agreement (Po) = (a+d)/N= (69 + 48)/150 = 0.78 or 78%. • Agreement expected at chance (Pe) Calculated by the product of the marginal totals (Pe) =[ (f1*n1)/N+(f2*n2)/N]*1/N 87 x 84/150 = 48.75 63 x 66/150 = 27.72 Then divide sum [76.47] by 150 to get Pe = 0.51 or 51%.
  • 176. K (Cont’d) • K = Po - Pe = 0.78 - 0.51 = 0.27 = 0.55 or 55% 1 - Pe 1 - 0.51 0.47 Kappa varies from -1 to +1, with a value of zero denoting agreement no better than chance (negative values denotes agreement worse than chance!)
  • 177. Reliability... Interpretation of Kappa • Poor agreement = Less than 0.20 • Fair agreement = 0.20 to 0.40 • Moderate agreement = 0.40 to 0.60 • Good agreement = 0.60 to 0.80 • Very good agreement = 0.80 to 1.00
  • 178. Validity • Validity (accuracy) is the degree to which the results of a measurement correspond to the true state of the phenomenon being measured. • For clinical observations that can be measured by physical means, the observed measurement is compared with some accepted standard. • Thus, it is relatively easy to establish validity.
  • 179. • Some other clinical measurements such as pain,nausea, dyspnea, depression, and fear cannot be verified physically. • In patient care, information about these phenomena is usually obtained informally by “taking a history.” • More formal and standardized approaches, used in research, are structured interviews and questionnaires. Validity…
  • 180. • A valid measurement thus requires both a valid method (instrument for measurement) and a valid observer (measurer). • Individual questions (items) are designed to measure specific phenomena (e.g., symptoms, feelings, attitudes, knowledge, beliefs) called constructs • Three general strategies are used to establish the validity of measurements that cannot be directly verified physically. Validity…
  • 181. • Is the extent to which a particular method of measurement includes all of the dimensions of the construct one intends to measure and nothing more. • For example, a scale for measuring pain would have content validity if it included questions about aching, throbbing, pressure, burning, and stinging, but not about itching, nausea, and tingling. • It looks at whether the instrument adequately covers all the content that it should with respect to the variable. • In other words, does the instrument cover the entire domain related to the variable, or construct it was designed to measure? A. Content Validity
  • 182. B. Criterion Validity • Also called "concurrent validity" has to do with the correlation between: • measurement items on the one hand and known and accepted standard measures or criteria on the other. • It is any other instrument that measures the same variable • Criterion validity is measured in three ways: 1. Convergent validity—shows that an instrument is highly correlated with instruments measuring similar variables.
  • 183. 2. Divergent validity—shows that an instrument is poorly correlated to instruments that measure different variables. • For example, there should be a low correlation between an instrument that measures motivation and one that measures self-efficacy. 3. Predictive validity—means that the instrument should have high correlations with future criterions. • The predictive validity is like concurrent validity except that there is a time elapse between the criterion and test measures • For example, a score of high self-efficacy related to performing a task should predict the likelihood a participant completing the task. Criterion Validity…
  • 184. C. Construct validity • It refers to whether you can draw inferences about test scores related to the concept being studied. • Construct validation is the accumulation of evidence to support the interpretation of what a measure reflects • For example, if a person has a high score on a survey that measures anxiety, does this person truly have a high degree of anxiety? • There are three types of evidence that can be used to demonstrate a research instrument has construct validity: • Homogeneity—meaning that the instrument measures one construct.
  • 185. • Convergence—this occurs when the instrument measures concepts similar to that of other instruments. • However, if there are no similar instruments available this will not be possible to do. • Theory evidence—this is evident when behaviour is similar to theoretical propositions of the construct measured in the instrument. • For example, when an instrument measures anxiety, one would expect to see that participants who score high on the instrument for anxiety also demonstrate symptoms of anxiety in their day-to-day lives Construct validity…
  • 186. Summary of validity • Validity is defined as the extent to which a concept is accurately measured in a quantitative study • Content validity: The extent to which a research instrument accurately measures all aspects of a construct • Construct validity: The extent to which a research instrument (or tool) measures the intended construct • Criterion validity: The extent to which a research instrument is related to other instrument
  • 187. • To assess the accuracy of any particular measuring 'instrument', we should distinguish between the reliability of the data collected and their validity. • Reliability is essentially the extent of the agreement or consistency between repeated measurements • Validity is the extent to which a method of measurement provides a true assessment of that which it purports to measure Reliability versus Validity
  • 188.
  • 189. Reliable Valid Not Reliable Not Valid Not Valid Reliability Validity
  • 190. Stages in the Data Collection Process Three main stages can be distinguished: Stage 1: Permission to proceed Stage 2: Data collection Stage 3: Data handling 190
  • 191. DATA PROCESSING, ANALYZING & INTERPRETATION
  • 192.  Data are numbers which can be measured or can be obtained by counting.  Data are sources of facts or information from which conclusion can be drawn after they are statistically treated in some way.  They are the raw material for statistics.
  • 193. • Data processing and analysis should start in the field, with checking for completeness of the data and performing quality control checks, while sorting the data by instrument used and by group of informants. • Data of small samples may even be processed and analyzed as soon as it is collected.
  • 194. Data processing, analyzing & interpretation  Data processing involves: • Data entry • Data coding • Data categorizing • Data cleaning Analyzing & interpretation
  • 195.  WHAT IS DATA PROCESSING? • Data processing refers to: • Data entry onto a computer • Data coding • Data categorizing • Data checks and correction • The aim of this process is to produce a relatively “clean” data set which may be imported into a statistical package. • When to start?
  • 196. Data pro… Why process data?  It helps the researchers to assure that :- • All the information one needs has been collected, and in a standardized way; • She/he has not collected unnecessary data which will never be analyzed. • Provide better insight into the feasibility of the analysis to be performed as well as the resources that are required. • It assures the appropriateness of the data collection tools that he/she needs.
  • 197. Data can be processed:- • Manually, using data master sheets, manual sorting, or tally counts. • Computer, using existing software for data analysis (eg:- spss, epi…).
  • 198.  Computer compilation consists of the following steps: 1. Choosing an appropriate computer program 2. Data entry 3. Verification or validation of the data 4. Programming (if necessary) 5. Computer outputs/prints
  • 199. I. Data entry • Data entry concerns the transfer of data from a questionnaire to a computer file. • It is a process of entering raw data into a computer • It is a process where raw data could be manipulated and changed • Data is coded and entered into a computer
  • 200. Data entry…. • We can use any software • EPI info • EPI 6 (dose format) • EPI-data (Dutch format) • SPSS for windows • Excel (office) • Access (office) • etc
  • 201. Selection of data entry software • There are different computer software for data entry • A software is selected based on its • strength to enter data through resisting to change • Its lower cost • Presence of program looking for consistency • Non-visibility of the whole data to the entering Clark • Ability to enter through double entry and its validity
  • 202. DATA ENTRY...  Who does data entry? • Data often entered to a computer by a clerk who may not be familiar with how the research was designed & how the data was collected. • To facilitate data entry and minimizing errors, the data entry person should not make guesses, calculations, coding etc. • Data entry is quick and easy for the data entry person if he/she simply type the information which is seen on the answer sheet (i.e. direct data entry).
  • 203. DATA ENTRY... If the questionnaire is adequately designed, direct data entry is possible if: • Answers are put in separate column or separate answer sheet • Documents are edited before data entry • Closed ended questions are pre-coded etc. When working with computers, note to: • Saving your work frequently • Keep back-ups (more than one copy) • Share time with other users etc.
  • 204. 2. DATA CODING  For computers to work their magic they must be able to read your data. In general computers are at best with numbers.  Alphabetic codes and open ended responses must be translated to numbers through the process called “coding”. Coding:  is assigning a separate (non-overlapping) numerical code for separate answers and missing values  E.g. instead of using “Male” and “Female” for the variable sex, it can be indicated as: 1= Male, 2= Female
  • 205. Coding may pre, post and recoding • Pre-coding ;-when questionnaire being written. • Post -coding:- After respondents have answered questions. • for open ended questions for which response categories can't anticipate. • Recoding:-changes earlier coding to facilitate meaningful analysis.
  • 206. Coding missing values: • Missing values occur when measurements were not taken, or respondents did not answers etc. • In general, missing values should not be entered as a “blank” because some statistical packages interpret blanks as zeros • Ideally, a code should be chosen to denote a missing value (e.g. code “9” or “99” or “999” is often used missing values).
  • 207. DATA CODING... Who does the coding? • The principal investigator should coordinate the coding process and ideally all the coding should be done by one person. • Certainly, no more than three different people should be involved in this process. • If the work is done by more than one person, they should have codebook • Code book: It is essentially a list of each variables entered in the column and the codes associated with the value of the variables
  • 208. Code book provides oA guide used in the coding process oLocating the variables oAssignments of the values of the variable oList of the code assignments of the values of the variable oDecoding back to original variables when reporting. DATA CODING...
  • 209. Coding conventions • Common responses should have the same code in each question, as this minimizes mistakes by coders. • For example: • Yes (or positive response) code - Y or 1 • No (or negative response) code - N or 2 • Don’t know code - D or 8 • No response/unknown code - U or 9
  • 210. 3. Data categorizing • Decisions have to be made concerning how to categorize responses. • For categorical variables that are investigated through closed questions or observation, the categories have been decided upon beforehand. • In interviews the answers to open-ended questions (for example, ‘Why do you visit the health centre?’) can be pre-categorised to a certain extent, depending on the knowledge of possible answers that may be given.
  • 211. • However, there should always be a category called ‘Others, specify . . .’, which can only be categorised afterwards. • For numerical variables, the data are often better collected without any pre-categorisation. • If you do not exactly know the range and the dispersion of the different values of these variables when you collect your sample. Example: Home-clinic distance for out-patients,  income Age Weight
  • 212. 4. Data cleaning • Once data is entered, the second step is data cleaning • Data cleaning is a process of similarizing data entered in a computer (soft copy) with that of the hard copy on a paper • The aim of this process is to produce a clean set of data for statistical analysis. • Checking for errors, impossible or implausible values and inconsistencies that may be due to coding or data entry process. • No matter how carefully the data have been entered, some errors are inevitable.
  • 213. DATA CLEANING…  Errors can result from: • Incorrect reading • Incorrect reporting • Incorrect filling • Incorrect sensing • Incorrect coding • Incorrect typing • Incorrect etc.
  • 214. • Data cleaning goes at three occasions, 1. During template formation 2. During data entry 3. After data is entered (the more we use combination of the above cleaning process, the more valid will be our data) DATA CLEANING…
  • 215. I. Cleaning during template formation • It is programming of data during template formation • Program is formed • by limiting values that enter within a variable • By looking for consistency of values • By providing good skipping pattern • By controlling for must enter • By making the computer to calculate and see for consistency, etc
  • 216. II. Cleaning during data entry a. Using two computers • It is when we use two data clerks with two computers • Data entered by the two computers are validated for similarity • When there is difference, a correction measure (based on the hard copy) is taken
  • 218. b. Double entry using a single computer • It is also possible to do double entry using the same computer • In EPI data version 3.1 • When there is change in the second entry, pop sound is heard, and a corrective measure can undertake • Counter checking entered data by principal investigator is also another method Other cleaning is by • Trying to counter check 5 to 10 % of daily entering data is useful
  • 219. III. Cleaning after data entry is completed • It is by making • Simple frequency, • Tabulating variables for consistency, and • Sorting (in SPSS) • Out layers and missing values are usually evaluated (against hard copy) • Giving serial number for the hard and soft copy makes things simple
  • 220. Analysis of epidemiological study • Quantitative data analysis is making sense of the numbers to permit meaningful interpretation It involves: 1. organizing the data 2. doing the calculations 3. interpreting the information • lessons learned 4. explaining limitations
  • 221. Analysis of epidemiological study… Prerequisites for analysis 1. More acquainted to the objectives of study 2. Knowledge of type of variables (dependent/ independent) 3. Knowledge of measurement of variables 4. Knowledge of type of analysis needed for each objectives (and designs) 5. Knowledge of statistics to be done 6. Selection of statistical software for analysis
  • 222. Aware of study objectives • A research is made principally to answer study questions • Our: • Results should answer the objectives (study questions) • Discussion should interpret what it mean by the results answering the objectives • Conclusion should be based on the answer to the objectives • Recommendation also should be based on finding but not on wish
  • 223. Knowledge of types of analysis and study design • Each study designs has a distinct type of analysis • For descriptive design analysis may be based on data summary (point estimate), and parametric measurement (confidence interval measure) • For analytic studies, analysis is based on comparison
  • 224. Components of Data Analysis Data processing • Data entry • Coding • Cleaning • Descriptive /exploratory • Frequencies, • Tables and graphs • Cross tabulations (chi-squares, spearman’s correlation…) • Measures of central tendency and variations • Proportions/percentages • Analytic /inferential • Estimation • Confidence intervals (P-value, OR,…) • Hypothesis testing • Statistical models Tadesse A., 2013
  • 225. Statistical Inference • Depending on different factors, there are a number of statistical models which will be appropriate for the data we have in hand • These are: • Objective of the study • Study Design • Nature of the variable • Distribution of the variable • The nature of the data • Sample size • The number of group we want to compare 225
  • 226. Different t-tests • If we want to compare two independent groups whether there is significant difference or not: • Independent sample t-test • If our aim is to compare two dependent groups (measurement before treatment and after treatment, two measurements taken from each individuals in a group, …) • Paired sample t-test 226
  • 227. Regressions • Linear regression • If the response variable (y) is continuous Simple linear regression y = 0 + 1 x +  Multiple linear regression 227 ιj ε ι ι β ... 2 2 β 1 1 β α       x x x y
  • 228. Regression… • When the response variable is categorical • Logistic regression Binary logistic regression Bivariate logistic regression Multiple logistic regression • Analysis of variance (ANOVA) • Survival analysis 228 i i 2 2 1 1 x β ... x β x β α P - 1 P ln           βx α ) ( log P - 1 P ln          p it