Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Application of Secondary Data in Epidemiological Study, Design Protocol and Statistical Analysis
1. Application of Secondary
Data in Epidemiological Study,
Design Protocol and Statistical
Analysis
MOHAMMAD ASLAM SHAIEKH
MPH, 3RD BATCH
SCHOOL OF HEALTH & ALLIED
SCIENCE (SHAH)
POKHARA UNIVERSITY (P.U.)
1
2. Contents:
Overview of Secondary data
Application of Secondary Data in Epidemiological Study
Secondary Data Analysis: Design and Protocol
Statistical Analysis
2
4. Primary & Secondary Data:
4
Secondary data means data that are already available i.e., they refer to
the data which have already been collected and analyzed by someone
else. Secondary data is generally referred to as outcome data
Data collected by researchers for specific purpose is primary data
Data collected by someone else for other purpose is secondary data
The secondary data are readily available from the other sources and as
such, there are no specific collection methods.
Secondary data are also helpful in designing subsequent primary research
and, as well, can provide a baseline with which to compare your primary
data collection results.
6. Advantage of Secondary Data:
6
Saves time, cost and efforts
Easy to access (Accessibility)
Clarification of Research question
Speedily collection
Availability
Flexibility (Gives Supplementary Information)
Helpful in Hypothesis formulation and testing
7. Disadvantages of Secondary Data:
7
Incomplete and out dated Information
Data Collected may not be suitable for the researcher’s purposes (Validity)
All necessary data may not be available in existing data
Original data set may not be accurate (Accuracy)
Requires time to search for data set (Multi variables)
Helps in Comparative analysis.
Helps to define the populations.
9. Secondary Data Analysis:
Analysis of data collected by someone else.
Analysis of secondary data, where “secondary data can include
any data that are examined to answer a research question than
the question(s) for which the data were initially collected
In contrast to primary data analysis in which same individual/team
of researcher design, collects and analyze the data.
Types of secondary data analysis: Documentary data (written &
non written), Survey based data and multiple source secondary
data analysis.
9
10. Steps in Secondary Data Analysis
Determine your research question (what are you
looking for): Identifying the Subject domain
Locate the data: Gathering the Information
Evaluate relevance of data: Comparing data
from different Sources
Analysis of data:
10
12. Evaluating the quality of Secondary Data
information Sources:
Determine the Original Purpose of the Data Collection
Attempt to Ascertain the Credentials of the Source(s) or
Author(s) of the Information
What’s the Date of Publication?
Who is the Intended Audience?
What is the Coverage of the Report or Document?
Importantly, Is the Document or Report Well-Referenced?
12
13. Application of Secondary Data in
Epidemiological Study:
Hypothesis generation and testing
Study Disease distribution and Cause & effect relationship
Helps to study the natural history of diseases.
Help to identify and define the Problems
Pilot Data for Grant Proposal
Develop an approach to the problems
Formulate an appropriate research design and Publications.
Answer the certain research question and test the hypothesis
Demand estimation
Helps to monitor the program and activities
13
14. Careful Handling of Secondary data
Determine the coding of missing data
Determine whether the same construct is being measured across
time
Interview question may be modified across time
Respondents may changed overtime
Different scale may be used over time
Check frequently for errors and updated overtime
Always use the most up-to-dated files.
14
16. Preparing of secondary data analysis:
Document everything (Save all syntax and files)
Transfer all potential data have to analyse
Address missing data
Recode variables
Create new variables
Start analysis and interpretations.
16
17. Advantages:
Save time and Money
Larger samples that are more representative of the target population
(greater external validity!)
Oversampling of low prevalence groups/behaviors allows for increased
statistical precision
Datasets often contain considerable breadth (thousands of variables)
Advantages and Disadvantages of
Secondary Data Analysis
17
18. Disadvantages:
Data may not facilitate particular research question
Information regarding study design and data collection procedures
may be scarce.
Data may Potentially lack depth information
Concern of Reliability and Validity of Data
Study Design and Measurement Model may be different as requirement
of researcher.
Advantages and Disadvantages of
Secondary Data Analysis
18
19. Study design : The type of study should be described and the reasons for selecting
it provided. Reasons should also be given why the data body in question is
considered to be a suitable basis for analyses in terms of the study design.
Study participants / database : Secondary data analysis should relate to one
study population, which is selected on the basis of a critical analysis of the
purpose of the data survey and the quality, reliability and validity of the data used
as well as the generalizability of the results.
Preventing bias, internal validity : Any potential bias in the results, which may arise
from selection and/or confounding, should be countered as early as the planning
stage in the case of studies based on secondary data. In secondary data analysis,
this can be achieved by matching individuals or groups or by taking account of
information required to control confounding disturbance variables.
The Requirement to Design the
Analysis Protocol:
19
20. Representativity, generalizability, external validity : Analogously to minimizing
the non-participation rate in primary data analyses, the aim in secondary
data analyses should be to achieve as high as possible generalizability for the
basic population studied.
Variables : A secondary data analysis must take into account the accuracy
and completeness of the features to be studied and any potential
disturbance variables in the primary data. This includes the description and
analysis of all variables (fields) used and the context in which data was
surveyed
Scope of the study: The protocol should state the rationale for the scope of
the study. In particular, quantitative estimates of statistical validity should be
made in analyses of rare events or those involving smaller target populations
to define the population sizes required (feasibility analysis).
The Requirement to Design the
Analysis Protocol:
20
21. Operations manual : To supplement the protocol, all organizational
stipulations for preparing for and conducting secondary data analysis and
their step-by-step execution should be documented in an operations
manual. This includes data provision by the data owners, data transfer to
secondary users and data preparation by the latter.
Resources : Data owners and secondary users should provide sufficient
resources in terms of time and personnel for the study. This applies equally
to data provision, the preparation, analysis and presentation of the results,
as well as to the necessary communication and discussion within and
between participating sites.
The Requirement to Design the
Analysis Protocol:
21
22. Guiding Protocol for Secondary Data
Analysis
Producing a protocol before the start of secondary data analysis is an essential
methodological condition for quality.
The protocol is composed of the most important information required for submitting
applications in relation to the study, for evaluating the study as a research proposal
and for conducting it.
In the context of secondary data analysis, the protocol should consist of the following:
The explicit question to be addressed and working hypotheses,
Type of study
Database
Scope of the study with reason
22
23. Guiding Protocol for Secondary Data
Analysis
Inclusion and exclusion criteria applied to define the data body
Specifying suitable variables within the data in question
Specifying suitable variables within the data in question
Concept for data provision and transfer as well as for archiving raw and
analyzed data sets
Analysis strategy including statistical methods
Quality assurance procedures, - Measures to ensure data protection
and ethical principles
Timetable setting out responsibilities.
23
24. Guidelines in Secondary Data Analysis
Guideline 1: Ethics : Secondary data analyses must be conducted in
accordance with ethical principles and respect human dignity as well as
human rights.
Guideline 2: Research Question : Planning each secondary data analysis
requires posing explicit questions that can actually be answered. These
questions must be worded as specifically and precisely as possible. The
population groups to be studied must be selected for reasons that relate to the
research question.
Guideline 3: Protocol : A detailed and binding protocol which sets out the study
characteristics in writing is essential to secondary data analysis.
24
25. Guidelines in Secondary Data Analysis
Guideline 4: Sample Databases : In many epidemiological studies, it is essential
or useful to set up a biological sample database. The documented consent of
all subjects is required for this and for the current and anticipated future
utilization of samples
Guideline 5: Quality Assurance : In secondary data analysis, associated quality
assurance of all relevant instruments and procedures should be undertaken.
Pretesting
Adapting the Protocol
25
26. Guidelines in Secondary Data Analysis
Guideline 6: Data Preparation : A detailed system must be set up in advance for
capture and storage of all the data surveyed during the study and for the preparation,
plausibility testing, coding and provision of the data.
Data Survey and transfer
Baseline Data Sets- The baseline data set transferred by the data owner should be
available in unchanged form over the whole period of secondary data analysis. The
retention period specified in Guideline 7 applies to the reproducibility of the analyses.
Data Description
Data Quality
Plausibility Checks
Practicability
Analysis data sets
26
27. Guidelines in Secondary Data Analysis
Guideline 7: Data Analysis :
Suitable methods should be used to analyse secondary data and
Analysis should be conducted without unnecessary delay.
The hypotheses to be tested in the context of secondary data analysis
must be formulated before the start of the study, as must the decision
criteria to be applied in these tests.
It must take the accuracy of measurement and completeness of the
data into account.
27
28. Guidelines in Secondary Data Analysis
Guideline 7: Data Analysis :
The Secondary data analysis requires the analysis strategy to be planned in
accordance with the available data.
Analysis plan : Data should be analyzed in accordance with an analysis plan produced
in advance, on the basis of the current state of epidemiological, statistical or
methodological knowledge.
Personal responsibility
Interim analyses
Checking the results : The analyses of the results of secondary data analyses should be
counterchecked before publication. The analysis strategy, analyses and their results
should be reproducible by third parties.
28
29. Guidelines in Secondary Data Analysis
Guideline 8: Data Interpretation : Interpretation of the research results of a secondary
data analysis is the task of the author(s) of a publication. All interpretation is based on
critical discussion of the methods, data and results of the author’s own study in the
context of the available evidence.
Guideline 9: Data Protection :
All analyses should be documented in such a way that outsiders, either persons or
institutions, can understand and reproduce the actual analyses and their results. The
data and programmes on which the analyses are based should then be archived in
fully reproducible form.
All persons who deal with personal data in connection with a research project must be
informed of the content, scope and capacity of the relevant legal provisions.
29
30. Guidelines in Secondary Data Analysis
Guideline 9: Dissemination & Public Health Interventions
Secondary data analyses, which aim to translate results into effective health measures,
should include the population groups affected in an appropriate way and aim to
achieve qualified risk communication with interested parties in public life.
Secondary data analyses may deal with the assessment of health system structures and
services or the implementation and evaluation of measures relevant to health.
According to the professional opinion of the secondary users, further action is needed
as a result of the secondary data analysis, this can be explicitly stipulated in the form of
a recommendations
Secondary users can also produce recommendations on a sound professional basis to
the data owners for making information available to the public and can contribute to
technical implementation.
30
31. Session-iii
Statistical Analysis of Secondary Data: Bias Analysis
a) Propensity Score Matching (Covariate adjustment using
the propensity score, stratification on the propensity
score, Propensity score Matching
b) Sensitivity Analysis
c) Instrumental Variable Analysis
31
32. Propensity Score Matching (PSM)
Propensity Score :is the probability that a unit with certain characteristics will be
assigned to the treatment group (as apposed to Control group). The score can be
used to reduce or eliminate Selection bias in observational studies by balancing
Covariates (the characteristics of participants) between treated and control groups.
When the covariates are balanced, it become much easier to match participants
with multiple characteristics.
Propensity Score Matching (PSM): PSM creates sets of participants for treatment and
control groups. A matched set consists of at least one participant in the treatment
group and one participant in control group with similar propensity scores. The goal is to
approximate a random experiment, eliminating many of the problems that come with
observational data analysis.
Matching is not only the way of controlling confounding, other popular method
includes stratifications, regression adjustment and weighting.
32
36. Sensitivity (Positivity in Disease)
Analysis:
Before Entering into the real analysis of Sensitivity we have o know the
Specificity, Positive predictive value (PPV), Negative predictive value (NPV),
Percentage of false positive (FP), Percentage of false negative (FN),
prevalence of the disease and positive and negative likelihood ratio,
validity and reliability. Because these all indicator influence the sensitivity
analysis.
Sensitivity is the ability of test to correctly classify an individual as diseased.
Sensitivity = True Positive/True Positive + False Negative {a/(a + c)}
Probability of being test positive when disease present
36
37. Specificity: Ability of a test to correctly classify an individual as disease free is called test’s
specificity.
Specificity = True Negative/True Negative + False Positive (d/b+d)
Probability of being test negative when disease absent.
Positive Predictive Value: % of patients with positive test who actually have the disease.
PPV= True Positive/ True Positive + False Positive (a/a+b)
Probability of patient having disease when test is positive.
Negative Predictive Value: % of patient with negative test who do not have the disease.
NPV = True Negative/False Negative + True Negative (d/c+d)
Probability of patient not having disease when test is negative.
37
39. Sensitivity Analysis
The technique used to determine how independent variable values will impact a
particular dependent variable under a given set of assumptions is defined as sensitive
analysis
It is also known as the what – if analysis and factor analysis.
It helps in analyzing how sensitive the output is, by the changes in one input while
keeping the other inputs constant
Sensitivity analysis works on the simple principle: Change the model and observe the
behavior.
Sensitivity analysis is one of the tools that help decision makers with more than a solution
to a problem. It provides an appropriate insight into the problems associated with the
model under reference. Finally the decision maker gets a decent idea about how
sensitive is the optimum solution chosen by him to any changes in the input values of one
or more parameters.
39
40. Measurement of Sensitivity Analysis
Below are mentioned the steps used to conduct sensitivity analysis:
Firstly the base case output is defined; say the NPV at a particular base case input
value (V1) for which the sensitivity is to be measured. All the other inputs of the
model are kept constant.
Then the value of the output at a new value of the input (V2) while keeping other
inputs constant is calculated.
Find the percentage change in the output and the percentage change in the input.
The sensitivity is calculated by dividing the percentage change in output by the
percentage change in input.
The conclusion would be that the higher the sensitivity figure, the more sensitive the
output is to any change in that input and vice versa.
40
41. Methods of Sensitivity Analysis
There are different methods to carry out the sensitivity analysis:
Modeling and simulation techniques
Scenario management tools through Microsoft excel
There are mainly two approaches to analyzing sensitivity:
Local Sensitivity Analysis
Global Sensitivity Analysis
Local sensitivity analysis : Local sensitivity analysis is a one-at-a-time (OAT) technique that
analyzes the impact of one parameter on the cost function at a time, keeping the other
parameters fixed.
Global Sensitivity Analysis : is the second approach to sensitivity analysis, often implemented
using Monte Carlo techniques. This approach uses a global set of samples to explore the design
space.
41
42. Types of Sensitivity Analysis:
Differential sensitivity analysis: It is also referred to the direct method. It involves
solving simple partial derivatives to temporal sensitivity analysis. Although this
method is computationally efficient, solving equations is intensive task to handle.
One at a time (OAT)Sensitivity Measures: It is the most fundamental method with
partial differentiation, in which varying parameters values are taken one at a
time. It is also called as local analysis as it is an indicator only for the addressed
point estimates and not the entire distribution.
Factorial Analysis: It involves the selection of given number of samples for a
specific parameter and then running the model for the combinations. The
outcome is then used to carry out parameter sensitivity.
42
43. Types of Sensitivity Analysis:
Through the sensitivity index one can calculate the output % difference
when one input parameter varies from minimum to maximum value.
Correlation analysis : helps in defining the relation between
independent and dependent variables.
Regression analysis : is a comprehensive method used to get responses
for complex models.
Subjective sensitivity analysis: In this method the individual parameters
are analyzed. This is a subjective method, simple, qualitative and an
easy method to rule out input parameters.
43
44. Use of Sensitivity Analysis:
The key application of sensitivity analysis is to indicate the sensitivity of simulation
to uncertainties in the input values of the model.
They help in decision making
Sensitivity analysis is a method for predicting the outcome of a decision if a
situation turns out to be different compared to the key predictions.
It helps in assessing the riskiness of a strategy.
Helps in identifying how dependent the output is on a particular input value.
Analyses if the dependency in turn helps in assessing the risk associated.
Helps in taking informed and appropriate decisions
Aids searching for errors in the model
44