Critical Appraisal of systematic review and meta analysis articles

Critique of Systematic
Review and
Meta-Analysis Articles

Objectives
By the end of this lecture, you should be able to know:
 How to appraise the validity of systematic review and meta-
analysis articles.
 How to interpret the results of meta-analysis.
 How to apply article results on your patients.

What is Critical Appraisal?
It is a process of systematic examination of research article to
assess its validity, results and applicability.
The three key steps in critically appraising an
article are:
1. Was the study valid?
2. What are the results?
3. Are the results applicable to our patients?

B
C
A
A: Narrative Review
B: Systematic Review
C: Meta-analysis
General Concepts
What do you understand by:
1. Narrative Review?
2. Systematic Review?
3. Meta-analysis?

Narrative Review:
▪ Collecting all secondary data regarding specific topic
WITHOUT any search strategy criteria of collecting or
appraising of data.
▪ Some considered it as a new name for literature review!!

Systematic Review (SR):
It is a result of:
Independent systematic strategic searching method and appraisal of
all secondary data that answers the same clinical question of specific
topic obtained from the best available type researches.
Tip:
Systematic Review of therapy should be obtained from RCT only,
since RCT is the best primary design for therapeutic experiment.

Meta-analysis:
It is the overall pooled result and statistical quantitative
method of Systematic Review for the purpose of integrating
the findings.
Tip:
You can’t do Meta-analysis without doing Systematic Review first!!

Validity
Results
Apply it
Is the study done in a correct way?
(Methodology Section)
Are results significant?
(Results Section – Forest plot)
Is this study generalizable?
Critique of Systematic Review and Meta-analysis

Critical Appraisal Toolkit (CAT)

Did the overview address a “focused clinical
question”?
RABI

Right question
▪ PICOTT format for RCT systematic
review → key words.
▪ PEO format for non-RCT systematic
review → key words.
RABI
Patient population
Exposure
Outcome
P
E
O
Patient population
Intervention or Issue
Comparison (optional)
Outcome of interest
Type of study (optional)
Timeframe (optional)
I
C
O
T
T
P

Practice You have come across this article in PLOS ONE Journal, and
you want to check its validity.

SR & MA CAT:
Did the overview address a
“focused clinical question”?
PICOTT
Searching with
key words

Were the “criteria” used to select articles
inclusion appropriate?
RABI

Criteria should:
• Specify patients, exposures and outcome.
• Methodological search strategy: Look at graph (Quality
of Reporting of Meta-analyses "QUOROM" flow chart)
and see based on what they excluded some articles.
CRiteria
RABI

SR & MA CAT:
Were the “criteria” used to
select articles inclusion
appropriate?

Methodological
search strategy
“QUOROM
Chart”
SR CAT:
Were the “criteria” used to
select articles inclusion
appropriate?

Is it unlikely that important relevant studies
were “missed”?
RABI

1. Database: at least 2 well-known databases (Medline, Embase,
Scopus, Medicus, Cochrane… etc.) using PICOTT/PEO key words.
2. Manual searching: of non-electronic articles.
3. References: searching within each reference of each article.
4. Unpublished data: like conferences, unpublished papers… etc. Use
this website to check all registered studies: https://clinicaltrials.gov
Article search
RABI
Things to consider:
▪ Consider searching all old studies until the time of working on this systematic review.
▪ No language restriction during searching for articles.

SR & MA CAT:
Is it unlikely that important
relevant studies were
“missed”?
Didn’t search other
languages. Didn’t
search for
unpublished
articles. Didn’t do
manual searching.
All old studies are
searched until the
time of doing SR

Was the validity of the included studies
“appraised”?
RABI

It is quality and methodological assessment for each article.
The widely used scores to assess the quality of RCT articles are:
▪ Cochrane Review Criteria
▪ Jadad Score
▪ Modified Jadad Score
The widely used score to assess the quality of non-RCT articles is:
▪ Modified AHRQ (Agency for Healthcare Research and Quality)
score.
Appraised articles
RABI

RABI
At least should
score 50/100
Cochrane Review
Criteria

RABI
At least should
score 3/5
Jadad Score

RABI
At least should
score 4/7
Modified Jadad Score

RABI
At least should
score 50/100
Modified AHRQ
score

SR & MA CAT:
Was the validity of
included studies appraised?
Modified
Jadad Score

Were the assessments of studies
reproducibility “blinded”?
RABI

This should be done by at least 2 independent blinded reviewers.
Disagreement should be resolved by:
• Inter-observer agreement (consensus, vote, 3rd party).
• Cohen’s Kappa test (Kappa value should be > 60%).
Blind assessment
RABI
Each reviewer must search and appraise the articles independently
and blindly to other reviewer searching and appraising results.

Calculated using SPSS
Cohen’s Kappa Test

SR & MA CAT:
Were the assessment of
studies reproducibility
blinded?
Nothing mentioned
about blinded
assessment

Were the results similar from study to
study?
RABI

This is done by checking for the presence of heterogeneity.
Ideally, we want all studies to be homogenous in order to
pool all results to do meta-analysis from systematic review.
Heterogeneity can be:
1. Clinical heterogeneity (normal variation between studies like
different drug, dose, assessment tool, patients criteria… etc.).
2. Statistical heterogeneity (true heterogeneity that affects the
meta-analysis results).
Inconsistency
RABI

RABIInconsistency
Statistical HeterogeneityClinical Heterogeneity

It is expected variation heterogeneity
between studies (different drug,
dose, patients, assessment…etc.).
Here, use your clinical sense whether
to accept this variation or not.
It doesn’t matter if it is green, red or yellow
apple. As long it is apple, then we can mix it with
other. It is like the meta-analysis that concluded
ACEIs are good for IHD; however, some recruited
studies used Perindopril, some used Enalapril
and some used Captopril… etc.
RABIInconsistency
Clinical heterogeneity

RABIInconsistency
The different
assessment tool
from study to study
may be a cause of
heterogeneity of SR
Clinical heterogeneity

The true heterogeneity that affects meta-analysis results.
The checking for statistical heterogeneity can be done by:
1. Visual (Eye-ball or imaginary line way).
2. Confidence Interval (CI) numbers.
3. Statistical testing for heterogeneity (p-value, I2, X2).
RABI
Statistical heterogeneity
Inconsistency

The visual way
It is done by looking into forest plot by drawing an imaginary
line running vertically from the "pooled estimate – overall
estimate" result (diamond shape) upward. If it crosses all
confidence interval (CI) lines, then trials are homogenous.
RABIInconsistency

The visual way
RABIInconsistency

The CI numbers way
When all lower-end CI numbers of every trial are lesser than
upper-end CI numbers of all other trials, then the trials are
homogenous.
RABIInconsistency

The statistical ways
1. Cochrane Q Chi2 (X2) test:
▪ If X2 > degree of freedom  Heterogeneous
▪ If X2 ≤ degree of freedom  Could be homogeneous!!
[degree of freedom (df) = number of studies – 1]
2. p-value of X2 test:
▪ If p < 0.10  Heterogeneous.
▪ If p ≥ 0.10  Could be homogenous!!
RABIInconsistency
These tests have limited discriminatory power. So you can not relay on them to
discriminate homogeneity from true heterogeneity. You should check other statistic.

The statistical ways
3. I2 statistic test: It measures the heterogeneity in 4 levels:
▪ If I2 1% - 25%  mild heterogeneity.
▪ If I2 26% - 50%  mild to moderate heterogeneity.
▪ If I2 51% - 75%  moderate to high heterogeneity.
▪ If I2 > 75%  high heterogeneity.
RABIInconsistency
General speaking:
▪ If I2 < 50%  Considered homogenous.
▪ If I2 ≥ 50%  Considered heterogeneous.

Practice
RABIInconsistency
What do you think, are these studies homogenous?
Drawing imaginary line
from “overall estimate”
does NOT cross all CI lines
→ heterogenous
One lower-end CI (1.94) is bigger than
two upper-end CI (1.63 and 1.37) →
heterogenous
X2 > df → heterogenous
p-value < 0.10 → heterogenous
I2 ≥ 50% → heterogenous

1. Change the measurement way of outcome: like calculating RR
instead of OR and vise versa.
2. Meta regression: is a collection of statistical procedures to assess
heterogeneity.
3. Not pooling the results: by just doing systematic review without
meta-analysis.
4. Ignoring the heterogeneity: when all CI of all studies lie on the
same side from “overall estimate line”, then we can do meta-
analysis as if there was no heterogeneity.
RABIInconsistency
How to overcome heterogeneity

5. Sub-group analysis (Post-Hoc): when we think there are
certain groups will have more benefit or harm than others
by the intervention, we may plan for sub-group analysis for
these groups. However, it might cause imbalance in
prognostic factors between sub-groups.
RABIInconsistency

6. Sensitivity analysis (worst case scenario): by removing all
heterogeneous studies and then pool the results “i.e. do
meta-analysis”. If the new pooled result lies within the
initial pooled result confidence interval (i.e. before doing
sensitivity analysis), then we can accept our initial pooled
results with heterogeneity.
RABIInconsistency

Types of Biases in Systematic
Review and Meta-Analysis Article

Selection Bias
When there is no RAB; i.e. no Right clinical question, no
inclusion or exclusion cRiteria, no proper Articles searching,
no proper Appraisal and no Blinded independent reviewers.
Biases
Publication Bias
Happens when journals and authors only published articles
with outcome of interest.
It can be detected through funnel plot.

What are the overall results of the
Systematic Review and Meta-analysis?
Results

Results
Relative risk: used to detect if either the exposure causes risk
or benefit outcome (< 1  decrease the outcome, = 1  no
different, > 1  increase the outcome).
𝑅𝑅 =
𝐸𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 𝑒𝑣𝑒𝑛𝑡 𝑟𝑎𝑡𝑒
𝐶𝑜𝑛𝑡𝑟𝑜𝑙 𝑒𝑣𝑒𝑛𝑡 𝑟𝑎𝑡𝑒
Relative Risk (RR)
Calculating the point estimates

Results
Odds ratio: used to study the association between exposure
and outcome (< 1  no association, = 1  no different, > 1 
there is association).
𝑂𝑅 =
[𝐸𝑣𝑒𝑛𝑡𝑠 𝑖𝑛 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝] × [𝑁𝑜 𝑒𝑣𝑒𝑛𝑡𝑠 𝑖𝑛 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 𝑔𝑟𝑜𝑢𝑝]
[𝑁𝑜 𝑒𝑣𝑒𝑛𝑡𝑠 𝑖𝑛 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝] × [𝐸𝑣𝑒𝑛𝑡𝑠 𝑖𝑛 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 𝑔𝑟𝑜𝑢𝑝]
Odds Ratio (OR)
Odds ratio usually overestimate the risk compared to
relative risk.

Results
Weighted Mean Difference: used for continuous outcome.
𝑊𝑀𝐷 =
[𝑀𝑒𝑎𝑛 𝑜𝑓 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 𝑔𝑟𝑜𝑢𝑝] × [𝑀𝑒𝑎𝑛 𝑜𝑓 𝑐𝑜𝑛𝑡𝑟𝑜𝑙 𝑔𝑟𝑜𝑢𝑝]
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑠𝑖𝑜𝑛
Weighted Mean Difference (WMD)
Weighted Mean Difference (WMD) in some articles named
Standard Mean Difference (SMD).

Doing Meta-Analysis
(Pooling All Results into One Overall Estimate Result)

1. Fixed Effect Model (FEM): by assuming there is only ONE true
variance within studies (by only chance). This model is ideally used
if there is no heterogeneity between studies. Example of FEM is
“Mantel-Haenszel” model.
2. Random Effect Model (REM): by assuming there are TWO variances
within studies (chance) and between studies (true heterogeneity).
This model can be used on any degree of heterogeneity between
studies. Example of REM is “DerSimonian”, “Laird” and “Mantel-
Haenszel” models.
Calculating the overall estimate
(pooling results into meta-analysis)

There was heterogeneity, and they tried to overcome it by sub-group analysis and
sensitivity analysis in order to pool results into meta-analysis using random effect model
Do you think there is heterogeneity?
How did they overcome it? How did
they measure overall estimate?

How precise were the results?
Results

Results
1. Confidence Interval (CI): Precision is mainly measured by
Confidence Interval (CI). The narrower the CI, the more
precise the results are.
2. Homogeneity: Being homogenous gives more precision to
the results.
Precision of results

Important Graphs in Systematic
Review and Meta-Analysis

Funnel Plot
It is visual aid to assess
for publication bias if
there is asymmetrical
distribution around
pooled result line
(overall effect line).

▪ Y-axis line represents the
standard error of the
mean. The lesser standard
error, the larger the sample
size; i.e. the upper most
studies in funnel plot are
the bigger in sample size.
▪ X-axis line represents the
estimated effect size (OR,
RR, WMD).
Funnel Plot

▪ The middle vertical line of
the funnel plot represents
the "pooled estimate"
result (overall effect).
▪ Scattered dots represent
individual studies’ results.
▪ Dash lines represent 95%
CI. The more the distance
of the dash lines to overall
effect line, the wider the CI
Funnel Plot

Detecting publication bias:
1. Asymmetry: Look for
asymmetrical distribution of
studies “dots” around the
overall effect line (subjective).
2. Egger’s and Begg’s tests:
They are statistical tests to
check for publication bias.
Significant result (p < 0.05)
means there is publication
bias (objective).
Funnel Plot

Causes of publication bias:
1. Poor searching strategy in the
systematic review.
2. Poor studies quality, or having
non-interesting results to the
journals, funders or authors
that prevent their publication.
3. True asymmetry due to
heterogenous studies (need to
do sensitivity analysis).
4. By chance!
Funnel Plot

What do you think, is there
any publication bias?

Blue squares "point estimate" represent OR, RR, or WMD of each study. Their
sizes represent the weight of each study in the review. If any square lies on
"favors treatment" side  Treatment is more effective in the study group and
vise versa.
Forest Plot
𝑊𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝑠𝑡𝑢𝑑𝑦 =
𝑇𝑜𝑡𝑎𝑙 𝑒𝑣𝑒𝑛𝑡𝑠 𝑖𝑛 𝑠𝑡𝑢𝑑𝑦
𝑇𝑜𝑡𝑎𝑙 𝑒𝑣𝑒𝑛𝑡𝑠 𝑖𝑛 𝑎𝑙𝑙 𝑠𝑡𝑢𝑑𝑖𝑒𝑠

▪ The horizontal lines represent the 95% CI of this estimate. The narrower the
CI, the more precision.
▪ The vertical line represents the "no effect line". It equals to 1 if "point
estimate" results are presented by RR or OR, and 0 if results are presented by
WMD.
Forest Plot

▪ If the horizontal line (i.e. the 95% CI) for any trial doesn’t cross the "no effect
line", there is a 95% chance there is significant real difference between two
groups.
▪ If the horizontal line (i.e. the 95% CI) for any trial crosses "no effect line", it
means there is no significant difference between the two groups.
Forest Plot

Diamond shape represents the "pooled result – overall effect" of the review
(meta-analysis result). The left and the right edges of the diamond represent the
beginning and the end of the 95% CI of the pooled result respectively.
Forest Plot

▪ If the diamond shape pooled result lies on the side that favors treatment 
Treatment is better than control and vise versa.
▪ If the diamond shape pooled result crosses the "no effect line"  No
significant difference in the outcome between the two groups.
Forest Plot

▪ Heterogeneity is presented by X2, p-value and I2 at the bottom of forest plot.
▪ The type of model used to calculate overall estimate for meta-analysis will
be mentioned under the point estimate type. Here it is random effect model
by Mantel-Haenszel model (M-H).
Forest Plot

Applicability
Can I apply the study results on my patients?
Check if the study population are similar to your patients.
Similarity

Do benefits overweight risks
in applying the results?
Check for the important side
effects of this intervention.
You need to consider NNT
and NNH to compare overall
estimates if it was SR of RCT.
Benefits
Risks
Applicability

Is its cost feasible?
How much will it cost?
Is it affordable to my
patients?
Applicability

Critical Appraisal of systematic review and meta analysis articles

Critical Appraisal of systematic review and meta analysis articles

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Critical Appraisal of systematic review and meta analysis articles

Similar to Critical Appraisal of systematic review and meta analysis articles (20)

More from Dr. Majdi Al Jasim

More from Dr. Majdi Al Jasim (12)

Recently uploaded

Recently uploaded (20)

Critical Appraisal of systematic review and meta analysis articles