SlideShare a Scribd company logo
1 of 29
Download to read offline
Common statistical pitfalls & errors
in biomedical research (a top-5 list)
Evangelos I. Kritsotakis
Assoc. Prof. of Biostatistics, Med. School, University of Crete
Honorary Senior Lecturer, ScHARR, University of Sheffield
e.kritsotakis@uoc.gr
10.06.2023
Outline and disclaimer
Top-5 list of common statistical pitfalls leading to errors, related to:
 Normality
 Time confounding
 Linearity
 Clustering
 Calibration
 This is a personal view based on my experience as a reader, reviewer, and
editor of medical journals,
o might be incomplete and biased, but hopefully will be useful.
 These problems are well known to statisticians and methodologists, but
they continue to appear in medical journals.
 Makes sense to summarize the data with median and IQR (rather than mean ± SD).
 Most researchers would apply a non-parametric test (e.g. Mann-Whitney U-test).
 But the t-test will work fine in this situation!
 In fact, is more appropriate and informative to use the t-test than non-parametrics.
NORMALITY: Who is afraid of non-normal data?
Data from the HELAS cohort of emergency laparotomies:
serum albumin
blood urea nitrogen
NORMALITY: Who is afraid of non-normal data?
The t-test, and thus linear regression, are NOT afraid of non-normal data!
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
http://www.youtube.com/watch?v=tHU0_-Jzg34
 t-test assumes Normality per group,
so that sample means are Normally
distributed.
but
 By the central limit theorem, the
sample means will approximate to
the Normal distribution when the
sample size increases, regardless of
the distribution of the original
observations
NORMALITY: Who is afraid of non-normal data?
The t-test, and thus also linear regression, are NOT afraid of non-normal data!
Rules of thumb for the t-test:
 n < 25 per group, the data must be normally distributed to use the t-test.
 n > 25 per group, no extreme outliers, can handle moderately skewed distributions
 n > 200 per group, t-test robust to heavily skewed distributions
When should you use a non-parametric test?
• n < 25 per group (as it is very difficult to confirm normality)
Eur J Endocrinol 2020;183(2):L1-L3.
Please DO NOT perform statistical tests for normality !
(e.g. Kolmogorov–Smirnov or Shapiro–Wilk tests)
NORMALITY: Applying non-parametrics in large samples - PITFALL
Parametric vs. non-parametric tests:
t-test vs. Wilcoxon-Mann-Whitney test
Rejection rates (p < 0.05) of the WMW and t-tests
after 10 000 replications
Data drawn at random from skewed gamma
distributions (Skewness coef. = 3), with equal
means and medians, 𝑆𝐷1 = 1.1 × 𝑆𝐷2
BMC Med Res Methodol 2012;12:78.
FOLLOW UP TIME: frequently variable and/or incomplete
• Patients entering a trial my have different
times of follow up.
• Not all patients will experience the event
of interest by end of data collection.
• Times to outcome event (endpoint) are
incomplete (right censored).
Prognostic study design
Patient follow up
Otolaryngol Head Neck Surg. 2010
= censoring
= event occurrence
S = short serial time
M = medium
L = long.
FOLLOW UP TIME: ignoring variable follow ups is an error!
R
R
R
R
R
R
Time (hours)  Time (hours) 
Drug A Drug B
R = relief of pain
1 2 8 3
2 8
5
• Pain relief proportions are ¾ (75%) for both drugs, but drug A is preferable.
• Times to event should not be ignored !
• One solution is to use (average) incidence rates:
• Compare using standard Poisson or negative Binomial regression models.
• This assumes constant rates and no censoring.
𝐼𝑅𝐴 =
3
12
= 0.25 𝐼𝑅𝐵 =
3
18
= 0.17 events per person−hour
FOLLOW UP TIME: ignoring censoring is an error!
Naïve suggestions:
A. Use complete data, exclude patients with incomplete follow up (too pessimistic!).
B. Assume censored patients, survived until end of study (too optimistic).
Solution:
C. Account for censoring with survival analysis methods: Kaplan-Meier, Cox regression, etc
1-year survival:
B) 47%
C) 41%
A) 27%
TIME DEPENDENT EFFECTS: e.g. non-proportional hazards
Kaplan-Meier survival curves showing the probabilities of remaining infection free.
Piecewise Cox model to estimate vaccine efficacy:
VE = 59% (95%CI 31% to 75%; P = 0.001) during first 9 weeks
VE = -17% (95%CI -76% to 23%; P = 0.460) during last 6 weeks
TIME TRENDS: over time, things may change anyway! - PITFALL
One measure before and after intervention (group level data)
? ?
Accounting for time trends may tell a different story!
?
TIME TRENDS: the interrupted time series model
Res Synth Methods 2021; 12(1):106-117
Segmented regression: 𝑌𝑡 = 𝛽0 + 𝛽1 ∙ 𝑡 + 𝛽2 ∙ 𝑋𝑡 + 𝛽3 𝑡 − 𝑡0 𝑋𝑡
𝒕𝟎
𝛽1
𝛽1 + 𝛽3
𝛽2
TIME TRENDS: ITS Example (1)
Carbapenem-focused antimicrobial stewardship intervention, Jan 2020 – Dec 2020,
University Hospital of Heraklion
Treatments per 100 hospital admissions:
 Level change IRR 0.63 (95%CI 0.50–0.80),
P < 0.001,
 Trend change IRR 1.02 (95%CI 1.00–1.04),
P = 0.117
Quarterly data on hospital consumption of
carbapenems:
 Level change: −4.9 DDD/100 PD
(95%CI −7.3 to −2.6); P = 0.007
J Antimicrob Chemother 2023;78(4):1000-1008.
TIME TRENDS: ITS Example (2)
Impact of SARS-CoV-2 preventive measures against healthcare-associated infections
from multidrug-resistant ESKAPEE pathogens (PAGNH + VENIZELEIO):
 Pre-COVID-19 period (3/2019 – 2/2020): 1.06 infections per 1,000 patient-days.
 COVID-19 period (3/2020 to 2/2021): 1.11 infections per 1,000 patient-days;
 IRR = 1.05 (overall), P = 0.58.
IRR = 0.46 (level drop) IRR = 0.44 (level drop)
Antibiotics 2023; 12(7):1088
LINEARITY: non-linear relationships are common - PITFALL
P
ΣbX
For the odds of binary outcome Y, the logistic regression model is:
loge(odds of Y) = b0 + b1X1 + b2X2 + b3X3 + … (linearity in logit)
or, equivalently:
 
1 1 2 2 3
0 3
b X b X b X
b
1
Probability of Y
1 e
    


• Non-linear probability model.
• Log-linear odds model.
• Measure of effect is the Odds Ratio (OR).
• Assumes that a 1 unit increase in a
covariate X has the same effect (OR) on the
outcome across the entire range of the
covariate ’s values – this is very strong
assumption and should be checked for
continuous variables!
• Use cubic splines or fractional polynomials.
LINEARITY: visualizing the effects before modelling
• HELAS cohort of emergency laparotomy patients in Greece
• Outcome: 30-day post-operative death
• Covariate: Age
• Logistic regression model: loge(odds death) = b0 + b1× AGE
OR = 1.75 (95% CI 1.47–2.09) per 10-years increase in age (P < 0.001)
i.e. odds of death after EL increase by 75% for each 10 additional years of age
across the entire range of ages (linearity)
World J Surg. 2023 Jan;47(1):130-139.
LINEARITY: visualizing the effects before modelling
• HELAS cohort of emergency laparotomy patients in Greece
• Outcome: 30-day post-operative death
• Covariate: BMI
World J Surg. 2023 Jan;47(1):130-139.
CLUSTERING: within-groups correlation - PITFALL
 Clustering occurs when data within a cluster tend to be ‘more alike’
(`intra-cluster correlation’)
 By design:
• longitudinal studies with repeated measurements (clusters = patients),
• data compiled across multiple experiments (clusters = trials),
• meta-analysis of different studies (clusters = studies),
• multicenter studies,
• cluster-randomized controlled trials ,
• cluster sampling in cross-sectional surveys,.
 By nature:
• subjects clustered within centers (surgeons, clinics, hospitals);
• clustering by surgeon or therapist delivering the intervention.
CLUSTERING: ignoring within-groups correlation
 Many statistical tests and models require independent data. Applying them on
clustered data, produces a false sense of precision, higher chances for Type I error,
and consequently incorrect conclusions may be drawn.
 Data within a cluster do not contribute
completely independent information,
the “effective” sample size is less than
the total number of observations.
The color of each data point represents the cluster to which it belongs
J Neurosci 2010;30(32):10601-8
CLUSTERING: Consequences of ignoring clustering
J Neurosci 2010;30(32):10601-8
CLUSTERING: methods to account for intra-cluster correlation
 `Fixed effect’ method: add one binary predictor variable for each cluster in a
regression / ANOVA model (using one cluster as a reference cluster).
o Simplest method, but requires small number of clusters.
o Results strictly only applicable to the particular set of clusters.
o Cannot be used in designs such as cluster RCTs.
 ‘Random effects’ model (aka mixed or multilevel),
o `marginal’ estimate of effect, for an individual changing exposure level within
a specified cluster,
o estimate of the between cluster variability itself.
 `Generalized estimating equations’ (GEEs).
o population average effect, for an individual moving from one exposure level to
another, regardless of cluster.
CLUSTERING: multilevel models
1. Random intercepts model
𝑌𝑖𝑗 = 𝛽0𝑗 + 𝛽1 ⋅ 𝑋𝑖𝑗 +𝑒𝑖𝑗
𝛽0𝑗 = 𝛾00 + 𝑢0𝑗
2. Random slopes model
𝑌𝑖𝑗 = 𝛽0 + 𝛽1𝑗 ⋅ 𝑋𝑖𝑗 + 𝑒𝑖𝑗
𝛽1𝑗 = 𝛾10 + 𝑢1𝑗
3. Random intercepts and slopes
𝑌𝑖𝑗 = 𝛽0𝑗 + 𝛽1𝑗 ⋅ 𝑋𝑖𝑗 + 𝑒𝑖𝑗
𝛽0𝑗 = 𝛾00 + 𝑢0𝑗
𝛽1𝑗 = 𝛾10 + 𝑢1𝑗
Patient: i
Cluster: j
CALIBRATION: Clinical Prediction Models
Obtain a system (set of variables + model) that estimates the
risk of the outcome.
Predictive
models:
Aim is the use in NEW patients:
it should work ‘tomorrow’, not
now (validation).
https://riskcalculator.facs.org/RiskCalculator/PatientInfo.jsp
CALIBRATION: Assessing clinical prediction models
• Discrimination
– Ability of model to rank subjects according
to the risk of the outcome event.
– Trade-off between sensitivity and specificity
– Assessed graphically with a Receiver
Operating Curve (ROC) and numerically by
the area under the curve (AUC = c-index)
• Calibration
– Agreement between risk predictions from
the model and observed risks of outcome.
– Assessed graphically with calibration plots
– Assessed numerically with the calibration
slope (ideal slope = 1) and calibration
intercept (ideal CITL= 0)
Slope =1.05
CITL = 0.00
CALIBRATION: Overfitting – PITFALL
Overfitting =
Source: https://retrobadge.co.uk/retrobadge/slogans-sayings-
badges/public-enemy-number-one-small-retro-badge/
Overfitting = What you see is not what you get!
“Idiosyncrasies in the data are fitted rather than
generalizable patterns. A model may hence not be
applicable to new patients, even when the setting of
application is very similar to the development setting”
Steyerberg, 2009, Springer, ISBN 978-0-387-77244-8.
CALIBRATION: Overfitting – PITFALL
• Typical calibration plot with overfitting:
Source: Maarten van Smeden
 Discrimination (e.g. AUC) may not be affected, but:
 Low risks are underestimated
 High risk are overestimated
CALIBRATION: Overfitting – PITFALL
CALIBRATION: Prognostic prediction after EL in the HELAS cohort
J Trauma Acute Care Surg 2023;94(6):847-856.
Good discrimination (high AUC or C-statistic value) does not necessarily coincide with good calibration.
RECOMMENDED READINGS: Short lists by others
 van Smeden M. A Very Short List of Common Pitfalls in Research Design, Data Analysis, and
Reporting. PRiMER. 2022;6:26. PMID: 36119906.
 Riley RD, Cole TJ, Deeks J, et al. On the 12th Day of Christmas, a Statistician Sent to Me . . .
BMJ. 2022;379:e072883. PMID: 36593578.
 Makin TR, Orban de Xivry JJ. Ten common statistical mistakes to watch out for when writing
or reviewing a manuscript. Elife. 2019 ;8:e48175. PMID: 31596231.
 Strasak AM, Zaman Q, Pfeiffer KP, Göbel G, Ulmer H. Statistical errors in medical research -
a review of common pitfalls. Swiss Med Wkly 2007;137(3-4):44-49.
 Borg DN, Lohse KR, Sainani KL. Ten Common Statistical Errors from All Phases of Research,
and Their Fixes. PM R. 2020;12(6):610-614. doi:10.1002/pmrj.12395
And an all-time classic:
 Altman DG. The scandal of poor medical research. BMJ. 1994;308(6924):283-284.

More Related Content

What's hot

Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eeznEhealthMoHS
 
An Introduction to Infectious Disease Modeling
An Introduction to Infectious Disease ModelingAn Introduction to Infectious Disease Modeling
An Introduction to Infectious Disease ModelingInsideScientific
 
Incidence or incidence rate (Epidemiology short lecture)
Incidence or incidence rate (Epidemiology short lecture)Incidence or incidence rate (Epidemiology short lecture)
Incidence or incidence rate (Epidemiology short lecture)Muhammad Akbar Rashid Qadri
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1KyusonLim
 
Survival Analysis Using SPSS
Survival Analysis Using SPSSSurvival Analysis Using SPSS
Survival Analysis Using SPSSNermin Osman
 
Power and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesPower and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesnQuery
 
Survival Analysis Lecture.ppt
Survival Analysis Lecture.pptSurvival Analysis Lecture.ppt
Survival Analysis Lecture.ppthabtamu biazin
 
COVID update: What you need to know for 2023 11/14/23
COVID update: What you need to know for 2023 11/14/23COVID update: What you need to know for 2023 11/14/23
COVID update: What you need to know for 2023 11/14/23Neil Kao
 
SURVIVAL ANALYSIS.ppt
SURVIVAL ANALYSIS.pptSURVIVAL ANALYSIS.ppt
SURVIVAL ANALYSIS.pptmbang ernest
 
COVID-19 SIR model overview
COVID-19 SIR model overviewCOVID-19 SIR model overview
COVID-19 SIR model overviewThawfeek Varusai
 
Medical Statistics Part-II:Inferential statistics
Medical Statistics Part-II:Inferential  statisticsMedical Statistics Part-II:Inferential  statistics
Medical Statistics Part-II:Inferential statisticsRamachandra Barik
 
Basics of medical statistics
Basics of medical statisticsBasics of medical statistics
Basics of medical statisticsRamachandra Barik
 
Survival analysis
Survival analysisSurvival analysis
Survival analysisHar Jindal
 

What's hot (20)

Part 1 Survival Analysis
Part 1 Survival AnalysisPart 1 Survival Analysis
Part 1 Survival Analysis
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eezn
 
Hazard ratios
Hazard ratiosHazard ratios
Hazard ratios
 
An Introduction to Infectious Disease Modeling
An Introduction to Infectious Disease ModelingAn Introduction to Infectious Disease Modeling
An Introduction to Infectious Disease Modeling
 
Survival analysis
Survival analysis  Survival analysis
Survival analysis
 
Incidence or incidence rate (Epidemiology short lecture)
Incidence or incidence rate (Epidemiology short lecture)Incidence or incidence rate (Epidemiology short lecture)
Incidence or incidence rate (Epidemiology short lecture)
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1
 
Survival Analysis Using SPSS
Survival Analysis Using SPSSSurvival Analysis Using SPSS
Survival Analysis Using SPSS
 
Power and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesPower and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar Slides
 
Survival Analysis Lecture.ppt
Survival Analysis Lecture.pptSurvival Analysis Lecture.ppt
Survival Analysis Lecture.ppt
 
COVID update: What you need to know for 2023 11/14/23
COVID update: What you need to know for 2023 11/14/23COVID update: What you need to know for 2023 11/14/23
COVID update: What you need to know for 2023 11/14/23
 
SURVIVAL ANALYSIS.ppt
SURVIVAL ANALYSIS.pptSURVIVAL ANALYSIS.ppt
SURVIVAL ANALYSIS.ppt
 
Roc curves
Roc curvesRoc curves
Roc curves
 
COVID-19 SIR model overview
COVID-19 SIR model overviewCOVID-19 SIR model overview
COVID-19 SIR model overview
 
HIV Resistance (Journal Club)
HIV Resistance (Journal Club)HIV Resistance (Journal Club)
HIV Resistance (Journal Club)
 
Medical Statistics Part-II:Inferential statistics
Medical Statistics Part-II:Inferential  statisticsMedical Statistics Part-II:Inferential  statistics
Medical Statistics Part-II:Inferential statistics
 
Basics of medical statistics
Basics of medical statisticsBasics of medical statistics
Basics of medical statistics
 
Influenza Virus
Influenza VirusInfluenza Virus
Influenza Virus
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
 

Similar to Top 5 statistical pitfalls in biomedical research

Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 
Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitLaure Wynants
 
unmatched case control studies
unmatched case control studiesunmatched case control studies
unmatched case control studiesMrinmoy Bharadwaz
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...BenVanCalster
 
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesExtending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesnQuery
 
ISCB 2023 Sources of uncertainty b.pptx
ISCB 2023 Sources of uncertainty b.pptxISCB 2023 Sources of uncertainty b.pptx
ISCB 2023 Sources of uncertainty b.pptxBenVanCalster
 
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...Cytel USA
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Nicole Krämer
 
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...European School of Oncology
 
Lemeshow samplesize
Lemeshow samplesizeLemeshow samplesize
Lemeshow samplesize1joanenab
 
Projecting ‘time to event’ outcomes in technology assessment: an alternative ...
Projecting ‘time to event’ outcomes in technology assessment: an alternative ...Projecting ‘time to event’ outcomes in technology assessment: an alternative ...
Projecting ‘time to event’ outcomes in technology assessment: an alternative ...cheweb1
 
Practical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesPractical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesnQuery
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxMrs S Sen
 
Medical Statistics used in Oncology
Medical Statistics used in OncologyMedical Statistics used in Oncology
Medical Statistics used in OncologyNamrata Das
 
Ideal induction therapy for newly diagnosed AML. Do we have a consensus?
Ideal induction therapy for newly diagnosed AML. Do we have a consensus?Ideal induction therapy for newly diagnosed AML. Do we have a consensus?
Ideal induction therapy for newly diagnosed AML. Do we have a consensus?spa718
 
Analytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataAnalytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataCTSI at UCSF
 

Similar to Top 5 statistical pitfalls in biomedical research (20)

Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net Benefit
 
unmatched case control studies
unmatched case control studiesunmatched case control studies
unmatched case control studies
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...
 
Quantitative Synthesis I
Quantitative Synthesis IQuantitative Synthesis I
Quantitative Synthesis I
 
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesExtending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
 
ISCB 2023 Sources of uncertainty b.pptx
ISCB 2023 Sources of uncertainty b.pptxISCB 2023 Sources of uncertainty b.pptx
ISCB 2023 Sources of uncertainty b.pptx
 
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
 
Stats.pptx
Stats.pptxStats.pptx
Stats.pptx
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...
 
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
 
Sampling distributions
Sampling distributionsSampling distributions
Sampling distributions
 
Lemeshow samplesize
Lemeshow samplesizeLemeshow samplesize
Lemeshow samplesize
 
Projecting ‘time to event’ outcomes in technology assessment: an alternative ...
Projecting ‘time to event’ outcomes in technology assessment: an alternative ...Projecting ‘time to event’ outcomes in technology assessment: an alternative ...
Projecting ‘time to event’ outcomes in technology assessment: an alternative ...
 
Practical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesPractical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size Challenges
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptx
 
Medical Statistics used in Oncology
Medical Statistics used in OncologyMedical Statistics used in Oncology
Medical Statistics used in Oncology
 
Ideal induction therapy for newly diagnosed AML. Do we have a consensus?
Ideal induction therapy for newly diagnosed AML. Do we have a consensus?Ideal induction therapy for newly diagnosed AML. Do we have a consensus?
Ideal induction therapy for newly diagnosed AML. Do we have a consensus?
 
Analytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataAnalytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational Data
 

Recently uploaded

Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night EnjoyCall Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoybabeytanya
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...narwatsonia7
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...Miss joya
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Miss joya
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...Taniya Sharma
 
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsBangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsGfnyt
 
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night EnjoyCall Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoybabeytanya
 
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...Neha Kaur
 
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...narwatsonia7
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...indiancallgirl4rent
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatorenarwatsonia7
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Low Rate Call Girls Patna Anika 8250192130 Independent Escort Service Patna
Low Rate Call Girls Patna Anika 8250192130 Independent Escort Service PatnaLow Rate Call Girls Patna Anika 8250192130 Independent Escort Service Patna
Low Rate Call Girls Patna Anika 8250192130 Independent Escort Service Patnamakika9823
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...astropune
 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call girls in Ahmedabad High profile
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableNehru place Escorts
 

Recently uploaded (20)

Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night EnjoyCall Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
 
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...
 
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsBangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
 
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night EnjoyCall Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
 
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
VIP Russian Call Girls in Varanasi Samaira 8250192130 Independent Escort Serv...
 
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
 
Low Rate Call Girls Patna Anika 8250192130 Independent Escort Service Patna
Low Rate Call Girls Patna Anika 8250192130 Independent Escort Service PatnaLow Rate Call Girls Patna Anika 8250192130 Independent Escort Service Patna
Low Rate Call Girls Patna Anika 8250192130 Independent Escort Service Patna
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
 

Top 5 statistical pitfalls in biomedical research

  • 1. Common statistical pitfalls & errors in biomedical research (a top-5 list) Evangelos I. Kritsotakis Assoc. Prof. of Biostatistics, Med. School, University of Crete Honorary Senior Lecturer, ScHARR, University of Sheffield e.kritsotakis@uoc.gr 10.06.2023
  • 2. Outline and disclaimer Top-5 list of common statistical pitfalls leading to errors, related to:  Normality  Time confounding  Linearity  Clustering  Calibration  This is a personal view based on my experience as a reader, reviewer, and editor of medical journals, o might be incomplete and biased, but hopefully will be useful.  These problems are well known to statisticians and methodologists, but they continue to appear in medical journals.
  • 3.  Makes sense to summarize the data with median and IQR (rather than mean ± SD).  Most researchers would apply a non-parametric test (e.g. Mann-Whitney U-test).  But the t-test will work fine in this situation!  In fact, is more appropriate and informative to use the t-test than non-parametrics. NORMALITY: Who is afraid of non-normal data? Data from the HELAS cohort of emergency laparotomies: serum albumin blood urea nitrogen
  • 4. NORMALITY: Who is afraid of non-normal data? The t-test, and thus linear regression, are NOT afraid of non-normal data! http://onlinestatbook.com/stat_sim/sampling_dist/index.html http://www.youtube.com/watch?v=tHU0_-Jzg34  t-test assumes Normality per group, so that sample means are Normally distributed. but  By the central limit theorem, the sample means will approximate to the Normal distribution when the sample size increases, regardless of the distribution of the original observations
  • 5. NORMALITY: Who is afraid of non-normal data? The t-test, and thus also linear regression, are NOT afraid of non-normal data! Rules of thumb for the t-test:  n < 25 per group, the data must be normally distributed to use the t-test.  n > 25 per group, no extreme outliers, can handle moderately skewed distributions  n > 200 per group, t-test robust to heavily skewed distributions When should you use a non-parametric test? • n < 25 per group (as it is very difficult to confirm normality) Eur J Endocrinol 2020;183(2):L1-L3. Please DO NOT perform statistical tests for normality ! (e.g. Kolmogorov–Smirnov or Shapiro–Wilk tests)
  • 6. NORMALITY: Applying non-parametrics in large samples - PITFALL Parametric vs. non-parametric tests: t-test vs. Wilcoxon-Mann-Whitney test Rejection rates (p < 0.05) of the WMW and t-tests after 10 000 replications Data drawn at random from skewed gamma distributions (Skewness coef. = 3), with equal means and medians, 𝑆𝐷1 = 1.1 × 𝑆𝐷2 BMC Med Res Methodol 2012;12:78.
  • 7. FOLLOW UP TIME: frequently variable and/or incomplete • Patients entering a trial my have different times of follow up. • Not all patients will experience the event of interest by end of data collection. • Times to outcome event (endpoint) are incomplete (right censored). Prognostic study design Patient follow up Otolaryngol Head Neck Surg. 2010 = censoring = event occurrence S = short serial time M = medium L = long.
  • 8. FOLLOW UP TIME: ignoring variable follow ups is an error! R R R R R R Time (hours)  Time (hours)  Drug A Drug B R = relief of pain 1 2 8 3 2 8 5 • Pain relief proportions are ¾ (75%) for both drugs, but drug A is preferable. • Times to event should not be ignored ! • One solution is to use (average) incidence rates: • Compare using standard Poisson or negative Binomial regression models. • This assumes constant rates and no censoring. 𝐼𝑅𝐴 = 3 12 = 0.25 𝐼𝑅𝐵 = 3 18 = 0.17 events per person−hour
  • 9. FOLLOW UP TIME: ignoring censoring is an error! Naïve suggestions: A. Use complete data, exclude patients with incomplete follow up (too pessimistic!). B. Assume censored patients, survived until end of study (too optimistic). Solution: C. Account for censoring with survival analysis methods: Kaplan-Meier, Cox regression, etc 1-year survival: B) 47% C) 41% A) 27%
  • 10. TIME DEPENDENT EFFECTS: e.g. non-proportional hazards Kaplan-Meier survival curves showing the probabilities of remaining infection free. Piecewise Cox model to estimate vaccine efficacy: VE = 59% (95%CI 31% to 75%; P = 0.001) during first 9 weeks VE = -17% (95%CI -76% to 23%; P = 0.460) during last 6 weeks
  • 11. TIME TRENDS: over time, things may change anyway! - PITFALL One measure before and after intervention (group level data) ? ? Accounting for time trends may tell a different story! ?
  • 12. TIME TRENDS: the interrupted time series model Res Synth Methods 2021; 12(1):106-117 Segmented regression: 𝑌𝑡 = 𝛽0 + 𝛽1 ∙ 𝑡 + 𝛽2 ∙ 𝑋𝑡 + 𝛽3 𝑡 − 𝑡0 𝑋𝑡 𝒕𝟎 𝛽1 𝛽1 + 𝛽3 𝛽2
  • 13. TIME TRENDS: ITS Example (1) Carbapenem-focused antimicrobial stewardship intervention, Jan 2020 – Dec 2020, University Hospital of Heraklion Treatments per 100 hospital admissions:  Level change IRR 0.63 (95%CI 0.50–0.80), P < 0.001,  Trend change IRR 1.02 (95%CI 1.00–1.04), P = 0.117 Quarterly data on hospital consumption of carbapenems:  Level change: −4.9 DDD/100 PD (95%CI −7.3 to −2.6); P = 0.007 J Antimicrob Chemother 2023;78(4):1000-1008.
  • 14. TIME TRENDS: ITS Example (2) Impact of SARS-CoV-2 preventive measures against healthcare-associated infections from multidrug-resistant ESKAPEE pathogens (PAGNH + VENIZELEIO):  Pre-COVID-19 period (3/2019 – 2/2020): 1.06 infections per 1,000 patient-days.  COVID-19 period (3/2020 to 2/2021): 1.11 infections per 1,000 patient-days;  IRR = 1.05 (overall), P = 0.58. IRR = 0.46 (level drop) IRR = 0.44 (level drop) Antibiotics 2023; 12(7):1088
  • 15. LINEARITY: non-linear relationships are common - PITFALL P ΣbX For the odds of binary outcome Y, the logistic regression model is: loge(odds of Y) = b0 + b1X1 + b2X2 + b3X3 + … (linearity in logit) or, equivalently:   1 1 2 2 3 0 3 b X b X b X b 1 Probability of Y 1 e        • Non-linear probability model. • Log-linear odds model. • Measure of effect is the Odds Ratio (OR). • Assumes that a 1 unit increase in a covariate X has the same effect (OR) on the outcome across the entire range of the covariate ’s values – this is very strong assumption and should be checked for continuous variables! • Use cubic splines or fractional polynomials.
  • 16. LINEARITY: visualizing the effects before modelling • HELAS cohort of emergency laparotomy patients in Greece • Outcome: 30-day post-operative death • Covariate: Age • Logistic regression model: loge(odds death) = b0 + b1× AGE OR = 1.75 (95% CI 1.47–2.09) per 10-years increase in age (P < 0.001) i.e. odds of death after EL increase by 75% for each 10 additional years of age across the entire range of ages (linearity) World J Surg. 2023 Jan;47(1):130-139.
  • 17. LINEARITY: visualizing the effects before modelling • HELAS cohort of emergency laparotomy patients in Greece • Outcome: 30-day post-operative death • Covariate: BMI World J Surg. 2023 Jan;47(1):130-139.
  • 18. CLUSTERING: within-groups correlation - PITFALL  Clustering occurs when data within a cluster tend to be ‘more alike’ (`intra-cluster correlation’)  By design: • longitudinal studies with repeated measurements (clusters = patients), • data compiled across multiple experiments (clusters = trials), • meta-analysis of different studies (clusters = studies), • multicenter studies, • cluster-randomized controlled trials , • cluster sampling in cross-sectional surveys,.  By nature: • subjects clustered within centers (surgeons, clinics, hospitals); • clustering by surgeon or therapist delivering the intervention.
  • 19. CLUSTERING: ignoring within-groups correlation  Many statistical tests and models require independent data. Applying them on clustered data, produces a false sense of precision, higher chances for Type I error, and consequently incorrect conclusions may be drawn.  Data within a cluster do not contribute completely independent information, the “effective” sample size is less than the total number of observations. The color of each data point represents the cluster to which it belongs J Neurosci 2010;30(32):10601-8
  • 20. CLUSTERING: Consequences of ignoring clustering J Neurosci 2010;30(32):10601-8
  • 21. CLUSTERING: methods to account for intra-cluster correlation  `Fixed effect’ method: add one binary predictor variable for each cluster in a regression / ANOVA model (using one cluster as a reference cluster). o Simplest method, but requires small number of clusters. o Results strictly only applicable to the particular set of clusters. o Cannot be used in designs such as cluster RCTs.  ‘Random effects’ model (aka mixed or multilevel), o `marginal’ estimate of effect, for an individual changing exposure level within a specified cluster, o estimate of the between cluster variability itself.  `Generalized estimating equations’ (GEEs). o population average effect, for an individual moving from one exposure level to another, regardless of cluster.
  • 22. CLUSTERING: multilevel models 1. Random intercepts model 𝑌𝑖𝑗 = 𝛽0𝑗 + 𝛽1 ⋅ 𝑋𝑖𝑗 +𝑒𝑖𝑗 𝛽0𝑗 = 𝛾00 + 𝑢0𝑗 2. Random slopes model 𝑌𝑖𝑗 = 𝛽0 + 𝛽1𝑗 ⋅ 𝑋𝑖𝑗 + 𝑒𝑖𝑗 𝛽1𝑗 = 𝛾10 + 𝑢1𝑗 3. Random intercepts and slopes 𝑌𝑖𝑗 = 𝛽0𝑗 + 𝛽1𝑗 ⋅ 𝑋𝑖𝑗 + 𝑒𝑖𝑗 𝛽0𝑗 = 𝛾00 + 𝑢0𝑗 𝛽1𝑗 = 𝛾10 + 𝑢1𝑗 Patient: i Cluster: j
  • 23. CALIBRATION: Clinical Prediction Models Obtain a system (set of variables + model) that estimates the risk of the outcome. Predictive models: Aim is the use in NEW patients: it should work ‘tomorrow’, not now (validation). https://riskcalculator.facs.org/RiskCalculator/PatientInfo.jsp
  • 24. CALIBRATION: Assessing clinical prediction models • Discrimination – Ability of model to rank subjects according to the risk of the outcome event. – Trade-off between sensitivity and specificity – Assessed graphically with a Receiver Operating Curve (ROC) and numerically by the area under the curve (AUC = c-index) • Calibration – Agreement between risk predictions from the model and observed risks of outcome. – Assessed graphically with calibration plots – Assessed numerically with the calibration slope (ideal slope = 1) and calibration intercept (ideal CITL= 0) Slope =1.05 CITL = 0.00
  • 25. CALIBRATION: Overfitting – PITFALL Overfitting = Source: https://retrobadge.co.uk/retrobadge/slogans-sayings- badges/public-enemy-number-one-small-retro-badge/ Overfitting = What you see is not what you get! “Idiosyncrasies in the data are fitted rather than generalizable patterns. A model may hence not be applicable to new patients, even when the setting of application is very similar to the development setting” Steyerberg, 2009, Springer, ISBN 978-0-387-77244-8.
  • 27. • Typical calibration plot with overfitting: Source: Maarten van Smeden  Discrimination (e.g. AUC) may not be affected, but:  Low risks are underestimated  High risk are overestimated CALIBRATION: Overfitting – PITFALL
  • 28. CALIBRATION: Prognostic prediction after EL in the HELAS cohort J Trauma Acute Care Surg 2023;94(6):847-856. Good discrimination (high AUC or C-statistic value) does not necessarily coincide with good calibration.
  • 29. RECOMMENDED READINGS: Short lists by others  van Smeden M. A Very Short List of Common Pitfalls in Research Design, Data Analysis, and Reporting. PRiMER. 2022;6:26. PMID: 36119906.  Riley RD, Cole TJ, Deeks J, et al. On the 12th Day of Christmas, a Statistician Sent to Me . . . BMJ. 2022;379:e072883. PMID: 36593578.  Makin TR, Orban de Xivry JJ. Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife. 2019 ;8:e48175. PMID: 31596231.  Strasak AM, Zaman Q, Pfeiffer KP, Göbel G, Ulmer H. Statistical errors in medical research - a review of common pitfalls. Swiss Med Wkly 2007;137(3-4):44-49.  Borg DN, Lohse KR, Sainani KL. Ten Common Statistical Errors from All Phases of Research, and Their Fixes. PM R. 2020;12(6):610-614. doi:10.1002/pmrj.12395 And an all-time classic:  Altman DG. The scandal of poor medical research. BMJ. 1994;308(6924):283-284.