SlideShare a Scribd company logo
1 of 23
First Steps Towards a Risk of Bias Corpus
of Randomized Controlled Trials
Presenter – Anjani Dhrangadhariya
MIE2023 - Göteborg, Sweden, 23.05.23
Authors: Anjani Dhrangadhariya, Roger Hilfiker, Martin Sattelmayer, Katia
Giacomino, Rahel Caliesch, Simone Elsig, Nona Naderi, Henning Müller
Randomized Controlled Trial
• In theory, an RCT accurately measures intervention effects on patient
outcomes, but in practice, biases enter
• Design/Planning
• Execution
• Analysis
• Outcomes reporting
• Systematic Reviews
• Utility
• Medical professionals
• Health policies
• Surgeons
• The risk of bias specifically pertains to systematic errors in the design,
conduct, or reporting of a study that can potentially lead to a
deviation from the true effect being measured.
• RoB assessment guidelines
Risk of Bias (RoB)
Example RoB assessment guidelines Year
Physiotherapy Evidence Database (PEDro) 1999
Risk of Bias Assessment Tool for Nonrandomized Studies (RoBANS) 2004
Cochrane Risk of Bias assessment guidelines 2008
Risk of Bias in Non-randomized Studies of Interventions (ROBINS-I) 2016
Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) 2017
Newcastle-Ottawa Scale (NOS) 2018
Revised Cochrane Risk of Bias for RCTs 2.0 tool (RoB 2) 2019
RoB information extraction
• Thorough assessment
• Manual assessment
• Time-consuming
• Cognitively demanding
• Two experts for manual assessment
• Third, for conflict resolution
• Automation imperative
Related Work
• RoB labelled corpus
• Wang et al. 2022
• Preclinical animal
studies
• Human RCTs
• RobotReviewer
• PDF highlights
• Freely-available
• Closed assess data
• Cochrane RoB v1
• RoB 2.0?
• RoB automation
• Marshall et al. 2015
• Millard et al. 2016
• Cochrane Database
(CDSR)
• Closed access
Motivation
1
No RoB text annotation
guidelines exist
2
No RoB annotated RCTs
exist
Revised Cochrane RoB 2.0 tool
• Can you use the guidelines to
annotate text corpus?
• Extensive guidelines
• Step-by-step instructions
• Divides RoB into 5 domains
• Each domain is assessed using several
signalling questions
Randomization
process
Deviations from
intended
interventions
Missing
outcomes data
Outcomes
measurement
Selection of
reported result
Sterne, J.A., Savović, J., Page, M.J., Elbers, R.G., Blencowe, N.S., Boutron, I., Cates, C.J., Cheng, H.Y., Corbett, M.S., Eldridge, S.M. and Emberson, J.R., 2019. RoB 2: a
revised tool for assessing risk of bias in randomised trials. bmj, 366.
Revised Cochrane RoB 2.0 tool
• Reviewers manually go through the RCT to identify text describing the
answer to a signalling question.
• Based on the answer to the signalling question, select one of the five
response judgements:
Yes Probably Yes Probably No No No Information
Revised Cochrane RoB 2.0 tool
• 2.1 - Were the participants aware of their assigned intervention
during the trial?
2.1 No Good
Risk domains Signalling questions
5 22
Annotation schema
• Follow the revised Cochrane RoB 2.0
• 110 span Labels
• 1.1 Yes Good
• 1.1 Probably Yes Good
• 1.1 Probably No Bad
• 1.1 No bad
• 1.1 No Information
• 1.2 Yes Good
• 1.2 Probably Yes Good
• 1.2 Probably No Bad
• …
1.1 Yes Good
Risk domain
Signalling question
SQ response
Direction
Good = low risk
Bad = High risk
Pilot Annotation
• Ten RCT full-text PDFs
• 2000-2019
• Four annotators
• 2 scientists
• 1 doctoral student
• 1 scientific collaborator
• Two NLP experts
• 1 professor
• 1 doctoral student
• tagtog PDF annotation tool
https://www.tagtog.com/
Evaluation
• F1-measure as Inter-annotator agreement
• Disregards out-of-the-span tokens (unannotated tokens)
1. IAASQ
Do the annotator pairs annotate
the same text span to answer a
signalling question (SQ)?
2. IAAresponse
If the annotator pairs annotate
the same text to answer a
signalling question, do they also
select same response
judgment?
Results - IAASQ
• Zero or no Annotation
• Domain 2 - 52%
• Domain 3 - 54%
• Domain 4 - 50%
• Domain 5 - 61% (protocol)
• Less subjective questions
• Better IAA
The table details the interpretation of pairwise F1-measure.
Results - IAAresponse
• IAA - SQ response judgment
• Averaged over all annotator pairs
• Zero agreement - 52.63%
• No annotation – 22%
~75%
The table details the interpretation of pairwise F1-measure.
Error Inspection – 1. Text span disagreement
• Not limiting the annotators to
annotating
• phrases vs full sentences
4.1 Was the method of measuring the outcome
inappropriate?
…The primary outcome measure was a 0–10
NRS pain score, which reflected the average
pain experienced by the patient for ten days
prior to follow-up…
…a 0–10 NRS pain score…
Phrase!
Sentence
Error Inspection – 2. Different sections
• Annotators use different regions
(Methods section, Results section,
Table, …) of full text to come to
identical labels.
• Same judgment, different parts of
text evidence
2.6 Was an appropriate analysis used to estimate
the effect of assignment to intervention?
…This study was guided by the HAPA, which
has been widely used to address the gap
between intention to change and a person’s
actual change in behaviour [25-27]…
…intention-to-treat analysis was done with
missing data substituted by the last-
observation-carried-forward procedure…
2.1 Yes Good
Error Inspection – 3. Polarity disagreement
… 71 allocated routine services, 67 allocated
intervention service, 69 assessed at 8 weeks,
64 assessed at 8 week...
3.1 Were data for the outcome of interest
available for all, or nearly all, participants
randomized?
• Selecting response judgment
options with different polarities
• Yes vs. No
• Three of the four annotators
responded to 3.1 with Yes, but
one chose Probably no.
• All or nearly all (cut-off?)
Error Inspection – 4. Degree disagreement
• Lenient - definitive
• Yes
• No
• Stringent
• Probably yes
• Probably no
1.1 Was a random sequence generation
method used to assign participants to
intervention groups?
…Patients were randomly allocated to either
intervention by a computer-generated
schedule stratified by sex and attendance at
a day hospital…
Conclusions
1. RoB 2.0 assessment guidelines cannot be directly used as RoB
corpus annotation guidelines.
2. RoB assessment and RoB text annotation tasks are both highly
subjective, but the annotation guidelines can be refined with an
iterative process to improve both.
Future Directions
1. Instructional placards as
annotation guidelines
2. Larger annotated corpus
of RCTs
Dr. Roger Hilfiker
Dr. Martin Sattelmayer
Rahel Caliesch
Katia Giacomino
Dr. Nona Naderi
Annotation team
References
1. Wang, Q., Liao, J., Lapata, M., & Macleod, M. (2022). Risk of bias assessment in preclinical literature using natural language processing. Research Synthesis
Methods, 13(3), 368-380.
2. Macleod, M. R., O’Collins, T., Howells, D. W., & Donnan, G. A. (2004). Pooling of animal experimental data reveals influence of study design and publication
bias. Stroke, 35(5), 1203-1208.
3. Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, Kouril M, Marsolo K, Solti I. Building gold standard corpora for medical natural language processing tasks. InAMIA
Annual Symposium Proceedings 2012 (Vol. 2012, p. 144). American Medical Informatics Association.
4. Sterne, J.A., Savović, J., Page, M.J., Elbers, R.G., Blencowe, N.S., Boutron, I., Cates, C.J., Cheng, H.Y., Corbett, M.S., Eldridge, S.M. and Emberson, J.R., 2019.
RoB 2: a revised tool for assessing risk of bias in randomised trials. bmj, 366.
Thank You
Questions?
Dataset: https://zenodo.org/record/7698941#.ZEGhXexBzzU
Email: anjani.k.dhrangadhariya@gmail.com
LinkedIn: https://www.linkedin.com/in/anjani-dhrangadhariya/

More Related Content

Similar to MIE20232.pptx

Knowledge transfer research examples
Knowledge transfer research examplesKnowledge transfer research examples
Knowledge transfer research examplestaem
 
Top Articles in Medical Education 2017
Top Articles in Medical Education 2017Top Articles in Medical Education 2017
Top Articles in Medical Education 2017dsandro1
 
Resident Presentations - Evidence-Based Medicine for Haematology
Resident Presentations - Evidence-Based Medicine for HaematologyResident Presentations - Evidence-Based Medicine for Haematology
Resident Presentations - Evidence-Based Medicine for HaematologyRobin Featherstone
 
Comparison of registered and published intervention fidelity assessment in cl...
Comparison of registered and published intervention fidelity assessment in cl...Comparison of registered and published intervention fidelity assessment in cl...
Comparison of registered and published intervention fidelity assessment in cl...valéry ridde
 
Techniques in clinical epidemiology
Techniques in clinical epidemiologyTechniques in clinical epidemiology
Techniques in clinical epidemiologyBhoj Raj Singh
 
CAT Systematic reviews of RCT.pptx
CAT Systematic reviews of RCT.pptxCAT Systematic reviews of RCT.pptx
CAT Systematic reviews of RCT.pptxmariaidrees3
 
Dataset Codebook BUS7105, Week 8 Name Source Represe
Dataset Codebook  BUS7105, Week 8  Name Source RepreseDataset Codebook  BUS7105, Week 8  Name Source Represe
Dataset Codebook BUS7105, Week 8 Name Source RepreseOllieShoresna
 
Quick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchQuick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchAlan Fricker
 
Systematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptxSystematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptxDr. Anik Chakraborty
 
How to conduct a systematic review
How to conduct a systematic reviewHow to conduct a systematic review
How to conduct a systematic reviewDrNidhiPruthiShukla
 
Efficacy of Information interventions in reducing transfer anxiety from a cri...
Efficacy of Information interventions in reducing transfer anxiety from a cri...Efficacy of Information interventions in reducing transfer anxiety from a cri...
Efficacy of Information interventions in reducing transfer anxiety from a cri...Ambika Rai
 
Development of health measurement scales - part 1
Development of health measurement scales - part 1Development of health measurement scales - part 1
Development of health measurement scales - part 1Rizwan S A
 
Correlational research
Correlational researchCorrelational research
Correlational researchDhiya Lara
 
Correlational research
Correlational researchCorrelational research
Correlational researchAzura Zaki
 
Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015KISK FF MU
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPAlAcademia Tsr
 
medicine_research_slides_1415_topic6.pdf
medicine_research_slides_1415_topic6.pdfmedicine_research_slides_1415_topic6.pdf
medicine_research_slides_1415_topic6.pdfPerioKLE
 

Similar to MIE20232.pptx (20)

Knowledge transfer research examples
Knowledge transfer research examplesKnowledge transfer research examples
Knowledge transfer research examples
 
Top Articles in Medical Education 2017
Top Articles in Medical Education 2017Top Articles in Medical Education 2017
Top Articles in Medical Education 2017
 
Resident Presentations - Evidence-Based Medicine for Haematology
Resident Presentations - Evidence-Based Medicine for HaematologyResident Presentations - Evidence-Based Medicine for Haematology
Resident Presentations - Evidence-Based Medicine for Haematology
 
Comparison of registered and published intervention fidelity assessment in cl...
Comparison of registered and published intervention fidelity assessment in cl...Comparison of registered and published intervention fidelity assessment in cl...
Comparison of registered and published intervention fidelity assessment in cl...
 
Techniques in clinical epidemiology
Techniques in clinical epidemiologyTechniques in clinical epidemiology
Techniques in clinical epidemiology
 
CAT Systematic reviews of RCT.pptx
CAT Systematic reviews of RCT.pptxCAT Systematic reviews of RCT.pptx
CAT Systematic reviews of RCT.pptx
 
Dataset Codebook BUS7105, Week 8 Name Source Represe
Dataset Codebook  BUS7105, Week 8  Name Source RepreseDataset Codebook  BUS7105, Week 8  Name Source Represe
Dataset Codebook BUS7105, Week 8 Name Source Represe
 
Quick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchQuick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative research
 
Systematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptxSystematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptx
 
Spotlight Webinar: ROBINS-I
Spotlight Webinar: ROBINS-I Spotlight Webinar: ROBINS-I
Spotlight Webinar: ROBINS-I
 
How to conduct a systematic review
How to conduct a systematic reviewHow to conduct a systematic review
How to conduct a systematic review
 
Efficacy of Information interventions in reducing transfer anxiety from a cri...
Efficacy of Information interventions in reducing transfer anxiety from a cri...Efficacy of Information interventions in reducing transfer anxiety from a cri...
Efficacy of Information interventions in reducing transfer anxiety from a cri...
 
Development of health measurement scales - part 1
Development of health measurement scales - part 1Development of health measurement scales - part 1
Development of health measurement scales - part 1
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLP
 
medicine_research_slides_1415_topic6.pdf
medicine_research_slides_1415_topic6.pdfmedicine_research_slides_1415_topic6.pdf
medicine_research_slides_1415_topic6.pdf
 
judith dyson collaborative launch
judith dyson collaborative launchjudith dyson collaborative launch
judith dyson collaborative launch
 
47711.ppt
47711.ppt47711.ppt
47711.ppt
 

More from Institute of Information Systems (HES-SO)

Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Institute of Information Systems (HES-SO)
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Institute of Information Systems (HES-SO)
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Institute of Information Systems (HES-SO)
 
Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...Institute of Information Systems (HES-SO)
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Institute of Information Systems (HES-SO)
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Institute of Information Systems (HES-SO)
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesInstitute of Information Systems (HES-SO)
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...Institute of Information Systems (HES-SO)
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesInstitute of Information Systems (HES-SO)
 

More from Institute of Information Systems (HES-SO) (20)

Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
 
L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?
 
Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodes
 
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair AccessibilityCrowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
 
Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
 
Medical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructuresMedical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructures
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
 
How to detect soft falls on devices
How to detect soft falls on devicesHow to detect soft falls on devices
How to detect soft falls on devices
 
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSISFUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
 
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLSMOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
 
Enhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET projectEnhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET project
 
Solar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptationSolar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptation
 

Recently uploaded

Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 

Recently uploaded (20)

Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 

MIE20232.pptx

  • 1. First Steps Towards a Risk of Bias Corpus of Randomized Controlled Trials Presenter – Anjani Dhrangadhariya MIE2023 - Göteborg, Sweden, 23.05.23 Authors: Anjani Dhrangadhariya, Roger Hilfiker, Martin Sattelmayer, Katia Giacomino, Rahel Caliesch, Simone Elsig, Nona Naderi, Henning Müller
  • 2. Randomized Controlled Trial • In theory, an RCT accurately measures intervention effects on patient outcomes, but in practice, biases enter • Design/Planning • Execution • Analysis • Outcomes reporting • Systematic Reviews • Utility • Medical professionals • Health policies • Surgeons
  • 3. • The risk of bias specifically pertains to systematic errors in the design, conduct, or reporting of a study that can potentially lead to a deviation from the true effect being measured. • RoB assessment guidelines Risk of Bias (RoB) Example RoB assessment guidelines Year Physiotherapy Evidence Database (PEDro) 1999 Risk of Bias Assessment Tool for Nonrandomized Studies (RoBANS) 2004 Cochrane Risk of Bias assessment guidelines 2008 Risk of Bias in Non-randomized Studies of Interventions (ROBINS-I) 2016 Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) 2017 Newcastle-Ottawa Scale (NOS) 2018 Revised Cochrane Risk of Bias for RCTs 2.0 tool (RoB 2) 2019
  • 4. RoB information extraction • Thorough assessment • Manual assessment • Time-consuming • Cognitively demanding • Two experts for manual assessment • Third, for conflict resolution • Automation imperative
  • 5. Related Work • RoB labelled corpus • Wang et al. 2022 • Preclinical animal studies • Human RCTs • RobotReviewer • PDF highlights • Freely-available • Closed assess data • Cochrane RoB v1 • RoB 2.0? • RoB automation • Marshall et al. 2015 • Millard et al. 2016 • Cochrane Database (CDSR) • Closed access
  • 6. Motivation 1 No RoB text annotation guidelines exist 2 No RoB annotated RCTs exist
  • 7. Revised Cochrane RoB 2.0 tool • Can you use the guidelines to annotate text corpus? • Extensive guidelines • Step-by-step instructions • Divides RoB into 5 domains • Each domain is assessed using several signalling questions Randomization process Deviations from intended interventions Missing outcomes data Outcomes measurement Selection of reported result Sterne, J.A., Savović, J., Page, M.J., Elbers, R.G., Blencowe, N.S., Boutron, I., Cates, C.J., Cheng, H.Y., Corbett, M.S., Eldridge, S.M. and Emberson, J.R., 2019. RoB 2: a revised tool for assessing risk of bias in randomised trials. bmj, 366.
  • 8. Revised Cochrane RoB 2.0 tool • Reviewers manually go through the RCT to identify text describing the answer to a signalling question. • Based on the answer to the signalling question, select one of the five response judgements: Yes Probably Yes Probably No No No Information
  • 9. Revised Cochrane RoB 2.0 tool • 2.1 - Were the participants aware of their assigned intervention during the trial? 2.1 No Good Risk domains Signalling questions 5 22
  • 10. Annotation schema • Follow the revised Cochrane RoB 2.0 • 110 span Labels • 1.1 Yes Good • 1.1 Probably Yes Good • 1.1 Probably No Bad • 1.1 No bad • 1.1 No Information • 1.2 Yes Good • 1.2 Probably Yes Good • 1.2 Probably No Bad • … 1.1 Yes Good Risk domain Signalling question SQ response Direction Good = low risk Bad = High risk
  • 11. Pilot Annotation • Ten RCT full-text PDFs • 2000-2019 • Four annotators • 2 scientists • 1 doctoral student • 1 scientific collaborator • Two NLP experts • 1 professor • 1 doctoral student • tagtog PDF annotation tool https://www.tagtog.com/
  • 12. Evaluation • F1-measure as Inter-annotator agreement • Disregards out-of-the-span tokens (unannotated tokens) 1. IAASQ Do the annotator pairs annotate the same text span to answer a signalling question (SQ)? 2. IAAresponse If the annotator pairs annotate the same text to answer a signalling question, do they also select same response judgment?
  • 13. Results - IAASQ • Zero or no Annotation • Domain 2 - 52% • Domain 3 - 54% • Domain 4 - 50% • Domain 5 - 61% (protocol) • Less subjective questions • Better IAA The table details the interpretation of pairwise F1-measure.
  • 14. Results - IAAresponse • IAA - SQ response judgment • Averaged over all annotator pairs • Zero agreement - 52.63% • No annotation – 22% ~75% The table details the interpretation of pairwise F1-measure.
  • 15. Error Inspection – 1. Text span disagreement • Not limiting the annotators to annotating • phrases vs full sentences 4.1 Was the method of measuring the outcome inappropriate? …The primary outcome measure was a 0–10 NRS pain score, which reflected the average pain experienced by the patient for ten days prior to follow-up… …a 0–10 NRS pain score… Phrase! Sentence
  • 16. Error Inspection – 2. Different sections • Annotators use different regions (Methods section, Results section, Table, …) of full text to come to identical labels. • Same judgment, different parts of text evidence 2.6 Was an appropriate analysis used to estimate the effect of assignment to intervention? …This study was guided by the HAPA, which has been widely used to address the gap between intention to change and a person’s actual change in behaviour [25-27]… …intention-to-treat analysis was done with missing data substituted by the last- observation-carried-forward procedure… 2.1 Yes Good
  • 17. Error Inspection – 3. Polarity disagreement … 71 allocated routine services, 67 allocated intervention service, 69 assessed at 8 weeks, 64 assessed at 8 week... 3.1 Were data for the outcome of interest available for all, or nearly all, participants randomized? • Selecting response judgment options with different polarities • Yes vs. No • Three of the four annotators responded to 3.1 with Yes, but one chose Probably no. • All or nearly all (cut-off?)
  • 18. Error Inspection – 4. Degree disagreement • Lenient - definitive • Yes • No • Stringent • Probably yes • Probably no 1.1 Was a random sequence generation method used to assign participants to intervention groups? …Patients were randomly allocated to either intervention by a computer-generated schedule stratified by sex and attendance at a day hospital…
  • 19. Conclusions 1. RoB 2.0 assessment guidelines cannot be directly used as RoB corpus annotation guidelines. 2. RoB assessment and RoB text annotation tasks are both highly subjective, but the annotation guidelines can be refined with an iterative process to improve both.
  • 20. Future Directions 1. Instructional placards as annotation guidelines 2. Larger annotated corpus of RCTs
  • 21. Dr. Roger Hilfiker Dr. Martin Sattelmayer Rahel Caliesch Katia Giacomino Dr. Nona Naderi Annotation team
  • 22. References 1. Wang, Q., Liao, J., Lapata, M., & Macleod, M. (2022). Risk of bias assessment in preclinical literature using natural language processing. Research Synthesis Methods, 13(3), 368-380. 2. Macleod, M. R., O’Collins, T., Howells, D. W., & Donnan, G. A. (2004). Pooling of animal experimental data reveals influence of study design and publication bias. Stroke, 35(5), 1203-1208. 3. Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, Kouril M, Marsolo K, Solti I. Building gold standard corpora for medical natural language processing tasks. InAMIA Annual Symposium Proceedings 2012 (Vol. 2012, p. 144). American Medical Informatics Association. 4. Sterne, J.A., Savović, J., Page, M.J., Elbers, R.G., Blencowe, N.S., Boutron, I., Cates, C.J., Cheng, H.Y., Corbett, M.S., Eldridge, S.M. and Emberson, J.R., 2019. RoB 2: a revised tool for assessing risk of bias in randomised trials. bmj, 366.
  • 23. Thank You Questions? Dataset: https://zenodo.org/record/7698941#.ZEGhXexBzzU Email: anjani.k.dhrangadhariya@gmail.com LinkedIn: https://www.linkedin.com/in/anjani-dhrangadhariya/

Editor's Notes

  1. Randomized controlled trials or RCTs, aim to accurately measure treatment effects on patient outcomes. In theory, they aim to minimize bias, but in practice, biases tend to creep into any of the trial stages. When RCTs with such questionable biases are used to write systematic reviews, they reduce the validity and utility of the review.
  2. Now, biases cannot be assessed from RCT studies, but the risk of bias can be estimated by identifying the systematic flaws in study design, planning, execution or even outcomes reporting. There are several risk-of-bias assessment guidelines that help thoroughly assess several bias risks in RCT literature. The latest published guidelines are the revised Cochrane RoB 2.0 guidelines.
  3. These guidelines help you thoroughly assess biases from RCT full-texts, but the process of manual RoB assessment is extremely time-consuming, resource intensive and cognitively demanding. Manual bias assessment is challenged by the rapidly rising publication of RCTs, and therefore, automatic RoB information extraction is imperative.
  4. There has been some work in automating RoB information extraction by Marshal and Millard studies, but the dataset used to train machine learning models is closed access. Later they developed a tool called RobotReviewer which is freely available but develops on closed access data which isn’t available to the community, and they automate using the older risk of bias guidelines. Recently, a RoB labelled corpus was released by Wang et al, but the corpus is based on preclinical animal studies and not human RCTs.
  5. So currently, we do no have any open access corpus annotated with risk of bias judgments and neither do we have guidelines to build one. These gaps prompted us to conduct this pilot project.
  6. RoB 2 are these really extensive and instructional guidelines that help you step-by-step assess the overall risk of bias from any RCT study. So before building our own annotation guidelines, we thought maybe we could use the RoB2 tool to annotate a text corpus as well. And to understand if we can use RoB 2 for this matter, we need to examine how it structures the bias assessment procedure. It divides the biases into 5 domains, each domain loosely translating to each of the trial stages. Each domain is assessed using several signalling questions.
  7. The reviewers manually go through each signalling question as it appears in the guidelines, and they try to identify text to answer this question in the RCT they are assessing. Once an answer text is found, based on that answer, they use this information to judge a minute chunk of risk corresponding to this signalling question. And based on the judgment they chose one of the five response options, with Yes mostly corresponding to yes – the answer suggests there’s risk of bias or No – there is no risk of bias for this question. However, it can also correspond to “Yes” – everything is alright and theres no risk of bias for this question.
  8. Take, for example, the signalling question 2.1. It asks whether the participants were aware of their assigned intervention during the trial. The reviewers identify the answer to this question in the text and let’s say they found that the participants were properly blinded to the intervention and were unaware of the assigned intervention meaning the bias is low and all is good for this signalling question. The reviewers needed to do it for 22 signalling questions in the RoB 2 tool so the exact procedure shown manually could be translated into the process of annotation.
  9. We need an annotation schema before starting to annotate the corpus We keep our annotation scheme very similar to how the assessment is structured in the RoB2 guidelines. Each of our span labels contains information about the domain the text is labelled for, the signalling question and also the response judgment. As the overall task of RoB assessment and annotation is very complex, we wanted to ensure the way labels are designed makes it easier for them to annotate.
  10. We then proceeded to annotate 10 full-text RCTs by four experts with varied RoB assessment expertise.
  11. This signalling question asks whether the outcomes data were available for all, or nearly all, participants randomized but does not clarify the exact cut-off for how many participant dropouts increase the risk? Therefore, the annotators make subjective response judgments depending upon what exact percentage of participant dropout is considered valid in their experience.
  12. The references, and...