Data Programming for PICO Information Extraction

Data Programming for PICO
information extraction in
absence of labelled data
Presenter – Anjani Dhrangadhariya
Group Meeting: 05.04.23
0 0 0 0 0 0
0 0 0 0 0 1
0 0 0 0 0 0
1 1 0 1 1 1
1 0 1 1 1 1
1 1 1 1 0 1

PICO Information
…A semi-structured interview
was used to obtain qualitative
information on the effect of the
daily aerobics intervention vs.
conventional exercise. The
convenience sample included 15
adult Oncology outpatients, 13
female and 2 male, ranging in
age from 20 to 87. Quality of life
was measured using SF-36
QOLS…
Participant
Intervention
Comparator
Outcome
15 adult Oncology outpatients, 13 female and 2 male, ranging in age from 20 to
87.
daily aerobics intervention
conventional exercise
Quality of life was measured using SF-36 QOLS…
Coarse-grained
information
Clinical trial (study)

Significance
 Prescribe treatment
 Diagnosis decisions
 Government’s health policies
 Health economic evaluation
 …
P
I
C
O
1,139 expert hours (avg.)
Quarter a million dollars

PICO Information
P
I
C
O
aerobics intervention
SF-36 QOLS
Quality of life
conventional exercise
15
Adult, age from 20 to 87
Oncology outpatients
13 female and 2 male
Sample size
Age
Condition
Gender
Intervention name
Control
Outcome method
Measurement scale
daily Intervention frequency
…A semi-structured interview
was used to obtain qualitative
information on the effect of the
daily aerobics intervention vs.
conventional exercise. The
convenience sample included 15
adult Oncology outpatients, 13
female and 2 male, ranging in
age from 20 to 87. Quality of life
was measured using SF-36
QOLS…
Spans ↔ Entities
Fine-grained
information

PICO extraction: Automation
• Challenging – fuzzy spans and entities
1. Nested
2. Overlapping
3. Highly contextual
• Low resource
• EBM-PICO (Nye et al., 2018)
• Ever extensible subunits
• Annotation task – tough to define
• Annotation personnel – tough to train
P
Sample size
Age
Condition
Gender
Ethnicity
Overall sample size
Subgroup sample size
Disease
Disorder
Sign and symptoms
Social status
Education
…

PICO: Manual annotation
• PICO
• PICOS
• PPICO
• PICOTS
• PIBOSO
• SPIDER
• …
• Error-prone
• Errors in EBM-PICO [1]
1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6174533/bin/NIHMS988059-supplement-Appendix.pdf

PICO: Manual annotation
1. Errors in the annotated corpus
2. Extract new classes
3. Re-define existing classes with new labels
Re-annotation?

Weak Supervision via Data programming
• Data Programming based weak supervision uses “programmatic
labelling” - relies on programmatic labelling sources to obtain training
data.
• Programmatic labelling is quick and allows efficient modifications to
the training data labels per
• the downstream application changes
• error correction
• addition of more entities

Programmatic labelling
• Uses a set of labelling sources* S = {s1, s2, … sn} and a set of weak-
labelling functions* Y = 𝜆1, 𝜆2, … , 𝜆𝑛 that could sample from the
unlabelled data and label a subset of them.
(.*) Pattern matching
Boolean search
DB lookup
Heuristics
Legacy systems
Third-party models
Crowd-sourced labels
*Labeling functions (LF)
Ontologies
Regular Expressions
Linguistic grammar
Dictionaries
Manual annotation
Terminologies from KBs
*Labeling sources (LS)

Labelling functions
def labelling_function_1(tokens, dictionary):
return [1 if t matches dictionary_term_i else 0 for t in tokens]
def labelling_function_2(tokens, Pattern):
return [1 if t matches re.Pattern else 0 for t in tokens]
def labelling_function_3(tokens, POS, Pattern):
return [1 if t.POS in {pos1, pos2, …} and matches re.Pattern else 0 for t in tokens]
def labelling_function_n(tokens, labelling_source):
return …
text tokens
Token labels
Labelling functions (in bold)
Labelling sources (in italics)

• m LFs label n text tokens generating multiple annotations for the
same tokens.
Programmatic labelling
Sequence 𝜆1 𝜆2 𝜆3 𝜆4 𝜆5 𝜆6 𝜆7 𝜆8
strength 1 0 1 0 1 1 1 1
training 1 0 1 1 1 0 0 1
and 0 0 0 0 0 1 0 0
aerobic 1 0 0 1 0 0 1 1
exercise 1 1 1 1 0 1 1 0
for 0 0 0 0 0 0 0 0
… 0 0 0 0 0 0 0 0
1 = Intervention

Labelling functions: aggregation
• Aggregate labels from multiple labelling functions 𝜆1, 𝜆2, … , 𝜆𝑛 to
obtain a consensus label for your tokens.
Weakly-supervised Transformer Predict
Data
programming

Data Programming: Applications
Biomedical text and image classification
Biomedical entity recognition
✘Clinical information extraction, especially PICO
• Highly compositional
• Difficult to define
• Fuzzy boundaries
• Meng, Yu, et al. "Weakly-supervised neural text classification." proceedings of the 27th ACM International Conference on information and knowledge management. 2018.
• Wang, Yanshan, et al. "A clinical text classification paradigm using weak supervision and deep representation." BMC medical informatics and decision making 19.1 (2019): 1-13.
• Mintz, Mike, et al. "Distant supervision for relation extraction without labeled data." Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th
International Joint Conference on Natural Language Processing of the AFNLP. 2009.
Gene Protein Drug Chemical Disease
IGF1 IGF1 Nomenclature and standardizations

Objective
• Adapting data programming and weak supervision to extract PICO
information.
• Use as small automatically labelled training data as possible.
• Use the freely-available resources.

Method: Dataset
• EBM-PICO
• 5000 abstracts
• Training and test set
• Coarse-grained PICO (spans or sentences)
• Fine-grained PICO (entities or phrases/words)

Method: Dataset
• EBM-PICO training set
• Error rectification on 2,960 (~1%) of 1,303,169 training tokens
• EBM-PICO test set
1. Error rectification
2. Re-annotation

Method: Dataset preprocessing
• Comes pre-tokenized
• No preprocessing was applied to the EBM-PICO dataset
• Enrichment
• POS tags
• Token Lemma

Method: Dataset
• Fine-grained PIO labels 0, 1, 2, … , 𝑘 0, 1
• PL 0, 1, 2, 3, 4 0, 1
• IL 0, 1, 2, 3, 4, 5, 6, 7 0, 1
• OL 0, 1, 2, 3, 4, 5, 6 0, 1
EBM-PICO training and test sets
𝑦𝑖 ⊂ 0, 1, 2, … , 𝑘 𝑦𝑖 ⊂ 0, 1
𝐷 = 𝑋, 𝑌 𝑛
MultiLabelBinarizer()

Task: binary labelling
• Input sequence: 𝑿 = 𝑥1, 𝑥2, … , 𝑥𝑛
• Output sequence: 𝒀 = (𝑦1, 𝑦2, … , 𝑦𝑛), where 𝑦𝑖 ⊂ 𝑦; 𝑦 = {1,0}
• Develop m LFs to label n text tokens as
1. P vs OOS (1 vs 0)
2. I vs OOS (1 vs 0)
3. O vs OOS (1 vs 0)
• Without ground truth Y, we estimate by aggregating several
labelling functions.
OOS = Out-of-the-span tokens

Method I: Labelling sources
• Labelling sources to label P, I (C) and O
• UMLS
• Non-UMLS ontologies (Table 2)
• Distant supervision dictionaries – clinicaltrials.gov
• Hand-crafted dictionaries
• Regular expressions
• Heuristics

Method II: Source preprocessing
• UMLS – 223 vocabularies
• Non-English removed
• Zoonotic removed
• Vocabularies with less than 500 terms removed
• Smart lowercasing to preserve abbreviations
• Removal
• Numbers
• punctuations, and
• trailing spaces
• Any term with less than 3 char

Vocabulary 1 Concept 1 Disease
Vocabulary 2 Concept 2 Age group
Vocabulary 3 Concept 3 Pharmaceutical Drug
Vocabulary 4 Concept 4 Sign and Symptoms
Vocabulary … Concept … …
Vocabulary n Concept n Mental dysfunction
Method III: Source to target mapping
Ontologies
P
I
O
Distant Supervision
ReGeX
Heuristics
Dictionaries
Semantic types Targets 𝑻
Task-
specific
rules
Sources 𝑺
UMLS ontology
https://bioportal.bioontology.org/

Method IV: Labelling functions
Terminology LF
ReGeX LF
Heuristic LF
{−1, 0, +1}
{0, +1}
{0, +1}
n Text Tokens Token labels
Dictionary sources
ReGeX sources
Heuristics
𝝀𝒎
Stop Words LF {0, −1}
Negative LFs
−1 = Negative class label
+1 = Positive class label
0 = abstain

Method V: Label matrix
𝝀𝒎 : 𝑋𝑛⟶ 𝑌𝑛
𝑚 × 𝑛
Label matrix ⋀
t1 t2 t3 t4 t5 t6
LF1 0 0 0 1 1 1
LF2 0 1 0 1 0 1
LF3 0 1 0 0 1 1
LF4 0 0 0 1 1 1
LF5 0 0 0 1 1 0
LF6 0 1 0 1 1 1

Method VI: Combine noisy labels
1. Majority vote (MV): choose the label chosen by the majority of LFs.
𝑌𝑀𝑉 = max
𝑦 ⊂ 0,1
𝑖=1
𝑚
1 λ𝑖 = 𝑦𝑖
In case of ties or abstains, choose label 0

LF1
LF2
LF3
LF4
LF5
LF6
0 0 0 0 0 0
0 0 0 0 0 1
0 0 0 0 0 0
1 1 0 1 1 1
1 0 1 1 1 1
1 1 1 1 0 1
1 2 3 4 5 6
Probabilistic labels for words
Via generative model
0
1
n
m
Weakly-supervised
discriminative model
Predict
Label matrix ⋀
• Label model (LM)
• Generative model

Method VII: Discriminative model
• Weakly supervised transformer model
• Noise-aware loss function 𝑳
Noise-aware loss 𝑳
WS PubMedBERT
𝐷 = 𝑋𝑛
, 𝑌𝑛
𝒀′
Predicted classes
𝜔 = argmin
𝜔
1
𝑁
𝑖=1
𝑛
𝐸𝑦~𝑌 𝑙 𝑓 𝑥, 𝜔 , 𝑦

Evaluation
• Token-wise macro-averaged F1-score
• EBM-PICO test set
• Original
• Error-rectified

Results 1: Error rectification
• 2,960 (~1% of training tokens) analysed for errors
Class Total errors
Participant 23.39%
Intervention 18.30%
Outcome 20.21%

Results 1: Error rectification
• Error correction on EBM-PICO gold test set
• Cohens knew > Cohens k
Cohens k = Agreement between different
annotator pairs for the EBM-PICO test set
Cohens knew = Agreement between the
original EBM-PICO test set and error-rectified
EBM-PICO test set

Results 2: Labelling functions
Total 160 141 143

Results 3: FS vs. WS
ΔF1
P 1.71%
I 8.67%
O 20.14%

Results 3: MV vs LM
Avg. performance
improvement: 2.74%
(0.17%-5.83%).

Results 3: LM vs WS Performance decay:
0.4% - 2.56%

Results 4: Precision vs. Recall
• LM consistently improves performance in comparison to recall-
oriented MV

Conclusion
• Weak supervision via data programming could be adapted to PICO
information extraction.
• Utilize freely-available sources
• Training set ~5000 documents
• It can outperform full supervision but requires careful selection and
design of labelling sources and functions.
• The label model (LM) is better than majority voting.
• Pretrained transformers bring performance improvement, but not
always.

Future work
• Gathering labelling sources for the PICO class
• The sources were no representative
• Incorporating class weights into the labelling functions
• LF x n (Label model removes any redundant labelling functions)

References
1. Dhrangadhariya, Anjani, and Henning Müller. "Not so weak PICO: leveraging weak supervision for participants,
interventions, and outcomes recognition for systematic review automation." JAMIA open 6.1 (2023): ooac107.
2. Nye B, Jessy Li J, Patel R, et al. A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support
Language Processing for Medical Literature. Proc Conf Assoc Comput Linguist Meet. 2018;2018:197-207.
3. Lee, Grace E., and Aixin Sun. "A study on agreement in Pico span annotations." Proceedings of the 42nd International ACM
SIGIR Conference on Research and Development in Information Retrieval. 2019.
4. Abaho, Micheal, et al. "Correcting crowdsourced annotations to improve detection of outcome types in evidence-based
medicine." CEUR Workshop Proceedings. Vol. 2429. 2019.
5. Ratner, Alexander, et al. "Snorkel: Rapid training data creation with weak supervision." Proceedings of the VLDB
Endowment. International Conference on Very Large Data Bases. Vol. 11. No. 3. NIH Public Access, 2017.
6. Fries, Jason A., et al. "Ontology-driven weak supervision for clinical entity classification in electronic health
records." Nature communications 12.1 (2021): 1-11.

Thank You
Questions?
Repository: https://github.com/anjani-dhrangadhariya/distant-PICO
Dataset: https://datadryad.org/stash/dataset/doi:10.5061/dryad.ncjsxkszr
Medium: Recipe for weak supervision

Data Programming for PICO Information Extraction

Recommended

Recommended

More Related Content

Similar to Data Programming for PICO Information Extraction

Similar to Data Programming for PICO Information Extraction (20)

Recently uploaded

Recently uploaded (20)

Data Programming for PICO Information Extraction

Editor's Notes