This document describes a project to enable large-scale annotation of drug-drug interactions (DDIs) documented in drug product labels. Over 50,000 product labels were analyzed to extract tables containing DDI information. A Python script was used to categorize the extracted table data and enter it into a Named Entity Recognizer to identify drug mentions. This generates a computable format for the DDI data that can be further processed, including pre-annotation and crowdsourcing, to create a comprehensive drug interaction knowledge database.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Project Final Presentation
1. Enabling Large-Scale Annotation of Drug-
Drug Interactions in Product Labels to Create
a Drug Interaction Knowledge Database
Josh Le
2. WHAT’S THE PROBLEM?
◉ Drugs can interact +/-
◉ Drug-drug interactions (DDIs) are documented in product
labeling but not readable by a computer
◉ Crowdsourcing could enable “annotation” (formalizing
DDIs for better search and retrieval)
Goal: Generate a computable format of tables from product
labels for entry into the Named Entity Recognizer (NER)
3.
4. HOW ARE WE GOING TO GET THERE?
◉ Python script written to extract
and categorize information from
tables
◉ Data entered into NER to return
drug output
5. EXTRACTING DDIs FROM LABELS
~ 50,000 product
labels (Nov. 2013)
Tables with
possible DDIs &
Drug Mentions
Data extracted
from tables
NER to find
drug mentions
6. BASIC TABLE STATISTICS
Property Value
Number of Tables 1,057
Number of Headers 2,182
Number of Proper Headers (<th>
Tag)
582
Proportion of Proper Headers 26.7%
Number of Distinct Headers 350
Number of Categories 8
Category Name Count
Drug Class or Drug Name 132
Effect on Drug 57
Interaction Properties 38
Interacting Substance 3
Interacting Substance Properties 25
Miscellaneous 21
Recommendation or Comment 73
Sample Size 1
8. NOT ALL DRUG MENTIONS ARE DDIs
◉ Manual sampling yielded
general trend of fewer
DDIs than drug mentions
Category Drug Mentions DDIs
Interacting Substance 20 7-20
Recommendation or
Comment
4 0
9. PUTTING IT ALL TOGETHER
◉ Enables computational form of table data for NER
◉ Returns drug mentions and probable DDIs
◉ Further processing will include pre-annotation and
eventually crowdsourcing
10. COULD NOT HAVE BEEN COMPLETED WITHOUT…
This project is supported by a grant from the National Library of Medicine: "Addressing gaps in
clinically useful evidence on drug-drug interactions" (1R01LM011838-01) and by training grant
5T15LM007059-29 from the National Library of Medicine/National Institute of Dental and Craniofacial
Research.
Richard D.
Boyce, Ph.D.
Jodi Schneider,
Ph.D.
Yifan Ning,
M.S.