Application of Information Extraction techniques to pharmacological domain: Extracting drug-drug interactions

1
Ph.D Thesis
Application of Information Extraction
techniques to pharmacological domain:
Extracting drug-drug interactions.
Isabel Segura Bedmar
Advisor: Paloma Martínez Fernández
April, 23th
2010 Title:logo_uc3m.eps
Creator:GIMP PostScript f
CreationDate:Mon Feb 8
LanguageLevel:2

2
Outline
Introduction
I
State of the Art
Proposal
Evaluation
Conclusions
DrugDDI Corpus
IE processes for DDI Extraction.
Future Work

3
What is a Drug-Drug Interaction (DDI)?
Introduction

4
Beneficial
Introduction
Ritonavir + Lopinavir → Effective antiretroviral
Nifedipine + propranolol → Antianginal drug

5
Dangerous
Introduction
Aspirin + Heparin → Bleeding
Aspirin + Acetazolamide → Death

6
 47.8% of adverse events
are due to drugs, of
which 3.5% result from
DDI1.
 Medication errors kill
7,000 patients per
annum in USA2.
 High incidence in certain
patient groups (3-20%).
 Increase the Healthcare
costs
Things can get complicated...
1. APEAS Estudio sobre la seguridad de los pacientes en Atención primaria de salud.
Madrid: Ministerio de Sanidad y Consumo, 2008
2. Kohn et al., 2000. “To Err is Human”.
Introduction

7
Drug interaction Resources
Most effective source:
Medical Literature.
Introduction

8
Drug interaction Resources
Introduction
Drug Interactions Resources
Medical and Pharmaceutical Research
Publications
? ?

9
How does Information Extraction help?
Triamterene, metformin and amiloride
should be co-administered with care
as they might increase dofetilide levels.
DDI ( TRIAMTERENE, DOFETILIDE)
DDI ( METFORMIN, DOFETILIDE)
DDI ( AMILORIDE, DOFETILIDE)
Introduction

10
Thesis approach goals
Introduction
Create an
annotated corpus
to study the
extraction of DDI
An Information
Extraction (IE)
System for DDI

11
Thesis specific goals
Creation of an annotated corpus of DDI.
Introduction
Study the main approaches for biomedical IE.
Develop a framework that allows the study
and combination of different techniques IE.
Propose a method to resolve the anaphoric
expressions involving drugs.
Integration of biomedical resources and
nomenclature standards.
Propose a method to identify and classify
drugs.

12
Thesis specific goals
Introduction
Combine the resolution of complex syntactic
constructions and a set of lexical patterns
defined by a pharmacist in order to extract
DDIs.
Study the performance of a machine
learning method to detect DDIs.
Compare both previous approaches and
analyze the results.

State of the Art
Approaches
14
State of the Art
Relation
Extraction
In biomedicine
1
Linguisticbased
approaches
2
3
Patternbased
approaches
Machine Learning
based approaches
4
Hybrid approaches

State of the Art
Approaches
15
State of the Art
Relation
Extraction
In biomedicine
1
Linguisticbased
approaches
2
3
Featuresbased
approaches
Kernelsbased
approaches
Patternbased
approaches
Machine Learning
based approaches

State of the Art
Unsolved Issues in BNER
16
State of the Art
Few approaches dealing with Drug Name
recognition.
New drugs are continually approved.
Synonyms.
Anaphoric expressions.
Ambiguity.
Abbreviations.

State of the Art
Unsolved Issues in Biomedical RE
17
State of the Art
No approaches for DDI extraction.
No annotated corpus for DDI.
Abstracts MedLine.
Sentence Level.

State of the Art
Unsolved Issues in Biomedical RE
18
State of the Art
Modality and negation are usually ignored.
Clauses, adverbial and prepositional
phrases are not usually addressed.
Performance depends heavily on results
from previous.
Huge gap among life science researches,
healthcare professionals and computer
scientists.

20
Proposal
Proposal
Create an
annotated corpus
to study the
extraction of DDI
An Information
Extraction System
for DDI

21
Text Analysis by MetaMap program
Corpus TXT
XML annotated with shallow
syntactic and semantic
information from UMLS
UMLS MetaMap
(MMTx):
Text analysis
Unified
Medical
Language
System
(UMLS)
Proposal: Corpus DrugDDI
DrugBank

22
Example of annotation

23
Example of annotation

24
Annotation of Corpus DrugDDI
Total
Avg .
per
doc
DDIs 3,160 5.5
Sentences 5,806 10.2
Sentences with
at least one DDI
2,044
3.5
Drugs 14,930 25.7
Documents 579
27597 90%
3160 10%
Non-DDI
DDIs

25
Proposal
Proposal
Create an
annotated corpus
to study the
extraction of DDI
An Information
Extraction (IE)
System for DDI

26
IE System for DDI
Corpus TXT
Text analysis
XML annotated
with shallow
syntactic and
semantic
information
Drug Name
Recognition
Anaphora
Resolution
DDI Extraction
+
drugs and other
biomedical
concepts
+
anaphoras
+
Drug
interactions
Biomedical Resources
Proposal: DrugDDI system

27
Text Analysis
Corpus TXT
XML annotated
with shallow
syntactic and
semantic
information
Drug Name
Recognition
Anaphora
Resolution
DDI Extraction
+
drugs and other
biomedical
concepts
+
anaphoras
+
Drug
interactions
Text analysis

28
Drug Name Recognition (DrugNer)
Corpus TXT
+
drugs and other
biomedical
concepts
+
anaphoras
+
DrugDrug
interactions
WHOINN
affixes
UMLS
XML annotated
with shallow
syntactic and
semantic
information
Drug Name
Recognition
Anaphora
Resolution
DDI Extraction
Text analysis

29
WHO affixes
for identifying and classifying drugs
Affixes
WHOINN
Drug Family Pattern Drugs
pristin Antibacterials,
pristinamycin
derivatives
[AZaz09]*[pristin] Efepristin
gatran Antithrombotic
agents
[AZaz09]*[gatran] Dabigatran
-tinib Antineoplastic
agents
[AZaz09]*[tinib] Dasatinib,
Sunitinib,
Nilotinib
-mycin -Antibiotics [AZaz09]*[mycin] Tanespimycin

30
Evaluation of DrugNer
Precision Recall F-measure
0.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
MMTx
MMTx + affixes
Affix-based classification obtains an accuracy
rate of 75%
Number of drugs in the DrugNer
corpus (849 Medline abstracts)
Detected by MMTx
8,093
(97.6%)
Only detected by affixes 181 (2.2%)
Detected neither by
MMTx nor affixes
20 (0,2%)
Total 8,294

31
Drug Anaphora Resolution
Corpus TXT
Drug Name
Recognition
Anaphora
Resolution
DDI Extraction
Text analysis
+
drugs and other
biomedical
concepts
+
anaphoras
+
DrugDrug
interactions
XML annotated
with shallow
syntactic and
semantic
information

32
Levofloxacin is one of the most commonly prescribed
antibiotics in clinical practice.
Several case reports have indicated that this drug may
signicantly potentiate the anticoagulation effect of
warfarin.
DDI MAY POTENTIATE( LEVOFLOXACIN , WARFARIN )
How does Anaphora Resolution help?

Corpus DrugNerAr
Personals
Reflexives (it)
Relatives (which)
Distributives (both)
Demonstratives (these)
Indefinites (some)
0 20 40 60 80 100 120 140
Definites (the)
Possessives (its)
Distributives (both)
Demonstratives (these)
Indefinites
0 10 20 30 40 50 60 70
49 documents, 1,976 sentences, 331
anaphoric expressions.
Pronominal Anaphora Nominal Anaphora

34
Approaches for Drug anaphora resolution
Baseline
Scoring-based method
Linguistic rules-based
method (Centering theory)

35
Drug anaphora resolution results
Pronominal Nominal
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Baseline
Scoring-based approach
Linguistic Rules-based
approach
F-measure

36
Drug-Drug Interaction Extraction
Corpus TXT
Drug Name
Recognition
Anaphora
Resolution
DDI Extraction
Text analysis
+
drugs and other
biomedical
concepts
+
anaphoras
+
DrugDrug
interactions
XML annotated
with shallow
syntactic and
semantic
information

37
Approaches for DDI Detection
DDI Extraction Drug interactions
Syntactic Information
+ Lexical Patterns
Machine Learning
1
2

38
Syntactic Information
+ Lexical Patterns
Machine Learning
1
2

39
Pharmacological patterns
Patterns defined by our pharmacist.
<DRUG> INTERACT(S) WITH <DRUG>.
<DRUG> (INCREASE(S)|DECREASE(S)|...) <DRUG EFFECTS>
<DRUG> INTERFERE(S) WITH <DRUG PROPERTIES>
CONCURRENT USE OF <DRUG> WITH <DRUG> (INCREASE(S)|
DECREASE(S)|...) <DRUG PROPERTIES>
<DRUG> INHIBIT(S) <DRUG PROPERTIES>
COADMINISTRATION OF <DRUG> AND <DRUG> RESULT IN
<DRUG PROPERTIES>
<DRUG EFFECTS> OF <DRUG> BE (ENHANCED|REDUCED|...) BY
<DRUG>

40
1st approximation: Syntactic
Information + Lexical Patterns
XML annotated with
shallow syntactic and
semantic information,
drugs and other
biomedical concepts,
anaphoras
DrugDrug
interactions
Detection of
coordinate structures
Detection of
appositions
Pattern
Matching
Clause Splitting
Rules for sentence
simplification

Allopurinol interacts with anisindione, azathioprine and cyclosporine
How does syntactic information help?
Detecting coordinate structures

Detection of
Coordinate structures
COORD := ([NP|PP|AJD|UNK],)* [NP|PP|
ADJ|UNK] CONJ [NP|PP|ADJ|UNK]
Allopurinol interact with COORD

Detection of
Allopurinol interacts with COORD
Drug Name Recognition
DRUG.1 interacts with COORD

DDI := <DRUG1|COORD|APPOSITION>
INTERACTS WITH
<DRUG2|COORD|APPOSITION>.
Detection of
Pattern Matching

DDI := <DRUG1|COORD|APPOSITION>
INTERACTS WITH
<DRUG2|COORD|APPOSITION>.
Detection of
Pattern Matching
DRUG-DRUG INTERACTION:
Drug 1: Allopurinol
Drug 2: anisindione
Drug 1: Allopurinol
Drug 2: azathioprine
Drug 1: Allopurinol
Drug 2: cyclosporine

46
Evaluation 1st
approach
for DDI extraction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.67
0.14
0.23
0.48
0.26
0.34
0.49
0.25
0.33
Lexical Patterns (baseline)
Coordinatives & Appositions
Coordinatives & Appositions &
Clauses
Experiments:

47
Machine Learning
Syntactic information
+ lexical patterns1
2

Training Texts
Text analysis
Drug Name
Recognition
DDI learning
Examples
Generator
DDI model
Testing Texts
Text analysis
Drug Name
Recognition
DDI classification
Examples
Generator
Classification Problem
Training
dataset
Testing
dataset

Aspirin may decrease the effects of probenecid, sulfinpyrazone and phenylbutazone
Generating examples (relation instances)

Positive examples
Negative examples

57
Imbalanced dataset
27597 90%
3160 10%
Non-DDI
DDIs
Dataset Docs Sents. Drugs
Training 437 4,578 2,650
Testing 142 1,228 753
Total 579 5,806 3,313
Dataset Exa. Pos. Neg.
Training 25,209 2,433 22,776
Testing 5,548 726 4,821
Total 30,757 3,160 27,597

58
Shallow Linguistic Relational kernel
(Giuliano et al., 2006)
Shallow representation of sentences (no syntax)
Global Context Kernel.
Local Context Kernel.

59
Global Context: Fore-Between
Concurrent administration of a TNF antagonist with ORENCIA
has been associated with an increased risk of serious
infections and no significant additional efficacy over use of the
TNF antagonists alone.
KGlobalContext
(R1
,R2
)=KFore-Between
(R1
,R2
)

60
Global Context: Between
KgoblalContext
(R1
,R2
)=KFore-Between
(R1
,R2
) +
KBetween
(R1
,R2
)

61
Global Context: Between-After
KGlobalContext
(R1
,R2
)=KFore-Between
(R1
,R2
) +
KBetween
(R1
,R2
) +
KBetween-After
(R1
,R2
)

KGlobal context
(R1
, R2
)=?
KGlobal context(
“Coadministration of DRUG with DRUG may increase the risk of
toxicity”,
“Coadministration of DRUG with DRUG may increase OTHER
exposure”)
=
How many ngrams do both examples share?
Global Context Kernel (n-gram)

KGlobal context
(R1
, R2
)=?
KGlobal context(
toxicity”,
exposure”)
= KFore-Between
(R1
,R2
) + KBetween
(R1
,R2
) + Kbetween-After
(R1
,R2
)
= 2
Global Context Kernel (n-gram=2)

KGlobal context
(R1
, R2
)=?
KGlobal context(
toxicity”,
exposure”)
= KFore-Between
(R1
,R2
) +
KBetween
(R1
,R2
) +
KBetween-After
(R1
,R2
)

KGlobal context
(R1
, R2
)=?
KGlobal context(
toxicity”,
exposure”)
=7
with
with
with
with

Aspirin may decrease the effects of probenecid, sulfinpyrazone
and phenylbutazone
Local Context Kernel (window-size =2 )

LEFT
and phenylbutazone

LEFT
Aspirin&Aspirin&noun&Aspirin&DRUG&A|
may&may&verb&may&O&O|
decrease&decrease&verb&decrease&O&O|
and phenylbutazone
CLEFT

RIGHTLEFT
and phenylbutazone
CLEFT

RIGHTLEFT
effects&effect&noun&effect&O&O|
probenecid&probenecid&noun&probenecid&DRUG&T|
,&,&comma&,&O&O|
sulfinpyrazone&sulfinpyrazone&noun&sulfinpyrazone&O&O|
and phenylbutazone
CLEFT
CRIGHT

RIGHTLEFT
,&,&comma&,&O&O|
and phenylbutazone
CLEFT
CRIGHT
KLocalContext
(R1
,R2
)=KLeft
(R1
,R2
) + KRight
(R1
,R2
)

72
Claudio Giuliano, Alberto Lavelli, Lorenza Romano. Exploiting
Shallow Linguistic Information for Relation Extraction from
Biomedical Literature, EACL 2006
KSL
(R1
,R2
)= KGlobalContext
(R1
,R2
) + KLocalContext
(R1
,R2
)
DDI classification using Shallow
Linguistic kernel

73
Evaluation 2nd
approach
for DDI extraction
Precision Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.57
0.72
0.55
0.76
n=3, w=1
n=1, w=2
n=2, w=3
n=3, w=3
n=4, w=3
n=5, w=3
Fmeasure = 0,64
Parameters for
Shallow Linguistic
Kernel
n = ngram
w = windowsize

74
Evaluation 2nd
approach
for DDI extraction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.53
0.72
0.61
0.51
0.68
0.580.57
0.72
0.64
0.55
0.76
0.64
Global Context (n-gram=3)
Local Context (window-size=2)
Shallow Linguistic (n-gram=3,
window-size=1)
Shallow Linguistic (n-gram=1,
window-size=2)

75
Experiment results on imbalanced
and balanced datasets
F-mesure
0
0.2
0.4
0.6
0.8
1
Baseline
Balanced Training
& Imbalanced
Testing
Imbalanced
Training and
Testing
F-mesure
0
0.2
0.4
0.6
0.8
1
Baseline
Balanced Training
and Testing

77
Patterns vs Kernels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.48
0.26
0.34
0.55
0.82
0.66
Syntactic + Patterns Approach
Shallow Kernel
Evaluation

Thesis contributions: Problem
79
Conclusions
Initial problem definition.
First approach for DDI extraction.
Multidisciplinary research group

Thesis contributions: Corpora
80
Conclusions
Creation of several biomedical corpora:
DrugNer, DrugNerAr.
First annotated corpus with DDI.

Thesis contributions: Biomedical
Named Entity Recognition
Conclusions
Drug Name Recognition and Classification.

Thesis contributions: Language
82
Conclusions
Drug Anaphora resolution.
Complex syntactic constructions:
coordinate and appositive structures, clauses.

Thesis contributions: Relational
Kernels
83
Conclusions
Comparative analysis: patterns vs kernels.
Relational Kernels applied to
pharmacological domain.

Future Work: DrugDDI corpus
85
Future Work
Increase the quality of the DrugDDI corpus.

Future Work: Drug Classification
86
Future Work
Improve the drug classification considering the
ATC system.

Future Work: Language
87
Future Work
Handle the mistakes made by MMTx.
Improve the clause splitting process.
Treatment of negation and modality.
Use the drug families to resolve nominal
anaphora.

Future Work: DDI Extraction
88
Future Work
Integrate the drug anaphora resolution in
the DDI extraction.
Use the SPINDEL [De Pablo-Sánchez et al.,
2009] system to acquire new patterns.
Extract relevant information about each
DDI.

Future Work: Relational Kernel
89
Future Work
Semantic Kernel (drug family, semantic
types, WordNet, etc).
Parse tree or dependency graph kernels.
Study other solutions for imbalanced
learning.

Future Work: Application
90
Future Work
Improve Drug interaction resources.
User-oriented evaluation.

Projects
91
Projects
This work has been partially supported by the Spanish research
projects:
MAVIR consortium (S-0505/TIC-0267, www.mavir.net), a network
of excellence funded by the Madrid Regional Government.
ISSE: Semantic Interoperability in Electronic Healthcare (FIT-
350300-2007-75).
BRAVO: Advanced Multimodal and Multilingual Question
Answering. (TIN2007-67407-C03-01).
MULTIMEDICA: Multilingual Information Extraction in Health
domain and application to scientific and informative
documents. Propuesta Plan Nacional de I+D 2009. UC3M, UPM,
UAM.

Dissemination
92
Dissemination
Isabel Segura-Bedmar, Mario Crespo, Cesar de Pablo-Sánchez,
Paloma Martínez. (2010). Resolving anaphoras for the
extraction of drug-drug interactions in pharmacological
documents. BMC BioInformatics, 11(Suppl 2):S1.
César de Pablo-Sanchez, Juan Perea, Isabel Segura-Bedmar,
Paloma Martinez. (2009). The UC3M team at the Knowledge
Base Population task. TAC 2009.
Isabel Segura-Bedmar, Mario Crespo, Cesar de Pablo-Sánchez,
Paloma Martínez. (2009) DrugNerAR: Linguistic Rule-Based
Anaphora Resolver for Drug-Drug Interaction Extraction
in Pharmacological Documents. ACM DTMBIO 09.
Isabel Segura-Bedmar, Mario Crespo, Cesar de Pablo-Sánchez.
(2009) Score-based approach for Anaphora Resolution in
Drug-Drug Interactions Documents. NLDB 2009.

Dissemination
93
Dissemination
Isabel Segura-Bedmar, Paloma Martínez, María Segura-
Bedmar. (2008). Drug Name Recognition and classification
in biomedical texts. Drug Discovery Today. 2008 Sep;13(17-
18).
Isabel Segura-Bedmar, Paloma Martínez, Doaa Samy. (2008) A
preliminary approach to recognize generic drug names
by combining UMLS resources and USAN naming
conventions. ACL BIONLP'08.
Isabel Segura-Bedmar, Paloma Martínez, Doaa Samy. (2008)
Detección de fármacos genéricos en textos biomédicos.
Revista SEPLN.

Dissemination
94
Dissemination
Isabel Segura Bedmar, Doaa Samy, José L. Martínez-
Fernández, Paloma Martínez. (2007) Detecting Semantic
Relations between Nominals using Support Vector
Machines and Linguistic-Based Rules. OTM 2007.
Isabel Segura Bedmar, Doaa Samy y José L. Martínez-
Fernández. (2007) UC3M: Classification of Semantic
Relations between Nominals using Sequential Minimal
Optimization. ACL SEMEVAL 2007.
Isabel Segura Bedmar, José L. Martínez-Fernández, Paloma
Martínez. (2006) Including deeper semantic information in
the Lexical Markup Framework: a aproposal. Fifth
Slovenian and First International Language Technologies
Conference, IS-LTC 2006.
Isabel Segura Bedmar, José L. Martínez-Fernández, Paloma
Martínez. (2006) Una Propuesta para el Etiquetado
Automático de Roles Semánticos. Revista SEPLN.

Dissemination
95
Dissemination
Isabel Segura-Bedmar, Paloma Martínez, Cesar de Pablo-
Sánchez (2010). Extracting drug-drug interactions from
biomedical texts. Accepted at BioTM 2010 (Workshop on
Advances in Bio Text Mining). BMC BioInformatics.
Roxana Danger, Isabel Segura-Bedmar, Paloma Martínez,
Paolo Rosso. (2009). A comparison of machine learning
techniques for detection of drug target articles..
Submitted to Journal of Biomedical Informatics.

96
Drug Drug Interaction detection is a
promising application for IE and NLP

97
Ph.D Thesis
Application of Information Extraction
techniques to pharmacological domain:
Extracting drug-drug interactions.
Isabel Segura Bedmar
Advisor: Paloma Martínez Fernández
April, 23th
2010 Title:logo_uc3m.eps
Creator:GIMP PostScript f
CreationDate:Mon Feb 8
LanguageLevel:2

99
The pressor effects of [catecholamines
such as dopamine or norepinephrine]_APOS
are enhanced by Bretylium Tosylate.
which can be interpreted as:
1) The pressor effects of catecholamines are enhanced by Bretylium
2) The pressor effects of dopamine are enhanced by Bretylium
3) The pressor effects of norepinephrine are enhanced by Bretylium
How syntactic information helps?
<DRUG EFFECT> OF (DRUG|APOS) BE
<INTERACT_VERB> BY (DRUG|APOS)
1) DDI increase ( BRETYLIUM TOSYLATE, CATECHOLAMINES )
2) DDI increase (BRETYLIUM TOSYLATE, DOPAMINE)
3) DDI increase (BRETYLIUM TOSYLATE, NOREPINEPHRINE)
Detecting appositive structures
Proposal: DrugDDI prototype

Catecholamine-depleting drugs, such as reserpine, may have an additive
effect when given with beta-blocking agents.
DDI := <DRUG1|APPOSITION>
(HAVE|INCREASE|...) <EFFECT>
WHEN GIVEN WITH
<DRUG2|APPOSITION>.
Detection of
Appositions
Pattern Matching
Drug 1: Catecholamine-depleting drugs
Drug 2: beta-blocking agents
Property|Effect: additive
Drug 1: Reserpine
Drug 2: beta-blocking agents
Property|Effect: additive
APPOSITION may have an additive effect when given with DRUG.
APPOSITION := <APPOSITIVE>
MARKER <APOSITIVE>
Detecting appositions

Concomitant administration of corticosteroids with Aspirin may increase the risk of
gastrointestinal ulceration and may reduce serum salicylate levels.
Concomitant administration of
corticosteroids with Aspirin may
increase the risk of
gastrointestinal ulceration
Concomitant administration of
corticosteroids with Aspirin may
reduce serum salicylate levels.
PATTERN: ADMINISTRATION
OF <DRUG1> WITH
<DRUG2>
MAY (INCREASE|REDUCE)...
Clause splitting
Pattern Matching
Drug 1: Corticosteroids
Drug 2: Aspirin
Action: increase
Property|Effect: Gastrointestinal
ulceration
Drug 1: Corticosteroids
Drug 2: Aspirin
Action: reduce
Property|Effect: serum salicylate
levels
Detecting clauses

102
Evaluation:
Syntactic structures resolution
Appositions Coordinatives Relatives Clauses
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Precision
Recall
F-measure

RIGHT
and phenylbutazone
ΦRIGHT(R)=
Local Context Kernel

104
[The Cmax of norethindrone was 13% higher] when
[it was coadministered with gabapentin]
What is the problem?
Complex sentences: Interactions could
span several clauses
In a pharmacokinetic substudy in patients with
congestive heart failure receiving
furosemide or digoxin in whom therapy with FLOLAN
was initiated, apparent oral clearance values
for furosemide (n = 23) and digoxin (n= 30)
were decreased by 13% and 15%, respectively,
on the second day of therapy and had returned
to baseline values by day 87.

105
Therefore,
[when MIDAMOR and non-steroidal
anti-inflammatory agents
are used concomitantly],
[the patient should be observed closely to determine
if thedesired effect of the diuretic is obtained].
Interactions could span several clauses
The most sentenses are complex
sentences.

106
In subjects who had received 21 days of
40 mg/day racemic citalopram, com-
bined administration of 400 mg/day cimetidine
for 8 days resulted in an
increase in citalopram AUC and
Cmax of 43% and 39%, respectively.
Patterns are not enough for detecting
other interactions

KGlobal context
(R1
, R2
)=?
KGlobal context(“DRUG may interact with DRUG”,
“DRUG may interact with DRUG, OTHER, OTHER”)=2
KGlobal context(“DRUG may interact with DRUG”,
“DRUG may decrease the effect of DRUG, OTHER, and OTHER”)=0
KGlobal context(“Coadministration of DRUG with DRUG may increase the
risk of toxicity”, “Coadministration of DRUG with DRUG may increase
OTHER exposure”, )=2

108
Example of parsed sentence

109

110

111

112
Experiments
XML annotated with
drugs and other
anaphoras
DrugDrug
interactions
Detection of
Detection of
appositions
Pattern
Matching
Clause Splitting
Rules for sentence
simplification

113
1st
Experiment: Baseline
XML annotated with
drugs and other
anaphoras
DrugDrug
interactions
Pattern
Matching

114
2nd Experiment: Coordinate structures
and appositions
XML annotated with
drugs and other
anaphoras
DrugDrug
interactions
Detection of
Detection of
appositions
Pattern
Matching
Clause Splitting
Rules for sentence
simplification

115
3th Experiment: Coordinate structures,
appositions and clauses
XML annotated with
drugs and other
anaphoras
DrugDrug
interactions
Detection of
Detection of
appositions
Pattern
Matching
Clause Splitting
Rules for sentence
simplification

116
Evaluation 2nd
Approximation
(n=3, w=1)
(n=1, w=2)
(n=2, w=3)
(n=3, w=3)
(n=4, w=3)
(n=5, w=3)
0
2000
4000
6000
8000
10000
12000
14000
16000
sg.
ngram=3, windowsize=1
minimizes the training time
and maximizes the precision
ngram=1, windowsize=2
minimizes the training time
and maximizes the recall
Training Time

RIGHTLEFT
the&the&det&the&O&O|
,&,&comma&,&O&O|
and&and&conj&and&O&O|
phenylbutazone&phenylbutazone&noun&phenylbutazone&O&O|
and phenylbutazone
CLEFT
CRIGHT
KLocalContext
(R1
,R2
)=KLeft
(R1
,R1
) + KRight
(R1
,R2
)

118
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.55
0.82
0.66
0.36
0.91
0.52
0.82
0.91
0.86
Imbalanced datasets
Balanced training dataset &
Imbalanced testing dataset
Balanced training and testing
datasets

119
Rules for syntactic simplification
Rules based on [Shiddahartan, 2006]:
Rules for Appositive Clause Simplification
S = [V][W][X apos of V][Z] => S1=[V][W][Z],
S2=[V] is/are [X apos of V]
Rules for Coordinative Clause Simplification
S=CONJ [X], [Y] => S1 = X, S2 = Y
S = [IF] [X] [THEN|,] [Y] => S1 = X, S2 = Y
S = [X] [,]? [CONJ] [Y] => S1 = X, S2 = Y
Rules for Relative Clause Simplification
S = [W] [X][Y relative W] [Z].=>S1 = W X Z.
S2 = W Y.
....

120
2nd
approximation: Machine Learning
Classification problem.
Every drug pair is a relation instance.
Relation is reciprocal, drug order is not
important.

121
Relation Instances (examples)

123
F-measure Recall F-measure
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.66
0.82
0.66
0.52
0.91
0.52
0.82
0.91
0.86
Baseline
Imbalanced datasets
datasets

124
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.55
0.82
0.66
0.36
0.91
0.52
0.82
0.91
0.86
Imbalanced datasets
datasets

Application of Information Extraction techniques to pharmacological domain: Extracting drug-drug interactions

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Application of Information Extraction techniques to pharmacological domain: Extracting drug-drug interactions

Similar to Application of Information Extraction techniques to pharmacological domain: Extracting drug-drug interactions (20)

More from Grupo HULAT

More from Grupo HULAT (20)

Recently uploaded

Recently uploaded (20)

Application of Information Extraction techniques to pharmacological domain: Extracting drug-drug interactions

Editor's Notes