1. Using SVMs with the Command Relation Feature
to Identify Negated Events in Biomedical
Literature
Farzaneh Sarafraz
Goran Nenadic
School of Computer Science
University of Manchester
sarafraf@cs.man.ac.uk
g.nenadic@manchester.ac.uk
3. Motivation & aim
• Biomedical literature
• 2000 papers published every day
• Biomedical information extraction needed
• Improve IE by negation information
• Negative results are interesting and reported
• “The IKK complex, but not p90 (rsk), is responsible for the in
vivo phosphorylation of I-kappa-B-alpha.”
• Resources
• Shared tasks, data
• Linguistic tools (syntactic parsers)
3 / 27
4. Problem statement
• Given
• Pubmed abstracts
• Protein/gene mentions annotated
• Molecular events annotated
• Wanted for every event
• Negated or not
• Classification problem
4 / 27
5. Molecular events
participant trigger participant
“We further show that Nmi interacts with all STATs except Stat2.”
trigger
event
participation type
participation type
{theme, cause}
{theme, cause}
event type
participant {binding, participant
transcription,
regulation,
participant type expression} participant type
{gene/protein, event} {gene/protein, event}/ 27
5
6. Molecular events – class I
• One theme (gene/protein)
• “The effect of this synergism was perceptible at
the level of induction of the IL-2 gene.”
• Trigger: induction
• Type: gene expression
• Theme: IL-2
• Types: transcription, gene expression, phosphorylation,
protein catabolism, localization
6 / 27
7. Molecular events – class II
• One or more themes (gene/protein)
• “We further show that Nmi interacts with all
STATs except Stat2.”
• Trigger: interacts
• Type: binding
• Themes: Nmi, Stat2
• Negated
• Type: Binding
7 / 27
8. Molecular events – class III
• 1 theme, 0 or 1 cause
• may be gene/protein or other events
• “Overexpression of full-length ALG-4 induced
transcription of FasL and, consequently, apoptosis.”
Event Trigger Type Theme Cause
Event 1 “transcription” Transcription FasL
Event 2 “Overexpression” Gene expression ALG-4
Event 3 “Overexpression” Regulation Event 2
Event 4 “induced” Regulation Event 1 Event 3
8 / 27
• Types: regulation types
9. Data: BioNLP’09
• Training: 800 abstracts
• Test: 260 abstracts
• Gold annotations
• Event trigger, type, participants, negation
• Negation cue not annotated
Event Training data Development data
Test data
class total negated total negated
Class I 2,858 131 559 26
Class II 887 44 249 15
Class III 4,870 440 987 66
Total 9,685 615 1,795 107
9 / 27
12. Baseline results
Approach P R F1 Spec.
No negation detection - 0% - 94%
any negation cue present 20% 78% 32% 81%
NegEx 36% 37% 36% 93%
12 / 27
13. The command relation
• If a and b are nodes in the constituency parse
tree of a sentence, then a X-commands b iff the
lowest ancestor of a with label X is also an
ancestor of b.
Ronald Langacker, On Pronominalization and the Chain of Command, in D. Reibel and S. Schane (eds.) Modern
Studies in English, Prentice-Hall, Englewood Cliffs, NJ. 160-186. 1969.
13 / 27
14. Example of the command relation
S
a S
• a S-commands b.
• b does not S-command a. b
14 / 27
15. X-command
in action
S
We now VP
show that
S VP
a mutant motif that exchanges fails to bind the p50
the terminal 3' C for a G homodimer.
15 / 27
16. Rule-based method
• An event is negated if
• Negation cue exists;
and
• Negation cue S-commands any participant
• Negation cue S-commands trigger
• Negation cue S-commands both
• Negation cue VP-commands both
16 / 27
17. Results of rule-based method
Approach P R F1 Spec.
negation cue S-commands any 23% 76% 35% 84%
participant
negation cue 23% 68% 34% 85%
S-commands trigger
negation cue 23% 68% 35% 86%
S-commands both
negation cue 42%
VP-commands both
17 / 27
18. SVM features
• Semantic features
• Event type
• Lexical features
• Sentence contains negation cue
• Negation cue
• Syntactic features
• POS of neg cue
• POS of event trigger
• POS of the participants
• Parse tree distance between trigger & cue
• Type of smallest phrase containing trigger & cue
• Cue S-commands any participant
• Cue S-commands trigger
18 / 27
19. Results of single SVM, incremental
feature sets
Feature set P R F1 Spec.
Features 1-7 43% 8% 14% 99.2%
Features 1-8 73% 19% 30% 99.3%
Features 1-9 71% 38% 49% 99.2%
Features 1-10 76% 38% 51% 99.2%
19 / 27
20. 1. Event type
Results of single SVM, incremental
2. Sentence contains neg
cue
feature sets
3. Neg cue
4. POS of neg cue
5. POS of event trigger
6. POS of theset
Feature participants P R F1 Spec.
7. Type of smallest phrase
Features 1-7
containing trigger & cue 43% 8% 14% 99.2%
Features 1-8 73% 19% 30% 99.3%
Features 1-9 71% 38% 49% 99.2%
Features 1-10 76% 38% 51% 99.2%
20 / 27
21. 1. Event type
Results of single SVM, incremental
2. Sentence contains neg
cue
feature sets
3. Neg cue
4. POS of neg cue
5. POS of event trigger
6. POS of theset
Feature participants P R F1 Spec.
7. Type of smallest phrase
Features 1-7
containing trigger & cue 43% 8% 14% 99.2%
8. Cue S-commands any
participant 1-8
Features 73% 19% 30% 99.3%
Features 1-9 71% 38% 49% 99.2%
Features 1-10 76% 38% 51% 99.2%
21 / 27
22. 1. Event type
Results of single SVM, incremental
2. Sentence contains neg
cue
feature sets
3. Neg cue
4. POS of neg cue
5. POS of event trigger
6. POS of theset
Feature participants P R F1 Spec.
7. Type of smallest phrase
Features 1-7
containing trigger & cue 43% 8% 14% 99.2%
8. Cue S-commands any
participant 1-8
Features 73% 19% 30% 99.3%
9. Cue S-commands
Features 1-9
trigger 71% 38% 49% 99.2%
Features 1-10 76% 38% 51% 99.2%
22 / 27
23. 1. Event type
Results of single SVM, incremental
2. Sentence contains neg
cue
feature sets
3. Neg cue
4. POS of neg cue
5. POS of event trigger
6. POS of theset
Feature participants P R F1 Spec.
7. Type of smallest phrase
Features 1-7
containing trigger & cue 43% 8% 14% 99.2%
8. Cue S-commands any
participant 1-8
Features 73% 19% 30% 99.3%
9. Cue S-commands
Features 1-9
trigger 71% 38% 49% 99.2%
10.Parse tree distance
Features 1-10
between trigger & cue 76% 38% 51% 99.2%
23 / 27
24. Results of separate SVMs for each class
Event class P R F1 Spec.
Class I 94% 65% 77% 99.8%
(559 events)
Class II 100% 33% 50% 100%
(249 events)
Class III 81% 44% 57% 99.2%
(987 events)
Micro-average 88% 49% 63% 99.4%
(1,795 events)
Macro-average 92% 47% 62% 99.7%
(3 classes)
24 / 27
25. Future work
• Use class-specific features
• Study other variants of command
• Combine negation detection with automatic
event detection instead of using ‘gold’ events
• Use negation detection on a larger scale dataset
(MEDLINE) to find contradictions & contrasts in
the biomedical literature
25 / 27
26. Conclusions
• SVM for extracting negated events
• >99% specificity
• 63% F-measure (micro average)
• Different classes of events behave differently
• To detect negated molecular event
• Event trigger & surface distances not enough
• Semantic & command features useful
• Event participants as important as triggers
• Apply on large scale data – MEDLINE
26 / 27
27. Acknowledgements
• Organisers of BioNLP’09
• GN TEAM
• Casey Bergman’s lab – Faculty of Life Sciences,
University of Manchester
• James Eales – University of Manchester
• Jonathan Caruana – University College London
• Web service soon available at
http://gnode1.mib.man.ac.uk/negmole
27 / 27
28. X-command S
in action
We now VP
show that
S VP
a mutant motif that exchanges fails to bind the p50
the terminal 3' C for a G homodimer that
S
is upregulated in LPS tolerant
human Mono Mac 6 cells.
28 / 27