SlideShare a Scribd company logo
Extracting Complex Biological Events
with Rich Graph­Based Feature Sets


 Jari Björne, Juho Heimonen, Filip Ginter, Antti
 Airola, Tapio Pahikkala, Tapio Salakoski
 BioNLP 2009 Workshop

Farzaneh Sarafraz
18 June 2009
                           
BioNLP'09 Task 1
       Events in abstracts
       Given: gene and gene products (proteins)
       Wanted: events
        −   type
        −   trigger
        −   participant(s)
        −   cause (if applicable)

                                     
Example
    "I kappa B/MAD­3 masks the nuclear localization 
      signal of NF­kappa B p65 and requires the 
      transactivation domain to inhibit NF­kappa B 
      p65 DNA binding. "


    Event: negative regulation
    Trigger: masks
    Theme1: the first p65
    Cause: MAD­3


                             
Event Types
       Gene expression             Binding
       Transcription               Regulation
       Protein Catabolism          Positive regulation
       Localisation                Negative regulation
       Phosphorylation




                              
Training and Test Data
       Training data: 800 abstracts
       Development data: 150 abstracts
       Test data: 260 abstracts




                               
The System
       Trigger recognition
        −   Methods similar to NER
        −   Classification
       Argument detection
        −   Graph edge selection
        −   Classification
       Semantic post­processing
        −   Rule­based
                                    
Trigger Detection
       Token labelling (one for each type and one ­)
       92% of triggers are single token
        −   Adjacent tokens form a trigger if they appear in the 
            training data
       Triggers that share a token:
        −   Combined class: gene expression/pos regulation
       A graph node for each trigger
        −   Not duplicated just yet
                                       
Classification ­ SVM
       Token features
        −   Binary: capitalisation, presence of punctuation or 
            numeric characters
        −   Stem
        −   Character bigrams and trigrams
        −   Token is known triggers in training data
        −   All the above for linear and dependency 
            “neighbours”

                                     
Classification ­ SVM
       Frequency features
        −   # of named entities
                In sentence
                In a linear window around the token
                Bag­of­words count of token texts in the sentence (?)
       Dependency chains
        −   Up to depth of 3 from the token are constructed
        −   At each depth both token and frequency features
        −   Plus dep type and sequence of dep types in chain
                                         
Two SVMs
       “Somewhat”  different feature sets
       Combined weighted results



    “This design should be considered an artifact of 
      the time­constrained, experiment­driven 
      development of the system rather than a 
      principled design”

                               
Precision/Recall trade­off
       Undetected trigger ­­> undetected event
       All triggers have events in the training data ­­> 
        bias towards reporting an event for all detected 
        triggers
       Adjust P/R explicitly 
        −   multiply the negative class by β
        −   find β experimentally


                                     
Edge Detection
       Multi­class SVM
       All potential directed edges
        −   Event node to named entity
        −   Event node to event node (nested event)
        −   Labelled as theme, cause, or negative
       Each edge is predicted independently



                                   
Feature Set – Central Concept

    Shortest undirected 
     path of syntactic 
     dependencies in the 
     Stanford scheme 
     parse of the 
     sentence.




                             
Feature Set
       Token text, POS, entity/event class, 
        dependency (subject)
       N­grams: merging the attributes of 2­4
        −   Consecutive tokens
        −   Consecutive dependencies
        −   Each token and two neighbouring dependencies
        −   Each dependency and two neighbouring tokens
        −   One bigram showing direction
                                  
Other Features
       Individual component features
       Semantic node features
       Frequency features




                              
Semantic Post­Processing
       Duplicate nodes
        −   Same class and same trigger
        −   Combined trigger
       Remove improper arguments
       Remove directed cycles by removing the 
        weakest link



                                  
Duplicating Event Nodes
       Task restrictions
        −   Two causes,
        −   must have theme,
        −   etc.
       Several heuristics
       x­th first dependency 
        in shortest path from 
        the event for binding
                                  
Results




           
Compared to Us




                  
What Didn't Work/Wasn't Tried
       CRF
       HMM
       Removing strong independence assumption
       Co­reference resolution (4.8%)




                               
End.




        

More Related Content

Viewers also liked

Nacsa úJ 4.1 Jav.
Nacsa úJ 4.1 Jav.Nacsa úJ 4.1 Jav.
Nacsa úJ 4.1 Jav.
tuddyke
 
Workshop negations
Workshop negationsWorkshop negations
Workshop negations
farzanehs
 
the_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframethe_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframeguest7ae38dee
 
Olivia Contradictions
Olivia ContradictionsOlivia Contradictions
Olivia Contradictionsfarzanehs
 

Viewers also liked (11)

Language
LanguageLanguage
Language
 
Six Month
Six MonthSix Month
Six Month
 
Nacsa úJ 4.1 Jav.
Nacsa úJ 4.1 Jav.Nacsa úJ 4.1 Jav.
Nacsa úJ 4.1 Jav.
 
Workshop negations
Workshop negationsWorkshop negations
Workshop negations
 
Edu2
Edu2Edu2
Edu2
 
Eoy
EoyEoy
Eoy
 
the_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframethe_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframe
 
I2b209
I2b209I2b209
I2b209
 
Defense
DefenseDefense
Defense
 
Olivia Contradictions
Olivia ContradictionsOlivia Contradictions
Olivia Contradictions
 
Ambiguity
AmbiguityAmbiguity
Ambiguity
 

Similar to BioNLP09 Winners

BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
BITS
 
BioWeka
BioWekaBioWeka
BioWeka
Martin Szugat
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Nancy Bansal
 
Workshop NGS data analysis - 3
Workshop NGS data analysis - 3Workshop NGS data analysis - 3
Workshop NGS data analysis - 3
Maté Ongenaert
 
Machine reading for cancer biology
Machine reading for cancer biologyMachine reading for cancer biology
Machine reading for cancer biology
Laura Berry
 
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat Security Conference
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learningbutest
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
Dan Gaston
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
Maté Ongenaert
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptx
Pandi Gingee
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
Kishor Datta Gupta
 
Automatic test packet generation
Automatic test packet generationAutomatic test packet generation
Automatic test packet generation
tusharjadhav2611
 
CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 
sequencea.ppt
sequencea.pptsequencea.ppt
sequencea.ppt
olusolaogunyewo1
 
Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012GenomeInABottle
 
Temporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationTemporal Hypermap Theory and Application
Temporal Hypermap Theory and Application
Abel Nyamapfene
 
Instruction level power analysis
Instruction level power analysisInstruction level power analysis
Instruction level power analysisRadhegovind
 
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Priyanka Aash
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
 

Similar to BioNLP09 Winners (20)

BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
BioWeka
BioWekaBioWeka
BioWeka
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Workshop NGS data analysis - 3
Workshop NGS data analysis - 3Workshop NGS data analysis - 3
Workshop NGS data analysis - 3
 
Machine reading for cancer biology
Machine reading for cancer biologyMachine reading for cancer biology
Machine reading for cancer biology
 
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
BlueHat v18 || Protecting the protector, hardening machine learning defenses ...
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learning
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptx
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
 
Automatic test packet generation
Automatic test packet generationAutomatic test packet generation
Automatic test packet generation
 
CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012
 
sequencea.ppt
sequencea.pptsequencea.ppt
sequencea.ppt
 
Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012
 
Temporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationTemporal Hypermap Theory and Application
Temporal Hypermap Theory and Application
 
Instruction level power analysis
Instruction level power analysisInstruction level power analysis
Instruction level power analysis
 
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
Protecting the Protector, Hardening Machine Learning Defenses Against Adversa...
 
Thesis proposal
Thesis proposalThesis proposal
Thesis proposal
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

BioNLP09 Winners

  • 2. BioNLP'09 Task 1  Events in abstracts  Given: gene and gene products (proteins)  Wanted: events − type − trigger − participant(s) − cause (if applicable)    
  • 3. Example "I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. " Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3    
  • 4. Event Types  Gene expression  Binding  Transcription  Regulation  Protein Catabolism  Positive regulation  Localisation  Negative regulation  Phosphorylation    
  • 5. Training and Test Data  Training data: 800 abstracts  Development data: 150 abstracts  Test data: 260 abstracts    
  • 6. The System  Trigger recognition − Methods similar to NER − Classification  Argument detection − Graph edge selection − Classification  Semantic post­processing − Rule­based    
  • 7. Trigger Detection  Token labelling (one for each type and one ­)  92% of triggers are single token − Adjacent tokens form a trigger if they appear in the  training data  Triggers that share a token: − Combined class: gene expression/pos regulation  A graph node for each trigger − Not duplicated just yet    
  • 8. Classification ­ SVM  Token features − Binary: capitalisation, presence of punctuation or  numeric characters − Stem − Character bigrams and trigrams − Token is known triggers in training data − All the above for linear and dependency  “neighbours”    
  • 9. Classification ­ SVM  Frequency features − # of named entities  In sentence  In a linear window around the token  Bag­of­words count of token texts in the sentence (?)  Dependency chains − Up to depth of 3 from the token are constructed − At each depth both token and frequency features − Plus dep type and sequence of dep types in chain    
  • 10. Two SVMs  “Somewhat”  different feature sets  Combined weighted results “This design should be considered an artifact of  the time­constrained, experiment­driven  development of the system rather than a  principled design”    
  • 11. Precision/Recall trade­off  Undetected trigger ­­> undetected event  All triggers have events in the training data ­­>  bias towards reporting an event for all detected  triggers  Adjust P/R explicitly  − multiply the negative class by β − find β experimentally    
  • 12. Edge Detection  Multi­class SVM  All potential directed edges − Event node to named entity − Event node to event node (nested event) − Labelled as theme, cause, or negative  Each edge is predicted independently    
  • 13. Feature Set – Central Concept Shortest undirected  path of syntactic  dependencies in the  Stanford scheme  parse of the  sentence.    
  • 14. Feature Set  Token text, POS, entity/event class,  dependency (subject)  N­grams: merging the attributes of 2­4 − Consecutive tokens − Consecutive dependencies − Each token and two neighbouring dependencies − Each dependency and two neighbouring tokens − One bigram showing direction    
  • 15. Other Features  Individual component features  Semantic node features  Frequency features    
  • 16. Semantic Post­Processing  Duplicate nodes − Same class and same trigger − Combined trigger  Remove improper arguments  Remove directed cycles by removing the  weakest link    
  • 17. Duplicating Event Nodes  Task restrictions − Two causes, − must have theme, − etc.  Several heuristics  x­th first dependency  in shortest path from  the event for binding    
  • 20. What Didn't Work/Wasn't Tried  CRF  HMM  Removing strong independence assumption  Co­reference resolution (4.8%)    
  • 21. End.