Your SlideShare is downloading. ×
  • Like
BioNLP09 Winners
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

BioNLP09 Winners

  • 223 views
Published

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
223
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Extracting Complex Biological Events with Rich Graph­Based Feature Sets Jari Björne, Juho Heimonen, Filip Ginter, Antti Airola, Tapio Pahikkala, Tapio Salakoski BioNLP 2009 Workshop Farzaneh Sarafraz 18 June 2009    
  • 2. BioNLP'09 Task 1  Events in abstracts  Given: gene and gene products (proteins)  Wanted: events − type − trigger − participant(s) − cause (if applicable)    
  • 3. Example "I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. " Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3    
  • 4. Event Types  Gene expression  Binding  Transcription  Regulation  Protein Catabolism  Positive regulation  Localisation  Negative regulation  Phosphorylation    
  • 5. Training and Test Data  Training data: 800 abstracts  Development data: 150 abstracts  Test data: 260 abstracts    
  • 6. The System  Trigger recognition − Methods similar to NER − Classification  Argument detection − Graph edge selection − Classification  Semantic post­processing − Rule­based    
  • 7. Trigger Detection  Token labelling (one for each type and one ­)  92% of triggers are single token − Adjacent tokens form a trigger if they appear in the  training data  Triggers that share a token: − Combined class: gene expression/pos regulation  A graph node for each trigger − Not duplicated just yet    
  • 8. Classification ­ SVM  Token features − Binary: capitalisation, presence of punctuation or  numeric characters − Stem − Character bigrams and trigrams − Token is known triggers in training data − All the above for linear and dependency  “neighbours”    
  • 9. Classification ­ SVM  Frequency features − # of named entities  In sentence  In a linear window around the token  Bag­of­words count of token texts in the sentence (?)  Dependency chains − Up to depth of 3 from the token are constructed − At each depth both token and frequency features − Plus dep type and sequence of dep types in chain    
  • 10. Two SVMs  “Somewhat”  different feature sets  Combined weighted results “This design should be considered an artifact of  the time­constrained, experiment­driven  development of the system rather than a  principled design”    
  • 11. Precision/Recall trade­off  Undetected trigger ­­> undetected event  All triggers have events in the training data ­­>  bias towards reporting an event for all detected  triggers  Adjust P/R explicitly  − multiply the negative class by β − find β experimentally    
  • 12. Edge Detection  Multi­class SVM  All potential directed edges − Event node to named entity − Event node to event node (nested event) − Labelled as theme, cause, or negative  Each edge is predicted independently    
  • 13. Feature Set – Central Concept Shortest undirected  path of syntactic  dependencies in the  Stanford scheme  parse of the  sentence.    
  • 14. Feature Set  Token text, POS, entity/event class,  dependency (subject)  N­grams: merging the attributes of 2­4 − Consecutive tokens − Consecutive dependencies − Each token and two neighbouring dependencies − Each dependency and two neighbouring tokens − One bigram showing direction    
  • 15. Other Features  Individual component features  Semantic node features  Frequency features    
  • 16. Semantic Post­Processing  Duplicate nodes − Same class and same trigger − Combined trigger  Remove improper arguments  Remove directed cycles by removing the  weakest link    
  • 17. Duplicating Event Nodes  Task restrictions − Two causes, − must have theme, − etc.  Several heuristics  x­th first dependency  in shortest path from  the event for binding    
  • 18. Results    
  • 19. Compared to Us    
  • 20. What Didn't Work/Wasn't Tried  CRF  HMM  Removing strong independence assumption  Co­reference resolution (4.8%)    
  • 21. End.