Your SlideShare is downloading. ×
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
BioNLP09 Winners
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

BioNLP09 Winners

228

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
228
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Extracting Complex Biological Events with Rich Graph­Based Feature Sets Jari Björne, Juho Heimonen, Filip Ginter, Antti Airola, Tapio Pahikkala, Tapio Salakoski BioNLP 2009 Workshop Farzaneh Sarafraz 18 June 2009    
  • 2. BioNLP'09 Task 1  Events in abstracts  Given: gene and gene products (proteins)  Wanted: events − type − trigger − participant(s) − cause (if applicable)    
  • 3. Example "I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. " Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3    
  • 4. Event Types  Gene expression  Binding  Transcription  Regulation  Protein Catabolism  Positive regulation  Localisation  Negative regulation  Phosphorylation    
  • 5. Training and Test Data  Training data: 800 abstracts  Development data: 150 abstracts  Test data: 260 abstracts    
  • 6. The System  Trigger recognition − Methods similar to NER − Classification  Argument detection − Graph edge selection − Classification  Semantic post­processing − Rule­based    
  • 7. Trigger Detection  Token labelling (one for each type and one ­)  92% of triggers are single token − Adjacent tokens form a trigger if they appear in the  training data  Triggers that share a token: − Combined class: gene expression/pos regulation  A graph node for each trigger − Not duplicated just yet    
  • 8. Classification ­ SVM  Token features − Binary: capitalisation, presence of punctuation or  numeric characters − Stem − Character bigrams and trigrams − Token is known triggers in training data − All the above for linear and dependency  “neighbours”    
  • 9. Classification ­ SVM  Frequency features − # of named entities  In sentence  In a linear window around the token  Bag­of­words count of token texts in the sentence (?)  Dependency chains − Up to depth of 3 from the token are constructed − At each depth both token and frequency features − Plus dep type and sequence of dep types in chain    
  • 10. Two SVMs  “Somewhat”  different feature sets  Combined weighted results “This design should be considered an artifact of  the time­constrained, experiment­driven  development of the system rather than a  principled design”    
  • 11. Precision/Recall trade­off  Undetected trigger ­­> undetected event  All triggers have events in the training data ­­>  bias towards reporting an event for all detected  triggers  Adjust P/R explicitly  − multiply the negative class by β − find β experimentally    
  • 12. Edge Detection  Multi­class SVM  All potential directed edges − Event node to named entity − Event node to event node (nested event) − Labelled as theme, cause, or negative  Each edge is predicted independently    
  • 13. Feature Set – Central Concept Shortest undirected  path of syntactic  dependencies in the  Stanford scheme  parse of the  sentence.    
  • 14. Feature Set  Token text, POS, entity/event class,  dependency (subject)  N­grams: merging the attributes of 2­4 − Consecutive tokens − Consecutive dependencies − Each token and two neighbouring dependencies − Each dependency and two neighbouring tokens − One bigram showing direction    
  • 15. Other Features  Individual component features  Semantic node features  Frequency features    
  • 16. Semantic Post­Processing  Duplicate nodes − Same class and same trigger − Combined trigger  Remove improper arguments  Remove directed cycles by removing the  weakest link    
  • 17. Duplicating Event Nodes  Task restrictions − Two causes, − must have theme, − etc.  Several heuristics  x­th first dependency  in shortest path from  the event for binding    
  • 18. Results    
  • 19. Compared to Us    
  • 20. What Didn't Work/Wasn't Tried  CRF  HMM  Removing strong independence assumption  Co­reference resolution (4.8%)    
  • 21. End.    

×