Presentation material


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Presentation material

  1. 1. A Bio Text Mining Workbench combined with Active Machine Learning Gary Geunbae Lee Postech 11/25 LBM2005
  2. 2. Contents <ul><li>Introduction </li></ul><ul><li>POSBIOTM/W Workbench </li></ul><ul><li>POSBIOTM/NER System </li></ul><ul><li>POSBIOTM/NER with Active Machine Learning </li></ul><ul><li>POSBIOTM/Event System </li></ul><ul><li>Current status ( demo) </li></ul>
  3. 3. Introduction <ul><li>Exponentially growing biological publications </li></ul>
  4. 4. Introduction <ul><li>Biological named entity recognition. </li></ul><ul><li>Extract the biological interaction (events) between biological entities. </li></ul><ul><ul><li>Important to biological pathway. </li></ul></ul>Biological Papers <ul><li>Two key issues to deal with biological texts. </li></ul>
  5. 5. Introduction <ul><li>Development workbench (common in NLP) </li></ul><ul><ul><li>Grammar development workbench </li></ul></ul><ul><ul><li>POS/Tree Tagging workbench </li></ul></ul><ul><li>Use large amount of Corpus </li></ul><ul><ul><li>Machine Learning methods are used in NER task and event extraction task. </li></ul></ul><ul><ul><li>Annotated corpus is essential to achieve good results in machine learning based methods (both in quantity and quality) </li></ul></ul><ul><ul><li>Lack of annotated corpus (notorious in bio/medical fields) </li></ul></ul><ul><li>Need </li></ul><ul><ul><li>tools in support of collecting, managing, creating, annotating and exploiting rich biomedical text resources. </li></ul></ul><ul><ul><li>Tools which interacts with the automatic system to increase the high quality annotated corpus </li></ul></ul><ul><li>Bio-text mining workbench </li></ul>
  6. 6. Contents <ul><li>Introduction </li></ul><ul><li>POSBIOTM/W Workbench </li></ul><ul><li>POSBIOTM/NER System </li></ul><ul><li>POSBIOTM/NER with Active Machine Learning </li></ul><ul><li>POSBIOTM/Event System </li></ul><ul><li>Current status </li></ul>
  7. 7. POSBIOTM/W : A development W orkbench <ul><li>Overall Design </li></ul>
  8. 8. POSBIOTM/W Workbench <ul><li>Goal </li></ul><ul><ul><li>help users to search, collect and manage publications. </li></ul></ul><ul><li>Quick Search Bar </li></ul><ul><ul><li>provides quick access to PubMed. </li></ul></ul><ul><li>Pubmed Search Assistant </li></ul><ul><ul><li>Users can select specific abstracts to do the named-entity tagging and event extraction </li></ul></ul><ul><li>Managing Tool </li></ul>
  9. 9. POSBIOTM/W Workbench <ul><li>Managing Tool </li></ul><ul><li>Pubmed search Assistant </li></ul>
  10. 10. POSBIOTM/W Workbench <ul><li>N amed-entity recognition (NER) task </li></ul><ul><ul><li>identification of material names concerned. </li></ul></ul><ul><li>Goal: automatically and effectively annotate biomedical-related entities. </li></ul><ul><li>NER Tool is a Client Tool of POSBIOTM/NER System </li></ul><ul><ul><li>Currently, Three NER models are provided. </li></ul></ul><ul><ul><li>The GENIA-NER model, the GENE-NER-model and the GPCR-NER model </li></ul></ul><ul><li>Named-entity recognition with Active learning </li></ul><ul><ul><li>To minimize the human labeling effort </li></ul></ul><ul><li>NER Tool </li></ul>
  11. 11. POSBIOTM/W Workbench <ul><li>NER Tool </li></ul><ul><li>Named-entity recognition with Active learning </li></ul>
  12. 12. POSBIOTM/W Workbench <ul><li>Goal: To extract the events which consist of “interaction”, “effecter”, and “reactant” </li></ul><ul><li>Named-entity types: protein (P), gene (G), small molecule (SM), and cellular process (CP). </li></ul><ul><li>Interaction: biological interaction (BI) and a chemical interaction (CI). </li></ul><ul><li>Event Extraction Tool is a Client Tool of POSBIOTM/Event System </li></ul><ul><li>Event Extraction Tool </li></ul>
  13. 13. POSBIOTM/W Workbench <ul><li>Extraction Result in XML format </li></ul><ul><li>Event Extraction Tool </li></ul><Result> <NER> .... <Sentence SNum = &quot;4&quot;><protein>EDG-1</protein>, encoded by the <gene>endothelial_differentiation_gene-1</gene> , is a <protein>heterotrimeric_guanine_nucleotide_binding_protein-coupled_receptor</protein> ( <protein >GPCR</ protein > ) for < small_molecule >sphingosine-1-phosphate</ small_molecule > ( < small_molecule >SPP</ small_molecule > ) that has been shown to stimulate < cellular_process >angiogenesis</ cellular_process > and < cellular_process >cell_migration</ cellular_process > in cultured endothelial cells. </Sentence> ..... </NER> <Event_Extraction> <Event SNum = &quot;4&quot;> <Interaction>stimulate</Interaction> <Effecter>sphingosine-1-phosphate</Effecter> <Reactant>angiogenesis</Reactant> </Event> ..... </ Event_Extraction > </Result>
  14. 14. POSBIOTM/W Workbench <ul><li>Extraction Result </li></ul><ul><li>Event Extraction Tool </li></ul>
  15. 15. POSBIOTM/W Workbench <ul><li>Goal </li></ul><ul><ul><li>The GUI-based Annotation tool is designed to manipulate the manual annotations. </li></ul></ul><ul><li>Named-entity editing </li></ul><ul><ul><li>NE is display ed in different colors which could be changed </li></ul></ul><ul><ul><li>add, remove or correct named-entity tags, or change the boundaries of named entities, etc. </li></ul></ul><ul><li>Annotation Tool </li></ul>
  16. 16. POSBIOTM/W Workbench <ul><li>Event editing </li></ul><ul><ul><li>extracted events are displayed in a table </li></ul></ul><ul><ul><li>double-clicking the event to look up the original sentence from which each event is extracted </li></ul></ul><ul><li>Upload function </li></ul><ul><ul><li>Users can upload the well-annotated data to the POSBIOTM system </li></ul></ul><ul><ul><li>incremental build-up of a massive amount of named-entity and event annotation corpus. </li></ul></ul><ul><li>Annotation Tool </li></ul>
  17. 17. POSBIOTM/W Workbench <ul><li>Annotation Tool </li></ul>
  18. 18. Contents <ul><li>Introduction </li></ul><ul><li>POSBIOTM/W Workbench </li></ul><ul><li>POSBIOTM/NER System </li></ul><ul><li>POSBIOTM/NER with Active Machine Learning </li></ul><ul><li>POSBIOTM/Event System </li></ul><ul><li>Current status </li></ul>
  19. 19. POSBIOTM/NER System <ul><li>Approach </li></ul><ul><ul><li>the named entity recognition problem is regarded as a classification problem, marking up each input token with named entity category labels. </li></ul></ul><ul><li>CRF </li></ul><ul><ul><li>Conditional random fields (CRFs) ([Lafferty 2001]) is a probabilistic framework for labeling and segmenting a sequential data. (s: state(tag); o: input) </li></ul></ul><ul><ul><li>For example: </li></ul></ul><ul><li>Named Entity Recognition (NER) </li></ul>
  20. 20. POSBIOTM/NER System <ul><li>Feature Set </li></ul><ul><li>Named Entity Recognition (NER) </li></ul>base noun phrase tag of the previous/current/next words. Base noun phrase tag POS tag of the previous/current/next words. The part of speech is the term used to describe how a particular word is used. E.g. nouns, verb, etc. part-of-speech tag Prefixes/suffixes which are contained in the prefix/suffix dictionary. Biological prefix, suffix concept – ase, blast, cyt, phore, plast. prefix/suffix orthographical feature of the previous/current/next words. Upper case letters, numbers, non-alphabet letters. Greek words – alpha cells, beta hemolysis, tau interferon. word feature only in the case that the previous/current/next words are in the surface word dictionary. Lexical word Description Feature
  21. 21. POSBIOTM/NER System <ul><li>Three NER models </li></ul><ul><ul><li>GENIA model / GENE-NER model / GPCR-NER model </li></ul></ul><ul><li>GENIA model </li></ul><ul><ul><li>The named entity classes used in the evaluation : </li></ul></ul><ul><ul><li>DNA, RNA, protein and cell_line, cell_type </li></ul></ul><ul><ul><li>The training data consists of 2000 MEDLINE abstracts of the GENIA version 3 corpus. These abstracts were collected using the search terms “human”, ”blood cell”, “transcription factor”. </li></ul></ul><ul><ul><li>The testing data will come from a super-domain of the training data (“blood cell”, ”transcription factor”). </li></ul></ul><ul><li>NER Models </li></ul>
  22. 22. POSBIOTM/NER System <ul><li>GENE-NER model </li></ul><ul><ul><li>GENE-NER module uses BioCreative corpus. </li></ul></ul><ul><ul><li>The aim of the GENE-NER module is the identification of which terms in biomedical research article are gene and/or protein names. </li></ul></ul><ul><ul><li>The training corpus consists of 7.5k sentences, selected from MEDLINE according to their likelihood of containing gene names. </li></ul></ul><ul><li>GPCR-NER module (Postech) </li></ul><ul><ul><li>aims at recognizing four target named entity categories: </li></ul></ul><ul><ul><li>protein, gene, small molecule and cellular process. </li></ul></ul><ul><ul><li>The training corpus consists of 50 full articles related to GPCR(G-protein coupled receptor) signal transduction pathway. </li></ul></ul><ul><li>NER Models </li></ul>
  23. 23. POSBIOTM/NER System <ul><li>Evaluation for Three NER models </li></ul><ul><li>NER Models </li></ul>0.7 9 82 0.8 4 04 0. 75 50 GENE-NER 0.7370 0.8135 0.6736 GPCR-NER 0.6945 0.6929 0.6960 GENIA-NER F-Measure Recall Precision Corpus
  24. 24. Contents <ul><li>Introduction </li></ul><ul><li>POSBIOTM/W Workbench </li></ul><ul><li>POSBIOTM/NER System </li></ul><ul><li>POSBIOTM/NER with Active Machine Learning </li></ul><ul><li>POSBIOTM/Event System </li></ul><ul><li>Current status </li></ul>
  25. 25. POSBIOTM/NER with Active Learning <ul><li>NER with Machine Learning </li></ul><ul><ul><li>To enhance the NER performance through the idea of re-using the annotated data and re-training the NER module </li></ul></ul><ul><li>NER with Active Machine Learning </li></ul><ul><ul><li>To minimize the human labeling effort without degrading the performance </li></ul></ul><ul><ul><li>To select the most informative samples for training </li></ul></ul><ul><li>Active Learning in NER </li></ul>
  26. 26. POSBIOTM/NER with Active Learning <ul><li>Active Learning in NER Framework </li></ul>
  27. 27. POSBIOTM/NER with Active Learning <ul><li>Uncertainty-based Sample Selection </li></ul><ul><ul><li>Using an entropy-based measure to quantify the uncertainty that the current classifier holds (entropy or normalized entropy of the CRF conditional probability) </li></ul></ul><ul><ul><li>The most uncertain samples are selected for human annotation </li></ul></ul><ul><li>Active Learning Scoring Strategy </li></ul>
  28. 28. POSBIOTM/NER with Active Learning <ul><li>Diversity-based Sample Selection </li></ul><ul><ul><li>To catch the most representative sentences in each sampling. </li></ul></ul><ul><ul><li>The divergence measures of the two sentences are represented by the minimum similarity among the examples </li></ul></ul><ul><ul><li>The similarity score of two words </li></ul></ul><ul><ul><li>The similarity score of two sentences </li></ul></ul><ul><li>Active Learning Scoring Strategy </li></ul>( for syntactic path)
  29. 29. POSBIOTM/NER with Active Learning <ul><li>MMR(Maximal Marginal Relevance) method </li></ul><ul><ul><li>The two measures for uncertainty and diversity will be combined using the MMR method to give the sampling scores in our active learning strategy </li></ul></ul><ul><li>Active Learning Scoring Strategy </li></ul>
  30. 30. POSBIOTM/NER with Active Learning <ul><li>Training Data </li></ul><ul><ul><li>2,000 MEDLINE abstracts from the GENIA corpus </li></ul></ul><ul><ul><li>5 named entity classes </li></ul></ul><ul><ul><ul><li>DNA, RNA, protein, cell line, cell type </li></ul></ul></ul><ul><li>Test Data </li></ul><ul><ul><li>404 abstracts </li></ul></ul><ul><ul><li>Half of them are from the same domain as the training data and the other half are from the super-domain of ‘blood cell’ and ‘transcription factor’ </li></ul></ul><ul><li>Experiment and Discussion </li></ul>
  31. 31. POSBIOTM/NER with Active Learning <ul><li>Pool-based sample selection </li></ul><ul><ul><li>100 abstracts were used to train initial NER module </li></ul></ul><ul><ul><li>Each time, we chose k examples (sentences) from the given pool to train the new NER module </li></ul></ul><ul><ul><li>The number k varied from 1,000 to 17,000 with step size 1,000 </li></ul></ul><ul><li>Active learning methods for test </li></ul><ul><ul><li>Random selection </li></ul></ul><ul><ul><li>Entropy based uncertainty selection </li></ul></ul><ul><ul><li>Entropy combined with Diversity </li></ul></ul><ul><ul><li>Normalized Entropy combined with Diversity </li></ul></ul><ul><li>Experiment and Discussion </li></ul>
  32. 32. POSBIOTM/NER with Active Learning <ul><li>Experiment and Discussion </li></ul>
  33. 33. POSBIOTM/NER with Active Learning <ul><li>All three kinds of active learning strategies outperform the random selection </li></ul><ul><ul><li>The combined strategy reduces 24.64% training examples compared with the random selection </li></ul></ul><ul><ul><li>The normalized combined strategy reduces 35.43% training examples compared with the random selection </li></ul></ul><ul><li>Diversity increases the classifier’s performance when the large amount of sample are selected </li></ul><ul><ul><li>Up to 4,000 sentences, the entropy strategy and the combined strategy perform similar </li></ul></ul><ul><ul><li>After 11,000 sentence point, the combined strategy surpasses the entropy strategy </li></ul></ul><ul><li>Experiment and Discussion </li></ul>
  34. 34. Contents <ul><li>Introduction </li></ul><ul><li>POSBIOTM/W Workbench </li></ul><ul><li>POSBIOTM/NER System </li></ul><ul><li>POSBIOTM/NER with Active Machine Learning </li></ul><ul><li>POSBIOTM/Event System </li></ul><ul><li>Current status </li></ul>
  35. 35. POSBIOTM/Event System <ul><li>System Architecture </li></ul>
  36. 36. POSBIOTM/Event System <ul><ul><li>Template Element </li></ul></ul><ul><ul><ul><li>Entities - participants of an event </li></ul></ul></ul><ul><ul><ul><ul><li>protein (P), gene (G), small molecule (SM), cellular process (CP) </li></ul></ul></ul></ul><ul><ul><ul><li>Interaction - relationship between entities </li></ul></ul></ul><ul><ul><ul><ul><li>biological interaction (BI) – Functional interaction </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>About how/whether one component affects the other's status biologically </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>chemical interaction (CI) – Molecular interaction </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>About the interaction among entities at the molecular structural level </li></ul></ul></ul></ul></ul><ul><ul><li>Event </li></ul></ul><ul><ul><ul><li>One Interaction (I) </li></ul></ul></ul><ul><ul><ul><ul><li>Connecting the effecter and reactant </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Interaction keywords (BI, CI) </li></ul></ul></ul></ul><ul><ul><ul><li>One Effecter (E) </li></ul></ul></ul><ul><ul><ul><ul><li>Provoking an event </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Template element (P, G, SM, CP) or nested event </li></ul></ul></ul></ul><ul><ul><ul><li>One Reactant (R) </li></ul></ul></ul><ul><ul><ul><ul><li>Responding to an effecter </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Template element (P, G, SM, CP) or nested event </li></ul></ul></ul></ul><ul><li>Target Slot Definition </li></ul>
  37. 37. POSBIOTM/Event System <ul><li>Target Slot Definition </li></ul><ul><li>Example </li></ul><ul><li>Template Element </li></ul><ul><ul><li>Entities : PDGF (P), SPP (SM), Cell movement (CP) </li></ul></ul><ul><ul><li>Interaction keywords : cross-talk (BI), require (BI) </li></ul></ul><ul><li>Event </li></ul><ul><ul><li>cross-talk (I) : PDGF (E) : SPP (R) </li></ul></ul><ul><ul><li>require (I) : cross-talk (E) : cell movement (R) </li></ul></ul>The cross-talk between PDGF and SPP is required for these embryonic cell movements .
  38. 38. POSBIOTM/Event System <ul><li>Sentence boundary detection </li></ul><ul><li>Annotating Named Entity (NER) </li></ul><ul><ul><li>Protein </li></ul></ul><ul><ul><li>Small molecule </li></ul></ul><ul><ul><li>Gene </li></ul></ul><ul><ul><li>Cellular process </li></ul></ul><ul><li>Compound/Complex Sentence Splitter </li></ul><ul><ul><li>To simplify the complicated full texts </li></ul></ul><ul><li>Pre-Processor </li></ul>
  39. 39. POSBIOTM/Event System <ul><li>Compound/Complex Sentence Splitter </li></ul><ul><ul><li>Simple splitting rules </li></ul></ul><ul><ul><ul><li>[S] NP1 VP1 NP2 [SBAR] that|which VP2 [/SBAR] [/S] </li></ul></ul></ul><ul><ul><ul><ul><li> NP1 VP1 NP2 + NP2 VP2 </li></ul></ul></ul></ul><ul><ul><li>Example </li></ul></ul><ul><ul><ul><li>“ The best studied of these is EDG-1, which is implicated in cell migration and angiogenesis.” </li></ul></ul></ul><ul><ul><ul><ul><li>==> 1. “The best studied of these is EDG-1 .” </li></ul></ul></ul></ul><ul><ul><ul><ul><li> 2. “ EDG-1 is implicated in cell migration and angiogenesis.” </li></ul></ul></ul></ul><ul><li>Pre-Processor </li></ul>
  40. 40. POSBIOTM/Event System <ul><li>Two-level Event Rule Learner </li></ul><ul><li>Biological Event Extraction </li></ul>
  41. 41. POSBIOTM/Event System <ul><li>Event Rule Learner </li></ul><ul><ul><li>Adapt a supervised machine learning algorithm: WHISK </li></ul></ul><ul><ul><ul><li>learns rules in the form of context-based regular expressions </li></ul></ul></ul><ul><ul><ul><li>induces the rules with top-down manner </li></ul></ul></ul><ul><ul><ul><ul><li>Ex) “{NP} .*? (<CP>)[E] {/NP} {VP} (<BI>)[I] {/VP} {NP} both (<P>)[R] and .*? {/NP}” </li></ul></ul></ul></ul><ul><ul><li>Limitation of the WHISK </li></ul></ul><ul><ul><ul><li>The longer distance between event components, the more difficult to extract the correct event </li></ul></ul></ul><ul><ul><ul><ul><li>WHISK consider all lexical words between event components </li></ul></ul></ul></ul><ul><ul><ul><li>Cannot handle nested biological events </li></ul></ul></ul><ul><ul><li>Propose two-level rule learning method to handle the limitation of the flat rule learning method </li></ul></ul><ul><li>Biological Event Extraction </li></ul>
  42. 42. POSBIOTM/Event System <ul><li>Two-level Event Rule Learner </li></ul><ul><li>Biological Event Extraction </li></ul>4. Learn the long-span rule with the re-annotated sentence {NP} <E>cross-talk_between_PDGF_and_SPP</E> {/NP} {VP} is <BI>required</BI> {/VP} for {NP} these embryonic <CP>cell_movements</CP> {/NP} <TAGS> B {interaction require} {effecter cross-talk} {reactant cell movement} 1. Marking long NP boundary 2. Learn the short-span rule corresponding to the NP: “<BI>cross-talk</BI> between <P>PDGF</P> and <SM>SPP</SM>”  “ {NP} (<BI>)[I] between (<P>)[E] and (<SM>)[R] {/NP} “ 3. Re-annotate the short-span interaction as one noun with regular expression format {NP} <BI>cross-talk</BI> between <P>PDGF</P> and <SM>SPP</SM> {/NP} {VP} is <BI>required</BI> {/VP} for {NP} these embryonic <CP>cell_movements</CP> {/NP} <TAGS> B {interaction cross-talk} {effecter PDGF} {reactant SPP} <TAGS> B {interaction require} {effecter cross-talk} {reactant cell movement}
  43. 43. POSBIOTM/Event System <ul><li>Event Extractor </li></ul><ul><ul><li>To extract the events with the automatic generated rules </li></ul></ul><ul><ul><ul><li>by using regular expression pattern matching </li></ul></ul></ul><ul><ul><li>To handle the alias and noun conjunction </li></ul></ul><ul><ul><ul><li>aliases and noun conjunctions have general patterns like ‘sphingosine-1-phosphate(SPP)’ or ‘FP, IP, and TP receptors’ </li></ul></ul></ul><ul><ul><ul><ul><li>handle them with simple rules like ‘A(B)’ or ‘A, B, C, and D’ </li></ul></ul></ul></ul><ul><ul><li>To remove sentences including the negative words </li></ul></ul><ul><ul><ul><li>‘ not’, ‘never’, ‘fail’, etc </li></ul></ul></ul><ul><li>Biological Event Extraction </li></ul>
  44. 44. POSBIOTM/Event System <ul><li>Event Component Verifier </li></ul>
  45. 45. POSBIOTM/Event System <ul><li>To remove the incorrectly extracted events </li></ul><ul><li>Classify template elements (P, G, SM, CP, BI, CI) into 4 classes </li></ul><ul><ul><li>I (interaction), E (effecter), R (reactant), N (none) </li></ul></ul><ul><ul><ul><li>I, E, R : event’s components </li></ul></ul></ul><ul><ul><ul><li>N : a template element , but not an event component </li></ul></ul></ul><ul><li>Use a Maximum Entropy Classifier </li></ul><ul><ul><li>Features </li></ul></ul><ul><ul><ul><li>POS tag, phrase chunks, the type of template element of neighboring words and semantic information </li></ul></ul></ul><ul><li>Event Component Verifier </li></ul>
  46. 46. POSBIOTM/Event System <ul><li>Event Component Verifier </li></ul>
  47. 47. POSBIOTM/Event System <ul><li>Example </li></ul><ul><li>Event Component Verifier </li></ul>Verified Biological Extracted Events Ev1: Requires (I) sphingosine_kinase (E) cell_migration (R) Ev2: Requires (I) EDG-1 (E) cell_migration (R) Event Component Verifier Results I : Requires E : EDG-1, sphingosine_kinase, PDGF R : cell_migration Extracted Biological Events Ev1: Requires (I) sphingosine_kinase(E) cell_migration (R) Ev2: Requires (I) EDG-1 (E) cell_migration (R) Ev3: Requires (I) EDG-1 (E) PDGF (R)
  48. 48. POSBIOTM/Event System <ul><ul><li>500 Medline abstracts including 2,314 biological events & 10-fold cross validation </li></ul></ul><ul><ul><ul><li>Flat rule learner vs. two-level rule learner </li></ul></ul></ul><ul><ul><ul><li>Before verification vs. after verification </li></ul></ul></ul><ul><ul><li>Performance comparison </li></ul></ul><ul><ul><ul><ul><li>Learning Information Extractors for Proteins and their Interactions (2004) - Razvan Bunescu, et. al </li></ul></ul></ul></ul><ul><ul><ul><ul><li>1000 abstracts & 10-fold cross validation </li></ul></ul></ul></ul><ul><li>Experiment and Discussion </li></ul>46.1 58.0 38.3 Before verification Flat rule learner 51.8 49.2 54.7 After verification 48.2 54.6 48.9 F-measure 63 56.1 68.0 Recall(%) 39 53.1 38.2 Precision(%) After verification Before verification Comparison system Two-level rule learner
  49. 49. POSBIOTM/Event System <ul><ul><li>Trade-off between precision and recall </li></ul></ul><ul><ul><ul><li>Before verification : big gap between precision and recall </li></ul></ul></ul><ul><ul><ul><li>After verification : low gap between precision and recall </li></ul></ul></ul><ul><ul><ul><ul><li>threshold : cut the rules according to the measure on how many of the extracted events from a rule are correct </li></ul></ul></ul></ul><ul><li>Experiment and Discussion </li></ul>
  50. 50. POSBIOTM/Event System <ul><ul><li>Constant good performance regardless of the threshold of rule learner </li></ul></ul><ul><li>Experiment and Discussion </li></ul>
  51. 51. Other Corpora for Bio-Relation Extraction <ul><li>BC-PPI </li></ul><ul><ul><li>From BioCreative Corpus for NER </li></ul></ul><ul><ul><li>Protein/Gene interactions </li></ul></ul><ul><ul><li>255 interactions in 1000 sentences </li></ul></ul><ul><li>IEPA </li></ul><ul><ul><li>Protein/Protein interactions </li></ul></ul><ul><ul><li>410 interactions in 498 sentences </li></ul></ul><ul><li>LLL05 </li></ul><ul><ul><li>Protein/Gene interactions </li></ul></ul><ul><ul><li>271 interactions in 80 sentences </li></ul></ul><ul><li>BioText </li></ul><ul><ul><li>Disease/Treatment relations </li></ul></ul>
  52. 52. Contents <ul><li>Introduction </li></ul><ul><li>POSBIOTM/W Workbench </li></ul><ul><li>POSBIOTM/NER System </li></ul><ul><li>POSBIOTM/NER with Active Machine Learning </li></ul><ul><li>POSBIOTM/Event System </li></ul><ul><li>Current status </li></ul>
  53. 53. Current Status & future works <ul><li>Re-implemented with Java (platform independent) </li></ul><ul><li>Integrated with J-Designer in SBW consortium (will be) </li></ul><ul><li>Integrated with Active learning method to automatically suggest human-annotated corpus </li></ul><ul><li>Used for national large scale BIT fusion projects: search for useful peptide (usable as a ligand for drug) </li></ul><ul><li>Getting more feed back from biologists </li></ul><ul><li>System getting smarter with more usage: workbench + active learning </li></ul>Workbench Demo