Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Text Mining Using JBoss Rules

Presentation on using rules - based text mining to extract protein interactions from the PubMed database.

  • Be the first to comment

Text Mining Using JBoss Rules

  1. 1. Text Mining Using JBoss Rules with a BioMedical Example Mark Maslyn Consultant [email_address] To Be Presented 2/2/2010 at Denver Open Source User Group
  2. 2. Acknowledgements <ul><li>Text Mining Group at the CU Center for Compuatational Pharmacology </li></ul><ul><li>Gene Name List From Cytoscape (2007) </li></ul><ul><li>23941 Genes </li></ul><ul><li>Action Verbs From Blaschke, et al (1999) </li></ul><ul><li>Sample Rule From Bali (2009) </li></ul><ul><li>Diagram From Bolouri (2008) </li></ul>
  3. 3. What Are Proteins ? Chains of Amino Acids that fold into unique shapes that determine what other proteins will interact with them. Diagram From WikiMedia Commons
  4. 4. Two Proteins Binding Together Diagram From WikiMedia Commons
  5. 5. Interacting Proteins Form New Molecules + + Substrate Enzyme Enzyme Product
  6. 6. Protein Interactions Form Networks Start 1 st Level 2 nd Level From Bolouri (2008) – Used By Permission
  7. 7. Chemical Feedback Loop To Keep Glucose Concentration Constant Glucose ( Sugar ) Too Little Too Much Glycogen ( Fat ) Prot > Prot > Prot Prot < Prot < Prot
  8. 8. Finding Protein / Protein Interactions is the Holy Grail of Pharmacology They Can Lead to New Treatments Image From WikiMedia Commons
  9. 9. Where Do I Get the Data ?
  10. 10. The Problem is: There's Too Much Data 2,000 New References Every Day
  11. 11. The Solution : <ul><li>Text Mining PubMed to Automatically Extract Information </li></ul>
  12. 12. Two Standard Approaches to Text Mining <ul><li>ABSTRACTIVE: Statistical methods including Co-Occurrence modeling </li></ul><ul><li>EXTRACTIVE: Rules Based using Rule Engines such as JBoss Rules </li></ul>
  13. 13. JBoss Rules ! <ul><li>Drools open source Java project became part of JBoss with version 4.x Current version is 5.x </li></ul><ul><li>Rules use Java like syntax </li></ul><ul><li>Added capabilities not commonly found in most Rules Engines </li></ul>
  14. 14. Rule Syntax <ul><li>package com.rules; </li></ul><ul><li>import; </li></ul><ul><li>rule &quot;low balance&quot; </li></ul><ul><li>when </li></ul><ul><li>Account( balance < 100) </li></ul><ul><li>then </li></ul><ul><li>System.out.println(&quot;Balance is less than $100&quot;) </li></ul><ul><li>end </li></ul>From Bali(2009)
  15. 15. Example Production Rule (BN Format) with Expected Order S p1 a p2 Where : p1 and p2 = different protein names (e.g. p53, BRCA1, etc) a = action verb (e.g. regulate, interact, modulate, bind, etc)
  16. 16. Word Mapping and Filtering <ul><li>Case changes – everything goes to lower case </li></ul><ul><li>Handle variations of action verbs (e.g. activates, activated, activation) </li></ul><ul><li>Removal of &quot;stop&quot; words (e.g. the, this, is, etc.) </li></ul><ul><li>Process a single sentence at a time </li></ul>
  17. 17. Text Mining Flow Chart Retrieve Parse Filter and TransformKeywords Rules to Evaluate Output
  18. 18. Cytoscape One Level Network Diagram Statistics: 200 References 7 Unique Links One Level Tree
  19. 19. Cytoscape Two Level Network Diagram Statistics: 1600 References 25 Unique Links Two Level Tree
  20. 20. Further Information Mark Maslyn: [email_address]