with  AIDA   Marco Roos , Scott Marshall, Sophia Katrenko, Edgar Meij, Willem van Hage, Pieter Adriaans AIDA demonstrati...
About e-Science An e-science approach to mining biomedical literature
and beer… And how we relate it to beer…
Virtual Laboratory e-Science project
Wet laboratory analogy Data Data handling & Data integration Metadata Data analysis Data storage Expert knowledge
General framework of AID for VL-e Middleware for sharing resources Knowledge-based resource management
General framework of AID for VL-e Middleware for sharing resources Model based resource management TM TM TM TM TM TM
Theme An e-Science approach to mining biomedical literature
and beer… And how we relate it to beer…
10/06/09 BioAID Which diseases may be associated with my protein of interest?
Biomedical knowledge repository 10/06/09 BioAID PubMed statistics http://www.ncbi.nlm.nih.gov/entrez >17 million citations...
Bioinformatics A bioinformatician
Bioinformatics A typical bioinformatician
Bioinformatics A biologist behind a computer who (just) learned perl
/* * determines ridges in htm expression table */ #include "ridge.h" int selecthtm(PGconn *conn, char *htmtablen...
Theme No , that is not an e-Science approach to mining biomedical literature
Not e-science So 2000 (quoting Lennart Post)
Not e-science So 1980
Theme An e-Science approach to mining biomedical literature
An e-science approach <ul><li>e-Science </li></ul><ul><li>Collaboration </li></ul><ul><li>Combining expertise </li></ul><u...
e -Scientists Edgar Meij Information retrieval expert
e -Scientists Sophia Katrenko Machine learning (information extraction) expert
e -Scientists Willem van Hage Semantic web expert (and bass guitar player)
The  AIDA  toolbox  for knowledge extraction and knowledge management in a virtual laboratory for  e -Science
e -bioscience An e-bioscientist
Components of the  AIDA  toolbox  used for Life Science knowledge extraction
Bio AID Disease Discovery workflow 10/06/09 BioAID AIDA AIDA OMIM service  (Japan) AIDA ‘ Taverna shim’ Taverna ‘shim’
An e-science approach <ul><li>e-Science </li></ul><ul><li>Collaboration </li></ul><ul><li>Combining expertise </li></ul><u...
Sharing
  with  AIDA   Live Demonstration Marco Roos , Scott Marshall, Sophia Katrenko, Edgar Meij, Willem van Hage, Pieter Adriaa...
10/06/09 BioAID Which diseases may be associated with my protein of interest?
10/06/09 BioAID
Components of the  AIDA  toolbox  used for Life Science knowledge extraction
10/06/09 BioAID
Sharing
Bio AID Disease discovery workflow
Bio AID Disease discovery workflow 10/06/09 BioAID
Bio AID Disease discovery workflow from 100 abstracts: 29 proteins associated with 1280 diseases 10/06/09 BioAID
Extending BioAID <ul><li>Extending BioAID workflows </li></ul>
10/06/09 BioAID Doesn’t EZH2 have synonyms?
“ Collaboration through web services” Bio-text mining expert Martijn Schuemie
“ Collaboration through web services” Synonym service
10/06/09 BioAID EZH2 is only a small part of a very complex system, for my research I need more than lists
components... 10/06/09 BioAID
Workflow and semantics When running workflows Store how biological entities are related Combine results from different run...
Example scenario of semantic approach Need a unique identifyer myModel myExtended Model
“ Collaboration through web services” 2 Bio-text mining expert Martijn Schuemie
getUniprotID Used as unique ID for proteins
10/06/09 BioAID
10/06/09 BioAID
Proto-ontology  (Protégé Jambalaya plugin) 10/06/09 BioAID
Enriched ontology  (snapshot) 10/06/09 BioAID
Future <ul><li>Future plans </li></ul>
Workflows for text mining ‘pipe line’   (BioAID) Named entity recognition Training document Manual annotation Annotated te...
Modelling support Epigenetics (paramutation) Courtesy of Maike Stam Cell division  Escherichia coli Courtesy of Tanneke de...
Reuse and share knowledge MedLine Reuse and share biological knowledge TM
Conclusions <ul><li>Conclusions and discussion </li></ul>
<ul><li>e-science approach to text mining </li></ul><ul><ul><li>myExperiment.org </li></ul></ul><ul><ul><ul><li>‘ mySpace’...
<ul><li>e -Science:  sharing text mining services </li></ul><ul><li>The best service is good a special service is better <...
Why adopt e-science?
Why should I adopt e-Science? I don’t believe in  e -Science I believe in  Me -Science
Why adopt e-science? For determined sinners:   ‘ The seven deadly sins of bioinformatics’  by Carole Goble http://www.slid...
Acknowledgements <ul><li>AIDA developers team: Sophia Katrenko, Edgar Meij, Willem van Hage, Frans Verster, (Machiel Janse...
How does beer relate to BioAID? And how do we relate it to beer?
10/06/09 BioAID
Thank you for your attention… Join the text mining network on myExperiment.org!!! AID information and resources http:// ad...
Upcoming SlideShare
Loading in …5
×

Demo Presentation Wageningen Text Mining Workshop 2007

1,139 views
1,043 views

Published on

Invited talk for a workshop for Dutch Genomics and Text Mining, Wageningen, 2007. (Scheduled just before drinks, it shows ;-) )

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,139
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • We will demonstrate an e-science approach to mining knowledge from biomedical literature through the application of the ‘Adaptive Information Disclosure Application’ toolbox that we develop in the context of the Dutch Virtual Laboratory e-Science project.
  • Demo Presentation Wageningen Text Mining Workshop 2007

    1. 1. with AIDA Marco Roos , Scott Marshall, Sophia Katrenko, Edgar Meij, Willem van Hage, Pieter Adriaans AIDA demonstration Wageningen, 23/11/2007 Adaptation of Talk for Taverna/OMII-UK workshop, Hinxton, October 2007
    2. 2. About e-Science An e-science approach to mining biomedical literature
    3. 3. and beer… And how we relate it to beer…
    4. 4. Virtual Laboratory e-Science project
    5. 5. Wet laboratory analogy Data Data handling & Data integration Metadata Data analysis Data storage Expert knowledge
    6. 6. General framework of AID for VL-e Middleware for sharing resources Knowledge-based resource management
    7. 7. General framework of AID for VL-e Middleware for sharing resources Model based resource management TM TM TM TM TM TM
    8. 8. Theme An e-Science approach to mining biomedical literature
    9. 9. and beer… And how we relate it to beer…
    10. 10. 10/06/09 BioAID Which diseases may be associated with my protein of interest?
    11. 11. Biomedical knowledge repository 10/06/09 BioAID PubMed statistics http://www.ncbi.nlm.nih.gov/entrez >17 million citations >400,000 added/year ~70,000 searches/month … Does not compute Does not fit
    12. 12. Bioinformatics A bioinformatician
    13. 13. Bioinformatics A typical bioinformatician
    14. 14. Bioinformatics A biologist behind a computer who (just) learned perl
    15. 15. /* * determines ridges in htm expression table */ #include &quot;ridge.h&quot; int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable) { char querystring[256]; sprintf(&quot;SELECT * FROM %s WHERE chrom = %s ORDER BY genstart&quot;, htmtablename, chromname); htmtable = PQexec(conn, querystring); return(validquery(htmtable, querystring)); } int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount) /* determines if mincount genes in a row are (part of) a ridge */ /* pre: htmtable is valid and sorted on genStart (ascending) /* post: { if (mincount<=0) return TRUE; if (row>=PQntuples(htmtable)) return FALSE; if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, &quot;movmed39expr&quot;)) < exprthreshold) { return FALSE; } return(is_ridge(htmtable, ++row, exprthreshold, --mincount)); } int main() { PGconn *conn; /* holds database connection */ char querystring[256]; /* query string */ PGresult *result; int i; conn = PQconnectdb(&quot;dbname=htm port=6400 user=mroos password=geheim&quot;); if (PQstatus(conn)==CONNECTION_BAD) { fprintf(stderr, &quot;connection to database failed. &quot;); fprintf(stderr, &quot;%s&quot;, PQerrorMessage(conn)); exit(1); } else printf(&quot;Connection ok &quot;); sprintf(querystring, &quot;SELECT * FROM chromosomes&quot;); printf(&quot;%s &quot;, querystring); result = PQexec(conn, querystring); if (validquery(result, querystring)) { printresults(result); } else { PQclear(result); PQfinish(conn); return FALSE; } PQclear(result); PQfinish(conn); return TRUE; } int printresults(PGresult *tuples) { int i; for (i=0; i< PQntuples(tuples) && i < 10; i++) { printf(&quot;%d, &quot;, i); printf(&quot;%s &quot;, PQgetvalue(tuples,i,0)); } return TRUE; } int validquery(PGresult *result, char *querystring) { printf(&quot; in validquery &quot;); if (PQresultStatus(result) != PGRES_TUPLES_OK) { printf(&quot;Query %s failed. &quot;, querystring); fprintf(stderr, &quot;Query %s failed. &quot;, querystring); return FALSE; } return TRUE; }
    16. 16. Theme No , that is not an e-Science approach to mining biomedical literature
    17. 17. Not e-science So 2000 (quoting Lennart Post)
    18. 18. Not e-science So 1980
    19. 19. Theme An e-Science approach to mining biomedical literature
    20. 20. An e-science approach <ul><li>e-Science </li></ul><ul><li>Collaboration </li></ul><ul><li>Combining expertise </li></ul><ul><li>Sharing </li></ul><ul><li>Technology </li></ul>
    21. 21. e -Scientists Edgar Meij Information retrieval expert
    22. 22. e -Scientists Sophia Katrenko Machine learning (information extraction) expert
    23. 23. e -Scientists Willem van Hage Semantic web expert (and bass guitar player)
    24. 24. The AIDA toolbox for knowledge extraction and knowledge management in a virtual laboratory for e -Science
    25. 25. e -bioscience An e-bioscientist
    26. 26. Components of the AIDA toolbox used for Life Science knowledge extraction
    27. 27. Bio AID Disease Discovery workflow 10/06/09 BioAID AIDA AIDA OMIM service (Japan) AIDA ‘ Taverna shim’ Taverna ‘shim’
    28. 28. An e-science approach <ul><li>e-Science </li></ul><ul><li>Collaboration </li></ul><ul><li>Combining expertise </li></ul><ul><li>Sharing </li></ul><ul><li>Technology </li></ul>
    29. 29. Sharing
    30. 30. with AIDA Live Demonstration Marco Roos , Scott Marshall, Sophia Katrenko, Edgar Meij, Willem van Hage, Pieter Adriaans AIDA demonstration Tavena/OMII-UK, Hinxton, October 2007
    31. 31. 10/06/09 BioAID Which diseases may be associated with my protein of interest?
    32. 32. 10/06/09 BioAID
    33. 33. Components of the AIDA toolbox used for Life Science knowledge extraction
    34. 34. 10/06/09 BioAID
    35. 35. Sharing
    36. 36. Bio AID Disease discovery workflow
    37. 37. Bio AID Disease discovery workflow 10/06/09 BioAID
    38. 38. Bio AID Disease discovery workflow from 100 abstracts: 29 proteins associated with 1280 diseases 10/06/09 BioAID
    39. 39. Extending BioAID <ul><li>Extending BioAID workflows </li></ul>
    40. 40. 10/06/09 BioAID Doesn’t EZH2 have synonyms?
    41. 41. “ Collaboration through web services” Bio-text mining expert Martijn Schuemie
    42. 42. “ Collaboration through web services” Synonym service
    43. 43. 10/06/09 BioAID EZH2 is only a small part of a very complex system, for my research I need more than lists
    44. 44. components... 10/06/09 BioAID
    45. 45. Workflow and semantics When running workflows Store how biological entities are related Combine results from different runs Recover ‘trails to evidence’
    46. 46. Example scenario of semantic approach Need a unique identifyer myModel myExtended Model
    47. 47. “ Collaboration through web services” 2 Bio-text mining expert Martijn Schuemie
    48. 48. getUniprotID Used as unique ID for proteins
    49. 49. 10/06/09 BioAID
    50. 50. 10/06/09 BioAID
    51. 51. Proto-ontology (Protégé Jambalaya plugin) 10/06/09 BioAID
    52. 52. Enriched ontology (snapshot) 10/06/09 BioAID
    53. 53. Future <ul><li>Future plans </li></ul>
    54. 54. Workflows for text mining ‘pipe line’ (BioAID) Named entity recognition Training document Manual annotation Annotated text that provides examples: N x …sentence<concept x > entity </concept x >sentence… Learn Learned model Readable patterns or black box of unreadable conditions: unreadable condition=true => <concept x > entity </concept x > Extract named entities and relations List of named entities <ul><li>Requirements </li></ul><ul><li>Training documents and annotator(s) (or pre-annotated text) </li></ul><ul><li>List of concepts to annotate with (possibly from an ontology) </li></ul><ul><li>A corpus of interest </li></ul>‘ Generalise’ examples per concept List of concepts <Concept=name> entity </concept>, frequency, doc ID, … Annotated sentences Training Corpus (e.g. MEDLINE) <ul><li>Advantages </li></ul><ul><li>Concepts of choice </li></ul><ul><li>Quality of output under our control </li></ul><ul><li>Limitations </li></ul><ul><li>Output is limited by initial list of concepts </li></ul><ul><li>Substantial amount of manual work (annotation) </li></ul>Semantic networks
    55. 55. Modelling support Epigenetics (paramutation) Courtesy of Maike Stam Cell division Escherichia coli Courtesy of Tanneke den Blauwen HIV < TF M M M M M M M RDRP RdDM Pol reinforcement repression M M M M M M M TF TF Pol RDRP B'
    56. 56. Reuse and share knowledge MedLine Reuse and share biological knowledge TM
    57. 57. Conclusions <ul><li>Conclusions and discussion </li></ul>
    58. 58. <ul><li>e-science approach to text mining </li></ul><ul><ul><li>myExperiment.org </li></ul></ul><ul><ul><ul><li>‘ mySpace’ for computational scientists </li></ul></ul></ul><ul><ul><ul><li>Reach out to end-users </li></ul></ul></ul><ul><ul><li>Workflow and web services </li></ul></ul><ul><ul><ul><li>From ‘black box perl’ to ‘computational experiments’ </li></ul></ul></ul><ul><ul><ul><li>Share & reuse workflows and knowledge </li></ul></ul></ul><ul><ul><li>Grid infrastructure for power users </li></ul></ul>Conclusions e-Science approach 10/06/09 BioAID Disclose!
    59. 59. <ul><li>e -Science: sharing text mining services </li></ul><ul><li>The best service is good a special service is better </li></ul><ul><li>Share with users and other developers on text mining network on myExperiment.org </li></ul>Conclusion e -Science and sharing 10/06/09 BioAID
    60. 60. Why adopt e-science?
    61. 61. Why should I adopt e-Science? I don’t believe in e -Science I believe in Me -Science
    62. 62. Why adopt e-science? For determined sinners: ‘ The seven deadly sins of bioinformatics’ by Carole Goble http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics/
    63. 63. Acknowledgements <ul><li>AIDA developers team: Sophia Katrenko, Edgar Meij, Willem van Hage, Frans Verster, (Machiel Jansen), Scott Marshall. </li></ul><ul><li>Guus Schreiber, Maarten de Rijke, Pieter Adriaans </li></ul><ul><li>Jan Top, Nicole Koenderink, Food informatics, Wageningen University </li></ul><ul><li>Martijn Schuemie, Erasmus University Rotterdam </li></ul><ul><li>OMII-UK and myGrid team, especially Katy Wolstencroft, Stian Soiland, Stuart Owen, Andy Gibson, Alan Rector, Robert Stevens, Carole Goble </li></ul><ul><li>W3C Semantic Web Health Care and Life Sciences Interest Group </li></ul><ul><li>Hideaki Sugawara, Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics (http://xml.nig.ac.jp) </li></ul><ul><li>This work was supported by the Dutch Ministry of Economic Affairs via VL-e and BioRange (BSIK grants) </li></ul>10/06/09 BioAID
    64. 64. How does beer relate to BioAID? And how do we relate it to beer?
    65. 65. 10/06/09 BioAID
    66. 66. Thank you for your attention… Join the text mining network on myExperiment.org!!! AID information and resources http:// adaptivedisclosure.org W3C Semantic Web Health Care and Life Sciences Interest Group http://www.w3.org/2001/sw/hcls/ BioAID workflows available from http:// .org

    ×