A PAL’s life About a biologist in e-science Presentation for the OMII-UK Board, Southampton, May 16, 2008
My experience in e- bio science My experience in e- bio science <ul><ul><li>Marco Roos </li></ul></ul><ul><ul><li>Biologis...
to here
Biological motivation:  F unction and architecture of DNA in the cell Escherichia coli Mouse fibroblast (skin) cells
Many components... 10/06/09 BioAID
Example: bioinformatics before e-science Human Transcriptome Map   (HTM) (Versteeg  et al. , Genome Research, 2003) Sage t...
Before e-science HTM construction and RIDGE detection /* * determines ridges in htm expression table */ #include &quot;rid...
Bioinformatics A typical bioinformatician
Bioinformatics A biologist behind a computer who (just) learned perl
The ‘spaghetti’ approach
Before e-science <ul><li>Conclusion </li></ul><ul><li>State of the art in computing in life science  is of the 1980s </li>...
e-science <ul><li>e-science motivation </li></ul><ul><li>Enhance the state of the art of computing in life science and bio...
Example An e-science approach to text mining
Biological knowledge extraction 10/06/09 BioAID Biological question/model Computational experiment Extracted knowledge I w...
10/06/09 BioAID Which diseases may be associated with my protein of interest EZH2
Combining expertise Edgar Meij Information retrieval expert
Combining expertise Sophia Katrenko Machine learning expert
Combining expertise Willem van Hage Semantic web expert (and bass guitar player)
Combining expertise Towards a knowledge framework Computer scientist and bioinformatician Scott Marshall
The  AIDA  toolbox  for knowledge extraction and knowledge management in a virtual laboratory for  e -Science
Combining web services
“ Collaboration through web services” Bio-text mining expert Martijn Schuemie
“ Collaboration through web services” Biological Database expert Hideaki Sugawara
“ Collaboration through web services” e -bioscientist
A nice tool
A not so nice tool
10/06/09 BioAID
Sharing
Bio AID Disease Discovery workflow 10/06/09 BioAID AIDA AIDA OMIM service  (Japan) AIDA ‘ Taverna shim’ Taverna ‘shim’
Bio AID Disease discovery workflow 10/06/09 BioAID
Bio AID Disease discovery workflow from 100 abstracts: 29 proteins associated with 1280 diseases 10/06/09 BioAID
Summary so far <ul><li>Application of myExperiment </li></ul><ul><li>Application of Taverna </li></ul><ul><li>Application ...
Summary so far <ul><li>Workflow enhance insight and reproducibility </li></ul><ul><li>Workflow as ‘computational experimen...
e -Science is about people 10/06/09 BioAID Want this…
e -Science is about people 10/06/09 BioAID … need this
Outreach <ul><li>Successful as ‘schoolbook’ example of  e-science approach </li></ul><ul><ul><ul><li>VL-e mid-term review ...
BioAssist <ul><li>National bioinformatics support programme </li></ul><ul><ul><li>Now based on e-science </li></ul></ul><u...
BioAssist as test-bed community <ul><li>Life science/bioinformatics requirements </li></ul><ul><ul><li>Taverna </li></ul><...
PAL’s future <ul><li>Is this PAL satisfied? Not yet! </li></ul><ul><li>Uptake by bioinformatics:  going well </li></ul><ul...
Full circle a biological question… 10/06/09 BioAID Could be running on a Grid or cluster
10/06/09 BioAID Thank you for sending me this  e -Experiment from myExperiment.org!
Experiences and conclusions <ul><li>VL-e, AID and OMII-UK have helped me reach out to the bioinformatics and life science ...
Experiences and conclusions <ul><li>OMII-UK and its members prove the concept of e-science </li></ul><ul><ul><li>accomplis...
Acknowledgements <ul><li>AID team: Sophia Katrenko, Edgar Meij, Willem van Hage ,…,  Frans Verster, Machiel Jansen , Scott...
A hopefully mutually felt: warm and fuzzy feeling! Thank you for your attention
Why should I adopt e-Science? I do not believe in  e -Science I only believe in  Me -Science
Why adopt e-science? For determined sinners:   ‘ The seven deadly sins of bioinformatics’  by Carole Goble http://www.slid...
Upcoming SlideShare
Loading in...5
×

'A PAL's Life' for OMII-UK Board, May 2008

4,749

Published on

Presentation for the OMII-UK Board, May 16, 2008. Results and views on a biologist working in e-Science and as a Project or Area Liaison for OMII-UK (Open Middleware Infrastructure Institute UK: http://www.omii.ac.uk/)

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,749
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 'A PAL's Life' for OMII-UK Board, May 2008

    1. 1. A PAL’s life About a biologist in e-science Presentation for the OMII-UK Board, Southampton, May 16, 2008
    2. 2. My experience in e- bio science My experience in e- bio science <ul><ul><li>Marco Roos </li></ul></ul><ul><ul><li>Biologist and bioinformatician </li></ul></ul><ul><ul><li>Post-doc e-(bio)science, University of Amsterdam </li></ul></ul><ul><ul><li>PAL OMII-UK </li></ul></ul><ul><ul><li>Member BioAssist steering group </li></ul></ul>
    3. 3. to here
    4. 4. Biological motivation: F unction and architecture of DNA in the cell Escherichia coli Mouse fibroblast (skin) cells
    5. 5. Many components... 10/06/09 BioAID
    6. 6. Example: bioinformatics before e-science Human Transcriptome Map (HTM) (Versteeg et al. , Genome Research, 2003) Sage tag count (TU, Sage library) TU identifier position Transcriptional Unit (TU)
    7. 7. Before e-science HTM construction and RIDGE detection /* * determines ridges in htm expression table */ #include &quot;ridge.h&quot; int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable) { char querystring[256]; sprintf(&quot;SELECT * FROM %s WHERE chrom = %s ORDER BY genstart&quot;, htmtablename, chromname); htmtable = PQexec(conn, querystring); return(validquery(htmtable, querystring)); } int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount) /* determines if mincount genes in a row are (part of) a ridge */ /* pre: htmtable is valid and sorted on genStart (ascending) /* post: { if (mincount<=0) return TRUE; if (row>=PQntuples(htmtable)) return FALSE; if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, &quot;movmed39expr&quot;)) < exprthreshold) { return FALSE; } return(is_ridge(htmtable, ++row, exprthreshold, --mincount)); } int main() { PGconn *conn; /* holds database connection */ char querystring[256]; /* query string */ PGresult *result; int i; conn = PQconnectdb(&quot;dbname=htm port=6400 user=mroos password=geheim&quot;); if (PQstatus(conn)==CONNECTION_BAD) { fprintf(stderr, &quot;connection to database failed. &quot;); fprintf(stderr, &quot;%s&quot;, PQerrorMessage(conn)); exit(1); } else printf(&quot;Connection ok &quot;); sprintf(querystring, &quot;SELECT * FROM chromosomes&quot;); printf(&quot;%s &quot;, querystring); result = PQexec(conn, querystring); if (validquery(result, querystring)) { printresults(result); } else { PQclear(result); PQfinish(conn); return FALSE; } PQclear(result); PQfinish(conn); return TRUE; } int printresults(PGresult *tuples) { int i; for (i=0; i< PQntuples(tuples) && i < 10; i++) { printf(&quot;%d, &quot;, i); printf(&quot;%s &quot;, PQgetvalue(tuples,i,0)); } return TRUE; } int validquery(PGresult *result, char *querystring) { printf(&quot; in validquery &quot;); if (PQresultStatus(result) != PGRES_TUPLES_OK) { printf(&quot;Query %s failed. &quot;, querystring); fprintf(stderr, &quot;Query %s failed. &quot;, querystring); return FALSE; } return TRUE; } IT used Perl PostgresSQL C MS Excel + VBA SPSS No predefined development strategy No design phase Data Data Data Data Data Data
    8. 8. Bioinformatics A typical bioinformatician
    9. 9. Bioinformatics A biologist behind a computer who (just) learned perl
    10. 10. The ‘spaghetti’ approach
    11. 11. Before e-science <ul><li>Conclusion </li></ul><ul><li>State of the art in computing in life science is of the 1980s </li></ul><ul><li>(gross simplification) </li></ul>
    12. 12. e-science <ul><li>e-science motivation </li></ul><ul><li>Enhance the state of the art of computing in life science and bioinformatics </li></ul>
    13. 13. Example An e-science approach to text mining
    14. 14. Biological knowledge extraction 10/06/09 BioAID Biological question/model Computational experiment Extracted knowledge I want to do it my way Carole Goble’s me -scientist >17 million citations +400,000/yr
    15. 15. 10/06/09 BioAID Which diseases may be associated with my protein of interest EZH2
    16. 16. Combining expertise Edgar Meij Information retrieval expert
    17. 17. Combining expertise Sophia Katrenko Machine learning expert
    18. 18. Combining expertise Willem van Hage Semantic web expert (and bass guitar player)
    19. 19. Combining expertise Towards a knowledge framework Computer scientist and bioinformatician Scott Marshall
    20. 20. The AIDA toolbox for knowledge extraction and knowledge management in a virtual laboratory for e -Science
    21. 21. Combining web services
    22. 22. “ Collaboration through web services” Bio-text mining expert Martijn Schuemie
    23. 23. “ Collaboration through web services” Biological Database expert Hideaki Sugawara
    24. 24. “ Collaboration through web services” e -bioscientist
    25. 25. A nice tool
    26. 26. A not so nice tool
    27. 27. 10/06/09 BioAID
    28. 28. Sharing
    29. 29. Bio AID Disease Discovery workflow 10/06/09 BioAID AIDA AIDA OMIM service (Japan) AIDA ‘ Taverna shim’ Taverna ‘shim’
    30. 30. Bio AID Disease discovery workflow 10/06/09 BioAID
    31. 31. Bio AID Disease discovery workflow from 100 abstracts: 29 proteins associated with 1280 diseases 10/06/09 BioAID
    32. 32. Summary so far <ul><li>Application of myExperiment </li></ul><ul><li>Application of Taverna </li></ul><ul><li>Application of web services </li></ul><ul><li>Reuse of components from a text mining tool </li></ul><ul><li>Reuse of AIDA services in resource management tools (not shown) </li></ul><ul><li>Application of semantic web (not shown) </li></ul>
    33. 33. Summary so far <ul><li>Workflow enhance insight and reproducibility </li></ul><ul><li>Workflow as ‘computational experiment’ </li></ul><ul><li>Feedback and development </li></ul><ul><li>Workflow enhances application of expertise </li></ul><ul><ul><li>Components built by diverse experts </li></ul></ul><ul><ul><li>Collaboration through web services </li></ul></ul><ul><ul><li>Text mining experiment by non text mining expert </li></ul></ul>
    34. 34. e -Science is about people 10/06/09 BioAID Want this…
    35. 35. e -Science is about people 10/06/09 BioAID … need this
    36. 36. Outreach <ul><li>Successful as ‘schoolbook’ example of e-science approach </li></ul><ul><ul><ul><li>VL-e mid-term review (‘e-science that works’) </li></ul></ul></ul><ul><ul><ul><li>3x NBIC </li></ul></ul></ul><ul><ul><ul><ul><li>Bioinformatics symposium </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Text mining workshop </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Web services/workflow tutorial </li></ul></ul></ul></ul><ul><ul><ul><li>ICT delta and eChallenges </li></ul></ul></ul><ul><ul><ul><li>ISMB/ECCB2007 Vienna </li></ul></ul></ul><ul><ul><ul><li>2x OMII-UK workshops </li></ul></ul></ul><ul><li>Attracts bioinformaticians to e-science </li></ul><ul><ul><ul><li>Example: NBIC/BioAssist </li></ul></ul></ul>
    37. 37. BioAssist <ul><li>National bioinformatics support programme </li></ul><ul><ul><li>Now based on e-science </li></ul></ul><ul><ul><li>Taverna as target platform </li></ul></ul><ul><li>5 (power)user communities (‘PAL’s pals’): </li></ul><ul><ul><li>Integrated analysis of functional genomics data </li></ul></ul><ul><ul><li>Proteomics data management and analysis </li></ul></ul><ul><ul><li>Metabolomics data management and analysis </li></ul></ul><ul><ul><li>Biobanking </li></ul></ul><ul><ul><li>High throughput sequencing </li></ul></ul><ul><ul><li>System bioinformatics </li></ul></ul><ul><li>Grid and super computing support from SARA </li></ul><ul><ul><li>Collaboration with OMII-UK/myGrid for linking computing resources with Taverna for transparent processing of large datasets </li></ul></ul>
    38. 38. BioAssist as test-bed community <ul><li>Life science/bioinformatics requirements </li></ul><ul><ul><li>Taverna </li></ul></ul><ul><ul><li>Large data processing in Taverna </li></ul></ul><ul><ul><li>Running workflows without Taverna help bioinformaticians help biologists </li></ul></ul><ul><ul><li>Web service repository (e.g. BioCatalogue) </li></ul></ul><ul><li>Potential ‘companion’ tools for OMII-UK toolset </li></ul><ul><ul><li>MolGenis for local data </li></ul></ul><ul><ul><li>vBrowser for browsing resources e.g. workflow results and data on grids </li></ul></ul><ul><li>Collaborative effort to address requirements </li></ul><ul><ul><li>Sharing code </li></ul></ul>
    39. 39. PAL’s future <ul><li>Is this PAL satisfied? Not yet! </li></ul><ul><li>Uptake by bioinformatics: going well </li></ul><ul><li>Uptake by systems biology: progress </li></ul><ul><li>Uptake by life science: early days </li></ul>
    40. 40. Full circle a biological question… 10/06/09 BioAID Could be running on a Grid or cluster
    41. 41. 10/06/09 BioAID Thank you for sending me this e -Experiment from myExperiment.org!
    42. 42. Experiences and conclusions <ul><li>VL-e, AID and OMII-UK have helped me reach out to the bioinformatics and life science communities </li></ul><ul><li>(Hopefully) I was helpful in getting the e-science of VL-e, AID, and OMII-UK across to the bioinformatics and life science community </li></ul><ul><li>Collaborative spirit and win-win unfamiliar for me-scientists </li></ul><ul><ul><li>Dissemination requires a lot of time and energy </li></ul></ul>
    43. 43. Experiences and conclusions <ul><li>OMII-UK and its members prove the concept of e-science </li></ul><ul><ul><li>accomplish the hugely complicated task of being successful from core computer science to application science and back (imho the essence of e-science research) </li></ul></ul><ul><ul><li>Why? (my view) </li></ul></ul><ul><ul><ul><li>Strong positive leadership </li></ul></ul></ul><ul><ul><ul><li>Successful approach, acknowledging social aspects </li></ul></ul></ul><ul><ul><ul><li>Large enough community </li></ul></ul></ul><ul><li>OMII-UK role model for e-science & organisations adopting e-science </li></ul><ul><ul><li>e-science (user) community needs a role model for some time to come </li></ul></ul>
    44. 44. Acknowledgements <ul><li>AID team: Sophia Katrenko, Edgar Meij, Willem van Hage ,…, Frans Verster, Machiel Jansen , Scott Marshall and Guus Schreiber, Maarten de Rijke, Pieter Adriaans </li></ul><ul><li>Jan Top, Nicole Koenderink, Food informatics, Wageningen University </li></ul><ul><li>Martijn Schuemie, Erasmus University Rotterdam </li></ul><ul><li>Hideaki Sugawara, Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics (http://xml.nig.ac.jp) </li></ul><ul><li>OMII-UK and the myGrid family, Katy Wolstencroft </li></ul><ul><li>E-science support team for NBIC </li></ul><ul><li>VL-e colleagues </li></ul><ul><li>The Netherlands BioInformatics Centre (NBIC) </li></ul><ul><li>W3C Semantic Web Health Care and Life Sciences Interest Group </li></ul><ul><li>iCapture team in Canada </li></ul><ul><li>My friends on myExperiment </li></ul><ul><li>This work was supported by the Dutch Ministry of Economic Affairs via VL-e and BioRange (BSIK grants), and OMII-UK </li></ul>10/06/09
    45. 45. A hopefully mutually felt: warm and fuzzy feeling! Thank you for your attention
    46. 46. Why should I adopt e-Science? I do not believe in e -Science I only believe in Me -Science
    47. 47. Why adopt e-science? For determined sinners: ‘ The seven deadly sins of bioinformatics’ by Carole Goble http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics/

    ×