A biologist in e-Science


Published on

Presentation for the BioAssist programmers face-to-face, Novemebr 17, 2008, Utrecht, The Netherlands. BioAssist is a nation-wide Bioinformatics support programme.

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A biologist in e-Science

    1. 1. A biologists in e-Science? by Marco Roos Acknowledgements: Scott Marshall, Edgar Meij, Sophia Katrenko, Willem van Hage, Pieter Adriaans, Martijn Schuemie, Carole Goble, Dave de Roure, Katy Wolstencroft, Andy Gibson, the myGrid and myExperiment teams, many others who share their ideas, and… You! * Project or Area Liaison for OMII-UK (domain: Biology and Bioinformatics) BioAssist programmers meeting November 17, 2008, Utrecht, The Netherlands
    2. 2. A priori What does e-Science mean to you? <ul><li>BioAssistants say… </li></ul><ul><li>Collaboration </li></ul><ul><li>High throughput computing </li></ul><ul><li>Grid </li></ul><ul><li>Standardisation </li></ul><ul><li>Scientific integration (tools, databases, scientific objects) </li></ul><ul><li>Knowledge </li></ul><ul><li>Information integration </li></ul><ul><li>Biologists </li></ul>
    3. 3. Introducing myself A biologist
    4. 4. My prime interest Structure and function of DNA in the nucleus Escherichia coli Mouse fibroblast (skin) cells
    5. 5. My C.V. before e-Science e-Science since 2003 <ul><li>Molecular & Cellular biology (MSc) </li></ul><ul><ul><li>microscopy and image analysis of chromosome structure </li></ul></ul><ul><ul><li>‘ minor’ computer science </li></ul></ul><ul><li>Image analysis methods to measure DNA content in bull sperm cells (civil service) </li></ul><ul><li>Chromatin structure & function (PhD molecular cytology) </li></ul><ul><ul><li>F.I.S.H., microscopy, image analysis, statistics </li></ul></ul><ul><ul><li>3-D chromosome structure during cell cycle (no luck) </li></ul></ul><ul><ul><li>DNA movement in Escherichia coli (success) </li></ul></ul><ul><li>Human Transcriptome Map (post-doc) </li></ul><ul><ul><li>‘ Traditional’ BioInformatics; data integration: gene expression to human genome sequence </li></ul></ul><ul><ul><li>Analysis of regions of increased gene expression </li></ul></ul>
    6. 6. How did I end up here? <ul><ul><li>Marco Roos </li></ul></ul><ul><ul><li>Biologist and bioinformatician </li></ul></ul><ul><ul><li>Post-doc e-(bio)science, University of Amsterdam (BioRange/VL-e) </li></ul></ul><ul><ul><li>Project or Area Liaison (PAL) OMII-UK </li></ul></ul><ul><ul><li>Member BioAssist programme committee NBIC </li></ul></ul>
    7. 7. Why should a biologist be interested in e-science? <ul><li>BioAssistants guess… </li></ul><ul><li>Involves Computation </li></ul><ul><li>Interpretation of results </li></ul><ul><li>Biology isn’t that interesting </li></ul><ul><li>Reinvention of the wheel </li></ul><ul><li>Lack of standards </li></ul><ul><li>Sharing results </li></ul><ul><li>Reshaping biology </li></ul><ul><li>Synergy effect between these sciences </li></ul><ul><li>Emerging Data driven science </li></ul>
    8. 8. My prime interest Structure and function of DNA in the nucleus Escherichia coli Mouse fibroblast (skin) cells
    9. 9. Components controlling structure & function of DNA
    10. 10. Connecting the dots (example: protein interaction network in yeast)
    11. 11. Biomedical knowledge repository PubMed statistics http://www.ncbi.nlm.nih.gov/entrez >17 million citations >400,000 added/year ~70,000 searches/month … Does not compute Does not fit
    12. 12. 1070 databases Nucleic Acids Research Jan 2008 (96 in Jan 2001) <ul><li>Proteomics </li></ul><ul><li>Genomics </li></ul><ul><li>Transcriptomics </li></ul><ul><li>Protein sequence prediction </li></ul><ul><li>Phenotypic studies </li></ul><ul><li>Phylogeny </li></ul><ul><li>Sequence analysis </li></ul><ul><li>Protein Structure prediction </li></ul><ul><li>Protein-protein interaction </li></ul><ul><li>Metabolomics </li></ul><ul><li>Model organism collections </li></ul><ul><li>Systems Biology </li></ul><ul><li>Epidemiology … </li></ul>
    13. 13. What do I do? A needy biologist
    14. 14. ‘ Old school’ Bioinformatics A typical bioinformatician
    15. 15. ‘ Old school’ Bioinformatics A biologist behind a computer who (just) learned perl
    16. 16. /* * determines ridges in htm expression table */ #include &quot;ridge.h&quot; int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable) { char querystring[256]; sprintf(&quot;SELECT * FROM %s WHERE chrom = %s ORDER BY genstart&quot;, htmtablename, chromname); htmtable = PQexec(conn, querystring); return(validquery(htmtable, querystring)); } int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount) /* determines if mincount genes in a row are (part of) a ridge */ /* pre: htmtable is valid and sorted on genStart (ascending) /* post: { if (mincount<=0) return TRUE; if (row>=PQntuples(htmtable)) return FALSE; if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, &quot;movmed39expr&quot;)) < exprthreshold) { return FALSE; } return(is_ridge(htmtable, ++row, exprthreshold, --mincount)); } int main() { PGconn *conn; /* holds database connection */ char querystring[256]; /* query string */ PGresult *result; int i; conn = PQconnectdb(&quot;dbname=htm port=6400 user=mroos password=geheim&quot;); if (PQstatus(conn)==CONNECTION_BAD) { fprintf(stderr, &quot;connection to database failed. &quot;); fprintf(stderr, &quot;%s&quot;, PQerrorMessage(conn)); exit(1); } else printf(&quot;Connection ok &quot;); sprintf(querystring, &quot;SELECT * FROM chromosomes&quot;); printf(&quot;%s &quot;, querystring); result = PQexec(conn, querystring); if (validquery(result, querystring)) { printresults(result); } else { PQclear(result); PQfinish(conn); return FALSE; } PQclear(result); PQfinish(conn); return TRUE; } int printresults(PGresult *tuples) { int i; for (i=0; i< PQntuples(tuples) && i < 10; i++) { printf(&quot;%d, &quot;, i); printf(&quot;%s &quot;, PQgetvalue(tuples,i,0)); } return TRUE; } int validquery(PGresult *result, char *querystring) { printf(&quot; in validquery &quot;); if (PQresultStatus(result) != PGRES_TUPLES_OK) { printf(&quot;Query %s failed. &quot;, querystring); fprintf(stderr, &quot;Query %s failed. &quot;, querystring); return FALSE; } return TRUE; }
    17. 17. Theme Not an e-Science approach
    18. 18. The ‘spaghetti’ approach
    19. 19. Computational tools graveyard rephrasing David Shotton
    20. 20. Database survival: <20% ‘no problems’
    21. 21. Data graveyard quoting David Shotton
    22. 22. Why should a biologist be interested in e-science? <ul><li>Lots of data and knowledge to deal with </li></ul><ul><li>Bioinformaticians make spaghetti and graveyards </li></ul>
    23. 23. Bridging biology and computer science <ul><ul><li>Marco Roos </li></ul></ul><ul><ul><li>Biologist and bioinformatician </li></ul></ul><ul><ul><li>Post-doc e-(bio)science, University of Amsterdam (BioRange/VL-e) </li></ul></ul><ul><ul><li>Project or Area Liaison (PAL) OMII-UK </li></ul></ul><ul><ul><li>Member BioAssist programme committee NBIC </li></ul></ul>
    24. 24. Empowering biologists and bioinformaticians
    25. 25. How could we be empowered? <ul><li>BioAssistants guess… </li></ul><ul><li>Where a champion shirt </li></ul><ul><li>Data integration solutions (warehouse or something) </li></ul><ul><li>Communication between tools </li></ul><ul><li>Sharing methods </li></ul><ul><li>Talk the same language </li></ul><ul><li>More metadata about wet/dry experiments </li></ul><ul><li>Data directories (find data) </li></ul><ul><li>Google-type search </li></ul><ul><li>Sharing knowledge </li></ul><ul><li>Gain knowledge by combining knowledge </li></ul>
    26. 26. Experiment 1: Model based data integration Example: UCSC genome browser partOf * * Transcription Factor Binding Site
    27. 27. Experiment 2 <ul><li>An e -science approach for automated knowledge extraction from literature </li></ul>Roos, Marshall, et al., ISMB/ECCB, Vienna, 2007
    28. 28. An e-science approach <ul><li>Combining expertise </li></ul><ul><li>Collaborating and sharing </li></ul><ul><li>Technology </li></ul>
    29. 29. Which diseases are associated with my protein of interest ‘EZH2’
    30. 30. Biological knowledge extraction Biological question/model Computational experiment Extracted knowledge >17 million citations +400,000/yr
    31. 31. Combining expertise Edgar Meij Information retrieval expert
    32. 32. Combining expertise Sophia Katrenko Machine learning expert
    33. 33. Combining expertise Willem van Hage Semantic web expert (and bass guitar player)
    34. 34. Combining expertise Towards a knowledge framework Computer scientist and bioinformatician Scott Marshall
    35. 35. The AIDA toolbox, Web Services for knowledge extraction and knowledge management
    36. 36. e -Science collaboration AIDA toolbox
    37. 37. “ Collaboration through Web Services” Bio-text mining expert BioSemantics group, Erasmus University Rotterdam Martijn Schuemie
    38. 38. “ Collaboration through Web Services” Biological Database expert Hideaki Sugawara
    39. 39. “ Collaboration through Web Services” e -bioscientist
    40. 40. A nice experiment design
    41. 41. A not so nice experiment design
    42. 42. A workflow Protocol for a computational experiment
    43. 43. 05/06/09 BioAID
    44. 44. 05/06/09 BioAID
    45. 45. Sharing and publishing my designs
    46. 46. Bio AID Disease Discovery workflow 05/06/09 BioAID AIDA AIDA OMIM service (Japan) AIDA ‘ Taverna shim’ Taverna ‘shim’
    47. 47. Bio AID Disease discovery workflow 05/06/09 BioAID
    48. 48. Bio AID Disease discovery workflow 05/06/09 BioAID
    49. 49. An insightful computational experiment
    50. 50. e -Science leveraging the use of more brains Want this…
    51. 51. e -Science leveraging the use of more brains … need this
    52. 52. Publish and share Publish & share research objects myExperiment >400 workflows >1000 registered users (< 1yr) Run workflows without Taverna (expert feature) Open to objects other than workflows Link out to other resources
    53. 53. Do I feel all powerful now? An e -biologist?
    54. 54. Tabular output <ul><li>Output good for viewing </li></ul><ul><li>Useful, but sufficient? </li></ul><ul><li>Query discoveries? </li></ul><ul><li>Fits biological modelling? </li></ul><ul><li>Basis for new experiments? </li></ul><ul><li>Flexible enough? </li></ul>
    55. 55. Underestimated: The brain bottleneck
    56. 56. Empower me with a ‘virtual brain’ * From P.J. Verschure, Journal of Cellular Biochemistry 2006, vol. 99(1), pg 23-34 My ws Your ws My ws Your ws My ws *
    57. 57. Workflow and Semantic Web Query Retrieve documents from Medline Extract proteins ( Homo sapiens ) Calculate ranking scores Create biological cross references Convert to table (html) Add documents (IDs) to semantic model Add proteins to semantic model Add scores to semantic model Add cross references to semantic model Add query to semantic model
    58. 58. Do I feel all powerful now? An e -biologist?
    59. 59. http://staff.science.uva.nl/~roos/ChromatinWorkgroup/
    60. 60. e -Laboratory factories
    61. 61. Conclusions How do we know when e-Science has succeeded? Not just accelerated but new A. When everyone is using Grid computing? B. When scientists make scientific advances that would not have happened otherwise? Slide from ‘The New e-Science’ by Dave de Roure
    62. 62. Conclusions <ul><li>e -Science is for people </li></ul><ul><ul><li>Empower them Grid, Semantic Web, Workflow, e-Laboratories for people! </li></ul></ul><ul><ul><li>Let scientists be scientists </li></ul></ul><ul><ul><li>Scientists require better, not perfect </li></ul></ul><ul><li>Workflow empowers scientists </li></ul><ul><li>Empowering by Semantic Web and e-Laboratories in development </li></ul>
    63. 63. How would you like to be empowered? <ul><li>BioAssistants say </li></ul><ul><li>Biologists asking understandable (solvable) questions </li></ul><ul><li>Computer scientists giving understandable answers </li></ul><ul><li>Education </li></ul><ul><ul><li>New technologies </li></ul></ul><ul><li>Good machinery </li></ul><ul><ul><li>Errors on the grid </li></ul></ul><ul><li>Task sharing; build in collaboration </li></ul>You
    64. 64. Project and Area Liaison <ul><ul><li>Marco Roos </li></ul></ul><ul><ul><li>Biologist and bioinformatician </li></ul></ul><ul><ul><li>Post-doc e-(bio)science, University of Amsterdam (BioRange/VL-e) </li></ul></ul><ul><ul><li>Project or Area Liaison (PAL) OMII-UK </li></ul></ul><ul><ul><li>Member BioAssist programme committee NBIC </li></ul></ul>
    65. 65. How many brains do you want to use? – One?
    66. 66. Some?
    67. 67. Many?
    68. 68. Use your community myGrid/myExperiment OMII-UK You
    69. 69. End of presentation... <ul><li>Thank you </li></ul><ul><li>http://adaptivedisclosure.org Some related presentations </li></ul><ul><li>http://www.slideshare.net/dder/the-new-science-bangalore-edition </li></ul><ul><li>http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics </li></ul>