From Laboratory to e-Laboratory - Presentation Transcript
From Laboratory to e-Laboratory? Introduction for ‘Lab-J’ of the LUMC Human Genetics Department Marco Roos Acknowledging the colleagues from BioSemantics, myGrid, OMII-UK, AID, The LUMC BioInformatics Expertise Centre
Introducing 2 Me
Liaison biology/bioinformatics – informatics 3 Biologist and bioinformatician, e-(bio)science researcher Coordinator BioSemantics group LeidenHuman Genetics Department Leiden University Medical Centre and Informatics Institute University of Amsterdam Project or Area Liaison (PAL) OMII-UK Member BioAssist programme committee NBIC
also about 4 You
First about 5 Me
My C.V. before e-Sciencebefore 2003 6 Molecular & Cellular biology(MSc) microscopy and image analysis of chromosome structure ‘minor’ computer science Image analysis methods to measure DNA content in bull sperm cells(civil service) Chromatin structure & function(PhD molecular cytology) F.I.S.H., microscopy, image analysis, statistics 3-D chromosome structure during cell cycle (no luck) DNA movement in Escherichia coli(success) Human Transcriptome Map (post-doc) Gene expression to human genome sequence Analysis of regions of increased gene expression
MotivationStructure and function of DNA in the nucleus Escherichia coli Muntiacusmuntjak
8 Why bioinformatics? Lab-J suggests…
23/09/2009 BioAID 9 Bioinformatics A typical bioinformatician
23/09/2009 BioAID 10 Bioinformatics A biologist behind a computer who (just) learned perl
23/09/2009 BioAID 11 /* * determines ridges in htm expression table */ #include "ridge.h" intselecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable) { char querystring[256]; sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname); htmtable = PQexec(conn, querystring); return(validquery(htmtable, querystring)); } intis_ridge(PGresult *htmtable, int row, double exprthreshold, intmincount) /* determines if mincount genes in a row are (part of) a ridge */ /* pre: htmtable is valid and sorted on genStart (ascending) /* post: { if (mincount<=0) return TRUE; if (row>=PQntuples(htmtable)) return FALSE; if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold) { return FALSE; } return(is_ridge(htmtable, ++row, exprthreshold, --mincount)); } int main() { PGconn *conn; /* holds database connection */ char querystring[256]; /* query string */ PGresult *result; inti; conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim"); if (PQstatus(conn)==CONNECTION_BAD) { fprintf(stderr, "connection to database failed.
"); fprintf(stderr, "%s", PQerrorMessage(conn)); exit(1); } else printf("Connection ok
"); sprintf(querystring, "SELECT * FROM chromosomes"); printf("%s
", querystring); result = PQexec(conn, querystring); if (validquery(result, querystring)) { printresults(result); } else { PQclear(result); PQfinish(conn); return FALSE; } PQclear(result); PQfinish(conn); return TRUE; } intprintresults(PGresult *tuples) { inti; for (i=0; i< PQntuples(tuples) && i < 10; i++) { printf("%d, ", i); printf("%s
", PQgetvalue(tuples,i,0)); } return TRUE; } intvalidquery(PGresult *result, char *querystring) { printf(" in validquery
"); if (PQresultStatus(result) != PGRES_TUPLES_OK) { printf("Query %s failed.
", querystring); fprintf(stderr, "Query %s failed.
", querystring); return FALSE; } return TRUE; }
State of the art applied computer science to a biologist 12
Why e-science? What is wrong with bioinformatics? 13 Human geneticists think…
Why should a biologist be interested in e-science? 14 BioAssistantsguessed… Involves Computation Interpretation of results Biology isn’t that interesting Reduce reinvention of the wheel Current lack of standards Sharing results Reshaping biology Synergy between different sciences Emerging Data driven science
15 Why e-Science? Lots of data to deal with Single tiny brain Lots of knowledge to deal with No computationalsuperpowers Lots of methodsand algorithms to try and combine Aneedy biologist
16 1070 databasesNucleic Acids Research Jan 2008(96 in Jan 2001) Proteomics, Genomics, Transcriptomics, Protein sequence prediction, Phenotypic studies, Phylogeny, Sequence analysis, Protein Structure prediction, Protein-protein interaction, Metabolomics, Model organism collections, Systems Biology, Epidemiology, etcetera … All with a splendid interface … all different, of course
23/09/2009 17 Traditional data integration in bioinformatics Local Database Local Database
18 The ‘spaghetti’ approach
Some of my observations Reinvention How many reannotation pipelines do you need? Little reuse of components Reproducibility Black boxes Emphasis not on clarity Can we understand bioinformatics as wet lab protocols? Focus on technicalities, not biological analysis Should bioinformaticians write ‘job submission’ scripts? Data graveyards Do we need >1000 databases? Can we understand our own data? 19
How did I end up here? 20 Marco Roos Biologist and bioinformatician, Post-doc e-(bio)scienceHuman Genetics Department Leiden University Medical Centre and Informatics Institute University of Amsterdam Project or Area Liaison (PAL) OMII-UK Member BioAssist programme committee NBIC
Some examples from field of e-Science 21
Enhancement 1: Workflows(Taverna workflow) 22
Enhancement 2: exploiting brains 23
Exploiting Brains By Web Servicessource: http://biocatalogue.org(launched at ISMB2009) 24 >1000 annotated services, >3000 known to Taverna Includes BioMart, R, Text mining, Kegg, NCBI Pubmed, Ensembl, etc. Web Services run remotely
25 Exploiting more brains by sharing workflowssource: http://myExperiment.org Social community web site for scientists 2300 registered users in two years 750 workflows
Bioinformatics and e-science Customized experiments with reusable components Single purpose,single person, black box application My component Your component My component My component Your component
What do we know of our data? 27 Sufficient?
Query discoveries?
Query across experiment?
Fit biological modelling?
Good basis for new experiments?
Flexible enough?
Model-based data integration Computer readable model Biologist readable model Biological concepts (‘myModel’) Data Marshall et al., International Workshop on Knowledge Systems in Bioinformatics 2006 Post et al., Bioinformatics 2007
Model based data integrationExample: UCSC genome browser partOf
Semantic Web (Linked Open Data) 30
31 Empower me with a ‘virtual brain’ * My ws Your ws My ws My ws Your ws * From P.J. Verschure, Journal of Cellular Biochemistry 2006, vol. 99(1), pg 23-34
32 Query Add query to semantic model Retrieve documents from Medline Add documents (IDs) to semantic model Extract proteins (Homo sapiens) Add proteins to semantic model Calculate ranking scores Add scores to semantic model Create biological cross references Add cross references to semantic model Convert to table (html) Workflow and Semantic Web
Concept web from a users point of view 33
34 e-Laboratories and e-Laboratory factories
e-Galaxy for NBIC 35
Galaxy as front end
Workflows & Web Services
Grid enabled Taverna
MOLGENIS
Semantic/Concept Web
myExperiment/BioCatalogue
Scientific Research Objects
Vacancy! (software engineer)
SRO = a pack of models - Tool models - Data/ui models - Flow models +Attached data SRO enactment = a running e-laboratory Tools Uses tools services Model SROs my protocols my data my protocols my data Portal to workflows 2.0 mashup data Flows mashup tools e-biologist e-bioinformatician Uses data services Portal to workflows Data programmatic interaction user interfacing
e-Galaxy mock-up 37 Suggestions by semantic components Your Scientific Research Object Underlying workflow Related research and documents Adlsjfladjslfadsflkjalfdadsf Adfljadlfkjaladlfjlakdjflkjadf Adflkjlakjlkjadsflakdfjlfladoioewn Jlakdsfooiuwfjaoijaoisdflvoaijdf MOLGENIS Convert Import/Export Research Objects Store Configure Run
Research and development aims Automated support for hypothesis formation E.g. on epigenetic mechanisms Apply Workflow, Semantic Web, Concept Web Concept-based meta-analysis Automated triple creation from computational analysis 40
Research and development ambitions Co-develop e-Laboratories e-Galaxy epiGenius BioBanking Help BEC with support environment Concept Web services Web services E-Laboratory components Transparent creation of triples Personal semantic repositories 41
Liaison Bioinformatics Expertise Centre LUMC Statistical and computer science expertise Generic support NBIC BioAssist core software development Grid tools, Concept Web, e-Labs BioSemantics Rotterdam Text mining Concept profile meta-analysis AIDUniversity of Amsterdam e-Science experts Grid tools You? OMII-UK Manchester, Southampton, Edinburgh (ca. 30 engineers) Taverna, myExperiment, e-Labs Concept Web Content, tools and infrastructure W3C Health Care & Life Sciences Interest Group Semantic Web experts Linked Open Data
‘e’ for enhance, not enforce Please help me to help you Register for: http://snipurl.com/biosemanticsusers (http://www.myexperiment.org/groups/211) Allows me to Give you preferential treatment Not spam everybody Keep you informed Ask your opinion (user driven development!) 43
Visit the BioSemantics web sitehttp://www.biosemantics.org/ 44
Word of warning Computer scientists are scientists too! Need to publish Score by papers, not by software Addressed by OMII-UK and BioAssist Compare “How can I use it in the clinic?” “How can I use it in the lab?” 45
Dissemination Come by for help or information Internal ‘mini-courses’? Send me suggestions! Course Managing Life Science Information for PhD students, 2010 46
47 Thank you for your attention Lots of accessible data Communitybrain power Knowledge basesto query Other people’scomputationalsuperpowers Web Services, Workflows, and their creatorsavailable Anenhanced biologist Homo biologicusenhancis
0 comments
Post a comment