We will demonstrate an e-science approach to mining knowledge from biomedical literature through the application of the ‘Adaptive Information Disclosure Application’ toolbox that we develop in the context of the Dutch Virtual Laboratory e-Science project.
Demo Presentation Wageningen Text Mining Workshop 2007 - Presentation Transcript
with AIDA Marco Roos , Scott Marshall, Sophia Katrenko, Edgar Meij, Willem van Hage, Pieter Adriaans AIDA demonstration Wageningen, 23/11/2007 Adaptation of Talk for Taverna/OMII-UK workshop, Hinxton, October 2007
About e-Science An e-science approach to mining biomedical literature
and beer… And how we relate it to beer…
Virtual Laboratory e-Science project
Wet laboratory analogy Data Data handling & Data integration Metadata Data analysis Data storage Expert knowledge
General framework of AID for VL-e Middleware for sharing resources Knowledge-based resource management
General framework of AID for VL-e Middleware for sharing resources Model based resource management TM TM TM TM TM TM
Theme An e-Science approach to mining biomedical literature
and beer… And how we relate it to beer…
10/06/09 BioAID Which diseases may be associated with my protein of interest?
Biomedical knowledge repository 10/06/09 BioAID PubMed statistics http://www.ncbi.nlm.nih.gov/entrez >17 million citations >400,000 added/year ~70,000 searches/month … Does not compute Does not fit
Bioinformatics A bioinformatician
Bioinformatics A typical bioinformatician
Bioinformatics A biologist behind a computer who (just) learned perl
/* * determines ridges in htm expression table */ #include "ridge.h" int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable) { char querystring[256]; sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname); htmtable = PQexec(conn, querystring); return(validquery(htmtable, querystring)); } int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount) /* determines if mincount genes in a row are (part of) a ridge */ /* pre: htmtable is valid and sorted on genStart (ascending) /* post: { if (mincount<=0) return TRUE; if (row>=PQntuples(htmtable)) return FALSE; if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold) { return FALSE; } return(is_ridge(htmtable, ++row, exprthreshold, --mincount)); } int main() { PGconn *conn; /* holds database connection */ char querystring[256]; /* query string */ PGresult *result; int i; conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim"); if (PQstatus(conn)==CONNECTION_BAD) { fprintf(stderr, "connection to database failed.
"); fprintf(stderr, "%s", PQerrorMessage(conn)); exit(1); } else printf("Connection ok
"); sprintf(querystring, "SELECT * FROM chromosomes"); printf("%s
", querystring); result = PQexec(conn, querystring); if (validquery(result, querystring)) { printresults(result); } else { PQclear(result); PQfinish(conn); return FALSE; } PQclear(result); PQfinish(conn); return TRUE; } int printresults(PGresult *tuples) { int i; for (i=0; i< PQntuples(tuples) && i < 10; i++) { printf("%d, ", i); printf("%s
", PQgetvalue(tuples,i,0)); } return TRUE; } int validquery(PGresult *result, char *querystring) { printf(" in validquery
"); if (PQresultStatus(result) != PGRES_TUPLES_OK) { printf("Query %s failed.
", querystring); fprintf(stderr, "Query %s failed.
", querystring); return FALSE; } return TRUE; }
Theme No , that is not an e-Science approach to mining biomedical literature
Not e-science So 2000 (quoting Lennart Post)
Not e-science So 1980
Theme An e-Science approach to mining biomedical literature
An e-science approach
e-Science
Collaboration
Combining expertise
Sharing
Technology
e -Scientists Edgar Meij Information retrieval expert
e -Scientists Sophia Katrenko Machine learning (information extraction) expert
e -Scientists Willem van Hage Semantic web expert (and bass guitar player)
The AIDA toolbox for knowledge extraction and knowledge management in a virtual laboratory for e -Science
e -bioscience An e-bioscientist
Components of the AIDA toolbox used for Life Science knowledge extraction
Bio AID Disease Discovery workflow 10/06/09 BioAID AIDA AIDA OMIM service (Japan) AIDA ‘ Taverna shim’ Taverna ‘shim’
An e-science approach
e-Science
Collaboration
Combining expertise
Sharing
Technology
Sharing
with AIDA Live Demonstration Marco Roos , Scott Marshall, Sophia Katrenko, Edgar Meij, Willem van Hage, Pieter Adriaans AIDA demonstration Tavena/OMII-UK, Hinxton, October 2007
10/06/09 BioAID Which diseases may be associated with my protein of interest?
10/06/09 BioAID
Components of the AIDA toolbox used for Life Science knowledge extraction
10/06/09 BioAID
Sharing
Bio AID Disease discovery workflow
Bio AID Disease discovery workflow 10/06/09 BioAID
Bio AID Disease discovery workflow from 100 abstracts: 29 proteins associated with 1280 diseases 10/06/09 BioAID
Extending BioAID
Extending BioAID workflows
10/06/09 BioAID Doesn’t EZH2 have synonyms?
“ Collaboration through web services” Bio-text mining expert Martijn Schuemie
“ Collaboration through web services” Synonym service
10/06/09 BioAID EZH2 is only a small part of a very complex system, for my research I need more than lists
components... 10/06/09 BioAID
Workflow and semantics When running workflows Store how biological entities are related Combine results from different runs Recover ‘trails to evidence’
Example scenario of semantic approach Need a unique identifyer myModel myExtended Model
“ Collaboration through web services” 2 Bio-text mining expert Martijn Schuemie
Workflows for text mining ‘pipe line’ (BioAID) Named entity recognition Training document Manual annotation Annotated text that provides examples: N x …sentence<concept x > entity </concept x >sentence… Learn Learned model Readable patterns or black box of unreadable conditions: unreadable condition=true => <concept x > entity </concept x > Extract named entities and relations List of named entities
Requirements
Training documents and annotator(s) (or pre-annotated text)
List of concepts to annotate with (possibly from an ontology)
A corpus of interest
‘ Generalise’ examples per concept List of concepts <Concept=name> entity </concept>, frequency, doc ID, … Annotated sentences Training Corpus (e.g. MEDLINE)
Advantages
Concepts of choice
Quality of output under our control
Limitations
Output is limited by initial list of concepts
Substantial amount of manual work (annotation)
Semantic networks
Modelling support Epigenetics (paramutation) Courtesy of Maike Stam Cell division Escherichia coli Courtesy of Tanneke den Blauwen HIV < TF M M M M M M M RDRP RdDM Pol reinforcement repression M M M M M M M TF TF Pol RDRP B'
Reuse and share knowledge MedLine Reuse and share biological knowledge TM
Conclusions
Conclusions and discussion
e-science approach to text mining
myExperiment.org
‘ mySpace’ for computational scientists
Reach out to end-users
Workflow and web services
From ‘black box perl’ to ‘computational experiments’
The best service is good a special service is better
Share with users and other developers on text mining network on myExperiment.org
Conclusion e -Science and sharing 10/06/09 BioAID
Why adopt e-science?
Why should I adopt e-Science? I don’t believe in e -Science I believe in Me -Science
Why adopt e-science? For determined sinners: ‘ The seven deadly sins of bioinformatics’ by Carole Goble http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics/
Acknowledgements
AIDA developers team: Sophia Katrenko, Edgar Meij, Willem van Hage, Frans Verster, (Machiel Jansen), Scott Marshall.
Guus Schreiber, Maarten de Rijke, Pieter Adriaans
Jan Top, Nicole Koenderink, Food informatics, Wageningen University
Martijn Schuemie, Erasmus University Rotterdam
OMII-UK and myGrid team, especially Katy Wolstencroft, Stian Soiland, Stuart Owen, Andy Gibson, Alan Rector, Robert Stevens, Carole Goble
W3C Semantic Web Health Care and Life Sciences Interest Group
Hideaki Sugawara, Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics (http://xml.nig.ac.jp)
This work was supported by the Dutch Ministry of Economic Affairs via VL-e and BioRange (BSIK grants)
10/06/09 BioAID
How does beer relate to BioAID? And how do we relate it to beer?
10/06/09 BioAID
Thank you for your attention… Join the text mining network on myExperiment.org!!! AID information and resources http:// adaptivedisclosure.org W3C Semantic Web Health Care and Life Sciences Interest Group http://www.w3.org/2001/sw/hcls/ BioAID workflows available from http:// .org
0 comments
Post a comment