The ONDEX data integration framework BOSC2007, Vienna 20.07.2007 Jan Taubert ( [email_address] ) Rothamsted Research, UK
Summary ONDEX – framework for large scale data integration, text mining & graph analysis JAVA API and standalone application  License: GNU General Public License Project status: early alpha Statistics: 814 files, 159572 lines http://ondex.sourceforge.net
University of Bielefeld, Bielefeld, Germany.  University of Koblenz, Koblenz, Germany.  University of Nottingham, Nottingham, Nottinghamshire, UK.  University of Tromsø, Tromsø, Norway. University of Wageningen, Wageningen, Netherlands. Rothamsted  Research, Harpenden, Hertfordshire, UK. Current members:   Jan Baumbach Sonja Ernst  Keywan Hassani-Pak  Matthew Hindle  Berend Hoekman  Jacob Köhler  Artem Lysenko  Stephan Philippi Chris Rawlings Jan Taubert  Paul Verrier  Jochen Weile  Rainer Winnenburg  Tully Yates  Former members:   Jessica Butz  Sebastian Elsner  Ina Kupp  Alexander Rüegg  Klaus Peter Sieren  Andre Skusa  Michael Specht Members
Based on JAVA J2SE 5.0 Berkeley DB Java Edition XFire SOAP framework Jetty WebServer Lucene text search engine Taverna for workflows
combine Large Experimental Data New Insights ONDEX Motivation 100‘s of Bio-Databases
…  in which nodes and edges can have different properties. enzyme kinetics protein interactions metabolic pathways protein structure relation properties ontologies Everything is a network
ONDEX: Graph of Concepts and Relations Biology: Protein interaction network Ontology of Concept Classes, Relation Types and additional Properties Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum …  Protein – Ligand interaction network Protein Ligand interact Protein interact Concept Concept Concept Relation Relation Concept Class: Protein Protein Ligand Relation Type: interact interact Ontology based graph
 
Microarray experiment result analysis map 100‘s of Bio-Databases One example application
Comp Protein Gene Enzyme EC Treat- ment Reaction Pathway
Treatments from DRASTIC Pathways from KEGG
Three steps: Try it out, submit bugs you find Suggest/Implement your improvements Become a contributor and submit improvements http://ondex.sourceforge.net  Contributors will be acknowledged on the project website and in publications involving their work. Contributors are welcome to publish their work on ONDEX under their own names.  How to contribute
Good : Write Flatfile parser for ONDEX Better : Provide your database in OXL  (see Taubert et al. (2007) “Exchange of integrated datasets – the OXL format”, in press,  IB2007) Also welcome : Provide your database in another standard  (BioPax, SBML, XGMML; but may result in loss of information) http://ondex.sourceforge.net  What to contribute
Algorithms : Needed for the alignment of integrated data Core : Improve persistency layer of Ontology based graph  Exporter : Provide your own exchange standard Webservices : Increase compatibility http://ondex.sourceforge.net  What to contribute
http://ondex.sourceforge.net  Support : OXL in your application Connect : Import from web service or directly from Core API Algorithms : Graph analysis using the ONDEX Visualisation and Analysis Tool Kit (OVTK) Feedback & Feature requests : Mailing lists and Sourceforge.net  What to contribute
Jun 06 – Jun 07:  1828 Downloads,  15289 page views  Current SF.net rank: 1074 Subversion Activity Jan 07 – Jun 07  8904 Reads  2072 Writes 4821 File Uploads Developer mailing list :  [email_address] User mailing list :  [email_address] Current release : 0.9alpha1  Sourceforge.net
J Taubert, R Winnenburg, M Hindle, J Weile, J Baumbach, S Philippi, C Rawlings and J Köhler (2007) “Data integration, information filtering and knowledge extraction with ONDEX”, Paper in preparation J Taubert, K P Sieren, M Hindle, B Hoekman, R Winnenburg, S Philippi, C Rawlings and J Köhler (2007) “Exchange of integrated datasets – the OXL format”, Submitted Paper,  4th integrative bioinformatics workshop (IB2007) Jacob Köhler, Stephan Philippi, Michael Specht and Alexander Rüeg (2006) "Ontology based text indexing and querying for the semantic web", Knowledge-Based Systems, Volume 19, Issue 8 Jacob Köhler, Jan Baumbach, Jan Taubert, Michael Specht, Andre Skusa, Alexander Rüegg, Chris Rawlings, Paul Verrier and Stephan Philippi (2006) "Graph-based analysis and visualization of experimental results with ONDEX", Bioinformatics 22(11)  Skusa, A., Rüegg, A., Köhler, J. (2005) "Extraction of biological networks from scientific literature", Briefings in Bioinformatics 6(3)  Köhler, J., Rawlings, C., Verrier, P., Mitchell, R., Skusa, A., Rüegg, A. and Philippi, S. (2004), "Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalized Data Structures", In Silico Biol, Volume 5, Special Issue: Ontology and Genome, Manuscript number in online journal: 0005.  Publications
Centre for Mathematical and Computational Biology Department of Biomathematics and Bioinformatics Rothamsted Research  Dr Jacob Köhler, Principle Investigator Prof Chris Rawlings, Head of Department Rothamsted Research is supported by the BBSRC Travel grants and scholarships by Acknowledgements
4 th  Integrative Bioinformatics workshop   10th to 12th September 2007 University of Ghent, Belgium http://www.rothamsted.bbsrc.ac.uk/bab/conf/ib07/   Invited speakers:   Prof Carole Goble, School of Computer Science, University of Manchester, UK Prof Søren Brunak, BioCentrum-DTU, Technical University of Denmark, Denmark Dr David Searls, Senior Vice President, Informatics, GlaxoSmithKline Pharmaceuticals, USA Dr Luis Serrano, EMBL, Heidelberg, Germany Organising committee : Prof Ralf Hofestädt, University of Bielefeld, Germany (Co-chair)   Dr Jacob Koehler, Rothamsted Research, UK (Co-chair)  Prof Martin Kuiper, University of Ghent, Belgium (Local organisation) Paul Verrier, Rothamsted Research, UK (Local organisation) Poster submission deadline 27 th August 2007 Registration deadline 13 th August 2007

The Ondex Data Integration Framework

  • 1.
    The ONDEX dataintegration framework BOSC2007, Vienna 20.07.2007 Jan Taubert ( [email_address] ) Rothamsted Research, UK
  • 2.
    Summary ONDEX –framework for large scale data integration, text mining & graph analysis JAVA API and standalone application License: GNU General Public License Project status: early alpha Statistics: 814 files, 159572 lines http://ondex.sourceforge.net
  • 3.
    University of Bielefeld,Bielefeld, Germany. University of Koblenz, Koblenz, Germany. University of Nottingham, Nottingham, Nottinghamshire, UK. University of Tromsø, Tromsø, Norway. University of Wageningen, Wageningen, Netherlands. Rothamsted Research, Harpenden, Hertfordshire, UK. Current members: Jan Baumbach Sonja Ernst Keywan Hassani-Pak Matthew Hindle Berend Hoekman Jacob Köhler Artem Lysenko Stephan Philippi Chris Rawlings Jan Taubert Paul Verrier Jochen Weile Rainer Winnenburg Tully Yates Former members: Jessica Butz Sebastian Elsner Ina Kupp Alexander Rüegg Klaus Peter Sieren Andre Skusa Michael Specht Members
  • 4.
    Based on JAVAJ2SE 5.0 Berkeley DB Java Edition XFire SOAP framework Jetty WebServer Lucene text search engine Taverna for workflows
  • 5.
    combine Large ExperimentalData New Insights ONDEX Motivation 100‘s of Bio-Databases
  • 6.
    … inwhich nodes and edges can have different properties. enzyme kinetics protein interactions metabolic pathways protein structure relation properties ontologies Everything is a network
  • 7.
    ONDEX: Graph ofConcepts and Relations Biology: Protein interaction network Ontology of Concept Classes, Relation Types and additional Properties Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum … Protein – Ligand interaction network Protein Ligand interact Protein interact Concept Concept Concept Relation Relation Concept Class: Protein Protein Ligand Relation Type: interact interact Ontology based graph
  • 8.
  • 9.
    Microarray experiment resultanalysis map 100‘s of Bio-Databases One example application
  • 10.
    Comp Protein GeneEnzyme EC Treat- ment Reaction Pathway
  • 11.
    Treatments from DRASTICPathways from KEGG
  • 12.
    Three steps: Tryit out, submit bugs you find Suggest/Implement your improvements Become a contributor and submit improvements http://ondex.sourceforge.net Contributors will be acknowledged on the project website and in publications involving their work. Contributors are welcome to publish their work on ONDEX under their own names. How to contribute
  • 13.
    Good : WriteFlatfile parser for ONDEX Better : Provide your database in OXL (see Taubert et al. (2007) “Exchange of integrated datasets – the OXL format”, in press, IB2007) Also welcome : Provide your database in another standard (BioPax, SBML, XGMML; but may result in loss of information) http://ondex.sourceforge.net What to contribute
  • 14.
    Algorithms : Neededfor the alignment of integrated data Core : Improve persistency layer of Ontology based graph Exporter : Provide your own exchange standard Webservices : Increase compatibility http://ondex.sourceforge.net What to contribute
  • 15.
    http://ondex.sourceforge.net Support: OXL in your application Connect : Import from web service or directly from Core API Algorithms : Graph analysis using the ONDEX Visualisation and Analysis Tool Kit (OVTK) Feedback & Feature requests : Mailing lists and Sourceforge.net What to contribute
  • 16.
    Jun 06 –Jun 07: 1828 Downloads, 15289 page views Current SF.net rank: 1074 Subversion Activity Jan 07 – Jun 07 8904 Reads 2072 Writes 4821 File Uploads Developer mailing list : [email_address] User mailing list : [email_address] Current release : 0.9alpha1 Sourceforge.net
  • 17.
    J Taubert, RWinnenburg, M Hindle, J Weile, J Baumbach, S Philippi, C Rawlings and J Köhler (2007) “Data integration, information filtering and knowledge extraction with ONDEX”, Paper in preparation J Taubert, K P Sieren, M Hindle, B Hoekman, R Winnenburg, S Philippi, C Rawlings and J Köhler (2007) “Exchange of integrated datasets – the OXL format”, Submitted Paper, 4th integrative bioinformatics workshop (IB2007) Jacob Köhler, Stephan Philippi, Michael Specht and Alexander Rüeg (2006) "Ontology based text indexing and querying for the semantic web", Knowledge-Based Systems, Volume 19, Issue 8 Jacob Köhler, Jan Baumbach, Jan Taubert, Michael Specht, Andre Skusa, Alexander Rüegg, Chris Rawlings, Paul Verrier and Stephan Philippi (2006) "Graph-based analysis and visualization of experimental results with ONDEX", Bioinformatics 22(11) Skusa, A., Rüegg, A., Köhler, J. (2005) "Extraction of biological networks from scientific literature", Briefings in Bioinformatics 6(3) Köhler, J., Rawlings, C., Verrier, P., Mitchell, R., Skusa, A., Rüegg, A. and Philippi, S. (2004), "Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalized Data Structures", In Silico Biol, Volume 5, Special Issue: Ontology and Genome, Manuscript number in online journal: 0005. Publications
  • 18.
    Centre for Mathematicaland Computational Biology Department of Biomathematics and Bioinformatics Rothamsted Research Dr Jacob Köhler, Principle Investigator Prof Chris Rawlings, Head of Department Rothamsted Research is supported by the BBSRC Travel grants and scholarships by Acknowledgements
  • 19.
    4 th Integrative Bioinformatics workshop 10th to 12th September 2007 University of Ghent, Belgium http://www.rothamsted.bbsrc.ac.uk/bab/conf/ib07/ Invited speakers: Prof Carole Goble, School of Computer Science, University of Manchester, UK Prof Søren Brunak, BioCentrum-DTU, Technical University of Denmark, Denmark Dr David Searls, Senior Vice President, Informatics, GlaxoSmithKline Pharmaceuticals, USA Dr Luis Serrano, EMBL, Heidelberg, Germany Organising committee : Prof Ralf Hofestädt, University of Bielefeld, Germany (Co-chair)   Dr Jacob Koehler, Rothamsted Research, UK (Co-chair)  Prof Martin Kuiper, University of Ghent, Belgium (Local organisation) Paul Verrier, Rothamsted Research, UK (Local organisation) Poster submission deadline 27 th August 2007 Registration deadline 13 th August 2007