Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards
Upcoming SlideShare
Loading in...5
×
 

Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards

on

  • 2,625 views

My presentation at the "Open Drug Discovery and Open Notebook Science" session at the GDCh-Wissenschaftsforum Chemie 2009 in Frankfurt.

My presentation at the "Open Drug Discovery and Open Notebook Science" session at the GDCh-Wissenschaftsforum Chemie 2009 in Frankfurt.

Statistics

Views

Total Views
2,625
Views on SlideShare
2,570
Embed Views
55

Actions

Likes
3
Downloads
35
Comments
0

6 Embeds 55

http://chem-bla-ics.blogspot.com 35
http://lanyrd.com 15
http://www.slideshare.net 2
http://chem-bla-ics.blogspot.de 1
http://chem-bla-ics.blogspot.in 1
http://chem-bla-ics.blogspot.com.es 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source, and Open Standards Presentation Transcript

  • Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source and Open Standards Egon Willighagen <http://chem-bla-ics.blogspot.com/> Bioclipse & Proteochemometric Group (Prof. Wikberg) Department of Pharmaceutical Biosciences Uppsala University 2009-08-31
  • The Setting... Problem Solution Results Discussions 1998: Organic Conclusion chemistry... beatiful science! But ... why, how, what, ... PJJA Buijnsters et al., Eur.J.Org.Chem, 2002, 1397–1406 2009-08-31 Bioclipse & Proteochemometric Group -2- Egon Willighagen | chem-bla-ics.blogspot.com
  • Reliable Knowledge: Trust Problem Solution Results Discussions How to build Trust Conclusion track record 2009-08-31 Bioclipse & Proteochemometric Group -3- Egon Willighagen | chem-bla-ics.blogspot.com
  • Knowledge: Trust Problem Solution Results Discussions How to build Trust Conclusion track record transparency: citation 2009-08-31 Bioclipse & Proteochemometric Group -4- Egon Willighagen | chem-bla-ics.blogspot.com
  • Knowledge: Trust Problem Solution Results How to build Trust Discussions track record Conclusion transparency: citation reproducibility: details 2009-08-31 Bioclipse & Proteochemometric Group -5- Egon Willighagen | chem-bla-ics.blogspot.com
  • Knowledge: Trust Problem Solution How to build Trust Results track record Discussions transparency: citation Conclusion reproducibility: details Open {Data|Standards|Source|. . . } 2009-08-31 Bioclipse & Proteochemometric Group -6- Egon Willighagen | chem-bla-ics.blogspot.com
  • Knowledge Representation... Problem Solution Results Discussions Conclusion What are the organic normal conditions? 2009-08-31 Bioclipse & Proteochemometric Group -7- Egon Willighagen | chem-bla-ics.blogspot.com
  • The Problem: Reproducibility... Problem Where reproducibility is Solution severely hampered: Results Discussions recalculate basic atom and Conclusion bond properties access to QSAR/QSPR data well-defined algorithms publications destroy information 2009-08-31 Bioclipse & Proteochemometric Group -8- Egon Willighagen | chem-bla-ics.blogspot.com
  • Solutions... Openess Problem license that allows Solution modification and Results redistribution Discussions Conclusion hiding behind public domain is not helpful Semantic Web be explicit in what you mean both in facts and in algorithms 2009-08-31 Bioclipse & Proteochemometric Group -9- Egon Willighagen | chem-bla-ics.blogspot.com
  • Reproducibility needs ODOSOS Open Data Problem No Intellectual Monopoly Solution Open Source Results algorithms are complex Discussions Conclusion implementations even more strong interaction with representation Open Standards Semantic Web formats unique identifiers http: // en. wikipedia. org/ wiki/ Glyn_ Moody 2009-08-31 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com
  • Jmol Problem Solution Started in 1997 by Results Dan Gezelter Discussions (Notre Dame) Conclusion Leaders: Bradly Smith, me, Miguel Howard, Bob Hanson E.L. Willighagen, M. Howard, Nature Precedings, 2005 http: // www. jmol. org/ 2009-08-31 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com
  • The Chemistry Development Kit A Family of Projects CDK-Taverna (chemoinformatics workflows) Problem Solution JChemPaint (semantic 2D editor) Results ChemoJava (GPL-ed extension) Discussions Goals Conclusion library of cheminformatics algorithms educational Usage CDK 2003: 75+ times cited in literature Bioclipse, KNIME, Jumbo (CML), AMBIT, ... C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003 C. Steinbeck et al., Curr.Pharm.Design, 2006 2009-08-31 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com
  • CDK: an Open Project Problem Features Solution open mailinglist and bug Results Discussions tracker Conclusion open source repository release soon, release often Offer Review senior developers review patches 2009-08-31 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com
  • Bioclipse Problem Solution Results Discussions Conclusion O. Spjuth et al., BMC Bioinformatics 2007, 8:59 2009-08-31 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com
  • Integration Services databases: PubChem Problem Solution web services Results Google Spreadsheets Discussions MyExperiment.org: Bioclipse Conclusion Scripting Language Twitter, ... journals, ... Techniques SOAP, REST, XMPP, . . . Resource Description Framework dedicated APIs 2009-08-31 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com
  • MyExperiment: Bioclipse Scripting Language Problem Solution Results Discussions Conclusion 2009-08-31 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com
  • XMPP XMPP Jabber Problem protocol Solution Alternative to Results HTTP Discussions Conclusion XML-based: improved semantics Features Asychronous XML-based: improved semantics J. Wagener et al., BMC Bioinformatics, 2009, in production 2009-08-31 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com
  • Resource Description Framework Problem Facts as Triples Solution subject Results predictate (relation) Discussions Conclusion object Examples wp:Benzene chem:hasSMILES "c1ccccc1" wp:Benzene owl:sameAs chemspider:123 2009-08-31 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com
  • OpenMolecules RDF Problem Solution Results Discussions Conclusion http://rdf.openmolecules.net/ 2009-08-31 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com
  • Blue Obelisk Problem Solution Results Discussions Conclusion R Guha et al., J.Chem.Inf.Model., 2006 2009-08-31 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com
  • Which License? Choice Problem Solution GPL v2 or v3, LGPL v2 or Results v3, Apache, BSD, MIT, ... Discussions FDL, CC0, PDDL Conclusion Important: redistribution, modification Bad Practise not explicitly stating your intentions Public Domain 2009-08-31 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com
  • Mixing Data? Problem Solution License Incompatibility Results Discussions Ask about the copyright Conclusion holders intention! Use Open Standard Interfaces Resource Description Framework 2009-08-31 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com
  • Conclusions Problem No Intellectual Monopoly Acchieved Solution Jmol, CDK, JChemPaint, Bioclipse Results • A huge success! Discussions Conclusion Open Data in chemistry is still way behind • Open Access trap • Public Domain trap Semantics is showing up • in RDF • in Publishing 2009-08-31 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com
  • The Details Problem Solution Results http://www.citeulike.org/user/ Discussions egonw/tag/papers Conclusion http: //chem-bla-ics.blogspot.com mailto: egon.willighagen@farmbio.uu.se 2009-08-31 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com