Your SlideShare is downloading. ×
0
Linking Open Drug Data to
                        Cheminformatics and
                        Proteochemometrics


Egon Wi...
Knowledge...




                                                         We model our world, but ...
                    ...
Names...


                     benzene
                     3-[4-[3-(1-methyl-7-oxo-3-propyl-4H-
                     pyr...
... Molecular reality...




                1 000 000 000 000 000 000 000 000
                000 000 000 000 000 000 000...
... and Numbers




2009-11-20   Bioclipse & Proteochemometric Group   -5-   Egon Willighagen | chem-bla-ics.blogspot.com
Knowledge Representation: Information
              Loss




2009-11-20   Bioclipse & Proteochemometric Group   -6-   Egon...
Data Analysis




2009-11-20   Bioclipse & Proteochemometric Group   -7-   Egon Willighagen | chem-bla-ics.blogspot.com
Proteochemometrics




2009-11-20   Bioclipse & Proteochemometric Group   -8-   Egon Willighagen | chem-bla-ics.blogspot.c...
Main Theme




                       How do we navigate dimensionality space?
                       How include prior kn...
OpenMolecules RDF: dereferenceable URI




                http://rdf.openmolecules.net/

2009-11-20   Bioclipse & Proteoc...
OpenMolecules RDF: linked data




                http://rdf.openmolecules.net/
2009-11-20   Bioclipse & Proteochemometri...
The Chemistry Development Kit

                A Family of Projects
                       CDK-Taverna (chemoinformatics w...
Bioclipse




                O. Spjuth et al., BMC Bioinformatics 2007, 8:59

2009-11-20   Bioclipse & Proteochemometric ...
Integration

               Services
                      databases: PubChem
                      web services
         ...
Bioclipse-RDF




                       local RDF storage
                       read/write RDF/XML, N3
                 ...
Quote of the Day




                       "There are too many people doing data integration,
                       this...
SPARQL end points




                GNU FDL
                       NMRShiftDB data (also available via Bio2RDF)
        ...
Names 2 Graphs 2 Numbers...




2009-11-20   Bioclipse & Proteochemometric Group   - 18 -   Egon Willighagen | chem-bla-ic...
Disease 2 PDB




2009-11-20   Bioclipse & Proteochemometric Group   - 19 -   Egon Willighagen | chem-bla-ics.blogspot.com
CDK as RDF


                model1:atom1
                      a       cdk:Atom ;
                      cdk:hasFormalChar...
Proteochemometrics




2009-11-20   Bioclipse & Proteochemometric Group   - 21 -   Egon Willighagen | chem-bla-ics.blogspo...
OWL for Descriptors




                Used for model and data.



2009-11-20   Bioclipse & Proteochemometric Group   - 2...
MyExperiment: Bioclipse Scripting
              Language




2009-11-20   Bioclipse & Proteochemometric Group   - 23 -   E...
What does this bring us?




                       Platform to integrate the RDF with the computation world
             ...
Where next?



                Framework
                       Triple generation on demand (XMPP, SADI, ...)
            ...
The Details




                      http://www.citeulike.org/user/
                      egonw/tag/papers
              ...
Upcoming SlideShare
Loading in...5
×

Linking Open Drug Data to Cheminformatics and Proteochemometrics

1,655

Published on

My talk at SWAT4LS 2009 in Amsterdam.

Published in: Health & Medicine, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,655
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
42
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Linking Open Drug Data to Cheminformatics and Proteochemometrics"

  1. 1. Linking Open Drug Data to Cheminformatics and Proteochemometrics Egon Willighagen <http://chem-bla-ics.blogspot.com/> Bioclipse & Proteochemometric Group (Prof. Wikberg) Department of Pharmaceutical Biosciences Uppsala University 2009-11-20
  2. 2. Knowledge... We model our world, but ... Life is not uni- or bivariate Knowledge is not either But we think of it as such Information Loss! Solanum lycopersicum... 2009-11-20 Bioclipse & Proteochemometric Group -2- Egon Willighagen | chem-bla-ics.blogspot.com
  3. 3. Names... benzene 3-[4-[3-(1-methyl-7-oxo-3-propyl-4H- pyrazolo[4,3-d]pyrimidin-5-yl)-4- propoxyphenyl]sulfonylpiperazin-1- yl]propanoic acid InChI=1S/C25H34N6O6S/c1-4-6-19-22- 23(29(3)28-19)25(34)27-24(26-22)18-16- 17(7-8-20(18)37-15-5-2)38(35,36)31-13-11- 30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9- 15H2,1-3H3,(H,32,33)(H,26,27,34) 2009-11-20 Bioclipse & Proteochemometric Group -3- Egon Willighagen | chem-bla-ics.blogspot.com
  4. 4. ... Molecular reality... 1 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 2009-11-20 Bioclipse & Proteochemometric Group -4- Egon Willighagen | chem-bla-ics.blogspot.com
  5. 5. ... and Numbers 2009-11-20 Bioclipse & Proteochemometric Group -5- Egon Willighagen | chem-bla-ics.blogspot.com
  6. 6. Knowledge Representation: Information Loss 2009-11-20 Bioclipse & Proteochemometric Group -6- Egon Willighagen | chem-bla-ics.blogspot.com
  7. 7. Data Analysis 2009-11-20 Bioclipse & Proteochemometric Group -7- Egon Willighagen | chem-bla-ics.blogspot.com
  8. 8. Proteochemometrics 2009-11-20 Bioclipse & Proteochemometric Group -8- Egon Willighagen | chem-bla-ics.blogspot.com
  9. 9. Main Theme How do we navigate dimensionality space? How include prior knowledge? While minimizing information loss? With optimal knowledge extraction? And maximizing interpretability? Without ending up in random correlation? 2009-11-20 Bioclipse & Proteochemometric Group -9- Egon Willighagen | chem-bla-ics.blogspot.com
  10. 10. OpenMolecules RDF: dereferenceable URI http://rdf.openmolecules.net/ 2009-11-20 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com
  11. 11. OpenMolecules RDF: linked data http://rdf.openmolecules.net/ 2009-11-20 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com
  12. 12. The Chemistry Development Kit A Family of Projects CDK-Taverna (chemoinformatics workflows) JChemPaint (semantic 2D editor) ChemoJava (GPL-ed extension) Goals library of cheminformatics algorithms educational Usage CDK: 100+ times cited in scientific literature Bioclipse, KNIME, Jumbo (CML), AMBIT, ... C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003 C. Steinbeck et al., Curr.Pharm.Design, 2006 2009-11-20 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com
  13. 13. Bioclipse O. Spjuth et al., BMC Bioinformatics 2007, 8:59 2009-11-20 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com
  14. 14. Integration Services databases: PubChem web services Google Spreadsheets MyExperiment.org: Bioclipse Scripting Language Twitter, ... journals, ... Techniques SOAP, REST, XMPP, . . . Resource Description Framework dedicated APIs 2009-11-20 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com
  15. 15. Bioclipse-RDF local RDF storage read/write RDF/XML, N3 run SPARQL queries (local and remote) extract RDF from XHTML/RDFa Thanx to Jena and Pellet. 2009-11-20 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com
  16. 16. Quote of the Day "There are too many people doing data integration, this is a waste of a lot of smart people’s time" @alanruttenberg at #swat4ls2009 via dullhunk - twitter 2009-11-20 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com
  17. 17. SPARQL end points GNU FDL NMRShiftDB data (also available via Bio2RDF) CC0 ChemPedia Open Notebook Science Solubility 2009-11-20 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com
  18. 18. Names 2 Graphs 2 Numbers... 2009-11-20 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com
  19. 19. Disease 2 PDB 2009-11-20 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com
  20. 20. CDK as RDF model1:atom1 a cdk:Atom ; cdk:hasFormalCharge "1" ; cdk:symbol "O" . model1:atom2 a cdk:Atom ; cdk:symbol "C" . model1:mol1 a cdk:Molecule ; dc:title "Methanol" ; owl:sameAs <http://rdf.openmolecules.net/?InChI=1/CH4O/c1-2 cdk:hasAtom model1:atom2 , model1:atom1 ; cdk:hasBond model1:bond1 . 2009-11-20 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com
  21. 21. Proteochemometrics 2009-11-20 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com
  22. 22. OWL for Descriptors Used for model and data. 2009-11-20 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com
  23. 23. MyExperiment: Bioclipse Scripting Language 2009-11-20 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com
  24. 24. What does this bring us? Platform to integrate the RDF with the computation world Bioclipse as single point of access Scripting, sharing of scripts with MyExperiment.org Bridge the nominal with the numerical world 2009-11-20 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com
  25. 25. Where next? Framework Triple generation on demand (XMPP, SADI, ...) Ontology alignments Semantic Mediawiki integration Proteochemometrics Knowledge discovery Data set aggregation Automated model validation 2009-11-20 Bioclipse & Proteochemometric Group - 25 - Egon Willighagen | chem-bla-ics.blogspot.com
  26. 26. The Details http://www.citeulike.org/user/ egonw/tag/papers http: //chem-bla-ics.blogspot.com http://egonw.github.com waveto: egon.willighagen@googlewave.com 2009-11-20 Bioclipse & Proteochemometric Group - 26 - Egon Willighagen | chem-bla-ics.blogspot.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×