A Web-Service
                                                 Approach for
                                              ...
A Web-Service
Outline            Approach for
                    Distributed
                    Access to
              ...
A Web-Service
Local versus Distributed Resources                               Approach for
                              ...
A Web-Service
Combining Data & Functionality                                     Approach for
                            ...
A Web-Service
Web Services - What’s Available                               Approach for
                                 ...
A Web-Service
Web Services for Modeling                                            Approach for
                          ...
A Web-Service
What’s Available?                                                          Approach for
                    ...
A Web-Service
Web Services - What’s Available                                    Approach for
                            ...
A Web-Service
REST Frontends to SOAP Services                             Approach for
                                   ...
A Web-Service
REST 3D Structure Service                                   Approach for
                                   ...
A Web-Service
REST Depiction Service                                        Approach for
                                 ...
A Web-Service
REST Descriptor Service                                   Approach for
                                     ...
A Web-Service
Pub3D                                                                            Approach for
              ...
A Web-Service
Pub3D Performance                                                                                           ...
A Web-Service
Pub3D & Clustering                     Approach for
                                        Distributed
    ...
A Web-Service
Pub3D & Clustering                     Approach for
                                        Distributed
    ...
A Web-Service
Pub3D and Conformers                                            Approach for
                               ...
A Web-Service
Model Exchange                                                  Approach for
                               ...
A Web-Service
Model Exchange                                                  Approach for
                               ...
A Web-Service
PMML for Model Exchange                                          Approach for
                              ...
A Web-Service
PMML for Model Exchange                                        Approach for
                                ...
A Web-Service
PMML for Model Exchange                               Approach for
                                         ...
A Web-Service
PMML for Model Exchange                               Approach for
                                         ...
A Web-Service
PMML for Model Exchange                               Approach for
                                         ...
A Web-Service
Summary                                                          Approach for
                              ...
A Web-Service
Summary                                                          Approach for
                              ...
A Web-Service
Acknowledgments     Approach for
                     Distributed
                     Access to
           ...
Upcoming SlideShare
Loading in …5
×

I Don't Care Where My Data and Methods Are: A Web-Service Approach for Distributed Access to Methods, Data and Models

1,707 views
1,663 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,707
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
50
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

I Don't Care Where My Data and Methods Are: A Web-Service Approach for Distributed Access to Methods, Data and Models

  1. 1. A Web-Service Approach for Distributed Access to Methods, Data and Models A Web-Service Approach for Distributed Rajarshi Guha Geoffrey Fox Access to Methods, Data and Models Kevin E. Gilbert Marlon Pierce David Wild (I Don’t Care Where My Data and Methods Are) Overview Pub3D Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Model Exchange Marlon Pierce David Wild School of Informatics Indiana University 235th ACS National Meeting 6th March, 2008
  2. 2. A Web-Service Outline Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Marlon Pierce David Wild Overview Overview Pub3D Model Exchange Pub3D Model Exchange
  3. 3. A Web-Service Local versus Distributed Resources Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Marlon Pierce David Wild Overview Pub3D Model Exchange Access arbitrary resources (methods, data, applications) All resources look like function calls
  4. 4. A Web-Service Combining Data & Functionality Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert But what do we mean by “combining” all these Marlon Pierce David Wild resources? Overview Different levels of complexity Pub3D Keep track of new additions to a database via an RSS Model Exchange feed Use Yahoo Pipes to combine and manipulate output easily Write full fledged programs in your language of choice Web services allow us to support all these activities in a uniform manner
  5. 5. A Web-Service Web Services - What’s Available Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Cheminformatics functionality Marlon Pierce David Wild Molecular descriptors Overview Similarity (2D, 3D) Pub3D Format conversion Model Exchange Depictions 3D coordinate generation Summarized on ChemBioGrid Dong, X. et al, J. Chem. Inf. Model., 2007, 47, 1303–1307 http://www.chembiogrid.org/projects/proj_core.html
  6. 6. A Web-Service Web Services for Modeling Approach for Distributed Access to Methods, Data and Models Computational web services can be viewed as wrappers Rajarshi Guha around the actual program Geoffrey Fox Kevin E. Gilbert Marlon Pierce Since predictive models are a common feature in David Wild cheminformatics we’d like to support them as well Overview This leads to a number of requirements Pub3D Ability to develop models Model Exchange Store (deploy) models Use the models via the web service infrastructure We provide a computational infrastructure based on R which supports Feature selection Model development Model deployment Arbitrary R code Guha, R. J. Chem. Inf. Model., 2008, 48, 456–464
  7. 7. A Web-Service What’s Available? Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Regression (OLS, CNN, RF) Geoffrey Fox Kevin E. Gilbert Classification (LDA) Marlon Pierce David Wild Clustering (k-means) Overview Feature selection (stepwise and exhaustive) Pub3D Automated model generation Model Exchange Load X, Y data, build linear and non-linear models with optional LOO CV Deployed models Random forests for 60 NCI DTP cell lines Cytotoxicity Ames mutagenicity Wang, H. et al., J. Chem. Inf. Model., 2007, 47, 2063–2076 Guha, R. and Sch¨rer, S., J. Comp. Aid. Molec. Des, 2008, ASAP u
  8. 8. A Web-Service Web Services - What’s Available Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Marlon Pierce David Wild Data sources Overview We maintain a number of databases Pub3D PubChem mirror (mainly for local research) Model Exchange Pub3D Directly accessible via queries in SQL We also encapsulate specific queries as web service calls http://www.chembiogrid.org/projects/proj_db.html
  9. 9. A Web-Service REST Frontends to SOAP Services Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox REST is a network architecture that avoids complex Kevin E. Gilbert Marlon Pierce message formats David Wild No extra libraries required Overview Access services using URL’s Pub3D Much simpler interface compared to SOAP Model Exchange We’ve been putting REST interfaces onto a number of our SOAP services Current REST services include Database (3D structure, PubChem mirror) Molecular descriptors Depictions
  10. 10. A Web-Service REST 3D Structure Service Approach for Distributed Access to Methods, Data To get a 3D structure for a PubChem CID and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert http://www.chembiogrid.org/cheminfo/rest/db/pub3d/2244 Marlon Pierce David Wild Overview Pub3D Model Exchange
  11. 11. A Web-Service REST Depiction Service Approach for Distributed Access to Methods, Data To get a 2D depiction for arbitrary SMILES and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert http://www.chembiogrid.org/cheminfo/rest/depict/C(=O)OCC Marlon Pierce David Wild Overview Pub3D Model Exchange
  12. 12. A Web-Service REST Descriptor Service Approach for Distributed Access to Methods, Data To get descriptors for an arbitrary SMILES string and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert http://www.chembiogrid.org/cheminfo/rest/desc/ Marlon Pierce descriptors/CC(=O)OCCN David Wild Overview Pub3D Model Exchange
  13. 13. A Web-Service Pub3D Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox A 3D structure database derived from PubChem Kevin E. Gilbert Marlon Pierce Current version contains a single 3D structure for 17M David Wild compounds Overview Structures obtained using MMFF94 Pub3D Not the lowest energy conformer Model Exchange Structures can be retrieved in SD format by CID using a web page or web service interfaces We also store distance moment shape descriptors allowing us to perform shape similarity searches http://www.chembiogrid.org/cheminfo/p3d/ Ballester, P.J. and Graham Richards, W., J. Comp. Chem., 2007, 28, 1711–1723
  14. 14. A Web-Service Pub3D Performance Approach for Distributed Access to Methods, Data Shape searches can be as fast as 5s, for reasonably large and Models result sets Rajarshi Guha Geoffrey Fox Fast enough for us to explore the “density of space” Kevin E. Gilbert Marlon Pierce around a given query compound David Wild Overview Similarity Query Times for 10000 Random CIDs (R = 0.4) 1 0.91 0.83 0.77 0.71 0.67 0.62 Pub3D 3500 140000 Levitra (110634) Model Exchange 3000 Diazepam (3016) Taxol (36314) 2500 Nearest Neighbor Count Didanosine (50599) 100000 Number of Queries Calcascorbin (6247) 2000 q 1500 60000 q q 1000 q q q 20000 q 500 q q q q q q q q q q q q 0 q 0 q q 0 20 40 60 80 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Time (sec) Radius Guha, R. et al., J. Chem. Inf. Model., 2006, 46, 1713–1722
  15. 15. A Web-Service Pub3D & Clustering Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Marlon Pierce David Wild Indexing gives good performance Overview Pub3D As index size increases, Model Exchange performance degrades Could add more RAM Clustering the database allows us to scale to significantly larger collections
  16. 16. A Web-Service Pub3D & Clustering Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Marlon Pierce David Wild Indexing gives good performance Overview Pub3D As index size increases, Model Exchange performance degrades Could add more RAM Clustering the database allows us to scale to significantly larger collections
  17. 17. A Web-Service Pub3D and Conformers Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Single, arbitrary conformers aren’t too useful Marlon Pierce David Wild We have currently generated conformers for a subset of PubChem (4 to 10 heavy atoms) Overview Pub3D 3 Kcal energy window Model Exchange 243,892 compounds ≈ 2M conformers Clustering will be vital when conformers are considered Allows us to handle arbitrary numbers of conformers May need to consider some sort of compression Could use cluster information to optimize queries
  18. 18. A Web-Service Model Exchange Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert The literature contains many published models Marlon Pierce David Wild No way to utilize them unless we manually rebuild them In many cases we will not have access to descriptors Overview Pub3D Difficult to search for models specifically Model Exchange We should be able to do the following Search for models Exchange them Execute them Is this a “format” issue? To some extent yes
  19. 19. A Web-Service Model Exchange Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert The literature contains many published models Marlon Pierce David Wild No way to utilize them unless we manually rebuild them In many cases we will not have access to descriptors Overview Pub3D Difficult to search for models specifically Model Exchange We should be able to do the following Search for models Exchange them Execute them Is this a “format” issue? To some extent yes
  20. 20. A Web-Service PMML for Model Exchange Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Predictive Modelling Markup Language Kevin E. Gilbert Marlon Pierce A standard supported by the Data Modeling Group David Wild Allows you to serialize predictive models to XML Overview Linear, logistic regression Pub3D Tree models Model Exchange Neural networks, Na¨ Bayes models ıve Association models Ensemble models (random forests, arbitrary ensembles) Supported by a number of platforms IBM, Salford Systems SAS, SPSS R
  21. 21. A Web-Service PMML for Model Exchange Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Marlon Pierce David Wild Since it’s XML, it’s extensible Overview PMML allows us to specify the model itself Pub3D But we need to add extra information to make a model Model Exchange truly searchable Provenance (who made it, when was it made, . . . ) Property (what is being modeled) Requirements (what are the inputs, how to get them)
  22. 22. A Web-Service PMML for Model Exchange Approach for Distributed Access to Methods, Data Provenance and Models Rajarshi Guha Geoffrey Fox Easily solved using Dublin Core Kevin E. Gilbert Marlon Pierce David Wild Property Overview Pub3D Could be addressed using keywords Model Exchange Could reuse pre-existing ontologies Requirements This is tricky Need to identify what descriptors were used What software, version How to evaluate the descriptors (if possible)
  23. 23. A Web-Service PMML for Model Exchange Approach for Distributed Access to Methods, Data Provenance and Models Rajarshi Guha Geoffrey Fox Easily solved using Dublin Core Kevin E. Gilbert Marlon Pierce David Wild Property Overview Pub3D Could be addressed using keywords Model Exchange Could reuse pre-existing ontologies Requirements This is tricky Need to identify what descriptors were used What software, version How to evaluate the descriptors (if possible)
  24. 24. A Web-Service PMML for Model Exchange Approach for Distributed Access to Methods, Data Provenance and Models Rajarshi Guha Geoffrey Fox Easily solved using Dublin Core Kevin E. Gilbert Marlon Pierce David Wild Property Overview Pub3D Could be addressed using keywords Model Exchange Could reuse pre-existing ontologies Requirements This is tricky Need to identify what descriptors were used What software, version How to evaluate the descriptors (if possible)
  25. 25. A Web-Service Summary Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Pub3D is a shape searchable version of PubChem Marlon Pierce David Wild Conformers will have to be considered for it to be useful Overview Are searches meaningful? Benchmarks required Pub3D Model Exchange Web services provide one approach to model deployment We should be able to search for models explicitly PMML is one extensible solution the addresses model exchange Automated model execution is more challenging
  26. 26. A Web-Service Summary Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Pub3D is a shape searchable version of PubChem Marlon Pierce David Wild Conformers will have to be considered for it to be useful Overview Are searches meaningful? Benchmarks required Pub3D Model Exchange Web services provide one approach to model deployment We should be able to search for models explicitly PMML is one extensible solution the addresses model exchange Automated model execution is more challenging
  27. 27. A Web-Service Acknowledgments Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Marlon Pierce David Wild Overview Pub3D Kangseok Kim Model Exchange NIH

×