Presentation of ChemSPider at PubChem Public Meeting


Published on

An overview of ChemSpider given at the PubChem Public Advisory Board Meeting in 2007

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Presentation of ChemSPider at PubChem Public Meeting

  1. 1. ChemSpider Creating a Structure Centric Community for Chemists Antony Williams [email_address]
  2. 2. The ChemSpider Mission <ul><li>Build a structure centric community for chemists by: </li></ul><ul><ul><li>Providing an environment for structure drawing, manipulation, visualization, modeling, databasing and searching </li></ul></ul><ul><ul><li>Providing methods by which to deposit, curate and enhance data associated with chemical structures </li></ul></ul><ul><ul><li>Providing structure-based access to federated Chemistry databases representing chemical vendors, literature, online data, patents and other forms of Chemistry data </li></ul></ul>
  3. 3. Execution of the Mission September 2007 <ul><li>An online database of nearly 20 million structures (should be >21 million following the latest depositions) </li></ul><ul><li>Systems in place for: </li></ul><ul><ul><li>Single structure and data collection depositions (in beta testing) </li></ul></ul><ul><ul><li>Association of analytical data with structures </li></ul></ul><ul><ul><li>Ability to curate data for each individual record </li></ul></ul><ul><li>Indexing of and Integration to: </li></ul><ul><ul><li>Over >80 individual databases </li></ul></ul><ul><ul><li>Patents from the US and European Patent offices (SureChem) </li></ul></ul>
  4. 4. Execution of the Mission September 2007 <ul><li>Text-based searching of over 50,000 Open Access articles (110,000 have been indexed but not online yet. Structure searching is coming) </li></ul><ul><li>Over 100,000 identifiers curated </li></ul><ul><li>Average of 1200 unique users per day </li></ul><ul><li>A series of web services for people to access a number of our capabilities </li></ul><ul><li>Multiple collaborations now in place </li></ul>
  5. 5. Flexible Boolean Searching
  6. 6. Flexible Boolean Searching
  7. 7. Flexible Boolean Searching
  8. 8. Search result: 49 hits in 0.8 seconds
  9. 9. Integrated Visualization Tools
  10. 10. Integrated Analytical Data Management for Public Domain Data
  11. 11. Integrated Access to Open Access Literature Text-based searching of over 50,000 Open Access Chemistry Articles
  12. 12. External Integrations - Google Search Across Google Using InChI string
  13. 13. External Integrations – Patents Surechem Portal
  14. 14. How do people generally use ChemSpider? <ul><li>Searching for chemical structures, in rank order, via: </li></ul><ul><ul><li>Trade names, synonyms and registry numbers, . </li></ul></ul><ul><ul><li>Structure identifiers such as SMILES or InChI </li></ul></ul><ul><ul><li>Intrinsic properties: commonly mass-based searches executed by mass spectrometrists </li></ul></ul><ul><ul><li>Systematic names: IUPAC or CAS Index name </li></ul></ul><ul><li>Structure-based searching of Patents </li></ul><ul><li>Text-based searching of Open Access articles </li></ul><ul><li>Generation of physicochemical properties </li></ul>
  15. 15. Curators - An Active Community <ul><li>Active curation is happening everyday now </li></ul><ul><li>Roboticized curation is underway – scripting to strip obvious errors </li></ul><ul><li>Visit the blog posts for detail ( </li></ul>
  16. 16. Quality is a Major Issue <ul><li>Pubchem structure-identifier pairs are proliferating </li></ul><ul><li>Care is needed or at least cleansing of the data </li></ul>
  17. 17. Quality is a Major Issue <ul><li>Other Databases … </li></ul><ul><li>1-Butyl alcohol , 1-Hydroxybutane , 1-butanol , Alcool butylique, Butan-1-ol, Butanol-1, Butanolen, Butanolo, Butyl alcohol, Butyl hydroxide, Butyl orthotitanate, Butyl titanate, Butyl titanate (IV), Butyl zirconate , Butylowy alkohol, Butyric alcohol, Butyric or normal primary butyl alcohol, Hemostyp, Methylolpropane, Propylcarbinol, Propylmethanol, Tetrabutoxytitanium, Tetrabutoxyzirconium, Tetrabutyl orthotitanate, Tetrabutyl titanate, Tetrabutyl zirconate, Titanium butoxide (Ti), Titanium tetrabutoxide, Titanium tetrabutylate, Zirconic acid butyl ester, Zirconium tetrabutoxide , n-Butan-1-ol, n-Butanol, n-Butanolbutanolen, n-Butyl alcohol, n-Butylalkohol, propyl carbinol </li></ul>
  18. 18. Quality is a Major Issue
  19. 19. Curating on ChemSpider
  20. 20. Curating PubChem Data <ul><li>The PubChem team is not resourced to curate the data </li></ul><ul><li>The data should be curated </li></ul><ul><li>ChemSpider has created an environment to validate and curate the data </li></ul><ul><li>Curation is underway </li></ul><ul><li>We will feed back curated data to PubChem on an ongoing basis </li></ul>
  21. 21. ChemSpider and PubChem <ul><li>ChemSpider will deposit our entire database of structures to PubChem following our latest deposition and deduplication cycle (within a month we hope) </li></ul><ul><li>ChemSpider is curating data and will submit back to PubChem </li></ul><ul><li>At 9:13am today: </li></ul>
  22. 22. Online Deposition System in Beta
  23. 23. Provide Tools for Developers
  24. 24. Provide Tools for Developers
  25. 25. Targets for 2007 <ul><li>End of year intentions for ChemSpider include </li></ul><ul><ul><li>Adding more databases to the index </li></ul></ul><ul><ul><li>Enhance integrations to other structure drawing packages </li></ul></ul><ul><ul><li>Additional property prediction algorithms from partners. More predicted properties to go online shortly. Calculations for >20 million structures is time-consuming! </li></ul></ul><ul><ul><li>Expand analytical data handling – presently working with a publisher regarding hosting the data for their publications </li></ul></ul><ul><ul><li>Enhance the Patent integration </li></ul></ul><ul><ul><li>Expand the Open Access article index to >250,000 articles </li></ul></ul><ul><ul><li>Make Medline structure searchable by text mining </li></ul></ul>
  26. 26. Targets for End of 2007 <ul><li>Source funding to continue the ChemSpider project </li></ul><ul><li>Deliver on projects with collaborators: </li></ul><ul><ul><li>ChemModLab with NCSU and NISS for QSAR-based virtual screening. ZINC is 4.6 million commercially available compounds. ChemSpider has about 10 million commercially available compounds – 3D optimized structures will be generated shortly </li></ul></ul><ul><ul><li>Simbiosys has developed groundbreaking technologies in terms of the speed of virtual screening by docking against targets. ChemSpider ligands will be used in virtual screens </li></ul></ul><ul><ul><li>Connectivities between ChemSpider and Chembench (Alex Tropsha at UNC Chapel Hill) will be enabled </li></ul></ul>
  27. 27. Making the Web Structure Searchable <ul><li>The InChIString and InChIKey will help make the web structure searchable </li></ul><ul><li>InChIStrings are not indexed correctly and the shift is to the InChIKey </li></ul><ul><li>“ Someone” must host the InChIKey look up table relating to InChIStrings </li></ul><ul><li>“ Someone” must provide scalable online tools for the capture, databasing and searching of InChIs </li></ul><ul><li>InChIs do NOT make the web substructure or similarity of structure searchable. An index will. </li></ul>
  28. 28. Conclusion <ul><li>ChemSpider is successfully building a structure centric community for chemists </li></ul><ul><li>Over 1200 chemists per day utilize ChemSpider to help answer questions and solve their problems </li></ul><ul><li>A well-defined path forward to enhance the service has been defined </li></ul>
  29. 29. Acknowledgments <ul><li>Thousands of users for their feedback and ongoing encouragement </li></ul><ul><li>The “naysayers” – criticism, when taken constructively, can drive creative actions </li></ul><ul><li>Our advisory group of scientists, specialists and friends </li></ul><ul><li>The bloggers coming to the ChemSpider Blog and ChemSpider News </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li> </li></ul></ul>