Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ChemSpider – disseminating data and enabling an abundance of chemistry platforms


Published on

ChemSpider is one of the chemistry community’s primary public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data to many tens of websites and software applications at this point. This presentation will provide an overview of the expanding reach of the ChemSpider platform and the nature of solutions that it helps to enable. We will also discuss some of the future directions for the project that are envisaged and how we intend to continue expanding the impact for the platform.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

ChemSpider – disseminating data and enabling an abundance of chemistry platforms

  1. 1. ChemSpider – disseminating data and enabling an abundance of chemistry platforms Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov, Dmitry Ivanov, Colin Batchelor, Jon Steele and David Sharpe ACS New Orleans April 2013
  2. 2. ChemSpider• >28.5 million unique chemicals from >400 data sources• Focus on improving data quality, enhancing functionality, integrating and enabling
  3. 3. Some usage statistics• ca. 200 visitors at any one time, ~30,000 visits per day• Mar 4-Apr 3, 2013 – Visits = 731,656 – Unique Visitors = 527,008• Independent servers to support other projects
  4. 4. Access ChemSpider• APIs – Programmatic access used by Mobile Apps, Funded Consortia projects, many Academic groups• Widgets – UI components for embedding in other websites• Data – Data access, downloads, reuse, licensing
  5. 5. Supporting the Semantic Web
  6. 6. ChemSpider Resources for Chemistry
  7. 7. ChemSpider Audiences Simplified interface … thisFrom this…..
  8. 8. Substance Pages
  9. 9. It is so difficult to navigate… IP? IP? What’s the What’s the structure? structure? Are they in Are they in our file? our file? What’s What’s similar? similar? What’s the What’s the Pharmacology Pharmacology target? target? data? data? Known Known Pathways? Pathways? Competitors? Competitors? Working On Working On Connections to Connections to Now? Now? disease? disease? Expressed in Expressed in right cell type? right cell type?
  10. 10. • 3-year knowledge management IMI project• Integrating chemistry and biology data and delivering using semantic web technologies• Open source code, open data and open standards• Academics, Pharma companies, Publishers….
  11. 11. ChemSpider Contributions• The host of the chemistry services – Supplier of “standardized” chemical data files – Chemistry searching (structure, substructure etc) – Provider of data in RDF format – Curator and data quality checking• Now building the Open PHACTS chemical registration system
  12. 12. ChemSpider Contributions• Supplier of chemistry UI components• “Quality Police” for data checking• Chemical Validation and Standardization Platform• Nanopublications from RSC publications
  13. 13. • FP7 Initiative. PharmaSea: increasing value and flow in the marine biodiscovery pipeline
  14. 14. PharmaSea• Dereplication via ChemSpider• Segregation of natural products datasets• Analytical data algorithms & integration – Mass spec searching – predicted fragmentation – NMR feature searching – NMR prediction – Computer-assisted structure elucidation
  15. 15. Integrate to instruments and software• Integration to analytical instrumentation vendors already in place – Agilent, Bruker, Thermo, Waters• Also, Cheminformatics vendors link to ChemSpider – Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
  16. 16. Natural Products Updates• Names hard, Structures “Obvious”• New content based on monthly updates of the database• Click through to the Natural Products Updates entry
  17. 17. National Chemical Database Service
  18. 18. Chemical DatabaseService• National Chemical Database Service for UK Academics• Integrating Commercial Databases and Services• Chemicals, analytical data, prediction algorithms• Development of data repository
  19. 19. Retrosynthetic Analysis
  20. 20. Publications - a summary of work• Scientific publications are a summary of work – Is all work reported? – How much science is lost to pruning? – What of value sits in notebooks and is lost?• How much data is lost? – How many compounds never reported? – How many syntheses fail or succeed? – How many characterization measurements?
  21. 21. Community Repository for Data• Funding agencies encourage sharing of data• Increasing availability of “Open Data”• Institutional repositories no specific domain support• Develop a community repository for chemistry data – private, public, embargoed• Provides data to develop models/algorithms
  22. 22. Community Repository for Data• Automated depositions of data• DOI’ed data objects for citation purposes• A database of reference data, but validated by the community• National services feeding the repository – crystallography, mass spectrometry• Integrate to blogging tools for chemistry• Integrate to Electronic Lab Notebooks as feeds
  23. 23. Model Building with Community Data• Community data as a basis of model building – Consume data from available databases, community data, new publications and build predictive algorithms for the community – How many algorithms are reported and lost? How much repeat work is done in the domain of algorithmic development?
  24. 24. Recognition onDataIC50 Measurements for 62 substituted benzoxazolesChemSpider Data Repository: DOI: 10.1356/CSID784.4
  25. 25. Integrate to electronic lab notebooks
  26. 26. E-Lab Notebooks• Previous work with IDBS and University of Cambridge• Working on LabTrove integration win U. Southampton• Integration between ELNs and: • ChemSpider • ChemSpider Reactions • CDS Repository• Publish data from ELNs issue DOIs• Data aggregated into fully indexed ESI format for publication
  27. 27. Support for Chemical Reactions• Integrating mined reaction data from patents (Daniel Lowe)• Will also incorporate and integrate: Methods of Organic Synthesis, Catalysts and Catalyzed Reactions and…
  28. 28. Micro-publishing Chemical Reactions
  29. 29. ChemSpider SyntheticPages
  30. 30. Retrosynthetic Analysis
  31. 31. Inside our Publication Archive• How much data is in the archive, in the publications and in the supplementary info? – How many compounds for ChemSpider? – How many syntheses for ChemSpider reactions? – How many characterization measurements? • Property Data • Spectral Data • Graphs and charts to be used for modeling?
  32. 32. What if we could capture it all?Digitally Enhancing the RSC Archive
  33. 33. Start with data in publications
  34. 34. Recent Work
  35. 35. Comparison of Spectra
  36. 36. Data Validation and Curation Required
  37. 37. CVSP: Validation and Standardization
  38. 38. Data Validation and Curation Required Encouraging Participation with Rewards and RECOGNITION
  39. 39. Manual Curation• Integrated commenting, curating and validation platform across ALL eScience and publishing platforms• All integrated to a central RSC profile and feeding the AltMetrics tools
  40. 40. Structure Review
  41. 41. Where we are now…
  42. 42. Rewards and Recognition The First Step badge is awarded when a user submits (& has published) their 1st CSSP article.Congratulations! Your 1st CSSP articlehas been published. Philosopher LaoTzu said “A journey of a thousandmiles begins with a single step”. In thesame way we hope that this will bethe first of many submissions that youmake to CSSP.
  43. 43. Future Recognition in AltMetrics? ChemSpider
  44. 44. Why is ChemSpider “different”• Interfaces for integration• Sharing of data – and increasingly open• Open for community participation – Deposition – Annotation – Curation• We are clear…the world is changing
  45. 45. The Future Internet DataSmall organic molecules Commercial SoftwareUndefined materials Pre-competitive DataOrganometallics Open ScienceNanomaterials Open DataPolymers PublishersMinerals EducatorsParticle bound Open DatabasesLinks to Biologicals Chemical Vendors
  46. 46. Acknowledgments• The RSC eScience and infrastructure teams• Our data providers, depositors, collaborators and curators• Daniel Lowe for Reaction Data• William Brouwer, Penn State• Software providers – OpenEye, ChemDoodle, ACD/Labs, GGA Software, Open Source (Jmol, JSpecView, OpenBabel)
  47. 47. Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: