Marrying ACDLabs technologies to eScience Projects at the Royal Society of Chemistry


Published on

The Royal Society of Chemistry is one of the worlds foremost scientific societies, a primary publisher for the chemical sciences and an innovator in the domain of eScience. In order to deliver on a number of our eScience projects we utilize a number of components of Advanced Chemistry Development software including nomenclature, physchem prediction, spectroscopy tools and the ACD/Ilab web-based system. This presentation will provide an overview of a number of RSC projects where ACS/Labs software has played an important role in the delivery of the systems including ChemSpider and the National Chemical Database Service for the United Kingdom. We will also provide an overview of our vision to deliver a repository for various types of experimental chemistry data and how we foresee utilizing various prediction and validation software approaches to characterize the data as well as the potential to generate predictive models from the data. This couples directly with our intention to data enable our publication archive of over 300,000 articles extracting chemicals, reactions and analytical data from the historical records.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Marrying ACDLabs technologies to eScience Projects at the Royal Society of Chemistry

  1. 1. Marrying ACD/Labs technologiesto eScience Projects at theRoyal Society of ChemistryAntony WilliamsACD/Labs User MeetingJune 2013
  2. 2. RSC eScience• Royal Society of Chemistry is a member society(>47,000), Publisher and Innovator in eScience• Host of many online databases and services– ChemSpider, SyntheticPages, SpectraSchool,…• Participant in multiple grant-based projects– National Chemical Database Service– Open PHACTS– PharmaSea
  3. 3. Multiple ACD/Labs Tools in use…• Structure “checking” routines for data• Nomenclature generation and conversion• Physicochemical prediction algorithms• Web-based spectral display widget• “Interactive Lab” web-based prediction tools• But first an intro to ChemSpider…
  4. 4. ChemSpider• 28 million chemicals with associated data…
  5. 5. I want to know about “Vincristine”
  6. 6. I want to know about “Vincristine”
  7. 7. Vincristine: Identifiers andProperties
  8. 8. Predicted Properties
  9. 9. Vincristine: Vendors and SourcesLinked by Structure
  10. 10. Vincristine: Patents
  11. 11. Google Patents
  12. 12. Vincristine: ArticlesLinked by Name
  13. 13. RSC Databases
  14. 14. RSC Database Linkthrough
  15. 15. Spectra
  16. 16. Spectra
  17. 17. Where do data come from?• ChemSpider users deposit data• Some contributions from NIST• Chemical vendors are starting to provide data.Synthonix are one of our major contributors(
  18. 18. Crowdsourced “Annotations”• Users can add– Compounds– Descriptions/Syntheses/Commentaries– Links to articles via DOIs– Add spectral data– Add Crystallographic Information Files– Add photos– Add MP3 files– Add Videos
  19. 19. Crowdsourced Curation• Crowd-sourced curation: identify/tag errors,edit names, synonyms, identify records todeprecate
  20. 20. Spectral Uploading• Locate the structure of interest and depositspectrum
  21. 21. Spectral Uploading• Various types of NMR spectra supported
  22. 22. Regular Updates
  23. 23. Multiple Spectra for One Structure
  24. 24. ChemSpider ID 24528095 H1 NMR
  25. 25. ChemSpider ID 24528095 C13 NMR
  26. 26. ChemSpider ID 24528095 HHCOSY
  27. 27. ChemSpider ID 24528095 HSQC
  28. 28. ChemSpider ID 24528095 HMBC
  29. 29. Available Spectra
  30. 30. Number of Spectra• IR 5389• HNMR 1679• CNMR 1207• UV-Vis 183• EI 90• 2D1H13CD 68• Raman 51• NIR 32• 2D1H1HCOSY 21• 2D1H13CLR 10• CI+ve 8• PNMR 7• 9746 spectra against 6890 compounds
  31. 31. Some usage statistics• ca. 200 visitors at any one time, ~30,000 visits per day• Mar 4-Apr 3, 2013– Visits = 731,656– Unique Visitors = 527,008• Independent servers to support other projects• Does not include web service calls
  32. 32. ChemSpider as a Foundation• ChemSpider is a foundation for projects:– >400 data sources aggregated and mapped– Continually curated and updated with new data– Normalized data around a structure centric datamodel– Providing an API allows integration to support otherinternal projects– Providing API access outside RSC extends the reach
  33. 33. Micropublishing Syntheses
  34. 34. ChemSpider SyntheticPages
  35. 35. Olympicene
  36. 36. Web ServicesExample: Spectral Data
  37. 37. www.SpectralGame.com
  38. 38. Spectral Game
  39. 39. Increasing Complexity
  40. 40. SpectralGame in the hand
  41. 41. SpectraSchool
  42. 42. SpectraSchool
  43. 43. Recently Added– THANKS ACD/Labs!• Storage and display of ASSIGNED spectra
  44. 44. Access ChemSpider• APIs– Programmatic access used by Mobile Apps, FundedConsortia projects, many Academic groups• Widgets– UI components for embedding in other websites• Data– Data access, downloads, reuse, licensing
  45. 45. Flexible ChemSpider API
  46. 46. Flexible ChemSpider API
  47. 47. Linking Names to Structures
  48. 48. It is so difficult to navigate…What’s thestructure?What’s thestructure?Are they inour file?Are they inour file?What’ssimilar?What’ssimilar?What’s thetarget?What’s thetarget?Pharmacologydata?Pharmacologydata?KnownPathways?KnownPathways?Working OnNow?Working OnNow?Connections todisease?Connections todisease?Expressed inright cell type?Expressed inright cell type?Competitors?Competitors?IP?IP?
  49. 49. • 3-year Innovative Medicines Initiative project• Integrating chemistry and biology data using semanticweb technologies• Open source code, open data and open standards• Academics, Pharma companies, Publishers
  50. 50. ChemSpider Contributions• The host of the chemistry services– Supplier of “standardized” chemical data files– Chemistry searching (structure, substructure etc)– Curator and data quality checking• Presently rolling out the Open PHACTSchemical registration system
  51. 51. • FP7 Initiative. PharmaSea:increasing value and flow inthe marine biodiscoverypipeline (2012-2017)• Improve the quality, volumeand value of active agentsdiscovered in the marineenvironment and increasethe speed at which they canbe delivered
  52. 52. PharmaSea• Dereplication via ChemSpider• Hosting of natural products datasets• Integrated storage of analytical data (ACD/Labs)• Analytical data algorithms & integration– Mass spec searching – predicted fragmentation– NMR feature searching – NMR prediction– Computer-assisted structure elucidation• Integration to ACD/Structure Elucidator
  53. 53. UK Chemical Database Service
  54. 54. Ilab Integration – NMR DB Searching
  55. 55. Ilab Integration – NMR Prediction
  56. 56. National Chemistry Data Repository• Imagine all chemistry related data from allacademic projects in the UK in ONE system• Security model for the data to be embargoed,private or public (available to the entireworld!)• Provide tools for easy data upload, review,automated validation – chemicals, reactions,spectral data, alphanumeric data• Use the data for algorithm training…
  57. 57. In Discussions At Present• Develop the worlds largest online spectroscopydatabase of integrated data• Does ACD/Labs have tools to help?– Automated depositions – Silent Automation– Processing and validation – Spectrus– Databasing – Spectrus DB– Web-based integration into ChemSpider
  58. 58. Where else can we get RICH data?
  59. 59. DERA : Data Enable the RSC Archive• How much data is in the archive, in thepublications and in the supplementary info?– How many compounds for ChemSpider?– How many syntheses for ChemSpider reactions?– How many characterization measurements?• Property Data• Spectral Data• Graphs and charts to be used for modeling?
  60. 60. What if we could capture it all?
  61. 61. The Future of Data• In Publications– Interactive plots, spectra, buy that compound,predict that property– Validation of data going INTO publications – NMRprediction, CASE validation, PhysPropcomparisons• From the lab– How much data NEVER gets published and is stilluseful? Failed Reactions? More Open Data…
  62. 62. Acknowledgements• RSC eScience Team• ACD/Labs – Pranas Japartas and Karim Kassam• GGA – Indigo Toolkit and Bingo Cartridge• The community of depositors• The Open Source Community
  63. 63. Thank youEmail: williamsa@rsc.orgTwitter: @ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: