Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dealing with the complex challenge of managing diverse analytical chemistry data online

4,382 views

Published on

The Royal Society of Chemistry provides open access to data associated with tens of millions of chemical compounds. The richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process delivering a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on the challenges of managing “Big Data” for chemists around the world and providing access to tools for structure dereplication, spectral database searching and the crowdsourcing of the worlds’ largest spectral database.

Published in: Science
  • Be the first to comment

Dealing with the complex challenge of managing diverse analytical chemistry data online

  1. 1. Dealing with the complex challenge of managing diverse analytical chemistry data online Antony Williams, Valery Tkachenko and Alexey Pshenichnov Pittcon March 2015
  2. 2. Free and Easy • Everything I will show in terms of ChemSpider is available for free online today • To make it easy to “take notes” these slides will be available at: www.slideshare.net/AntonyWilliams/
  3. 3. Two posters from this week • Using an online database of chemical compounds for the purpose of structure identification • ChemSpider - building an online database of open spectra
  4. 4. www.ChemSpider.com
  5. 5. ChemSpider
  6. 6. What will ChemSpider give us?
  7. 7. What will ChemSpider give us?
  8. 8. For Mass Spectrometrists • Valuable searches for Mass Spec would be: • Search the database by mass or formula for structure identification • Search subsets of data – e.g. “metabolism”, pesticides etc • Link structure-based data across the internet • Provide “programming interfaces” to integrate • Does ChemSpider provide value to Mass Spectrometrists?
  9. 9. Pre-calculated data
  10. 10. Data Source Selection • >34 million chemicals include • Vendor collections • Government databases • Individual/Lab data • Publication data • All segregated allowing for data source selection
  11. 11. Data Source Selection - Type
  12. 12. Data Source Selection - Individual
  13. 13. Mass Spec Analysis Jim Little, Eastman Chemical
  14. 14. ChemSpider Interface
  15. 15. 1287 Hits Ranked by Defect
  16. 16. 1287 Hits Ranked by # of References
  17. 17. Top Ranked Hit
  18. 18. Tinuvin 328
  19. 19. What can I find on ChemSpider?
  20. 20. What can I find?
  21. 21. What can I find?
  22. 22. Source and Purchase…
  23. 23. What can I find on ChemSpider?
  24. 24. External Calculation Engines
  25. 25. What can I find on ChemSpider?
  26. 26. …and in the RSC Databases..
  27. 27. Linked to the Publisher
  28. 28. What can I find?
  29. 29. And out to Google Patents
  30. 30. What About the Entire Web?
  31. 31. The InChI Identifier
  32. 32. InChIStrings Hash to InChIKeys
  33. 33. Searching Internet by Structure
  34. 34. Extended Study Sorting by references
  35. 35. Position sorted by references
  36. 36. Position 1 only
  37. 37. Web Services For Collaboration • Many instrument vendors are using or investigating our web-based services for compound lookup • Many academic sites integrating directly – metabonomics, name lookup, mass-based searching
  38. 38. Results of the ChemSpider Search in the MarkerLynx Worksheet
  39. 39. Hit Details in ChemSpider
  40. 40. “REAL Spectral Data” • Masses on ChemSpider are clearly valuable! • We’d like to host “spectral curves” • But we’re a publisher so what can we do?
  41. 41. PDF files with images…
  42. 42. ESI – Text Spectra…
  43. 43. We want this…we need YOU!
  44. 44. Spectra: Cholesterol
  45. 45. ChemSpider ID 24528095 H1 NMR
  46. 46. ChemSpider ID 24528095 C13 NMR
  47. 47. ChemSpider ID 24528095 HHCOSY
  48. 48. Managing Assignments?
  49. 49. Visualization of Spectra • We would like to view “interactive spectra”
  50. 50. The challenges of analytical data • Vendors produce complex proprietary data formats – tough to manage but some do it: ACD/Labs, Mestre, others • Standard formats are required (JCAMP, NetCDF, AniML, “Allotrope” in future?)
  51. 51. ACD/Labs
  52. 52. Jmol
  53. 53. ChemDoodle Components
  54. 54. Publications & “Real Spectra” • We are turning text into spectra • We are turning figures into spectra
  55. 55. ESI – Text Spectra
  56. 56. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  57. 57. 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
  58. 58. “Where is the real data please?” FIGURE DATA
  59. 59. Future Developments • We have extracted 100s of 1000s of text strings from patents – next we go into our archive • We estimate many 1000s of figures with spectral data in our ESI and articles • We are aiming for a million spectra online… • But YOU can submit your data today and share it
  60. 60. New Repository Architecture doi: 10.1007/s10822-014-9784-5
  61. 61. Acknowledgments • Jim Little, Eastman Chemical Company • Kevin Thiesen & Rudy Potenzone – ChemDoodle • Daniel Lowe – NextMove Software • Bill Brouwer – Plot2Txt Development • Carlos Cobas and Santi Dominguez – MestreLabs • Bob Hanson - Jmol/JSpecView Javascript version • David Hardy, Patrick Wheeler, Arvin Moser - ACD/Labs
  62. 62. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

×