Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ChemSpider – Building an Online Database of Open Spectra
Antony J. Williams1,Valery Tkachenko1,Alexey Pshenichnov1, Daniel...
Upcoming SlideShare
Loading in …5

ChemSpider - building an online database of open spectra


Published on

ChemSpider is an online database of over 30 million chemical compounds sourced from over 500 different sources including government laboratories, chemical vendors, public resources and publications. Developed with the intention of building community for chemists ChemSpider allows its users to deposit data including structures, properties, links to external resources and various forms of spectral data. Over the past few years ChemSpider has aggregated almost 20000 high quality NMR and IR spectra and continues to expand as the community deposits additional types of data. The majority of spectral data is licensed as Open Data allowing it to be downloaded and reused in presentations, lesson plans and for teaching purposes. This poster will present our existing technology and our plans to host a million spectra in our developing online data repository.

Published in: Science
  • Be the first to comment

ChemSpider - building an online database of open spectra

  1. 1. ChemSpider – Building an Online Database of Open Spectra Antony J. Williams1,Valery Tkachenko1,Alexey Pshenichnov1, Daniel Lowe2, Carlos Coba3, Kevin Theisen4 and Rudy Potenzone4 1. Royal Society of Chemistry 2. NextMove Software 3. Mestrelab Research 4. iChemLabs LLC Introduction: ChemSpider is an online database of over 30 million chemical compounds from >500 different sources including chemical vendors, online public resources and publications. ChemSpider allows deposition of data including structures, properties, and various forms of spectral data. One activity of the project is to host a searchable database of 1D/2D NMR, IR, Raman and Mass Spectral data. ChemSpider has over 20000 spectra and expands as the community deposits additional data. Sources of Spectral Data: The majority of data are deposited by users of ChemSpider. Submission of spectra in the form of JCAMP-DX, or images/PDF (for all spectra but especially for 2D NMR) are supported. Community-based curators will validate and annotate the data to ensure that only the highest quality data are available on the database. To create a large NMR database we are using “text-mining” to extract spectral data, together with their associated chemical compounds, then simulating visual forms of the spectra,. We have text-mined a large patent corpus to extract many hundreds of thousands of NMR spectra to produce visual depictions as shown in Figure 1. Text mined spectra are of the form: 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH) Figure 1: A spectral depiction from converting the text-mined spectrum above. This can be stored in JCAMP to build a spectral database. Spectral Visualization: Spectra are viewed inside the JSpecView spectral display widget1. Zooming, scrolling and integration are possible. 2DNMR spectra are viewed only as images. Figure 2: The JSpecView spectral viewing applet supports viewing JCAMP spectra of 1D NMR, IR, UV-Vis and Mass Spectrometry data. Spectroscopic techniques produce NMR and IR vibrational assignments, and mass fragment peaks. We are now working with iChemLabs HTML5 widgets2 for the display of assignments. Figure 3: Assignments of spectral- structure associations. Selecting the peak at 7.5ppm highlights the protons on the molecule. The assignments are contained in the JCAMP spectral format. Future Directions: We intend to continue to grow the spectral database by encouraging further depositions from the community as well as investigating the possibility of converting spectral figures to spectral data to host in the database. References 1)JSpecView Project: an Open Source Java viewer and converter for JCAMP-DX, and XML spectral data files,http://www.journal.chemistrycentr 2)iChemLabs Web Components Spectrum Structure Correlations: