Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using an online database of chemical compounds for the purpose of structure identification


Published on

The Royal Society of Chemistry provides open access to data associated with tens of millions of chemical compounds. The richness and complexity of the data has continued to expand dramatically and the original vision for providing an integrated hub for structure-centric data has been delivered across the world to hundreds of thousands of users. With an intention of expanding the reach to cover more diverse aspects of chemistry-related data including compounds, reactions and analytical data, to name just a few data-types, we are in the process delivering a Chemistry Data Repository. The data repository will manage the challenges of associated metadata, the various levels of required security (private, shared and public) and exposing the data as appropriate using semantic web technologies. Ultimately this platform will become the host for all chemicals, reactions and analytical data contained within RSC publications and specifically supplementary information. This presentation will report on the challenges of managing “Big Data” for chemists around the world and providing access to tools for structure dereplication, spectral database searching and the crowdsourcing of the worlds’ largest spectral database.

Published in: Science
  • Be the first to comment

Using an online database of chemical compounds for the purpose of structure identification

  1. 1. Using an Online Database of Chemical Compounds for the Purpose of Structure Identification Antony J. Williams, Valery Tkachenko and Alexey Pshenichnov Royal Society of Chemistry Introduction: Spectroscopy has a primary role in identifying chemicals. Verification is when a suggested chemical structure is confirmed to be consistent with the analytical data. This work reports on how a public compound database can be used for identification. The ChemSpider Database: ChemSpider ( is an online database of >30 million compounds from >500 different sources including vendors, public resources and publications. For each compound in the database masses and molecular formulae are generated. The ability to search the database by mass, and specifically monoisotopic mass, can provide selected hits from the database. The ability to search on adducts and included/excluded elements is also allowed. The search interface is shown below. Searching for “Known Unknowns” ChemSpider has been used1 to identify “known unknowns”, compounds that may be unknown to the researcher but known in the scientific literature/patents. Using a monoisotopic mass search of ChemSpider, and sorting search results by the number of associated references was shown useful in bringing the most likely candidates to the top of the list. Following a mass search hits can be ordered by selecting the desired field. In the example below, initial results were sorted in descending order by the number of references. Little et al. showed this approach applied to metabolites, pesticides, drugs etc. as represented in Table 1. ChemSpider Programming Interfaces A number of MS instrument vendors interrogate the ChemSpider database for compounds. Companies including Agilent, Bruker, Thermo, Waters and others have tested the integration. ChemSpider data sources are segregated into types such as metabolites, safety data, patents etc. The programming interface allows a list of data sources to be passed in the query. If a biofluid sample is being examined then only metabolite data sources should be searched. A research study utilizing ChemSpider identifying pesticides in water2 has been reported. References 1) Identifying “Known Unknowns” Using ChemSpider, J. Little, A.J. Williams and V. Tkachenko, ASMS Conference, 2011 2) The Detection, Identification, and Structural Elucidation of Unknown Contaminants During ToF Screening for Pesticides in River Water E. Riches, K. Worrall and H. Major,