ChemSpider as a hub for online chemical information resources


Published on

The World Wide Web continues to have an expanding and profound effect on providing access to chemical information. A chemist may wish to know a variety of information about a given chemical compound including physical and chemical properties, molecular structure, spectral data, synthetic methods, known reactions, safety information, and systematic nomenclature and chemical names. In the past, having access to this variety of information required a small library of different reference works, since no one resource contained all this data. This was problematic both in terms of cost and physical space for storage. Now there is a single web site that not only provides all this information for millions of compounds but also is free. This website is the Royal Society of Chemistry’s ChemSpider

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

ChemSpider as a hub for online chemical information resources

  1. 1. ChemSpider: A Hub for Online Chemical Information ResourcesAntony Williams**ChemSpider, Royal Society of Chemistry, U.S. Office: Wake Forest, NC-27587E-mail: williamsa@rsc.org1. Internet-based chemistryThe World Wide Web continues to have an expanding and profound effect on providing access tochemical information. A chemist may wish to know a variety of information about a givenchemical compound including physical and chemical properties, molecular structure, spectraldata, synthetic methods, known reactions, safety information, and systematic nomenclature andchemical names. In the past, having access to this variety of information required a small libraryof different reference works, since no one resource contained all this data. This was problematicboth in terms of cost and physical space for storage. Now there is a single web site that not onlyprovides all this information for millions of compounds but also is free. This website is the RoyalSociety of Chemistry’s ChemSpider [1, 2].2. ChemSpiderAs a cheminformatician interested in integrating together large amounts of data, specificallystructure-based data, spectral data and large quantities of physicochemical data, the author,together with a number of software developers decided to pursue the challenge of integratingtogether web-based chemistry data. Using a nominal infrastructure of just three computer serversand developing bespoke software using Microsoft technologies (specifically a .NET architectureusing a SQL server database) ChemSpider was released to the community as a platformcontaining >10.5 million unique chemical structures sourced from the PubChem database [3]integrated to a small number of online resources. The original system included both structure andrudimentary substructure searching. Within a few months of release the ability for users toregister and upload chemical compounds and annotate and curate data was introduced. Theamount of data online continued to grow with depositions from chemical vendors and otheronline chemical databases and reached around 20 million chemicals. Within a period of threeyears the ChemSpider platform had developed a significant level of popularity with thecommunity and was acquired by the Royal Society of Chemistry [4].Today ChemSpider is a free, online chemical database offering access to physical and chemicalproperties, molecular structures, spectral data, synthetic methods, safety information, andnomenclature for over twenty six million unique chemical compounds, sourced and linked out toalmost four hundred separate data sources on the web. ChemSpider is fast becoming the primarychemistry internet portal and it can be very useful for both chemical teaching and research.ChemSpider is not just a search engine layered on terabytes of chemistry data but is also acrowdsourcing community for chemists. Registered users can enter information and annotate andcurate the records. The requirement to register and login is to prevent anonymous acts of
  2. 2. vandalism. The chemical community has been forthcoming in adding information including newchemical structures, associations between structures and publications, addition of analytical datasuch as spectra and the curation of chemical identifiers and property data.ChemSpider has been described as the Google for Chemistry and a Wikipedia for chemists. Byaggregating data and linking it together using a chemical structure as the primary record in thedatabase, ChemSpider has been able to link together Wikipedia [5], PubChem [6], ChEBI(Chemical Entities of Biological Interest) [7] and KEGG (The Kyoto Encyclopedia of Genes andGenomes) [8], chemical vendors, a patent database, and both open and closed access chemistryjournals. Where possible, each chemical record retains the links out to the original source of thematerial thereby associating a microattribution. These links allow a ChemSpider user to sourceinformation of particular interest, including where to purchase a chemical, as well as toxicity andmetabolism data and so on. Aggregating that level of connected information via a classical searchengine, like Google, would be very time consuming.ChemSpider has a number of advantages over a simple Google search. The variety of informationabout a compound provided at ChemSpider is hard to match on any other free web site. The datacontinue to be validated, updated and expanded by practicing chemists. ChemSpider provideslinks to many other online sources for further information. This plethora of links now includesGoogle Books, Scholar and Patents, Microsoft Academic Search, RSC Databases, Books andPublishing website and an ever-increasing number of government, commercial and academicdatabases.Figure 1: The header of the chemical record for Domoic Acid( in ChemSpider. The entire record spans multiple pagesincluding links to patents and publications, pre-calculated and experimental properties and linksto many data external data sources and informational websites.
  3. 3. ChemSpider aggregated over 25 million unique chemical entities in just over 3 years. Newadditions to the database are made daily especially since it is now integrated to the RSCpublishing process whereby new compounds identified in prospected RSC articles are depositedand released to the community as the article is published. Many of the compounds in the currentdatabase have already been curated, and the process is ongoing. In comparison the ChemicalAbstracts Service (CAS), which has been in the business of aggregating chemistry-related datafor over a century in order to create the CAS registry, recorded its 50 millionth chemical structurein 2009 [9].Searching the web using classical search engines is less useful than ChemSpider since theseservices do not provide structure-based searching of the internet nor do they systematicallyorganize data curation. The closest comparison in terms of validated and crowdsourcedcontributions to the domain of chemistry are the chemical pages in Wikipedia; however,Wikipedia has information on far fewer compounds and supports only text searching not structuresearching.The ChemSpider “web services” provide programmatical access to ChemSpider and allows forinstrument vendors to utilize the data for the purpose of structure identification. This opportunityin particular is being used for the purpose of compound identification by mass spectrometry [10].The data are also available to the Open PHACTS project [11], a project funded by the InnovativeMedicines Initative [12], and ChemSpider is one of the key particpants in the project. AsChemSpider continues to expand in scope, capabilities and data the site is likely to become thedominant free online resource for chemists especially as it supports a number of additionalprojects as discussed below.3. Synthetic Reactions on ChemSpiderThe recently added ChemSpider SyntheticPages [13] provides a source of online data regardingchemical synthesis procedures. This database is created by the community, for the community.Chemists populate the online database with one or more of their chemical reactions outlining howto perform a reaction. ChemSpider SyntheticPages grows as the community continues tocontribute content. What type of reactions suit? The reactions could be for a new compound or aknown compound from the literature or from an authors’ own publications. Also, it does notmatter if a similar prep is already in the database. There is a benefit to submitting as early stageresearchers should realise that potential employers have free and direct access to examples oftheir work, including the time-consuming "starting material" preps that perhaps did not make itinto the papers or thesis. It is fast to submit an article - certainly less than an hour from start tofinish, and probably a lot less if the author already has the text in electronic format for a report.The kudos of being a part of a database hosted by the RSC should not be underestimated and theissuance of a permanent digital object identifier (DOI) link provides curriculum vitae value. Thevalue of the database will grow exponentially with an increasing number of pages covering anincreasingly broad array of chemical syntheses.
  4. 4. Figure 2: A ChemSpider SyntheticPages article regarding a hydrogenation process4. Making Chemistry MobileAs there has been an unprecedented growth in new ways to access online information usingmobile devices [14, 15] (for example, iPhones and iPads using the iOS operating system andAndroid devices) it made sense to deliver access to ChemSpider and its related projects on suchplatforms. Initially the ChemMobi [16] application from Symyx (now part of Accelrys) wasdeveloped using the ChemSpider web services. This was soon followed by mobile websitesversions of both ChemSpider and ChemSpider SyntheticPages. Numerous other iOS apps thenmade use of the web services. The Royal Society of Chemistry contracted the development of aChemSpider Mobile app [17] and it has since been downloaded many thousands of times andruns on both iPhone and iPad.
  5. 5. Figure 3: The ChemSpider website optimized for mobile devices. These screen captures obtainedfrom an iPhone.5. Additional projects integrating ChemSpiderAn increasing array of projects are now being supported by ChemSpider as they serve up contentvia the programming interface. ChemSpider is already becoming an important resource forteaching, learning, and research. Specifically, the spectroscopic data, over 3000 spectra in total,are the basis for the Spectral Game, which has already been used by over 10000 students [18].This game allows students to learn how to interpret NMR spectra by validating either H1 or C13spectra against two or more structures. Increasing in complexity as the game progresses byincreasing from 2 to 5 structures to choose from to match with the spectrum, the game has beenplayed by thousands of students from almost a 100 different countries.Other RSC resources have recently been unveiled utilizing integration to ChemSpider data. Theseinclude the Learn Chemistry Wiki [19] and SpectraSchool [20] to help in the education ofsecondary school children. Since ChemSpider offers unrivalled online access to chemistry datavia application programming interfaces such projects will continue to expand in scope andcapabilities.
  6. 6. Figure 4: The Learn Chemistry wiki: a wiki environment utilizing ChemSpider data on itscompound pages.ConclusionChemSpider is presently one of the richest sources of chemistry data available online. It has beenrecognized with a number of awards in 2010 including the Bio-IT Best Practices Award forcommunity service [21] and the ALPSP [22] and i-Expo [23] awards for innovation. TheChemSpider database is the foundation platform for a series of related websites and applicationsand presently serves many hundreds of thousands of requests every day. ChemSpider is likely toincrease in prominence and impact in the coming years as the quantity of data grows and thediversity of integrated data sources increases.References1. Pence, H.E. and A.J. Williams, ChemSpider: An Online Chemical Information Resource. J. Chem. Educ., 2010. 87(11): p. 1123-1124.2. ChemSpider. Available from: Wang, Y., et al., PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res, 2009. 37(Web Server issue): p. W623-33.4. Royal Society of Chemistry acquires ChemSpider. September 22nd 2011]; Available from: Wkipedia Home Page. 2010 [cited 2010 May 12,]; Available from: PubChem Home Page. 2010 [cited 2010 May 12]; Available from: ChEBI Home Page. 2010 [cited 2010 May 12]; Available from:
  7. 7. 8. KEGG Home Page. 2010 [cited 2010 May 12]; Available from: Available from: Little, J.L., et al., Identification of "Known Unknowns" Utilizing Accurate Mass Data and ChemSpider. J Am Soc Mass Spectrom, 2011.11. OpenPHACTS Project. 2011 [cited 2011 October 31st 2011]; Available from: Kamel, N., et al., The Innovative Medicines Initiative (IMI): a new opportunity for scientific collaboration between academia and industry at the European level. Eur Respir J, 2008. 31(5): p. 924-6.13. ChemSpider Synthetic Pages. Available from: Williams, A.J., et al., Mobile Apps for chemistry in the world of drug discovery. Drug Disc Today, 2011. 16(21-22): p. 928-939.15. Williams, A.J. and H.E. Pence, Smart Phones, a Powerful Tool in the Chemistry Classroom. J Chem Educ, 2011. 88: p. 683-686.16. ChemMobi. Available from: ChemSpider Mobile, [cited 2011 January 4th]; Available from: Bradley, J.C., et al., The Spectral Game: leveraging Open Data and crowdsourcing for education. J Cheminform, 2009. 1(1): p. 9.19. Learn Chemistry Wiki. [cited 2011 January 4th]; Available from: SpectraSchool. [cited 2011 January 4th ]; Available from: Williams, A.J. ChemSpider wins Bio-IT Best Practices Award for Community Service. 2010; Available from: practices-award-for-community-service.html.22. ChemSpider wins the APSP Publishing Innovation Prize. 2010; Available from: prize.html.23. ChemSpider wins "Most Innovative Software" Award. 2010; Available from: award.html.