Be the first to like this
The Royal Society of Chemistry hosts large scale data collections and provides access to the data to the chemistry community. The largest RSC data set of wide scale interest to the community offers access to tens of millions of compounds. The host platform, ChemSpider, is limited as it is a structure centric hub only. A new architecture, the RSC data repository, has been developed that extends support to reactions, spectral data, crystallography data and related property data. It is also the architecture underlying a series of exemplar projects for managing data for a number of diverse laboratories. The adoption of data standards for the integration and distribution of data has been essential. Specific standards include molecular structure formats such as molfiles and InChIs, and spectral data formats such as JCAMP. This presentation will report on our development of the data repository, the importance of utilizing standards for data integration, the flexible nature of the architecture to deliver solutions for various laboratories and our efforts to develop new large data collections. This includes text-mining efforts to extract large spectrum-structure collections from large corpuses.