RSC as an Intermediary• We contribute to the science community as areporter and distributor of scientific content, as anetworker of scientists and the community, andas an innovator• We facilitate access to– Data– Technology– Communities
RSC InnovationAn example from eScience…• Development of a compound centric data hub:– Aggregation of publicly accessible data sources– Exposing chemistry data directly from RSC articles– Crowd sourced deposition– Access to data and services for the community viasoftware programming interfaces
PublicationsPublications summarisedata acquisition, analysisand conclusions.– Much detail in the data– Improved navigationshould include data access– Reanalysis and reuse ofdata is limited in PDFs
ChemSpider• >28.5 million unique chemicals updated daily• Aggregated from >450 data sources• Focus on improving data quality, enhancingfunctionality, integrating and enabling• Facilitating access to “data on the web” ratherthan abstracting
Building a Chemical Data Repository• Scientific publications are a summary of work– Is all work reported? Make Supp Info morevaluable– What of value sits in notebooks and is lost?• How much data is lost?– How many compounds never reported?– How many syntheses fail or succeed?– How many characterization measurements?• The need to support “micropublications”
A Data Repository as a Foundation• Provide access to data and services:– Funded Consortia projects access the system– Billion dollar organizations integrate –• Agilent, Bruker, Thermo, Waters– Software systems proliferate the data– Mobile Apps delivered– Data access, downloads, reuse, licensing
• FP7 Initiative. PharmaSea:increasing value and flow inthe marine biodiscoverypipeline (2012-2017)
PharmaSea• Dereplication via ChemSpider• Segregation of natural products datasets• Analytical data algorithms & integration– Computer-assisted structure elucidation
It is so difficult to navigate…What’s thestructure?What’s thestructure?Are they inour file?Are they inour file?What’ssimilar?What’ssimilar?What’s thetarget?What’s thetarget?Pharmacologydata?Pharmacologydata?KnownPathways?KnownPathways?Working OnNow?Working OnNow?Connections todisease?Connections todisease?Expressed inright cell type?Expressed inright cell type?Competitors?Competitors?IP?IP?
• 3-year Innovative Medicines Initiative project• Integrating chemistry and biology data using semanticweb technologies• Open source code, open data and open standards• Academics, Pharma companies, Publishers….
Chemical DatabaseService• National Chemical Database Servicefor UK Academics• Integrating Commercial Databasesand Services• Chemicals, analytical data,prediction algorithms• Development of data repository
Community Repository for Data• Funding agencies encourage sharing of data• Increasing availability of “Open Data”• Develop a community repository for chemistrydata – private, public, embargoed• National services feeding the repository –crystallography, mass spectrometry• Integrate to social networking tools for chemistry• Integrate to Electronic Lab Notebooks
Model Building with Community Data• Community data as a basis of model building– Consume data from available databases, communitydata, new publications and build predictivealgorithms for the community– How many algorithms are reported and lost? Howmuch repeat work is done in the domain ofalgorithmic development?
Encourage Participation withRewards and Recognition
RSC Emerging Technologies Competition• “…support research intensive small companiesand academics on the path to commercialisingtechnology in the chemical sciences”• The Prize– Bespoke programme of mentoring and support– introduction to investors– Marketing and PR
2013 WINNERS: University of Oxford, AQDOT and Biogelx