UK National Chemical Database
Service: An integration of
commercial and public chemistry
services to support chemists in
t...
UK Chemical Database Service
• The National Chemical Database Service is for
UK academics – see later for Rest of World
Vision for the Service PART 1
• Provide access to databases and services of
interest to the academic community to serve
th...
Service Rollout
• Many services are hosted in the cloud
• Access through login/password, IP
authentication or Shibboleth a...
Feedback from Community
• Converted initial public negativity spike on
Twitter pre-release to very positive feedback
post-...
Usage
• Majority of usage is for crystallography data –
previous provider had same bias
• Usage is increasing month-by-mon...
Vision for the Service PART 2
• Response to the call for proposals included
our vision for a 21st Century data repository
...
An Initial “Vague” Vision Set
• Manage “all” of the chemistry data associated
with chemical substances
• Data to be downlo...
Data Repository
• Registration of chemical compounds
• Deposition of chemical syntheses
• Addition of analytical data
• In...
What we will deliver for all data
• Simple interfaces for uploading of data
• Embeddable widgets and programming
interface...
Input data pipeline
Deposition Gateway
Staging
databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds
M...
Compounds upload
• Draw chemicals in the interface (Javascript
editors – PC, Mac, Tablets, Phones)
• Drag and drop of comp...
Depositions Gateway User
Interface
Depositions Gateway User
Interface
Chemical Validation and
Standardization
Reactions
• Hosting of reaction data – standard “document
formats” – full flexibility but limiting – extraction
of data fr...
Electronic Notebook Data
• Development work integrating chemistry into
the Southampton Labtrove notebook
• Stoichiometry t...
Micropublishing Syntheses
ChemSpider SyntheticPages
Requirements
• Community agreement on acceptable
templates for CSSP/Reactions deposition
• Data Model deposition based on ...
What we will deliver
• Micropublishing platform for submission of
• Protocols and Procedures
• Reactions
• Safety and Haza...
Reaction Deposition/Validation
Reaction Deposition/Validation
Spectral Data
• Support for “structure identification” is a must
– “greatest value” for reference and lookup
• Support for...
Raw Spectral Data
10 years from now…
• Binary file formats generally need original
data processing software to deal with them –
from Bruker,...
This is way more useful
Processed data…
Spectral searching is made possible
Spectral matching is possible
This is what we really want…
Addition of Analytical Data
• Spectral Container is in development using
componentized widgets for display
• NIST spectra ...
Javascript viewer NMR, MS, IR
Depositions Gateway User
Interface
Document processing
Depositions Gateway User
Interface
User Interface Approach
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
...
User Interface Approach
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
...
Analytical Chemist
Characterize
Measure
Search
Store
<<include>>
<<include>>
<<include>>
Synthetic Chemist
Search
(synthet...
Medicinal Chemist
Search
(against database of properties)
Source
(find vendor)
Analyse
(cluster, dock, screen)
Computation...
Present activities for ACS Fall
• Deposition process development of compounds,
reactions and spectral data by end of Sprin...
UK Chemical Database Service
• The National Chemical Database Service is for
UK academics
• What would be necessary to mak...
Acknowledgments
• Jeremy Frey and Simon Coles, University of
Southampton
• Will Dichtel and Leah McEwan, Cornell
Universit...
Thank you
Email: williamsa@rsc.org
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists...
Upcoming SlideShare
Loading in …5
×

The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom

3,807 views
3,760 views

Published on

At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,807
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom

  1. 1. UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014
  2. 2. UK Chemical Database Service • The National Chemical Database Service is for UK academics – see later for Rest of World
  3. 3. Vision for the Service PART 1 • Provide access to databases and services of interest to the academic community to serve their needs. Access to services to include: • Crystallography data – Organic and inorganic materials • Thermophysical data • Reactions Data including retrosynthetic analysis • Prediction technologies – name generation, physicochemical parameters, NMR prediction
  4. 4. Service Rollout • Many services are hosted in the cloud • Access through login/password, IP authentication or Shibboleth authentication • Lots of hard work in a very short time – so much thanks to all of the service providers • More providers stepped up to help – ChemAxon • Crystallography concern (understatement!)
  5. 5. Feedback from Community • Converted initial public negativity spike on Twitter pre-release to very positive feedback post-release • Training required – onsite training sessions organized • Available Chemicals Directory is big plus! • Concerns with Retrosynthetic Analysis tool
  6. 6. Usage • Majority of usage is for crystallography data – previous provider had same bias • Usage is increasing month-by-month • Still way-under used and in many cases low awareness
  7. 7. Vision for the Service PART 2 • Response to the call for proposals included our vision for a 21st Century data repository • At a time of Open Access, Open Data and funding agency requirement to make data public – build a data repository • Funding is split for licensing content and services (VAST MAJORITY) and some funding for research and development
  8. 8. An Initial “Vague” Vision Set • Manage “all” of the chemistry data associated with chemical substances • Data to be downloadable, reusable, interactive • Build a platform that enables the scientist • Data storage, validation, standardization and curation • Collaborative data sharing • Provide data platform that can enable and enhance publishing of scientific papers
  9. 9. Data Repository • Registration of chemical compounds • Deposition of chemical syntheses • Addition of analytical data • Integration to electronic notebooks • Rewards and recognition for data sharing • Document processing • Hosting of data as private, embargoed or public
  10. 10. What we will deliver for all data • Simple interfaces for uploading of data • Embeddable widgets and programming interfaces to utilize in in-house systems, ELNs • Automated harvesting approaches – sweeping directories for data • Data validation where possible
  11. 11. Input data pipeline Deposition Gateway Staging databases Compounds Reactions Spectra Materials Articles / CSSP Compounds Module Spectra Module Reactions Module Materials Module Textmining Module Module Web UI for unified depositions DropBox, Google Drive, SkyDrive, etc LabTroveand other templated data Documents API, FTP, etc Raw data Validated data Staging databases Alldatabases are sliced by data sources/data collections and havesimple security model where each data slice/sourceis private, public or embargoed
  12. 12. Compounds upload • Draw chemicals in the interface (Javascript editors – PC, Mac, Tablets, Phones) • Drag and drop of compounds • Automated generate of properties – Formulae, Mw, Mi, physchem properties • Metadata input forms • Bulk upload
  13. 13. Depositions Gateway User Interface
  14. 14. Depositions Gateway User Interface
  15. 15. Chemical Validation and Standardization
  16. 16. Reactions • Hosting of reaction data – standard “document formats” – full flexibility but limiting – extraction of data from embedded objects • Encourage template formats – using ELNs for example, community agreed templates
  17. 17. Electronic Notebook Data • Development work integrating chemistry into the Southampton Labtrove notebook • Stoichiometry table development • Analytical data integration • “ChemTrove” rolled out to a small test group in January
  18. 18. Micropublishing Syntheses
  19. 19. ChemSpider SyntheticPages
  20. 20. Requirements • Community agreement on acceptable templates for CSSP/Reactions deposition • Data Model deposition based on mappings between template and CSSP model • Adoption of Labtrove interface for deposition
  21. 21. What we will deliver • Micropublishing platform for submission of • Protocols and Procedures • Reactions • Safety and Hazard data (LATER) • Template-based submissions of procedures • Matched to ELN submissions • Full details for user submission versus mapped submission into database
  22. 22. Reaction Deposition/Validation
  23. 23. Reaction Deposition/Validation
  24. 24. Spectral Data • Support for “structure identification” is a must – “greatest value” for reference and lookup • Support for data standards primarily – JCAMP, mzML, SPC • Want to support ASSIGNED data formats • Hold binary files but prefer standards – WHY?
  25. 25. Raw Spectral Data
  26. 26. 10 years from now… • Binary file formats generally need original data processing software to deal with them – from Bruker, Agilent, Jeol, Thermo, Waters, blah, blah, blah, blah,… • While we can store the original raw data files for posterity should we? This has been one focus for data repositories
  27. 27. This is way more useful
  28. 28. Processed data… Spectral searching is made possible Spectral matching is possible
  29. 29. This is what we really want…
  30. 30. Addition of Analytical Data • Spectral Container is in development using componentized widgets for display • NIST spectra converted into standardized JCAMP format for deposition - 296,103 spectra deposited • 10% of remaining NIST spectra need to be curated as there are obvious structure issues
  31. 31. Javascript viewer NMR, MS, IR
  32. 32. Depositions Gateway User Interface
  33. 33. Document processing
  34. 34. Depositions Gateway User Interface
  35. 35. User Interface Approach Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
  36. 36. User Interface Approach Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
  37. 37. Analytical Chemist Characterize Measure Search Store <<include>> <<include>> <<include>> Synthetic Chemist Search (synthetic procedure) Document (publish synthetic procedure) Retrosynthetic analysis
  38. 38. Medicinal Chemist Search (against database of properties) Source (find vendor) Analyse (cluster, dock, screen) Computational Chemist Search or Develop algorithm Store results Run calculations Synthesize Measure activity
  39. 39. Present activities for ACS Fall • Deposition process development of compounds, reactions and spectral data by end of Spring • FTP, DropBox, Web-upload, ELN integration • Compounds, Reactions, Spectral data search, display, download • Data sharing – private, public, collaborative • Metadata, metadata, metadata standards! • Open Sourcing Chemical Registry System including CVSP
  40. 40. UK Chemical Database Service • The National Chemical Database Service is for UK academics • What would be necessary to make this available for “Rest of World”, a single institution, an organization? • It’s not really technology…that’s scale out and can be handled • It’s negotiation with database providers, pricing, login/authentication, localization?
  41. 41. Acknowledgments • Jeremy Frey and Simon Coles, University of Southampton • Will Dichtel and Leah McEwan, Cornell University • Stuart Chalk, University of North Florida • Bob Hanson and Bob Lancashire, Jmol and JSpecView
  42. 42. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

×