The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms
Upcoming SlideShare
Loading in...5
×
 

The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms

on

  • 4,380 views

The Royal Society of Chemistry provides access to a number of databases hosting chemicals data, reactions, spectroscopy data and prediction services. These databases and services can be accessed via ...

The Royal Society of Chemistry provides access to a number of databases hosting chemicals data, reactions, spectroscopy data and prediction services. These databases and services can be accessed via web services utilizing queries using standard data formats such as InChI and molfiles. Data can then be downloaded in standard structure and spectral formats allowing for reuse and repurposing. The ChemSpider database integrates to a number of projects external to RSC including Open PHACTS that integrates chemical and biological data. This project utilizes semantic web data standards including RDF. This presentation will provide an overview of how structure and spectral data standards have been critical in allowing us to integrate many open source tools, ease of integration to a myriad of services and underpin many of our future developments.

Statistics

Views

Total Views
4,380
Views on SlideShare
790
Embed Views
3,590

Actions

Likes
0
Downloads
8
Comments
0

11 Embeds 3,590

http://www.chemconnector.com 3084
http://www.chemspider.com 433
http://www.rsc.org 43
http://www.feedspot.com 13
http://www.newsblur.com 9
http://cloud.feedly.com 3
http://127.0.0.1 1
http://translate.googleusercontent.com 1
http://biblioproxy.cnr.it 1
http://phpnode1.rsc-wf.org 1
http://www.ranksit.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms The importance of standards for data exchange and interchange on the Royal Society of Chemistry e science platforms Presentation Transcript

    • The importance of standards for data exchange and interchange on the Royal Society of Chemistry eScience platforms Valery Tkachenko, Colin Batchelor, Jon Steele and Antony Williams* ACS Indianapolis September 12th 2013
    • RSC Projects in Action • Many RSC projects underway, underpinned by ChemSpider, and very dependent on standards • ChemSpider • ChemSpider Reactions • Open PHACTS • PharmaSea • Chemical Database Service • Open Source Drug Discovery
    • • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharmas, Publishers… • To put medicines in the pipeline…
    • Open Source Drug Discovery
    • Compound Data • The standards of chemical structure handling are primarily molfile, SDfile, SMILES, InChI • We primarily depend on molfiles and SDF files for data deposition and interchange • We use InChI a lot – especially for integrated searching across the web • There ARE data interchange problems associated with structures….
    • ChemSpider
    • ChemSpider
    • Exact Search
    • Skeleton Search
    • Compound Data • The standards of chemical structure handling are primarily molfile, SDfile, SMILES, InChI • We primarily depend on molfiles and SDF files for data deposition and interchange • We use InChI a lot – especially for integrated searching across the web • There ARE data interchange problems associated with structures….
    • CVSP : chemical validation Free chemistry validation platform that performs: •Structure validation • Atoms • Bonds • Valence • Stereo • If aromatic - check that uniquely dearomatized • Strongest acid not ionized first in partially-ionized system •Cross-matching of SDF fields • synonyms • InChIs • Smiles
    • Input formats supported: CDX, Mol, Sdf Zip Gz Tab-delimited text files
    • CVSP: standardization modules • Custom processing let’s user to put together workflow from pre-defined standardization modules list
    • Reaction Data • ChemSpider is built for compounds – but how are they made??? • ChemSpider Reactions is our attempt to answer the question.. • Integrating both commercial and open data • RSC Databases, data extracted from our publications on the DERA project and Open Data sources of reactions • Molfiles, CDX files, RXN files
    • RSC and Chemical Reactions
    • RSC and Chemical Reactions
    • RSC Journal Content • Many 10s/100s of thousands of reactions contained in our journals • Electronic Supplementary information data contains lots more
    • ChemSpider Reactions
    • ChemSpider Reactions
    • ChemSpider SyntheticPages
    • Spectral Data • ChemSpider requires spectral data to be deposited in standard formats – JCAMP or images • All spectra available at: http:// www.chemspider.com/spectra.aspx • Data are deposited on a regular basis • Students • Chemical vendors • Growing collection now
    • Student Submissions
    • JCAMP NMR Spectra
    • Data on ChemSpider
    • Data Interchange
    • JCAMP file downloads • When NMR spectra are stored as JCAMP then downloads into offline packages are feasible – MestreLabs, ACD/Labs etc • Open Data – download versus view • Store spectra locally and reuse • Java is increasingly a pain! • Need to move to HTML5 viewing on ChemSpider, especially for Mobile Viewing
    • Spectral Display in the hand
    • Challenges with Spectra • JCAMP is good for a lot of spectral data – IR, Raman, 1D NMR • MS data is rarely made available in JCAMP • We would love a ratified JCAMP 6.0 for 2D data exchange – allows third parties to build support for download • ASSIGNED JCAMP spectra can be supported but no real standards here
    • …and images
    • DERA to digitize documents? • We want to get data out of our historical archive • What could we do? • Find chemical names and generate structures • Find chemical images and generate structures • Find reactions – and make a database! • Find data (MP, BP, LogP) and deposit • Find figures and database them • Find spectra (and link to structures)
    • Text-Mining
    • ESI – Text Spectra
    • Do we want to search text spectra? What do we get when we search: 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
    • MestreLabs Mnova NMR Beta
    • 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
    • ESI – Text and Image Spectra
    • Extracted JCAMP Spectrum
    • Prepare CONSISTENT JCAMP
    • Data onto ChemSpider
    • It’s exactly the WRONG WAY! • We should NOT be mining data out of future publications • Structures should be submitted “correctly” • Spectra should be digital spectral formats, not images • ESI should be RICH and interactive
    • APIs and Standards • We follow the standard expectations in terms of how people would want to access our APIs: RESTful services, JSON handling etc. • We allow people to pass in queries using molfiles, SMILES, InChI/Keys etc • Future will include JCAMP searching • APIs in use by MANY organizations and of value to our Open PHACTS, PharmaSea, Chemical Database Service etc. Also Mobile
    • Conclusions • Data Interchange standards are all over our projects! • We are grateful to companies, organizations, contributors who have helped define: • Structure – Mol,SDF,InChI etc • Spectra – JCAMP, SPC, NetCDF etc • W3C standards
    • For the Next ACS hopefully… • Build out our ChemSpider Reaction collection • Grab spectral data out of our ESI! • Get more submissions in STANDARD formats • Integrate to spectroscopy handling systems for deposition in JCAMP • Push molfiles directly into ChemSpider with improved deposition platform • Build out the chemical data repository…
    • Thank You Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams