The rsc e science - reflecting the change in the world we live in
Upcoming SlideShare
Loading in...5
×
 

The rsc e science - reflecting the change in the world we live in

on

  • 513 views

 

Statistics

Views

Total Views
513
Views on SlideShare
513
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The rsc e science - reflecting the change in the world we live in The rsc e science - reflecting the change in the world we live in Presentation Transcript

  • The RSC & e-Science: Reflecting the Change in the World we Live In Valery Tkachenko RSC-OSDD Consultative Workshop on Cheminformatics Delhi, September 28th 2013
  • Royal Society of Chemistry and Global Chemistry Network
  • The World we live in Internet World 20+ years into the Internet Revolution Web 2.0 -> Web 3.0 Connected World Social Networks Real-time Communications Big Data World Semantic content New Interfaces
  • Pillars of the World Data Data (knowledge) is a King Dataflow Navigation Domain-specific search and navigation Navigate inside and link out - federation Interfaces HCI (human computer interface) M2M (machine to machine)
  • Science map
  • Chemical sciences map
  • Chemistry on the Internet
  • What’s wrong?!?! Complexity
  • Royal Society of Chemistry and Global Chemistry Network
  • Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  • Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  • 50000ft view at STM publisher Knowledge Our User Interfaces (Desktop, Web, Mobile, etc) Customers Delivery Magic 3rd party integrations (our web services)
  • ChemSpider Suite Data Layer ChemSpider Assays ChemSpider Compounds ChemSpider Reactions ChemSpider Spectra ChemSpider Materials ChemSpider Algorithms Business Objects Layer CSAs BOCSC BO CSR BO CSS BO CSMBO CSABO APIs Layer DS APIExport APISearch API Processing API CSAs APICSC API CSR API CSS API CSMAPI CSAAPI Components Layer JS Components Google Apps Components Python widgets SharePoint Components PHP snippets ASP.NET Components UIs ChemSpider website ChemSpider Reactions mobileweb app ChemSpider desktop app Depositions client Java Beans
  • • 29 million chemicals and growing • Data sourced from >500 different sources • Crowdsourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • A structure centric hub for web-searching
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • ChemSpider and Atovaquone
  • Micropublishing
  • Micropublishing
  • Micropublishing
  • ChemSpider Reactions
  • ChemSpider Reactions
  • Knowledge in our own archives
  • DERA and Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6, thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer, thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring, for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N- methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N- methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • It is so difficult to navigate… What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
  • Digitally Enabling RSC Archive Text, PDF, XML Structures Reactions Spectra Materials Chemistry Validation and Standardization Platform (CVSP) DERA (Text Mining) Biological Activities
  • Data quality issue and CVSP Robochemistry Proliferation of errors in public and private databases Automated quality control system
  • ChemSpider issues
  • DrugBank dataset (6516 records) ~60 records that can’t be dearomatized unambiguously DB04283 DB04462
  • ~30 records with bonds that do not make sense DB04283 DDB04009
  • 2 records where Smiles, InChI, and name did not match the structure DB00611 DB01547
  • ~40 records where InChIs did not match the structure DrugBank ID: DB00755 InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)11-12-18-17(3)10-7-13- 20(18,4)5/h6,8-9,11-12,14H,7,10,13H2,1-5H3,(H,21,22)/b9-6+,12-11+,15-8+,16-14+ DruGBank ID: DB00614
  • DB08128 J. Brechner, IUPAC Graphical Representation of stereochem. configurations Section: ST-1.1.10 DB06287 7 records with 2 stereo bonds at chiral atoms
  • CVSP validation of ChEMBL 16 (~1.3 mln. records) • Overall 0.7% of records had validation issues • Stereo problems (~82%) • Directions of bonds do not make sense (~63%) • Ambiguous stereo : 2 stereo bonds at chiral center (~19%)
  • “Direction of bond makes no sense” – 63%
  • “Stereo types of the opposite bonds mismatch” -15% http://www.iupac.org/publications/pac/2006/pdf/7810x1897.pdf
  • “Stereo types of non-opposite bonds match” – 2%
  • “atom not recognized” – 3% isotopes Should be atom from periodic table No mass difference in atom line No “M ISO” in connection table In molfile:
  • ChemSpider Suite Data Layer ChemSpider Assays ChemSpider Compounds ChemSpider Reactions ChemSpider Spectra ChemSpider Materials ChemSpider Algorithms Business Objects Layer CSAs BOCSC BO CSR BO CSS BO CSMBO CSABO APIs Layer DS APIExport APISearch API Processing API CSAs APICSC API CSR API CSS API CSMAPI CSAAPI Components Layer JS Components Google Apps Components Python widgets SharePoint Components PHP snippets ASP.NET Components UIs ChemSpider website ChemSpider Reactions mobileweb app ChemSpider desktop app Depositions client Java Beans
  • Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  • Started with 2 servers in a basement Presently – two farms ~40 servers each Future – in the Clouds
  • Compute intensive calculations Delivery systems
  • Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  • AltMetrics
  • Curation in ChemSpider
  • Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  • Visualization
  • Navigation
  • ChemSpider APIs
  • We are a part of a larger world
  • National Chemistry Database
  • National Data Repository University 1 Data Hub Workstations University 2 Data Hub Workstations Company 3 Data Hub Workstations Data Repository indexed storage Data Repository provided data storage Chemically intelligent services Indexes Data External clients Publishers Scientists Funding bodies
  • http://www.openphacts.org Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to drug discovery in industry, academia and for small businesses. Semantic web is one of the corner stones
  • What does e-Science do in ? ChemSpider provides many of the physicochemical properties within the Open PHACTS Discovery Platform e-Science develop tools to check and standardise chemical structures • • e-Science is creating the Open PHACTS chemical registration system •
  • RDF Export Data: ChEMBL HMDB DrugBankChemistry Validation and Standardization Platform (CVSP) at cvsp.chemspider.com •Validation •Standardization •Parent generation •Run on Hadoop-based farm
  • We know about Natural Products
  • Marinlit
  • OSDD
  • The Global Chemistry Network