Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics
Upcoming SlideShare
Loading in...5
×
 

Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

on

  • 524 views

 

Statistics

Views

Total Views
524
Views on SlideShare
515
Embed Views
9

Actions

Likes
0
Downloads
12
Comments
0

3 Embeds 9

http://www.linkedin.com 4
https://twitter.com 3
https://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics Presentation Transcript

  • Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune, India February 3rd 2014
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • Science map
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • Chemical space - 1060
  • Navigation in chemical space
  • Navigation in chemical space
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • Structure-based Drug Design
  • Structure-based Drug Design
  • Ligand-based Drug Design
  • Ligand-based Drug Design
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • Machine learning
  • Applied machine learning
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • • • • • ~30 million chemicals and growing Data sourced from >500 different sources Crowdsourced curation and annotation Ongoing deposition of data from our journals and our collaborators • A structure centric hub for web-searching
  • ChemSpider
  • ChemSpider
  • Properties - experimental
  • Properties - ACDLabs
  • Properties – EPI Suite
  • Properties - ChemAxon
  • Literature references
  • Patents references
  • Books
  • Classification
  • Chemical vendors and datasources
  • Multimedia
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • ChemSpider Reactions
  • ChemSpider Reactions
  • ChemSpider Reactions
  • ChemSpider Reactions
  • ChemSpider Spectra
  • ChemSpider Spectra
  • ChemSpider Databases ChemSpider Compounds ChemSpider Reactions ChemSpider Spectra ChemSpider Crystals ChemSpider Materials ChemSpider Assays ChemSpider Algorithms
  • Research data inflow All databases are sliced by data sources/data collections and have simple security model where each data slice/source is private, public or embargoed Web UI for unified depositions Compounds Deposition Gateway Reactions API, FTP, etc DropBox, Google Drive, SkyDrive, etc LabTrove and other templated data Compounds Module Raw data Reactions Module Spectra Module Materials Module Textmining Module ͙ Module Staging databases Staging databases Validated data Spectra Materials Documents Articles / CSSP
  • Research data outflow User interface tier (examples) Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Electronic Laboratory Notebook Analytical Laboratory application User interface components tier Data access tier Chemical Inventory application Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Compounds API Reactions API Spectra API Materials API Documents API Compounds Reactions Spectra Materials Documents Data tier
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • RSC Archive – since 1841
  • DERA Digitally Enabling RSC Archive
  • Semantic mark-up of articles
  • It is so difficult to navigate… IP? IP? What’s the What’s the structure? structure? Are they in Are they in our file? our file? What’s What’s similar? similar? Pharmacology Pharmacology data? data? What’s the What’s the target? target? Known Known Pathways? Pathways? Competitors? Competitors? Connections Connections to disease? to disease? Working On Working On Now? Now? Expressed in Expressed in right cell type? right cell type?
  • Data quality issue and CVSP – Robochemistry – Proliferation of errors in public and private databases – Automated quality control system
  • DrugBank dataset (6516 records) J. Brechner, IUPAC Graphical Representation of stereochem. configurations Section: ST-1.1.10 DB06287
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • Research data management Scientists Funding bodies External clients Publishers Indexes Data Repository indexed storage Chemically intelligent services Data Data Repository provided data storage University 1 University 2 Data Hub Workstations Company 3 Data Hub Workstations Data Hub Workstations
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • Crowdsourcing
  • AltMetrics
  • RSC/Rewards and Recognition The First Step badge is awarded when a user submits (& has published) their 1st CSSP article. Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Visualization and navigation Building Global Chemistry Network
  • Visualization
  • Visualization and navigation
  • Visualization and navigation
  • Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  • We are a part of a larger world
  • ChemSpider APIs
  • National Chemistry Database
  • http://www.openphacts.org Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to drug discovery in industry, academia and for small businesses. Semantic web is one of the corner stones
  • OSDD
  • Thank you Email: tkachenkov@rsc.org Slides: http://www.slideshare.net/valerytkachenko16