0
Big Data Supporting Drug Discovery
Cautionary Tales from the World of Chemistry
for Translational Informatics
Valery Tkach...
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
Science map
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
Chemical space - 1060
Navigation in chemical space
Navigation in chemical space
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
Structure-based Drug Design
Structure-based Drug Design
Ligand-based Drug Design
Ligand-based Drug Design
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
Machine learning
Applied machine learning
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
•
•
•
•

~30 million chemicals and growing
Data sourced from >500 different sources
Crowdsourced curation and annotation
O...
ChemSpider
ChemSpider
Properties - experimental
Properties - ACDLabs
Properties – EPI Suite
Properties - ChemAxon
Literature references
Patents references
Books
Classification
Chemical vendors and datasources
Multimedia
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
ChemSpider Reactions
ChemSpider Reactions
ChemSpider Reactions
ChemSpider Reactions
ChemSpider Spectra
ChemSpider Spectra
ChemSpider Databases
ChemSpider Compounds
ChemSpider Reactions
ChemSpider Spectra
ChemSpider Crystals
ChemSpider Materials...
Research data inflow
All databases are
sliced by data
sources/data
collections and
have simple
security model
where each d...
Research data outflow
User
interface tier
(examples)

Paid 3rd party integrations (various platforms – SharePoint, Google,...
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
RSC Archive – since 1841
DERA Digitally Enabling RSC Archive
Semantic mark-up of articles
It is so difficult to navigate…
IP?
IP?
What’s the
What’s the
structure?
structure?
Are they in
Are they in
our file?
our ...
Data quality issue and CVSP
– Robochemistry
– Proliferation of errors in public and
private databases
– Automated quality ...
DrugBank dataset (6516
records)

J. Brechner, IUPAC
Graphical Representation
of stereochem.
configurations
Section: ST-1.1...
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
Research data management
Scientists

Funding bodies

External clients

Publishers
Indexes
Data Repository
indexed storage
...
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
Crowdsourcing
AltMetrics
RSC/Rewards and Recognition
The First Step badge is
awarded when a user
submits (& has published)
their 1st CSSP article.
...
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
Visualization
Visualization and navigation
Visualization and navigation
Big Data
Chemical Space
Drug Discovery pipeline
Machine learning
Training sets
RSC/ChemSpider platforms
RSC/Archive
Resear...
We are a part of a larger world
ChemSpider APIs
National Chemistry Database
http://www.openphacts.org
Open PHACTS is an Innovative
Medicines Initiative (IMI) project,
aiming to reduce the barriers t...
OSDD
Thank you
Email: tkachenkov@rsc.org
Slides: http://www.slideshare.net/valerytkachenko16
Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics
Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics
Upcoming SlideShare
Loading in...5
×

Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

590

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
590
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics"

  1. 1. Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune, India February 3rd 2014
  2. 2. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  3. 3. Science map
  4. 4. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  5. 5. Chemical space - 1060
  6. 6. Navigation in chemical space
  7. 7. Navigation in chemical space
  8. 8. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  9. 9. Structure-based Drug Design
  10. 10. Structure-based Drug Design
  11. 11. Ligand-based Drug Design
  12. 12. Ligand-based Drug Design
  13. 13. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  14. 14. Machine learning
  15. 15. Applied machine learning
  16. 16. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  17. 17. • • • • ~30 million chemicals and growing Data sourced from >500 different sources Crowdsourced curation and annotation Ongoing deposition of data from our journals and our collaborators • A structure centric hub for web-searching
  18. 18. ChemSpider
  19. 19. ChemSpider
  20. 20. Properties - experimental
  21. 21. Properties - ACDLabs
  22. 22. Properties – EPI Suite
  23. 23. Properties - ChemAxon
  24. 24. Literature references
  25. 25. Patents references
  26. 26. Books
  27. 27. Classification
  28. 28. Chemical vendors and datasources
  29. 29. Multimedia
  30. 30. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  31. 31. ChemSpider Reactions
  32. 32. ChemSpider Reactions
  33. 33. ChemSpider Reactions
  34. 34. ChemSpider Reactions
  35. 35. ChemSpider Spectra
  36. 36. ChemSpider Spectra
  37. 37. ChemSpider Databases ChemSpider Compounds ChemSpider Reactions ChemSpider Spectra ChemSpider Crystals ChemSpider Materials ChemSpider Assays ChemSpider Algorithms
  38. 38. Research data inflow All databases are sliced by data sources/data collections and have simple security model where each data slice/source is private, public or embargoed Web UI for unified depositions Compounds Deposition Gateway Reactions API, FTP, etc DropBox, Google Drive, SkyDrive, etc LabTrove and other templated data Compounds Module Raw data Reactions Module Spectra Module Materials Module Textmining Module ͙ Module Staging databases Staging databases Validated data Spectra Materials Documents Articles / CSSP
  39. 39. Research data outflow User interface tier (examples) Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Electronic Laboratory Notebook Analytical Laboratory application User interface components tier Data access tier Chemical Inventory application Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Compounds API Reactions API Spectra API Materials API Documents API Compounds Reactions Spectra Materials Documents Data tier
  40. 40. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  41. 41. RSC Archive – since 1841
  42. 42. DERA Digitally Enabling RSC Archive
  43. 43. Semantic mark-up of articles
  44. 44. It is so difficult to navigate… IP? IP? What’s the What’s the structure? structure? Are they in Are they in our file? our file? What’s What’s similar? similar? Pharmacology Pharmacology data? data? What’s the What’s the target? target? Known Known Pathways? Pathways? Competitors? Competitors? Connections Connections to disease? to disease? Working On Working On Now? Now? Expressed in Expressed in right cell type? right cell type?
  45. 45. Data quality issue and CVSP – Robochemistry – Proliferation of errors in public and private databases – Automated quality control system
  46. 46. DrugBank dataset (6516 records) J. Brechner, IUPAC Graphical Representation of stereochem. configurations Section: ST-1.1.10 DB06287
  47. 47. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  48. 48. Research data management Scientists Funding bodies External clients Publishers Indexes Data Repository indexed storage Chemically intelligent services Data Data Repository provided data storage University 1 University 2 Data Hub Workstations Company 3 Data Hub Workstations Data Hub Workstations
  49. 49. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  50. 50. Crowdsourcing
  51. 51. AltMetrics
  52. 52. RSC/Rewards and Recognition The First Step badge is awarded when a user submits (& has published) their 1st CSSP article. Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.
  53. 53. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Visualization and navigation Building Global Chemistry Network
  54. 54. Visualization
  55. 55. Visualization and navigation
  56. 56. Visualization and navigation
  57. 57. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network
  58. 58. We are a part of a larger world
  59. 59. ChemSpider APIs
  60. 60. National Chemistry Database
  61. 61. http://www.openphacts.org Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to drug discovery in industry, academia and for small businesses. Semantic web is one of the corner stones
  62. 62. OSDD
  63. 63. Thank you Email: tkachenkov@rsc.org Slides: http://www.slideshare.net/valerytkachenko16
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×