Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

1. Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune, India February 3rd 2014

2. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Data quality, crowdsourcing and AltMetrics Building Global Chemistry Network

4. Science map

6. Chemical space - 1060

7. Navigation in chemical space

8. Navigation in chemical space

10. Structure-based Drug Design

11. Structure-based Drug Design

12. Ligand-based Drug Design

13. Ligand-based Drug Design

15. Machine learning

16. Applied machine learning

18. • • • • ~30 million chemicals and growing Data sourced from >500 different sources Crowdsourced curation and annotation Ongoing deposition of data from our journals and our collaborators • A structure centric hub for web-searching

19. ChemSpider

20. ChemSpider

21. Properties - experimental

22. Properties - ACDLabs

23. Properties – EPI Suite

24. Properties - ChemAxon

25. Literature references

26. Patents references

27. Books

28. Classification

29. Chemical vendors and datasources

30. Multimedia

32. ChemSpider Reactions

36. ChemSpider Spectra

37. ChemSpider Spectra

38. ChemSpider Databases ChemSpider Compounds ChemSpider Reactions ChemSpider Spectra ChemSpider Crystals ChemSpider Materials ChemSpider Assays ChemSpider Algorithms

39. Research data inflow All databases are sliced by data sources/data collections and have simple security model where each data slice/source is private, public or embargoed Web UI for unified depositions Compounds Deposition Gateway Reactions API, FTP, etc DropBox, Google Drive, SkyDrive, etc LabTrove and other templated data Compounds Module Raw data Reactions Module Spectra Module Materials Module Textmining Module ͙ Module Staging databases Staging databases Validated data Spectra Materials Documents Articles / CSSP

40. Research data outflow User interface tier (examples) Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Electronic Laboratory Notebook Analytical Laboratory application User interface components tier Data access tier Chemical Inventory application Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Compounds API Reactions API Spectra API Materials API Documents API Compounds Reactions Spectra Materials Documents Data tier

42. RSC Archive – since 1841

43. DERA Digitally Enabling RSC Archive

44. Semantic mark-up of articles

45. It is so difficult to navigate… IP? IP? What’s the What’s the structure? structure? Are they in Are they in our file? our file? What’s What’s similar? similar? Pharmacology Pharmacology data? data? What’s the What’s the target? target? Known Known Pathways? Pathways? Competitors? Competitors? Connections Connections to disease? to disease? Working On Working On Now? Now? Expressed in Expressed in right cell type? right cell type?

46. Data quality issue and CVSP – Robochemistry – Proliferation of errors in public and private databases – Automated quality control system

47. DrugBank dataset (6516 records) J. Brechner, IUPAC Graphical Representation of stereochem. configurations Section: ST-1.1.10 DB06287

49. Research data management Scientists Funding bodies External clients Publishers Indexes Data Repository indexed storage Chemically intelligent services Data Data Repository provided data storage University 1 University 2 Data Hub Workstations Company 3 Data Hub Workstations Data Hub Workstations

51. Crowdsourcing

52. AltMetrics

53. RSC/Rewards and Recognition The First Step badge is awarded when a user submits (& has published) their 1st CSSP article. Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.

54. Big Data Chemical Space Drug Discovery pipeline Machine learning Training sets RSC/ChemSpider platforms RSC/Archive Research data management Visualization and navigation Building Global Chemistry Network

55. Visualization

56. Visualization and navigation

57. Visualization and navigation

59. We are a part of a larger world

60. ChemSpider APIs

61. National Chemistry Database

62. http://www.openphacts.org Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to drug discovery in industry, academia and for small businesses. Semantic web is one of the corner stones

64. OSDD

65. Thank you Email: tkachenkov@rsc.org Slides: http://www.slideshare.net/valerytkachenko16

Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics

Similar to Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics (20)

More from Valery Tkachenko

More from Valery Tkachenko (20)

Recently uploaded

Recently uploaded (20)

Big data supporting drug discovery - cautionary tales from the world of chemistry for translational informatics