Exploring Chemical and Biological Knowledge Spaces with PubChem


Published on

My presentation for the Drug Repurposing workshop at the upcoming Bio-IT World Expo.


Presentation abstract:

PubChem has a wealth of chemical structure and biological activity information. In conjunction with NCBI’s other resources such as PubMed and GenBank, PubChem is a vast source of information relevant to repurposing not only of established drugs but any compounds with in vivo pharmacology and/or clinical results. The challenge is how to take advantage of this knowledge. The ability to explore not only chemical similarity but relationships between diseases and disease targets has crucial value in repurposing. While focused investigations are already possible within the existing Entrez system, navigation across these linked information spaces can be difficult to do on a large scale with current tools. We are actively developing new infrastructure to support such analyses, and pursuing new methods of exploring inter- and intra-database relationships between chemicals, targets, diseases, and patents. Progress and some future direction in these areas will be presented.

Published in: Health & Medicine
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Exploring Chemical and Biological Knowledge Spaces with PubChem

  1. 1. Dr. Paul A. Thiessen, NCBI 2013/03/21 draft
  2. 2. What is a “Knowledge Space”? May be a database But may be a concept not encapsulated in a database Genes DiseasesLiterature(PubMed) Chemicals (PubChem) Assays TargetsPatents (PubChem) (sequences) Drugs
  3. 3. Connecting the Spaces Database cross-links Assays (PubChem)Literature(PubMed) Active MeSH Inactive Targets Depositor (sequences) Chemicals (PubChem)
  4. 4. Moving Within a Space Neighbors… some examples Same Similar sets parent of screenedSimilar Assays chemicalsby 2D (PubChem)or 3D Chemicals (PubChem) Similar target Same (BLAST) connectivity
  5. 5. Drug Repurposing as a SpatialTransformation  One possible route…Search Diseases Drugs (known) Similarity Diseases Targets (hypothesized)
  6. 6. What is in PubChem 117M Substances (SIDs)  Information from depositors, including links to PubMed, sequences, structures, patents, etc. 47M Compounds (CIDs)  Derived from Substances (including links)  Computed properties 650k Assays (AIDs)  ~200M test results on SIDs  Links to target sequences
  7. 7. Some PubChem Statistics All CIDs 46,814,409 Unique parents by connectivity 36,806,372 Rule of 5 34,343,056 Rule of 5 but MW 250-800 31,483,865 Active in any BioAssay 824,028 Tested in any BioAssay 1,872,313 Experimental 3D (mainly PDB) 41,406 Computed 3D (multiple confs + neighbors) 42,252,570 Pharmacological Actions 11,531 Biosystems 9,703 Chemical vendors 28,852,943 NIH Molecular Libraries 402,076 Patent sources 14,512,499 Patent links 5,978,538 … as of 2013/03/20
  8. 8. What is in NCBI Entrez Many other databases…  PubMed  Protein/Nucleotide sequences  Genes  Biosystems (metabolic pathways)  PDB structures (with VAST neighbors) Text and numeric search fields Cross-links  Between databases  Within databases (neighbors)
  9. 9. How Entrez Works Search results = list of identifiers Boolean operations on lists (query refinement) Links from one database to another PubChem CID Search List Link PMID to PubMed List PubChem CID Search List
  10. 10. Limitations of Entrez Only text or numeric search  Search fields hard to discover  Search fields and defaults vary by database  Chemical structure search, and other specialized algorithms, must be done outside Entrez The kicker: links are incomplete  Only 500-10,000 ids!  Limit also varies by database
  11. 11. Working Around the Limitations Scripting  E-Utils, PUG SOAP/REST, etc.  Break queries into smaller chunks Specialized services  PubChem’s ID Exchange  Classification trees (with associated IDs)
  12. 12. What is not in Entrez … as a database per se, but which may be imported and linked to PubChem Drugs  (sort of but not really) Targets  (again sort of) Diseases Patents
  13. 13. Some Public Sources of InformationRelevant to Drugs and Repurposing United States (FDA, NLM, NCBI, …)  ClinicalTrials.gov  NDF(-RT)  RxNorm  HSDB  MeSH  DailyMed  PubMed, PubMed Health  USPTO Europe  ChEBI / ChEMBL  EPO / WIPO Canada  DrugBank Japan  KEGG … not an exhaustive list … some are linked to PubChem … some are works in progress
  14. 14. MeSH and ChEBI Chemical structure classification Biological role Pharmacological action
  15. 15. KEGG and DrugBank Drug classification Targets
  16. 16. Patents PubChem depositors  Per SID: ○ Patent IDs ○ PubMed IDs Classifications  ECLA  IPC  USPC  CPC
  17. 17. Aside: Patent Summaries
  18. 18. NDF-RT Molecular interactions Drug ingredients Diseases (with drugs) Physiological effects Has links to MeSH  … which leads to CIDs
  19. 19. NDF-RT linked to SID, CID
  20. 20. Classifications as NavigationTools Where are the CIDs in the tree? • Example: chemicals affecting serotonin transporters according to KEGG
  21. 21. Classifications for QueryRefinement Where are MY CIDs in the tree?• Example: what diseases are linked by NDF to KEGG’s serotonin transport drugs?
  22. 22. Big Classifications… Some Engineering RequiredWIPO IPC• 72,000 tree nodes• 6,000,000 CIDs• 124,000,000 node-CID linksFiltering on the fly:• 22,000 CIDs from PDB … interactive!
  23. 23. More Space to Explore Genes Literature Assays (PubMed) (PubChem) Chemicals (PubChem) Targets Patents (sequences) Drugs Diseases … and beyond
  24. 24. Conclusions PubChem is…  A very generalized system  Based on open data  Part of the larger Entrez collection We strive to…  Make analysis across multiple knowledge spaces accessible and powerful  Enable hypothesis generation for drug repurposing (as one scenario among many) Feedback is always welcome!  info@ncbi.nlm.nih.gov
  25. 25. Acknowledgements Evan Bolton Steve Bryant Asta Gindulyte (classification front end) Chris Southan … Thank You!