Exploring Chemical and Biological Knowledge Spaces with PubChem
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Exploring Chemical and Biological Knowledge Spaces with PubChem



My presentation for the Drug Repurposing workshop at the upcoming Bio-IT World Expo. ...

My presentation for the Drug Repurposing workshop at the upcoming Bio-IT World Expo.


Presentation abstract:

PubChem has a wealth of chemical structure and biological activity information. In conjunction with NCBI’s other resources such as PubMed and GenBank, PubChem is a vast source of information relevant to repurposing not only of established drugs but any compounds with in vivo pharmacology and/or clinical results. The challenge is how to take advantage of this knowledge. The ability to explore not only chemical similarity but relationships between diseases and disease targets has crucial value in repurposing. While focused investigations are already possible within the existing Entrez system, navigation across these linked information spaces can be difficult to do on a large scale with current tools. We are actively developing new infrastructure to support such analyses, and pursuing new methods of exploring inter- and intra-database relationships between chemicals, targets, diseases, and patents. Progress and some future direction in these areas will be presented.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Exploring Chemical and Biological Knowledge Spaces with PubChem Presentation Transcript

  • 1. Dr. Paul A. Thiessen, NCBI 2013/03/21 draft
  • 2. What is a “Knowledge Space”? May be a database But may be a concept not encapsulated in a database Genes DiseasesLiterature(PubMed) Chemicals (PubChem) Assays TargetsPatents (PubChem) (sequences) Drugs
  • 3. Connecting the Spaces Database cross-links Assays (PubChem)Literature(PubMed) Active MeSH Inactive Targets Depositor (sequences) Chemicals (PubChem)
  • 4. Moving Within a Space Neighbors… some examples Same Similar sets parent of screenedSimilar Assays chemicalsby 2D (PubChem)or 3D Chemicals (PubChem) Similar target Same (BLAST) connectivity
  • 5. Drug Repurposing as a SpatialTransformation  One possible route…Search Diseases Drugs (known) Similarity Diseases Targets (hypothesized)
  • 6. What is in PubChem 117M Substances (SIDs)  Information from depositors, including links to PubMed, sequences, structures, patents, etc. 47M Compounds (CIDs)  Derived from Substances (including links)  Computed properties 650k Assays (AIDs)  ~200M test results on SIDs  Links to target sequences
  • 7. Some PubChem Statistics All CIDs 46,814,409 Unique parents by connectivity 36,806,372 Rule of 5 34,343,056 Rule of 5 but MW 250-800 31,483,865 Active in any BioAssay 824,028 Tested in any BioAssay 1,872,313 Experimental 3D (mainly PDB) 41,406 Computed 3D (multiple confs + neighbors) 42,252,570 Pharmacological Actions 11,531 Biosystems 9,703 Chemical vendors 28,852,943 NIH Molecular Libraries 402,076 Patent sources 14,512,499 Patent links 5,978,538 … as of 2013/03/20
  • 8. What is in NCBI Entrez Many other databases…  PubMed  Protein/Nucleotide sequences  Genes  Biosystems (metabolic pathways)  PDB structures (with VAST neighbors) Text and numeric search fields Cross-links  Between databases  Within databases (neighbors)
  • 9. How Entrez Works Search results = list of identifiers Boolean operations on lists (query refinement) Links from one database to another PubChem CID Search List Link PMID to PubMed List PubChem CID Search List
  • 10. Limitations of Entrez Only text or numeric search  Search fields hard to discover  Search fields and defaults vary by database  Chemical structure search, and other specialized algorithms, must be done outside Entrez The kicker: links are incomplete  Only 500-10,000 ids!  Limit also varies by database
  • 11. Working Around the Limitations Scripting  E-Utils, PUG SOAP/REST, etc.  Break queries into smaller chunks Specialized services  PubChem’s ID Exchange  Classification trees (with associated IDs)
  • 12. What is not in Entrez … as a database per se, but which may be imported and linked to PubChem Drugs  (sort of but not really) Targets  (again sort of) Diseases Patents
  • 13. Some Public Sources of InformationRelevant to Drugs and Repurposing United States (FDA, NLM, NCBI, …)  ClinicalTrials.gov  NDF(-RT)  RxNorm  HSDB  MeSH  DailyMed  PubMed, PubMed Health  USPTO Europe  ChEBI / ChEMBL  EPO / WIPO Canada  DrugBank Japan  KEGG … not an exhaustive list … some are linked to PubChem … some are works in progress
  • 14. MeSH and ChEBI Chemical structure classification Biological role Pharmacological action
  • 15. KEGG and DrugBank Drug classification Targets
  • 16. Patents PubChem depositors  Per SID: ○ Patent IDs ○ PubMed IDs Classifications  ECLA  IPC  USPC  CPC
  • 17. Aside: Patent Summaries
  • 18. NDF-RT Molecular interactions Drug ingredients Diseases (with drugs) Physiological effects Has links to MeSH  … which leads to CIDs
  • 19. NDF-RT linked to SID, CID
  • 20. Classifications as NavigationTools Where are the CIDs in the tree? • Example: chemicals affecting serotonin transporters according to KEGG
  • 21. Classifications for QueryRefinement Where are MY CIDs in the tree?• Example: what diseases are linked by NDF to KEGG’s serotonin transport drugs?
  • 22. Big Classifications… Some Engineering RequiredWIPO IPC• 72,000 tree nodes• 6,000,000 CIDs• 124,000,000 node-CID linksFiltering on the fly:• 22,000 CIDs from PDB … interactive!
  • 23. More Space to Explore Genes Literature Assays (PubMed) (PubChem) Chemicals (PubChem) Targets Patents (sequences) Drugs Diseases … and beyond
  • 24. Conclusions PubChem is…  A very generalized system  Based on open data  Part of the larger Entrez collection We strive to…  Make analysis across multiple knowledge spaces accessible and powerful  Enable hypothesis generation for drug repurposing (as one scenario among many) Feedback is always welcome!  info@ncbi.nlm.nih.gov
  • 25. Acknowledgements Evan Bolton Steve Bryant Asta Gindulyte (classification front end) Chris Southan … Thank You!