Your SlideShare is downloading. ×
PubChem and Related Open Cheminformatic Resources: A Revolution in the Connectivity Between Medicinal Chemistry and Biolog...
Outline <ul><li>The way it was </li></ul><ul><li>The revolution </li></ul><ul><li>PubChem aims and content </li></ul><ul><...
The Context <ul><li>Medicinal chemistry provides a bridge between biology and chemistry by identifying compounds that prod...
Acceleration of Global Medicinal Chemistry Output   <ul><li>This includes not only ~ 30K development cpds produced over th...
The Conceptual Union Between Chemistry and Biology goes back a long way ….
So does Bioactive Compound Structure Representation…..
But Times Have Changed …..
November 2004:  The Seeds of Revolution
Strophanthidin: from 1952 to 2007:  Now just a click to Hinxton… http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:38178
Or Bethesda…. http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6185
PubChem and ChEBI:  Revolutionary Consequences  <ul><li>Arrival of  the ”missing entity”  of  formal and linked chemical s...
PubChem and ChEBI:  Revolutionary Consequences  <ul><li>Arrival of  the ”missing entity”  of  formal and linked chemical s...
PubChem and ChEBI:  Revolutionary Consequences  <ul><li>Arrival of  the ”missing entity”  of  formal and linked chemical s...
PubChem and ChEBI:  Revolutionary Consequences  <ul><li>Arrival of  the ”missing entity”  of  formal and linked chemical s...
PubChem and ChEBI:  Revolutionary Consequences  <ul><li>Arrival of  the ”missing entity”  of  formal and linked chemical s...
The NIH Roadmap Chemical Diversity Technology  Development Screening Instrumentation Assay Development Predictive ADMET Co...
PubChem <ul><li>PubChem   is the  NCBI   informatics backbone for the  NIH Molecular Libraries Initiative   </li></ul><ul>...
Growth In PubChem Substances & Compounds Todays compound count: All: 10954831  BioAssay: 561883  Protein3D: 11717  Rule of...
GenBank PubChem
Top PubChem Depositors 56 current depositors 22 current depositors DiscoveryGate 4608994 ZINC 3813892 ChemDB 3564938 Thoms...
Growth In PubChem BioAssays Todays count: All: 637  Confirmatory: 250  MLSCN: 365  Protein Target: 257  Screening: 204  Su...
<ul><li>PubChem Chemical Searching </li></ul>
Assay Example
Advanced Functionality
Bio-Chem Data Joins
The Biology Union
Entrez  Connectivity Map
Expanding Relationships in Entrez Protein  Sequences Literature PubMed VAST Structure Similarity Bioactivity Assay Results...
A Pharmaceutical Portfolio from PubChem arixtra = CID 636380   factor Xa inhibitor   odiparcil = CID 216385   thrombin inh...
Joining to Protein Structure Cn3D  view of  PDB  1I7G   on the left PubChem  tesaglitazar=CID 208901  on the right   Super...
Systems Chemical Biology <ul><li>Oprea et al.  Nat Chem Biol . 2007  (8): 447-50 PMID: 17637771  </li></ul><ul><li>“ The i...
Zebrafish  Example
Public and Commercial Pathway Info
PubChem is now a Global Hub  Including  bioinformatic  dbs with  in-links   ChemBank, chemical genomics 0.4 mill ChEBi, en...
Relationships in Bioactive Chemical Space Protein  Sequences metabolomes & natural products assay data  drug-like cpds  fr...
The Crucial Value of Explicit Compound-to-sequence Links <ul><li>Increasing availability of extracted/curated/annotated re...
Complexities of Establishing Compound-to-sequence Links <ul><li>Definitions, thermodynamic, enzymological, receptor pharma...
Mapping the “Targetome” <ul><li>Basal (unspliced) human proteome ~ 20,000 </li></ul><ul><li>Extrapolated maximal “druggabl...
 
The DrugBank Data Extractor:  Extensive Target-Cpd Joins
Linkage  between  Swiss-Prot - DrugBank - PubChem - MMDB see these  marketed target links   (411) (15728)  = 181 (2501)
Commercial Products Complement Public Sources
Post-filtration Compound Counts (Oct 2006)  <ul><li>GVKBio  1,488,288 </li></ul><ul><li>GVKBio Journals  542,858 </li></ul...
PubChem and Commercial Target db Overlaps 7.27 mill 128K 1.49 mill PubChem GVKBIO 4,150 86,143 34,674 353,623 3,162 WOMBAT...
Bio-Chem Connectivity not yet Completely Effortless <ul><li>Relatively few specific target assays </li></ul><ul><li>The ML...
Implications of Open Chemistry (I)  <ul><li>CAS tried to strangle PubChem (see Wikepedia PubChem entry) </li></ul><ul><li>...
Implications of Open Chemistry (II)  <ul><li>Commercial monopolies  on chemical information brokerage have cracked  </li><...
The New Literature Front-fill: Chemistry Central <ul><li>Chemistry Central is a new open access website for chemists publi...
7-CHLORO-1-METHYL-5- PHENYL-2H-1,4- BENZODIAZEPIN-2-ONE  SMILES strings: c1ccccc1 INChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H Text A...
The ChemSpider Mission <ul><li>Build a structure centric community for chemists by: </li></ul><ul><ul><li>Providing an env...
Flexible Boolean Searching
Search result: 49 hits in 0.8 seconds
Integrated Visualization Tools
Integrated Analytical Data Management for Public Domain Data
Integrated Access to Open-Access  Literature Text-based searching of over 50,000 Open Access Chemistry Articles
External Integrations - Google Search Across Google Using InChI string
External Integrations – Patents Reel Two Surechem Portal
Quality is a Major Issue <ul><li>Pubchem structure-identifier pairs are proliferating </li></ul><ul><li>Care is needed or ...
Acknowledgments <ul><li>Colleagues from AstraZeneca Global Compound Sciences  </li></ul><ul><li>Steve Bryant and the NCBI ...
Upcoming SlideShare
Loading in...5
×

Revolution in the Connectivity Between Medicinal Chemistry and Biology

1,375

Published on

Course in Advanced Medicinal Chemistry (KEN760) Lecture
Chalmers Technical University, Oct 2007

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,375
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Transcript of "Revolution in the Connectivity Between Medicinal Chemistry and Biology"

    1. 1. PubChem and Related Open Cheminformatic Resources: A Revolution in the Connectivity Between Medicinal Chemistry and Biology <ul><li>Course in Advanced Medicinal Chemistry (KEN760) Lecture </li></ul><ul><li>Chalmers Technical University, Oct 2007 </li></ul><ul><li>Chris Southan, ChrisDS Consulting, Göteborg </li></ul><ul><li>http://www.cdsouthan.info/CDS_proff.htm </li></ul>
    2. 2. Outline <ul><li>The way it was </li></ul><ul><li>The revolution </li></ul><ul><li>PubChem aims and content </li></ul><ul><li>Making the chemistry and biology joins </li></ul><ul><li>Chemical systems biology </li></ul><ul><li>Drug Bank and Chem Spider </li></ul><ul><li>Wider implications </li></ul>
    3. 3. The Context <ul><li>Medicinal chemistry provides a bridge between biology and chemistry by identifying compounds that produce biological effects </li></ul><ul><li>Historically, the goal has been to optimize therapeutic efficacy and avoid undesired biological effects i.e. develop new medicines </li></ul><ul><li>However, it is increasingly recognised that bioactive cpds can be part of the perturbation toolbox for the investigation of all types of biological processes </li></ul><ul><li>By advancing biological knowledge at both the molecular and systems level such investigations can lead to improved understanding of disease and new opportunities for classical medicinal chemistry </li></ul>
    4. 4. Acceleration of Global Medicinal Chemistry Output <ul><li>This includes not only ~ 30K development cpds produced over the last 20 years but also post-genomic, post-HTS, post-libraries output acceleration </li></ul><ul><li>Much of this is being published in med chem journals and estimates suggest global pharma/biotech patent output of at least 300K compound claims per year </li></ul><ul><li>Because the targetome is small we are approaching the point where nearly all tractable targets, directly or by homology, will have available chemical modulation starting points </li></ul>
    5. 5. The Conceptual Union Between Chemistry and Biology goes back a long way ….
    6. 6. So does Bioactive Compound Structure Representation…..
    7. 7. But Times Have Changed …..
    8. 8. November 2004: The Seeds of Revolution
    9. 9. Strophanthidin: from 1952 to 2007: Now just a click to Hinxton… http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:38178
    10. 10. Or Bethesda…. http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6185
    11. 11. PubChem and ChEBI: Revolutionary Consequences <ul><li>Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships </li></ul>
    12. 12. PubChem and ChEBI: Revolutionary Consequences <ul><li>Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships </li></ul><ul><li>Ability to search across links between biochemical data, biological effects and chemical structure information </li></ul>
    13. 13. PubChem and ChEBI: Revolutionary Consequences <ul><li>Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships </li></ul><ul><li>Ability to search across links between biochemical data, biological effects and chemical structure information </li></ul><ul><li>Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories </li></ul>
    14. 14. PubChem and ChEBI: Revolutionary Consequences <ul><li>Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships </li></ul><ul><li>Ability to search across links between biochemical data, biological effects and chemical structure information </li></ul><ul><li>Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories </li></ul><ul><li>Proliferation of cheminformatics tools, databases, nomenclatures, and ontologies in the public domain </li></ul>
    15. 15. PubChem and ChEBI: Revolutionary Consequences <ul><li>Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships </li></ul><ul><li>Ability to search across links between biochemical data, biological effects and chemical structure information </li></ul><ul><li>Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories </li></ul><ul><li>Proliferation of cheminformatics tools, databases, nomenclatures, and ontologies in the public domain </li></ul><ul><li>A quantum jump in the global enablement of chemical biology and medicinal chemistry </li></ul>
    16. 16. The NIH Roadmap Chemical Diversity Technology Development Screening Instrumentation Assay Development Predictive ADMET Compound Repository (MLSMR) Informatics Chem- informatics Research Centers Molecular Libraries Screening Centers Network ( M L S C N )
    17. 17. PubChem <ul><li>PubChem is the NCBI informatics backbone for the NIH Molecular Libraries Initiative </li></ul><ul><li>A suite of three databases, PubChem Compound unique structures with computed properties ) PubChem BioAssay ( results supplied by depositors) and PubChem Substance ( deposited compound structures) </li></ul><ul><li>The ten MLI-funded screening centers are run cellular and target-based HTS’s using a compound collection of <140K and submitting the results to PubChem </li></ul>
    18. 18. Growth In PubChem Substances & Compounds Todays compound count: All: 10954831 BioAssay: 561883 Protein3D: 11717 Rule of 5: 7133978
    19. 19. GenBank PubChem
    20. 20. Top PubChem Depositors 56 current depositors 22 current depositors DiscoveryGate 4608994 ZINC 3813892 ChemDB 3564938 Thomson Pharma 2303628 ChemBridge 433971 ChemBank 413586 ChemIDplus 383789 Asinex 362469 DTP/NCI 268696 Specs 204658 DTP/NCI 173 NIH Chemical Genomics Center 60 Structural Genomics Consortium - Oxford 43 Scripps Research Institute 37 University of Pittsburg MLSC 33 Southern Research MLSC 29 San Diego Center for Chemical Genomics 22 BindingDB 20 Penn Center for Molecular Discovery 19 Emory MLSC; Vanderbilt MLSC 15
    21. 21. Growth In PubChem BioAssays Todays count: All: 637 Confirmatory: 250 MLSCN: 365 Protein Target: 257 Screening: 204 Summary: 1
    22. 22. <ul><li>PubChem Chemical Searching </li></ul>
    23. 23. Assay Example
    24. 24. Advanced Functionality
    25. 25. Bio-Chem Data Joins
    26. 26. The Biology Union
    27. 27. Entrez Connectivity Map
    28. 28. Expanding Relationships in Entrez Protein Sequences Literature PubMed VAST Structure Similarity Bioactivity Assay Results Small Molecule Structures Protein 3D Structures Biological Terms MeSH indexed 2D Chemical Structure Similarity (3D soon) Activity Profile Similarity Protein Sequence BLAST Sequence Similarity
    29. 29. A Pharmaceutical Portfolio from PubChem arixtra = CID 636380 factor Xa inhibitor odiparcil = CID 216385 thrombin inhibitor carvedilol = CID 2585 alpha/beta blocker levitra = CID 110634 PDE-5 inhibitor avodart = CID 152945 5-ARI inhibitor cutivate = CID 444036 anti-inflammatory hycamtin = CID 60699 topoisomerase inhibitor zofran = CID 4595 serotonin type 3 receptor antagonist avandir = CID 77998 PPAR gamma inhibitor seroxat = CID 5284605 SSRI wellbutrin = CID 62884 dopamine uptake blocker imigran = CID 5358 serotonin 5HT1 agonist lamictal = CID 3878 calcium channel blocker flixonase = CID 444036 anti-inflamatory serevent = CID 5152 adrenergic beta-agonist ariflo = CID 151170 PDE IV inhibitor SB 204070A= CID 121880 5-HT4 receptor antagonist SB-220453 = tonerbasat = CID 3055165 anticonvulsant
    30. 30. Joining to Protein Structure Cn3D view of PDB 1I7G   on the left PubChem tesaglitazar=CID 208901 on the right Superimposed non-identical structural neighbours of 1I7G on the left with 90% similar neighbours of tesaglitazar=CID 208901 on the right.
    31. 31. Systems Chemical Biology <ul><li>Oprea et al. Nat Chem Biol . 2007 (8): 447-50 PMID: 17637771 </li></ul><ul><li>“ The increasing availability of data related to genes, proteins and their modulation by small molecules has provided a vast amount of biological information leading to the emergence of systems biology and the broad use of simulation tools for data analysis. However, there is a critical need to develop cheminformatics tools that can integrate chemical knowledge with these biological databases and simulation approaches, with the goal of creating systems chemical biology.” </li></ul>
    32. 32. Zebrafish Example
    33. 33. Public and Commercial Pathway Info
    34. 34. PubChem is now a Global Hub Including bioinformatic dbs with in-links ChemBank, chemical genomics 0.4 mill ChEBi, enzyme ligands 8K KEGG, drugs and metabolites 14K ZINC, ready-to-dock 3.8 mill MMDB, PDB ligands 55K ChemIDplus, NIH tox data 383K GPCR-Ligand Database Human Metabolite db 2K MEROPS protease inhibitors DrugBank, drugs and targets 4K P u b C h e m ChemSpider 20 million Nature Chemical Biology 0.8 K Drugs of the Future 3.4K LIPID MAPS, metabolism 8.8K
    35. 35. Relationships in Bioactive Chemical Space Protein Sequences metabolomes & natural products assay data drug-like cpds from literature & patents drugs chem genomics & sys biol probes
    36. 36. The Crucial Value of Explicit Compound-to-sequence Links <ul><li>Increasing availability of extracted/curated/annotated relationships: </li></ul><ul><li>… document (or database entry) “V “ includes assay data “W” that defines compound “X” as an activity modulator of protein “Y” with sequence “Z”…. </li></ul><ul><li>~ 130,000 cpds, ~1,300 sequences, ~7,000 papers </li></ul><ul><li>~ 2 million cpds ~ 5,000 sequences ~ 80,000 patents and papers </li></ul><ul><li>~ 4,000 cpds, ~ 6000 sequences </li></ul>
    37. 37. Complexities of Establishing Compound-to-sequence Links <ul><li>Definitions, thermodynamic, enzymological, receptor pharmacology, what is a “target”, ect </li></ul><ul><li>The necessity for quantitative data from defined biochemical assays </li></ul><ul><li>Constitutive problems of extracting unstructured document data into relational database schema </li></ul><ul><li>Inherent biases of cross-screening data </li></ul><ul><li>Lack of assay standards or ontologies </li></ul><ul><li>Mapping between assay descriptions, gene names, symbols, sequence identifiers and primary structure isoforms </li></ul><ul><li>Establishing triage rules to encompass different degrees of ambiguity from source data </li></ul><ul><li>Blending automated extraction with expert annotation </li></ul><ul><li>Filtering and combining complex Entrez queries with PubChem intersects </li></ul><ul><li>Fundamental challenge of relating in-vitro activities of cpds to in-vivo pharmacology and clinical effects </li></ul>
    38. 38. Mapping the “Targetome” <ul><li>Basal (unspliced) human proteome ~ 20,000 </li></ul><ul><li>Extrapolated maximal “druggable”, targetome (H&G) ~ 3,000 </li></ul><ul><li>Estimate of maximal tractable therapeutic targets (H&G) ~ 600 – 1,200 </li></ul><ul><li>Targets with chemical starting points (GVKBIO all) 5595 </li></ul><ul><li>Targets & pathways (Prous Integrity) 1866 </li></ul><ul><li>Targets (WOMBAT 2006.1) = 1,320 </li></ul><ul><li>PubChem Compound-to-RefSeq = 1081 </li></ul><ul><li>Targets with chemical starting points (GVKBIO human) 1248 </li></ul><ul><li>Therapeutic Targets Database = 997 </li></ul><ul><li>Targets with <10 uM cpds (Paolini et al . human) = 836 </li></ul><ul><li>Targets with <100nm Lipinski-compliant cpds (Paolini et al . human) = 529 </li></ul><ul><li>Swiss-Prot - > DrugBank links (all) = 503 </li></ul><ul><li>Big pharma project portfolio (Pfizer) = 479 </li></ul><ul><li>Swiss-Prot - > DrugBank links (human) = 411 </li></ul><ul><li>Targets marketed drugs (all) = 248 proteins, </li></ul><ul><li>Targets marketed drugs (human) = 207 </li></ul><ul><li>Targets for marketed oral drugs (human) = 185 </li></ul>
    39. 40. The DrugBank Data Extractor: Extensive Target-Cpd Joins
    40. 41. Linkage between Swiss-Prot - DrugBank - PubChem - MMDB see these marketed target links (411) (15728) = 181 (2501)
    41. 42. Commercial Products Complement Public Sources
    42. 43. Post-filtration Compound Counts (Oct 2006) <ul><li>GVKBio 1,488,288 </li></ul><ul><li>GVKBio Journals 542,858 </li></ul><ul><li>GVKBio Patents 1,034,548 </li></ul><ul><li>GVKBio Drug 1,933 </li></ul><ul><li>WOMBAT 128,120 </li></ul><ul><li>PubChem 7,268,193 </li></ul><ul><li>PubChem Prous 3,318 </li></ul><ul><li>PubChem PDB 5,626 </li></ul><ul><li>PubChem actives 35,671 </li></ul><ul><li>PubChem pharmacol 6,070 </li></ul><ul><li>Bioprint 2,437 </li></ul><ul><li>ZINC FDA 1,200 </li></ul><ul><li>DrugBank 3,723 </li></ul><ul><li>DrugBank small mol 1,018 </li></ul><ul><li>DrugBank exp drugs 2,737 </li></ul><ul><li>Dict. Nat.Prod. 132,831 </li></ul><ul><li>MDDR 159,867 </li></ul><ul><li>MDDR launched 1,118 </li></ul><ul><li>CMC 8,189 </li></ul>
    43. 44. PubChem and Commercial Target db Overlaps 7.27 mill 128K 1.49 mill PubChem GVKBIO 4,150 86,143 34,674 353,623 3,162 WOMBAT 1,013,848 6,825,265
    44. 45. Bio-Chem Connectivity not yet Completely Effortless <ul><li>Relatively few specific target assays </li></ul><ul><li>The MLSC screening collection is modest </li></ul><ul><li>As a consequence of being “submitter-friendly” the NCBI cannot enforce stringent data standards </li></ul><ul><li>The “vendor-push” effect dilutes out actives </li></ul><ul><li>Valuble bioactivity dbs not yet reciprocally linked or Entrez-selectable </li></ul><ul><li>Explicit protein<->compound links not extensive </li></ul><ul><li>Caveats for raw HTS data </li></ul><ul><li>MeSH linking from the medicinal chemistry literature is thin </li></ul><ul><li>Limited public back-fill of compound-biology-sequence links from literature </li></ul><ul><li>Lack of an assay ontology </li></ul><ul><li>Entrez is massive and not for the faint hearted </li></ul><ul><li>Conceptual challenges of traversing chemistry<>bioinformatics<>protein structure >biology </li></ul>
    45. 46. Implications of Open Chemistry (I) <ul><li>CAS tried to strangle PubChem (see Wikepedia PubChem entry) </li></ul><ul><li>At least in part, the NIH MLI was a response to the post-genomic productivity decline of pharma </li></ul><ul><li>It will enable drug research for orphan and tropical diseases that pharma could not take on alone </li></ul><ul><li>But progressing to clinical candidates will be challenging </li></ul><ul><li>Impact of commercial medicinal chemistry patent claims on public research is unclear </li></ul><ul><li>Impact of academic medicinal chemistry patent claims on commercial research is unclear </li></ul><ul><li>Pharma encouragement for more pre-competive collaboration </li></ul><ul><li>Collective responsibility for correct and standardised chemical structure representation </li></ul><ul><li>Controversy over open acess Chem-Bio journals and data links vs traditional publishing </li></ul><ul><li>Credibility issues e.g. PubChem-FDA-DrugBank-Wikepia </li></ul>
    46. 47. Implications of Open Chemistry (II) <ul><li>Commercial monopolies on chemical information brokerage have cracked </li></ul><ul><li>But their opportunties have expanded in target annotation and pathway mapping </li></ul><ul><li>Benchmarking and standardisation opportunities from public HTS and other screening data </li></ul><ul><li>Academic opportunities for seaching patent cpds and data </li></ul><ul><li>More open-source cheminformatic tools will become available e.g. pharmacophore searching, virtual screening and docking </li></ul><ul><li>Academic experimental work may be limited by cpd availability </li></ul><ul><li>Will the revolution reach the patent offices and result in searchable e-submissions ? </li></ul><ul><li>While progess is being made in large-scale extraction and indexing of biological knowledge from the literature (conversion of unstructured to strucutred data) its not clear how the chemical literature will be back-filled </li></ul><ul><li>Increased collaboration and/or licencing opportunites for pharma and biotech </li></ul>
    47. 48. The New Literature Front-fill: Chemistry Central <ul><li>Chemistry Central is a new open access website for chemists publishing peer-reviewed research in chemistry from a range of open access journals. The Chemistry Central Journal will cover all of chemistry and will be broken down into discipline-specific sections including Medicinal Chemistry will be a key discipline in this new journal. </li></ul><ul><li>Online publishing offers technologies not available in print-only format such as hyperlinking, video files, and interactive graphics and molecular superimpositions. Figures will be able to be submitted in ChemDraw (.CDX) or ISIS/Draw (.TGF) file formats. All articles will be deposited in PubMed Central, and will therefore be automatically linked into PubChem, based on the chemical substances that they mention. Additional information such as experimental or spectroscopic data can be linked electronically to structures within the document. The storage of supporting data should make this an invaluable SAR resource. </li></ul>
    48. 49. 7-CHLORO-1-METHYL-5- PHENYL-2H-1,4- BENZODIAZEPIN-2-ONE SMILES strings: c1ccccc1 INChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H Text Analysis Operations for Chemistry Documents: Backfilling from Patents and Papers Name  Structure Program language-free entities 6 6 0 0 0 0 0 0 0 0999 V2000 6.7092 5.6087 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.7076 4.5056 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.6607 3.9551 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.6160 4.5062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.6121 5.6136 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.6583 6.1591 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 1 1 0 0 0 0 M END Connection tables
    49. 50. The ChemSpider Mission <ul><li>Build a structure centric community for chemists by: </li></ul><ul><ul><li>Providing an environment for structure drawing, manipulation, visualization, modeling, databasing and searching </li></ul></ul><ul><ul><li>Providing methods by which to deposit, curate and enhance data associated with chemical structures </li></ul></ul><ul><ul><li>Providing structure-based access to federated Chemistry databases representing chemical vendors, literature, online data, patents and other forms of Chemistry data </li></ul></ul>
    50. 51. Flexible Boolean Searching
    51. 52. Search result: 49 hits in 0.8 seconds
    52. 53. Integrated Visualization Tools
    53. 54. Integrated Analytical Data Management for Public Domain Data
    54. 55. Integrated Access to Open-Access Literature Text-based searching of over 50,000 Open Access Chemistry Articles
    55. 56. External Integrations - Google Search Across Google Using InChI string
    56. 57. External Integrations – Patents Reel Two Surechem Portal
    57. 58. Quality is a Major Issue <ul><li>Pubchem structure-identifier pairs are proliferating </li></ul><ul><li>Care is needed or at least cleansing of the data </li></ul>
    58. 59. Acknowledgments <ul><li>Colleagues from AstraZeneca Global Compound Sciences </li></ul><ul><li>Steve Bryant and the NCBI PubChem team </li></ul><ul><li>Chris Austen and the NIH Chemical Genomics Centre team </li></ul><ul><li>Antony Williams for ChemSpider slides </li></ul>

    ×