Your SlideShare is downloading. ×
0
Great promise of navigating the          internet using InChIs                     Antony J Williams                 ACS S...
Openness and Quality IssuesWilliams and Ekins, DDT, 16: 747-750 (2011)              Science Translational Medicine 2011
Warning… This talk is not about Quality…it’s about quantity
Warning… This talk is not about Quality…it’s about quantity                  Drugbank was here
Data quality is a known issue
We ALL have issues!!!
It’s about what’s out there…
How to Link it…
And getting out of overwhelm…
So what is Yohimbine?
Of course it is out there…      Drugbox: 3001/5080 with InChIs      Chembox:5436/7690 with InChIs
Tell me more…   Where can I find the molfile for Yohimbine?   Papers/Patents about Yohimbine?   What are the side effec...
Quantity!
Yohimbine on ChemSpider..Quality?
How do we build it? We deal in Molfiles or SDF files – with coordinates Deposit anything that has an InChI – we support ...
Downsides of InChI InChI was a moving target (multi versions) but  overall worked as planned. Good for small molecules –...
Side Effects of InChI Usage
SMILES by comparison…
Side Effects of InChI Usage
Standardization IssuesDepiction based on molfile
Downsides of Overall Approach Meshing data together based on InChIs worked  for simple molecules 2D layout errors inheri...
Yohimbine on ChemSpider..Quality?
So where can we travel???
So where can we travel???
InChI String Search via GoogleGive me InChIKeys…
And where can we travel???
 ChemSpider BRENDA Wikipedia ChEMBL ChEBI DrugBank
 Aggregator Enzymes Encyclopedia Pharmacology Curated Chemicals Drug-Drug Target
Recognizing Compound Dilution So much chemistry on the web…. And so much dilution – “structural uniqueness”  versus “acc...
Vancomycin – Search the Internet
VancomycinSearch Molecular   Search Full Molecule  SKELETON
Full Skeleton Search
All aggegators suffer dilution!
Many Problems Can be Solved… Clean up databases – structure validation,  structure standardization Warn about   Valency...
Structure Validation
Structure Validation - Fixed
What needs to happen? If we could validate    Catch errors in databases (and clean)    Proactively catch errors in publ...
NPC Browser Set
Download, Deposit, Reprocess
Substructure   # of    # of          No           Incomplete       Complete but                Hits   Correct   stereochem...
Structure-Name Validation                                  H3C                                                            ...
Standardize Use the SRS as a guidance document for  standardization Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
Ammonium salts
Millions of structures? Lots of Issues
ChemSpider Standardization Entire ChemSpider database will be standardized  using modified FDA rule set Original Molfile...
Identifier Dictionaries Reciprocal curation processes…share curation  with each other. If a database has a compound alre...
Proof of Concept Data Curation SharingWho wants to work with us?
Structure Validation using feed Look for approved synonyms Compare feed InChIKey with database InChIKey If different, f...
It is so difficult to navigate…                                                        IP?                                ...
Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver se...
Chemistry in Open PHACTS Selected data slices of ChemSpider carrying  pharmacological links into the “linked data cache”...
ChemSpider and InChI                      Internet Data Small organic molecules              Commercial Software Undefined...
The great promise should be obvious InChIs are here to stay They will evolve, they will encompass, we will  adopt and ad...
If InChI never existed or went away.. ChemSpider would never have been built Database linking would suffer dramatically...
Acknowledgments The inspiration of the InChI Masters – Steve H.,  Steve S., Alan, Dmitrii, Igor IUPAC, NIST, all adopter...
Steve Heller
Steve Heller
Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/Anto...
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
Upcoming SlideShare
Loading in...5
×

Great promise of navigating the internet using in chis

3,897

Published on

The InChI, the International Chemical Identifier, has been the basis of both indexing and deduplication of the ChemSpider database since the inception of the platform. When the InChI was adopted we envisaged a future whereby the identifier would proliferate across journals, databases and the internet in general providing us a basis for “structure searching the internet”. This presentation will provide an overview of how the InChI has facilitated the integration of ChemSpider to chemistry on the internet, some of the surprising findings that have resulted from this work and extrapolate the influence of InChIs into the future for a chemically enabled web.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,897
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Great promise of navigating the internet using in chis"

  1. 1. Great promise of navigating the internet using InChIs Antony J Williams ACS San Diego March 2012
  2. 2. Openness and Quality IssuesWilliams and Ekins, DDT, 16: 747-750 (2011) Science Translational Medicine 2011
  3. 3. Warning… This talk is not about Quality…it’s about quantity
  4. 4. Warning… This talk is not about Quality…it’s about quantity Drugbank was here
  5. 5. Data quality is a known issue
  6. 6. We ALL have issues!!!
  7. 7. It’s about what’s out there…
  8. 8. How to Link it…
  9. 9. And getting out of overwhelm…
  10. 10. So what is Yohimbine?
  11. 11. Of course it is out there… Drugbox: 3001/5080 with InChIs Chembox:5436/7690 with InChIs
  12. 12. Tell me more… Where can I find the molfile for Yohimbine? Papers/Patents about Yohimbine? What are the side effects of Yohimbine? Where can I order Yohimbine? What are the physicochemical properties? Metabolic pathways? Different synonyms of Yohimbine? Synthesis of Yohimbine? Side effects of Yohimbine? Etc….
  13. 13. Quantity!
  14. 14. Yohimbine on ChemSpider..Quality?
  15. 15. How do we build it? We deal in Molfiles or SDF files – with coordinates Deposit anything that has an InChI – we support what InChI can handle, good and bad Standardization based on “InChI standardization” InChIs aggregate (certain) tautomers We link out to external sites using their IDs
  16. 16. Downsides of InChI InChI was a moving target (multi versions) but overall worked as planned. Good for small molecules – but no polymers, issues with inorganics, organometallics, imperfect stereochemistry. ChemSpider is “small molecules” InChI used as the “deduplicator” – FIRST version of a compound into the database becomes THE structure to deduplicate against…
  17. 17. Side Effects of InChI Usage
  18. 18. SMILES by comparison…
  19. 19. Side Effects of InChI Usage
  20. 20. Standardization IssuesDepiction based on molfile
  21. 21. Downsides of Overall Approach Meshing data together based on InChIs worked for simple molecules 2D layout errors inherited or limited by algorithm Complex molecules that are meant to be the same thing were NOT deduplicated. Compounds differing by one stereocenter, named the same, meant to be the same, are not the same
  22. 22. Yohimbine on ChemSpider..Quality?
  23. 23. So where can we travel???
  24. 24. So where can we travel???
  25. 25. InChI String Search via GoogleGive me InChIKeys…
  26. 26. And where can we travel???
  27. 27.  ChemSpider BRENDA Wikipedia ChEMBL ChEBI DrugBank
  28. 28.  Aggregator Enzymes Encyclopedia Pharmacology Curated Chemicals Drug-Drug Target
  29. 29. Recognizing Compound Dilution So much chemistry on the web…. And so much dilution – “structural uniqueness” versus “accidental ambiguity” InChI as an easy skeleton search
  30. 30. Vancomycin – Search the Internet
  31. 31. VancomycinSearch Molecular Search Full Molecule SKELETON
  32. 32. Full Skeleton Search
  33. 33. All aggegators suffer dilution!
  34. 34. Many Problems Can be Solved… Clean up databases – structure validation, structure standardization Warn about  Valency, charge balance, depiction issues, bond types, absent stereo, and another 100 rules (or so…) Standardize  Agree community rules to “Standardize”
  35. 35. Structure Validation
  36. 36. Structure Validation - Fixed
  37. 37. What needs to happen? If we could validate  Catch errors in databases (and clean)  Proactively catch errors in publications/patents  Reduce junk in the ether – improve QUALITY! If we standardized  Interlinking should improve
  38. 38. NPC Browser Set
  39. 39. Download, Deposit, Reprocess
  40. 40. Substructure # of # of No Incomplete Complete but Hits Correct stereochemistry Stereochemistry incorrect Hits stereochemistryGonane 34 5 8 21 0Gon-4-ene 55 12 3 33 7Gon-1,4-diene 60 17 10 23 10
  41. 41. Structure-Name Validation H3C NH2 O I I O O CH3 H3C OH O CH3 O CH3 O H HN CH3 I OH OH O O HO O O O Choladine O CH3 Taxol Cl H3C N N CH3 CH3 CH3 H Cholane H H Chlotrimazole
  42. 42. Standardize Use the SRS as a guidance document for standardization Adjust as necessary to our needs
  43. 43. Nitro groups
  44. 44. Salt and Ionic Bonds
  45. 45. Ammonium salts
  46. 46. Millions of structures? Lots of Issues
  47. 47. ChemSpider Standardization Entire ChemSpider database will be standardized using modified FDA rule set Original Molfiles will be standardized and all properties (predicted properties, SMILES, InChIs, Names) will all be regenerated Standardization procedures automatically applied to all future depositions
  48. 48. Identifier Dictionaries Reciprocal curation processes…share curation with each other. If a database has a compound already then use InChiKeys to match “suggested” validation against the compound. A series of “added” and “removed” synonyms against InChIKeys for matching.
  49. 49. Proof of Concept Data Curation SharingWho wants to work with us?
  50. 50. Structure Validation using feed Look for approved synonyms Compare feed InChIKey with database InChIKey If different, flag for inspection
  51. 51. It is so difficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?
  52. 52. Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -
  53. 53. Chemistry in Open PHACTS Selected data slices of ChemSpider carrying pharmacological links into the “linked data cache” ChemSpiderIDs and InChIs/InChIKeys will be in Open PHACTS and available for linking A structure ID standard to enable further linking across the semantic web of science
  54. 54. ChemSpider and InChI Internet Data Small organic molecules Commercial Software Undefined materials Pre-competitive Data Organometallics Open Science Nanomaterials Open Data Polymers Publishers Minerals Educators Particle bound Open Databases Links to Biologicals Chemical Vendors
  55. 55. The great promise should be obvious InChIs are here to stay They will evolve, they will encompass, we will adopt and adapt Public and private databases will federate & build a linked environment of validated data! Data validation and standardization is needed Open Data will continue to proliferate InChIs are in the “Semantic Web” already
  56. 56. If InChI never existed or went away.. ChemSpider would never have been built Database linking would suffer dramatically The web would not be “structure searchable” Cheminformatics tools would likely not be linking to public domain databases in the same way And we would not have the pleasure of today…
  57. 57. Acknowledgments The inspiration of the InChI Masters – Steve H., Steve S., Alan, Dmitrii, Igor IUPAC, NIST, all adopters, supporters, challengers and users The InChI Trust and its supporters for funding continued development Al Gore –enabling us to search InChIs on the web
  58. 58. Steve Heller
  59. 59. Steve Heller
  60. 60. Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×