Great promise of navigating the internet using in chis
Upcoming SlideShare
Loading in...5
×
 

Great promise of navigating the internet using in chis

on

  • 4,136 views

The InChI, the International Chemical Identifier, has been the basis of both indexing and deduplication of the ChemSpider database since the inception of the platform. When the InChI was adopted we ...

The InChI, the International Chemical Identifier, has been the basis of both indexing and deduplication of the ChemSpider database since the inception of the platform. When the InChI was adopted we envisaged a future whereby the identifier would proliferate across journals, databases and the internet in general providing us a basis for “structure searching the internet”. This presentation will provide an overview of how the InChI has facilitated the integration of ChemSpider to chemistry on the internet, some of the surprising findings that have resulted from this work and extrapolate the influence of InChIs into the future for a chemically enabled web.

Statistics

Views

Total Views
4,136
Views on SlideShare
1,465
Embed Views
2,671

Actions

Likes
0
Downloads
10
Comments
0

4 Embeds 2,671

http://www.chemconnector.com 2663
http://www.chemspider.com 6
http://www.twylah.com 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Great promise of navigating the internet using in chis Great promise of navigating the internet using in chis Presentation Transcript

  • Great promise of navigating the internet using InChIs Antony J Williams ACS San Diego March 2012
  • Openness and Quality IssuesWilliams and Ekins, DDT, 16: 747-750 (2011) Science Translational Medicine 2011
  • Warning… This talk is not about Quality…it’s about quantity
  • Warning… This talk is not about Quality…it’s about quantity Drugbank was here
  • Data quality is a known issue
  • We ALL have issues!!!
  • It’s about what’s out there…
  • How to Link it…
  • And getting out of overwhelm…
  • So what is Yohimbine?
  • Of course it is out there… Drugbox: 3001/5080 with InChIs Chembox:5436/7690 with InChIs
  • Tell me more… Where can I find the molfile for Yohimbine? Papers/Patents about Yohimbine? What are the side effects of Yohimbine? Where can I order Yohimbine? What are the physicochemical properties? Metabolic pathways? Different synonyms of Yohimbine? Synthesis of Yohimbine? Side effects of Yohimbine? Etc….
  • Quantity!
  • Yohimbine on ChemSpider..Quality?
  • How do we build it? We deal in Molfiles or SDF files – with coordinates Deposit anything that has an InChI – we support what InChI can handle, good and bad Standardization based on “InChI standardization” InChIs aggregate (certain) tautomers We link out to external sites using their IDs
  • Downsides of InChI InChI was a moving target (multi versions) but overall worked as planned. Good for small molecules – but no polymers, issues with inorganics, organometallics, imperfect stereochemistry. ChemSpider is “small molecules” InChI used as the “deduplicator” – FIRST version of a compound into the database becomes THE structure to deduplicate against…
  • Side Effects of InChI Usage
  • SMILES by comparison…
  • Side Effects of InChI Usage
  • Standardization IssuesDepiction based on molfile
  • Downsides of Overall Approach Meshing data together based on InChIs worked for simple molecules 2D layout errors inherited or limited by algorithm Complex molecules that are meant to be the same thing were NOT deduplicated. Compounds differing by one stereocenter, named the same, meant to be the same, are not the same
  • Yohimbine on ChemSpider..Quality?
  • So where can we travel???
  • So where can we travel???
  • InChI String Search via GoogleGive me InChIKeys…
  • And where can we travel???
  •  ChemSpider BRENDA Wikipedia ChEMBL ChEBI DrugBank
  •  Aggregator Enzymes Encyclopedia Pharmacology Curated Chemicals Drug-Drug Target
  • Recognizing Compound Dilution So much chemistry on the web…. And so much dilution – “structural uniqueness” versus “accidental ambiguity” InChI as an easy skeleton search
  • Vancomycin – Search the Internet
  • VancomycinSearch Molecular Search Full Molecule SKELETON
  • Full Skeleton Search
  • All aggegators suffer dilution!
  • Many Problems Can be Solved… Clean up databases – structure validation, structure standardization Warn about  Valency, charge balance, depiction issues, bond types, absent stereo, and another 100 rules (or so…) Standardize  Agree community rules to “Standardize”
  • Structure Validation
  • Structure Validation - Fixed
  • What needs to happen? If we could validate  Catch errors in databases (and clean)  Proactively catch errors in publications/patents  Reduce junk in the ether – improve QUALITY! If we standardized  Interlinking should improve
  • NPC Browser Set
  • Download, Deposit, Reprocess
  • Substructure # of # of No Incomplete Complete but Hits Correct stereochemistry Stereochemistry incorrect Hits stereochemistryGonane 34 5 8 21 0Gon-4-ene 55 12 3 33 7Gon-1,4-diene 60 17 10 23 10
  • Structure-Name Validation H3C NH2 O I I O O CH3 H3C OH O CH3 O CH3 O H HN CH3 I OH OH O O HO O O O Choladine O CH3 Taxol Cl H3C N N CH3 CH3 CH3 H Cholane H H Chlotrimazole
  • Standardize Use the SRS as a guidance document for standardization Adjust as necessary to our needs
  • Nitro groups
  • Salt and Ionic Bonds
  • Ammonium salts
  • Millions of structures? Lots of Issues
  • ChemSpider Standardization Entire ChemSpider database will be standardized using modified FDA rule set Original Molfiles will be standardized and all properties (predicted properties, SMILES, InChIs, Names) will all be regenerated Standardization procedures automatically applied to all future depositions
  • Identifier Dictionaries Reciprocal curation processes…share curation with each other. If a database has a compound already then use InChiKeys to match “suggested” validation against the compound. A series of “added” and “removed” synonyms against InChIKeys for matching.
  • Proof of Concept Data Curation SharingWho wants to work with us?
  • Structure Validation using feed Look for approved synonyms Compare feed InChIKey with database InChIKey If different, flag for inspection
  • It is so difficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?
  • Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -
  • Chemistry in Open PHACTS Selected data slices of ChemSpider carrying pharmacological links into the “linked data cache” ChemSpiderIDs and InChIs/InChIKeys will be in Open PHACTS and available for linking A structure ID standard to enable further linking across the semantic web of science
  • ChemSpider and InChI Internet Data Small organic molecules Commercial Software Undefined materials Pre-competitive Data Organometallics Open Science Nanomaterials Open Data Polymers Publishers Minerals Educators Particle bound Open Databases Links to Biologicals Chemical Vendors
  • The great promise should be obvious InChIs are here to stay They will evolve, they will encompass, we will adopt and adapt Public and private databases will federate & build a linked environment of validated data! Data validation and standardization is needed Open Data will continue to proliferate InChIs are in the “Semantic Web” already
  • If InChI never existed or went away.. ChemSpider would never have been built Database linking would suffer dramatically The web would not be “structure searchable” Cheminformatics tools would likely not be linking to public domain databases in the same way And we would not have the pleasure of today…
  • Acknowledgments The inspiration of the InChI Masters – Steve H., Steve S., Alan, Dmitrii, Igor IUPAC, NIST, all adopters, supporters, challengers and users The InChI Trust and its supporters for funding continued development Al Gore –enabling us to search InChIs on the web
  • Steve Heller
  • Steve Heller
  • Thank youEmail: williamsa@rsc.orgTwitter: ChemConnectorPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams