SlideShare a Scribd company logo
1 of 63
Great promise of navigating the
          internet using InChIs

                     Antony J Williams
                 ACS San Diego March 2012
Openness and Quality Issues
Williams and Ekins, DDT, 16: 747-750 (2011)

              Science Translational Medicine 2011
Warning…
 This talk is not about Quality…it’s about quantity
Warning…
 This talk is not about Quality…it’s about quantity




                  Drugbank was here
Data quality is a known issue
We ALL have issues!!!
It’s about what’s out there…
How to Link it…
And getting out of overwhelm…
So what is Yohimbine?
Of course it is out there…




      Drugbox: 3001/5080 with InChIs

      Chembox:5436/7690 with InChIs
Tell me more…
   Where can I find the molfile for Yohimbine?
   Papers/Patents about Yohimbine?
   What are the side effects of Yohimbine?
   Where can I order Yohimbine?
   What are the physicochemical properties?
   Metabolic pathways?
   Different synonyms of Yohimbine?
   Synthesis of Yohimbine?
   Side effects of Yohimbine?
   Etc….
Quantity!
Yohimbine on ChemSpider..Quality?
How do we build it?
 We deal in Molfiles or SDF files – with coordinates

 Deposit anything that has an InChI – we support
  what InChI can handle, good and bad

 Standardization based on “InChI standardization”

 InChIs aggregate (certain) tautomers

 We link out to external sites using their IDs
Downsides of InChI
 InChI was a moving target (multi versions) but
  overall worked as planned.

 Good for small molecules – but no polymers,
  issues with inorganics, organometallics, imperfect
  stereochemistry. ChemSpider is “small molecules”

 InChI used as the “deduplicator” – FIRST version
  of a compound into the database becomes THE
  structure to deduplicate against…
Side Effects of InChI Usage
SMILES by comparison…
Side Effects of InChI Usage
Standardization Issues
Depiction based on molfile
Downsides of Overall Approach
 Meshing data together based on InChIs worked
  for simple molecules

 2D layout errors inherited or limited by algorithm

 Complex molecules that are meant to be the
  same thing were NOT deduplicated. Compounds
  differing by one stereocenter, named the same,
  meant to be the same, are not the same
Yohimbine on ChemSpider..Quality?
So where can we travel???
So where can we travel???
InChI String Search via Google
Give me InChIKeys…
And where can we travel???
 ChemSpider

 BRENDA

 Wikipedia

 ChEMBL

 ChEBI

 DrugBank
 Aggregator

 Enzymes

 Encyclopedia

 Pharmacology

 Curated Chemicals

 Drug-Drug Target
Recognizing Compound Dilution
 So much chemistry on the web….

 And so much dilution – “structural uniqueness”
  versus “accidental ambiguity”

 InChI as an easy skeleton search
Vancomycin – Search the Internet
Vancomycin




Search Molecular   Search Full Molecule
  SKELETON
Full Skeleton Search
All aggegators suffer dilution!
Many Problems Can be Solved…
 Clean up databases – structure validation,
  structure standardization

 Warn about
   Valency, charge balance, depiction issues,
    bond types, absent stereo, and another 100
    rules (or so…)

 Standardize
   Agree community rules to “Standardize”
Structure Validation
Structure Validation - Fixed
What needs to happen?
 If we could validate
    Catch errors in databases (and clean)
    Proactively catch errors in publications/patents
    Reduce junk in the ether – improve QUALITY!

 If we standardized
    Interlinking should improve
NPC Browser Set
Download, Deposit, Reprocess
Substructure   # of    # of          No           Incomplete       Complete but

                Hits   Correct   stereochemistry Stereochemistry      incorrect

                        Hits                                       stereochemistry


Gonane          34       5             8               21                0

Gon-4-ene       55       12            3               33                7

Gon-1,4-diene   60       17            10              23                10
Structure-Name Validation
                                  H3C
                                                                           NH2
                                               O
                                                                      I              I
                                      O            O                                     CH3
                           H3C                          OH
                   O                                CH3
                                                                                                  O
                                          CH3
                       O                             H
     HN
                                          CH3                               I                OH
              OH
                                                             O
          O                      HO
                                               O     O
                                           O
                                                                            Choladine
                                  O
                                                   CH3


      Taxol

                                                                 Cl
                       H3C                                                               N
                                                                                 N
                       CH3                  CH3

          CH3      H
                                  Cholane
              H        H
                                                                      Chlotrimazole
Standardize




 Use the SRS as a guidance document for
  standardization
 Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
Ammonium salts
Millions of structures? Lots of Issues
ChemSpider Standardization
 Entire ChemSpider database will be standardized
  using modified FDA rule set

 Original Molfiles will be standardized and all
  properties (predicted properties, SMILES, InChIs,
  Names) will all be regenerated

 Standardization procedures automatically applied
  to all future depositions
Identifier Dictionaries
 Reciprocal curation processes…share curation
  with each other.

 If a database has a compound already then use
  InChiKeys to match “suggested” validation
  against the compound.

 A series of “added” and “removed” synonyms
  against InChIKeys for matching.
Proof of Concept Data Curation Sharing
Who wants to work with us?
Structure Validation using feed
 Look for approved synonyms

 Compare feed InChIKey with database InChIKey

 If different, flag for inspection
It is so difficult to navigate…
                                                        IP?
                                What’s the
                                structure?
                                                    Are they in
                                                     our file?
                                  What’s
                                 similar?
                                                    What’s the
                              Pharmacology           target?
                                  data?

                                              Known
                                            Pathways?
                             Competitors?
                                                    Working On
                              Connections             Now?
                              to disease?
                                              Expressed in
                                             right cell type?
Open PHACTS Project
 Develop a set of robust standards…
 Implement the standards in a semantic integration hub
 Deliver services to support drug discovery programs in
  pharma and public domain
 22 partners, 8 pharmaceutical companies, 3 biotechs
 36 months project

  Guiding principle is open access, open usage, open source
                - Key to standards adoption -
Chemistry in Open PHACTS
 Selected data slices of ChemSpider carrying
  pharmacological links into the “linked data cache”

 ChemSpiderIDs and InChIs/InChIKeys will be in
  Open PHACTS and available for linking

 A structure ID standard to enable further linking
  across the semantic web of science
ChemSpider and InChI
                      Internet Data




 Small organic molecules              Commercial Software
 Undefined materials                  Pre-competitive Data
 Organometallics                            Open Science
 Nanomaterials                                 Open Data
 Polymers                                      Publishers
 Minerals                                      Educators
 Particle bound                           Open Databases
 Links to Biologicals                   Chemical Vendors
The great promise should be obvious
 InChIs are here to stay
 They will evolve, they will encompass, we will
  adopt and adapt
 Public and private databases will federate &
  build a linked environment of validated data!
 Data validation and standardization is
  needed
 Open Data will continue to proliferate
 InChIs are in the “Semantic Web” already
If InChI never existed or went away..
 ChemSpider would never have been built

 Database linking would suffer dramatically

 The web would not be “structure searchable”

 Cheminformatics tools would likely not be linking
  to public domain databases in the same way

 And we would not have the pleasure of today…
Acknowledgments
 The inspiration of the InChI Masters – Steve H.,
  Steve S., Alan, Dmitrii, Igor

 IUPAC, NIST, all adopters, supporters,
  challengers and users

 The InChI Trust and its supporters for funding
  continued development

 Al Gore –enabling us to search InChIs on the web
Steve Heller
Steve Heller
Thank you

Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

Similar to Great promise of navigating the internet using in chis

How can the international chemical identifier (InChI) be extended to non …
How can the international chemical identifier (InChI) be extended to non …How can the international chemical identifier (InChI) be extended to non …
How can the international chemical identifier (InChI) be extended to non …Valery Tkachenko
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeChris Mungall
 
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...Michel Dumontier
 
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...guest01a117
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
Communities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and RealityCommunities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and Realityrobertstevens65
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...Ken Karapetyan
 

Similar to Great promise of navigating the internet using in chis (20)

How can the international chemical identifier (InChI) be extended to non triv...
How can the international chemical identifier (InChI) be extended to non triv...How can the international chemical identifier (InChI) be extended to non triv...
How can the international chemical identifier (InChI) be extended to non triv...
 
How can the international chemical identifier (InChI) be extended to non …
How can the international chemical identifier (InChI) be extended to non …How can the international chemical identifier (InChI) be extended to non …
How can the international chemical identifier (InChI) be extended to non …
 
The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
 
The Great Promise of Online Data for Chemistry and the Life Sciences
The Great Promise of Online Data for Chemistry and the Life SciencesThe Great Promise of Online Data for Chemistry and the Life Sciences
The Great Promise of Online Data for Chemistry and the Life Sciences
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Chemical Database Projects Delivered by RSC eScience
Chemical Database Projects Delivered by RSC eScienceChemical Database Projects Delivered by RSC eScience
Chemical Database Projects Delivered by RSC eScience
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
 
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
 
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of ChemistryOntology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, a...
 
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Communities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and RealityCommunities building ontologies: Tensions and Reality
Communities building ontologies: Tensions and Reality
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
Bio4j
Bio4jBio4j
Bio4j
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 

More from Royal Society of Chemistry

The Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovationThe Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovationRoyal Society of Chemistry
 
Engaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careersEngaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careersRoyal Society of Chemistry
 
Navigating scientific resources using wiki based resources
Navigating scientific resources using wiki based resourcesNavigating scientific resources using wiki based resources
Navigating scientific resources using wiki based resourcesRoyal Society of Chemistry
 
Utilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscUtilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscRoyal Society of Chemistry
 
Newcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU OnlineNewcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU OnlineRoyal Society of Chemistry
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchRoyal Society of Chemistry
 

More from Royal Society of Chemistry (16)

The Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovationThe Global Chemistry Network - driving innovation
The Global Chemistry Network - driving innovation
 
20130724 cisrg sugars_batchelor
20130724 cisrg sugars_batchelor20130724 cisrg sugars_batchelor
20130724 cisrg sugars_batchelor
 
20130410 carbohydrates
20130410 carbohydrates20130410 carbohydrates
20130410 carbohydrates
 
Engaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careersEngaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careers
 
Navigating scientific resources using wiki based resources
Navigating scientific resources using wiki based resourcesNavigating scientific resources using wiki based resources
Navigating scientific resources using wiki based resources
 
Utilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscUtilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rsc
 
ChemCareers India Specialist presentation
ChemCareers India Specialist presentation ChemCareers India Specialist presentation
ChemCareers India Specialist presentation
 
Newcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU OnlineNewcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU Online
 
ChemNet Careers 2011-12
ChemNet Careers 2011-12ChemNet Careers 2011-12
ChemNet Careers 2011-12
 
Town hall speech
Town hall speechTown hall speech
Town hall speech
 
Chemistry Landscape - Town Hall Speech
Chemistry Landscape - Town Hall SpeechChemistry Landscape - Town Hall Speech
Chemistry Landscape - Town Hall Speech
 
All aboard the Semantic Bandwagon
All aboard the Semantic BandwagonAll aboard the Semantic Bandwagon
All aboard the Semantic Bandwagon
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish research
 
AZ of Chemspider February 2011
AZ of Chemspider February 2011AZ of Chemspider February 2011
AZ of Chemspider February 2011
 
Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201
 
Chem spider introduction spring 2011
Chem spider introduction spring 2011Chem spider introduction spring 2011
Chem spider introduction spring 2011
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Great promise of navigating the internet using in chis

  • 1. Great promise of navigating the internet using InChIs Antony J Williams ACS San Diego March 2012
  • 2. Openness and Quality Issues Williams and Ekins, DDT, 16: 747-750 (2011) Science Translational Medicine 2011
  • 3. Warning…  This talk is not about Quality…it’s about quantity
  • 4. Warning…  This talk is not about Quality…it’s about quantity Drugbank was here
  • 5. Data quality is a known issue
  • 6. We ALL have issues!!!
  • 7. It’s about what’s out there…
  • 8. How to Link it…
  • 9. And getting out of overwhelm…
  • 10. So what is Yohimbine?
  • 11. Of course it is out there… Drugbox: 3001/5080 with InChIs Chembox:5436/7690 with InChIs
  • 12. Tell me more…  Where can I find the molfile for Yohimbine?  Papers/Patents about Yohimbine?  What are the side effects of Yohimbine?  Where can I order Yohimbine?  What are the physicochemical properties?  Metabolic pathways?  Different synonyms of Yohimbine?  Synthesis of Yohimbine?  Side effects of Yohimbine?  Etc….
  • 15. How do we build it?  We deal in Molfiles or SDF files – with coordinates  Deposit anything that has an InChI – we support what InChI can handle, good and bad  Standardization based on “InChI standardization”  InChIs aggregate (certain) tautomers  We link out to external sites using their IDs
  • 16. Downsides of InChI  InChI was a moving target (multi versions) but overall worked as planned.  Good for small molecules – but no polymers, issues with inorganics, organometallics, imperfect stereochemistry. ChemSpider is “small molecules”  InChI used as the “deduplicator” – FIRST version of a compound into the database becomes THE structure to deduplicate against…
  • 17. Side Effects of InChI Usage
  • 19. Side Effects of InChI Usage
  • 21. Downsides of Overall Approach  Meshing data together based on InChIs worked for simple molecules  2D layout errors inherited or limited by algorithm  Complex molecules that are meant to be the same thing were NOT deduplicated. Compounds differing by one stereocenter, named the same, meant to be the same, are not the same
  • 23. So where can we travel???
  • 24. So where can we travel???
  • 25.
  • 26. InChI String Search via Google Give me InChIKeys…
  • 27. And where can we travel???
  • 28.  ChemSpider  BRENDA  Wikipedia  ChEMBL  ChEBI  DrugBank
  • 29.  Aggregator  Enzymes  Encyclopedia  Pharmacology  Curated Chemicals  Drug-Drug Target
  • 30. Recognizing Compound Dilution  So much chemistry on the web….  And so much dilution – “structural uniqueness” versus “accidental ambiguity”  InChI as an easy skeleton search
  • 31. Vancomycin – Search the Internet
  • 32. Vancomycin Search Molecular Search Full Molecule SKELETON
  • 35. Many Problems Can be Solved…  Clean up databases – structure validation, structure standardization  Warn about  Valency, charge balance, depiction issues, bond types, absent stereo, and another 100 rules (or so…)  Standardize  Agree community rules to “Standardize”
  • 38. What needs to happen?  If we could validate  Catch errors in databases (and clean)  Proactively catch errors in publications/patents  Reduce junk in the ether – improve QUALITY!  If we standardized  Interlinking should improve
  • 39.
  • 42. Substructure # of # of No Incomplete Complete but Hits Correct stereochemistry Stereochemistry incorrect Hits stereochemistry Gonane 34 5 8 21 0 Gon-4-ene 55 12 3 33 7 Gon-1,4-diene 60 17 10 23 10
  • 43. Structure-Name Validation H3C NH2 O I I O O CH3 H3C OH O CH3 O CH3 O H HN CH3 I OH OH O O HO O O O Choladine O CH3 Taxol Cl H3C N N CH3 CH3 CH3 H Cholane H H Chlotrimazole
  • 44. Standardize  Use the SRS as a guidance document for standardization  Adjust as necessary to our needs
  • 46. Salt and Ionic Bonds
  • 48. Millions of structures? Lots of Issues
  • 49. ChemSpider Standardization  Entire ChemSpider database will be standardized using modified FDA rule set  Original Molfiles will be standardized and all properties (predicted properties, SMILES, InChIs, Names) will all be regenerated  Standardization procedures automatically applied to all future depositions
  • 50. Identifier Dictionaries  Reciprocal curation processes…share curation with each other.  If a database has a compound already then use InChiKeys to match “suggested” validation against the compound.  A series of “added” and “removed” synonyms against InChIKeys for matching.
  • 51. Proof of Concept Data Curation Sharing Who wants to work with us?
  • 52. Structure Validation using feed  Look for approved synonyms  Compare feed InChIKey with database InChIKey  If different, flag for inspection
  • 53. It is so difficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?
  • 54. Open PHACTS Project  Develop a set of robust standards…  Implement the standards in a semantic integration hub  Deliver services to support drug discovery programs in pharma and public domain  22 partners, 8 pharmaceutical companies, 3 biotechs  36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -
  • 55.
  • 56. Chemistry in Open PHACTS  Selected data slices of ChemSpider carrying pharmacological links into the “linked data cache”  ChemSpiderIDs and InChIs/InChIKeys will be in Open PHACTS and available for linking  A structure ID standard to enable further linking across the semantic web of science
  • 57. ChemSpider and InChI Internet Data Small organic molecules Commercial Software Undefined materials Pre-competitive Data Organometallics Open Science Nanomaterials Open Data Polymers Publishers Minerals Educators Particle bound Open Databases Links to Biologicals Chemical Vendors
  • 58. The great promise should be obvious  InChIs are here to stay  They will evolve, they will encompass, we will adopt and adapt  Public and private databases will federate & build a linked environment of validated data!  Data validation and standardization is needed  Open Data will continue to proliferate  InChIs are in the “Semantic Web” already
  • 59. If InChI never existed or went away..  ChemSpider would never have been built  Database linking would suffer dramatically  The web would not be “structure searchable”  Cheminformatics tools would likely not be linking to public domain databases in the same way  And we would not have the pleasure of today…
  • 60. Acknowledgments  The inspiration of the InChI Masters – Steve H., Steve S., Alan, Dmitrii, Igor  IUPAC, NIST, all adopters, supporters, challengers and users  The InChI Trust and its supporters for funding continued development  Al Gore –enabling us to search InChIs on the web
  • 63. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams