ChemSpider as a Chemical
           Term Resolver

  Antony Williams and Valery Tkachenko,

                   ACS San Diego March 2012
The Web of Chemistry – VERY BIG!
Online Databases are “Linking”
It is so difficult to navigate…
                                                        IP?
                                What’s the
                                structure?
                                                    Are they in
                                                     our file?
                                  What’s
                                 similar?
                                                    What’s the
                              Pharmacology           target?
                                  data?

                                              Known
                                            Pathways?
                             Competitors?
                                                    Working On
                              Connections             Now?
                              to disease?
                                              Expressed in
                                             right cell type?
Open PHACTS Project
 Develop a set of robust standards…
 Implement the standards in a semantic integration hub
 Deliver services to support drug discovery programs in
  pharma and public domain
 22 partners, 8 pharmaceutical companies, 3 biotechs
 36 months project

  Guiding principle is open access, open usage, open source
                - Key to standards adoption -
What is the Structure of Vitamin K?
MeSH
 A lipid cofactor that is required for normal blood
  clotting.

 Several forms of vitamin K have been identified:
   VITAMIN K 1 (phytomenadione) derived from
    plants,
   VITAMIN K 2 (menaquinone) from bacteria, and
    synthetic naphthoquinone provitamins,
   VITAMIN K 3 (menadione).
What is the Structure of Vitamin K1?
Create an Online “Resolver” as a
path to chemistry
 Search all forms of structure IDs

   Systematic name(s)
   Trivial Name(s)
   SMILES
   InChI Strings
   InChIKeys
   Database IDs
   Registry Number
ChemSpider
Available Information…
 Linked to vendors, safety data, toxicity, metabolism
Available Information….
Vitamin K1 Names
Vitamin K1 on ChemSpider CORRECT
Resolving Names for QUALITY
 Searching chemical identifiers should resolve to
  the correct chemical as much as possible
Validated Name-Structure Dictionaries

 Chemical name dictionaries are used for:
     Text-mining (publications, patents)
        Used to index PubMed and link to Google Patents

     Linking to other databases – think Biology!
        When structures are not available drug names link

     Searching the web
        Names link to structures link to InChIs
I want to know about “Vincristine”
Vincristine: Identifiers
Vincristine: Patents
Linked by Name
Many Names, One Structure
Top 200 Drugs on Wikipedia
http://en.wikipedia.org/wiki/List_of_bestselling_drugs
The Project Challenge PART ONE
 Agree on the set of chemical names to work with

 Independently create an SDF file in each “lab”

 Compare differences and agree on final structures

 Issue “Gold Standard” SDF file to team
RSC Process
Relative accuracy of groups against
final master list
The Project Challenge PART TWO
 Use Gold Standard SDF File to investigate data
  quality on these compounds in Internet Databases

 Two checks
    Search chemical name – does it return the
     correct compound. If not correct, how is it
     different?
    Search “structure” – SMILES, Molfile,
     InChIString or InChIKey
“The First 10”
Performance on 150 Drug Names
NPC Browser Set
Standardize




 Use the SRS as a guidance document for
  standardization
 Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
One dictionary look up is never enough…
 ChemSpider does not contain all chemistry

 We are not the only ones curating data

 New chemistry expands daily and goes online
One dictionary look up is never enough…
 Federation is key….

     Check ChemSpider first, if not found then
     Check PubChem
     Check NCI resolver
     Check ChEBI
     Check ….the “network” of open interfaces

 Each resolver will have its own “quantitative
  confidence”.
Chemical Identifier Resolver (CIR)

                                    Converts a given
                                    structure identifier into
                                    another representation
                                    or structure identifier.

                                    Resolve names,
                                    identifiers etc




http://cactus.nci.nih.gov/chemical/structure
What can become a resolver?
We are building….
 A central federated resolver utilizing available
  services
 Dictionary lookups, systematic name conversions
  (multiple tools – ACD/Labs, Lexichem, OPSIN)
 “Consensus” decisions and guidance BUT
 Chemicals have timelines!!!
ORIGINAL   FINAL
Thank you

Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

ChemSpider as a chemical term resolver

  • 1.
    ChemSpider as aChemical Term Resolver Antony Williams and Valery Tkachenko, ACS San Diego March 2012
  • 2.
    The Web ofChemistry – VERY BIG!
  • 3.
    Online Databases are“Linking”
  • 4.
    It is sodifficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?
  • 5.
    Open PHACTS Project Develop a set of robust standards…  Implement the standards in a semantic integration hub  Deliver services to support drug discovery programs in pharma and public domain  22 partners, 8 pharmaceutical companies, 3 biotechs  36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -
  • 7.
    What is theStructure of Vitamin K?
  • 8.
    MeSH  A lipidcofactor that is required for normal blood clotting.  Several forms of vitamin K have been identified:  VITAMIN K 1 (phytomenadione) derived from plants,  VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins,  VITAMIN K 3 (menadione).
  • 9.
    What is theStructure of Vitamin K1?
  • 12.
    Create an Online“Resolver” as a path to chemistry  Search all forms of structure IDs  Systematic name(s)  Trivial Name(s)  SMILES  InChI Strings  InChIKeys  Database IDs  Registry Number
  • 13.
  • 14.
    Available Information…  Linkedto vendors, safety data, toxicity, metabolism
  • 15.
  • 16.
  • 17.
    Vitamin K1 onChemSpider CORRECT
  • 18.
    Resolving Names forQUALITY  Searching chemical identifiers should resolve to the correct chemical as much as possible
  • 19.
    Validated Name-Structure Dictionaries Chemical name dictionaries are used for:  Text-mining (publications, patents)  Used to index PubMed and link to Google Patents  Linking to other databases – think Biology!  When structures are not available drug names link  Searching the web  Names link to structures link to InChIs
  • 20.
    I want toknow about “Vincristine”
  • 21.
  • 22.
  • 23.
    Many Names, OneStructure
  • 24.
    Top 200 Drugson Wikipedia http://en.wikipedia.org/wiki/List_of_bestselling_drugs
  • 25.
    The Project ChallengePART ONE  Agree on the set of chemical names to work with  Independently create an SDF file in each “lab”  Compare differences and agree on final structures  Issue “Gold Standard” SDF file to team
  • 26.
  • 27.
    Relative accuracy ofgroups against final master list
  • 28.
    The Project ChallengePART TWO  Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases  Two checks  Search chemical name – does it return the correct compound. If not correct, how is it different?  Search “structure” – SMILES, Molfile, InChIString or InChIKey
  • 29.
  • 30.
  • 32.
  • 33.
    Standardize  Use theSRS as a guidance document for standardization  Adjust as necessary to our needs
  • 34.
  • 35.
  • 36.
    One dictionary lookup is never enough…  ChemSpider does not contain all chemistry  We are not the only ones curating data  New chemistry expands daily and goes online
  • 37.
    One dictionary lookup is never enough…  Federation is key….  Check ChemSpider first, if not found then  Check PubChem  Check NCI resolver  Check ChEBI  Check ….the “network” of open interfaces  Each resolver will have its own “quantitative confidence”.
  • 38.
    Chemical Identifier Resolver(CIR) Converts a given structure identifier into another representation or structure identifier. Resolve names, identifiers etc http://cactus.nci.nih.gov/chemical/structure
  • 39.
    What can becomea resolver?
  • 40.
    We are building…. A central federated resolver utilizing available services  Dictionary lookups, systematic name conversions (multiple tools – ACD/Labs, Lexichem, OPSIN)  “Consensus” decisions and guidance BUT  Chemicals have timelines!!!
  • 41.
    ORIGINAL FINAL
  • 42.
    Thank you Email: williamsa@rsc.org Twitter:ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams