How an Online Chemistry Resource  Could Change  Our  World  Antony Williams Triangle Chromatography Discussion Group, Raleigh, NC, May 2009
Imagine a time when …. The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) When there is an online database of NMR, IR, MS spectra and chromatography methods built by available to the community Chemistry articles are indexed and searchable by “chemistry” The web is linked together through the “language of chemistry” Publicly funded research data can be shared and discussed in the Open, maybe as Open Notebook Science Cheminformatics has as much of a public face and success as bioinformatics (Protein DataBank, Genbank, etc)
The Language of Chemistry My language….
And its dialects….
As a chemist… I look for information about chemicals/chemistry What is a particular structure ? What alternative names/identifiers? Reaction synthesis? Physical properties? Analytical data? Purchase? Tell me more? Similar stuff – what other compounds are “like” mine?
Linked Data Cloud
Chemistry on the Internet Much of the information online is  User Beware!  The Quality of information is “diverse” Technologies can “link and connect” information but validation and curation is key to providing quality The LinkedData web is of less value when the data linked are “wrong”
“ Good Stuff” TotallySynthetic.com
PubChem
Questions a chemist might ask… What is the melting point of n-butanol?  What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Link outs
Complex Data and Information
Online Analytical Data
Various Searches  Structure searching Substructure searching Subset searching – choose from 200 data sources Property searching Value for Mass Spectrometrists and Chromatographers?
ChemSpider for MS Spectrometrists What would an MS spectrometrist want to do? Search the database based on mass (various forms) Search selected subsets of the database based on mass Search based on mass and substructure(s) Search for structure based on name(s) or database IDs Search for structures based on elements/not elements Download the structure/structures in standard format Search literature for information Identify related data sources – chemical vendors, pathway databases, etc
Search Database Based on Mass
Mass Based Searches? What compounds have a mass of 300+/-0.001?
59 hits/1.3 seconds from 21.5 MILLION
Substructure and Property
 
Elemental Constraints
Search based on Data Sources
Outlinks – to vendors and other databases Example databases of interest to MS Spectrometrists: HMDB – Human Metabolome Database KEGG – Kyoto Encyclopedia of Genes and Genomes BioCyc - collection of Pathway/Genome Databases Uni. Minnesota Biodegradation DB - information on microbial biocatalytic reactions and biodegradation pathways for primarily xenobiotic, chemical compounds  WikiPathways – new initiative to build crowdsourced pathway data management
Links out to KEGG Kyoto Encyclopedia of Genes and Genomes
WikiPathways Link
Download Structure(s) Download individual record – molfile Download SDF file (group of structures)
Web Service Integration ChemSpider integration presently integrated to Bruker, Waters and Thermo – more vendors coming… Direct integration to vendor data processing tools
MassSpec API Web Services http:// www.chemspider.com/MassSpecAPI.asmx
Web Services
Test Web Services for MassSpec http:// www.chemspider.com/WebServices/WSMassSpecAPIDemo.aspx
Test results
Waters Integration
Waters Integration
Outlinks from Table
For Chromatographers? “ Structure-based methods” being linked Structure-centric searching of methods We can host chromatograms for display LogPs and LogDs (pH5.5 and 7.4) calculated for >21 million compounds using ACD/Labs software We’d love to host collections from the column vendors! [email_address]
From 21.5 MILLION molecules… Data are gathered/deposited from >200 data sources Government databases Chemical vendors Wikipedia There are “imperfections” in all online data sources How bad can it get????
What is “wrong”?
Quality is a Major Issue- Search Butanol OLD EXAMPLE..now fixed
Vancomycin Who will curate? PubChem is not resourced to clean these errors   How would you clean such a large dataset?
Wikipedia, C&E News, PubChem C&E News (from ACS)
 
Does one stereocenter matter? Thalidomide
Question Everything www.dhmo.org
DailyMed “ DailyMed provides  high quality  information about marketed drugs.  This information includes FDA approved labels (package inserts).”
The FDA’s DailyMed
Structures on DailyMed Poor Representations
Incorrect Structures Scanning (?) Issues
Incorrect Structures
Wikis for Science Who in the room hasn’t used Wikipedia? Is it trustworthy? What are the advantages and disadvantages of the Wiki environment? How suitable is it for Chemistry?
Collaborative  Knowledge Management  for Chemists
Wikipedia Curation Looking for self-consistency across a Wikipedia Page Primary key is the article TITLE The chemical shown needs to match the title Cyclic self-consistency – and decisions must get made
Taxol on PubChem
When are things “wrong”? Structures have a timeline…..
 
 
 
Creating a trusted source… Small databases can be curated by the hosts – EPA’s DSSTox, Wikipedia, etc. Who will curate an enormous database?
Crowdsourcing
Curating ChemSpider Anyone can “Post Comments” associated with a structure. To curate data we require login to track
Multi-level Curation and Approval
ChemMantis Chem ical  M arkup  A nd  N omenclature  T ransformation  I ntegrated  S ystem
On the fly conversion
Nature Publications
Integrations Out to Other Sources
Reactions
ChemSpider Everywhere RSC Compounds
ChemSpider Everywhere Nature Chemistry Nature Chemistry  articles are annotated to identify all of the chemical compounds mentioned throughout the text.  Those compounds are linked out to other information resources including PubChem and  ChemSpider .
ChemSpider Everywhere ChemMobi
 
It Happened in a Basement!! Homebuilt servers Cable internet Software donations Lots of hard work >8000 users per day >80,000 transactions per day
And now… The  Royal Society of Chemistry  announced on May 11th that it has  acquired ChemSpider , heralding a breakthrough investment for the organisation and for the Chemistry Community. This acquisition reflects RSC's commitment to providing access to rich resources of chemistry data and information.

How an Online Resource for Chemistry Can Change Our World

  • 1.
    How an OnlineChemistry Resource Could Change Our World Antony Williams Triangle Chromatography Discussion Group, Raleigh, NC, May 2009
  • 2.
    Imagine a timewhen …. The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) When there is an online database of NMR, IR, MS spectra and chromatography methods built by available to the community Chemistry articles are indexed and searchable by “chemistry” The web is linked together through the “language of chemistry” Publicly funded research data can be shared and discussed in the Open, maybe as Open Notebook Science Cheminformatics has as much of a public face and success as bioinformatics (Protein DataBank, Genbank, etc)
  • 3.
    The Language ofChemistry My language….
  • 4.
  • 5.
    As a chemist…I look for information about chemicals/chemistry What is a particular structure ? What alternative names/identifiers? Reaction synthesis? Physical properties? Analytical data? Purchase? Tell me more? Similar stuff – what other compounds are “like” mine?
  • 6.
  • 7.
    Chemistry on theInternet Much of the information online is User Beware! The Quality of information is “diverse” Technologies can “link and connect” information but validation and curation is key to providing quality The LinkedData web is of less value when the data linked are “wrong”
  • 8.
    “ Good Stuff”TotallySynthetic.com
  • 9.
  • 10.
    Questions a chemistmight ask… What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    Complex Data andInformation
  • 18.
  • 19.
    Various Searches Structure searching Substructure searching Subset searching – choose from 200 data sources Property searching Value for Mass Spectrometrists and Chromatographers?
  • 20.
    ChemSpider for MSSpectrometrists What would an MS spectrometrist want to do? Search the database based on mass (various forms) Search selected subsets of the database based on mass Search based on mass and substructure(s) Search for structure based on name(s) or database IDs Search for structures based on elements/not elements Download the structure/structures in standard format Search literature for information Identify related data sources – chemical vendors, pathway databases, etc
  • 21.
  • 22.
    Mass Based Searches?What compounds have a mass of 300+/-0.001?
  • 23.
    59 hits/1.3 secondsfrom 21.5 MILLION
  • 24.
  • 25.
  • 26.
  • 27.
    Search based onData Sources
  • 28.
    Outlinks – tovendors and other databases Example databases of interest to MS Spectrometrists: HMDB – Human Metabolome Database KEGG – Kyoto Encyclopedia of Genes and Genomes BioCyc - collection of Pathway/Genome Databases Uni. Minnesota Biodegradation DB - information on microbial biocatalytic reactions and biodegradation pathways for primarily xenobiotic, chemical compounds WikiPathways – new initiative to build crowdsourced pathway data management
  • 29.
    Links out toKEGG Kyoto Encyclopedia of Genes and Genomes
  • 30.
  • 31.
    Download Structure(s) Downloadindividual record – molfile Download SDF file (group of structures)
  • 32.
    Web Service IntegrationChemSpider integration presently integrated to Bruker, Waters and Thermo – more vendors coming… Direct integration to vendor data processing tools
  • 33.
    MassSpec API WebServices http:// www.chemspider.com/MassSpecAPI.asmx
  • 34.
  • 35.
    Test Web Servicesfor MassSpec http:// www.chemspider.com/WebServices/WSMassSpecAPIDemo.aspx
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
    For Chromatographers? “Structure-based methods” being linked Structure-centric searching of methods We can host chromatograms for display LogPs and LogDs (pH5.5 and 7.4) calculated for >21 million compounds using ACD/Labs software We’d love to host collections from the column vendors! [email_address]
  • 41.
    From 21.5 MILLIONmolecules… Data are gathered/deposited from >200 data sources Government databases Chemical vendors Wikipedia There are “imperfections” in all online data sources How bad can it get????
  • 42.
  • 43.
    Quality is aMajor Issue- Search Butanol OLD EXAMPLE..now fixed
  • 44.
    Vancomycin Who willcurate? PubChem is not resourced to clean these errors  How would you clean such a large dataset?
  • 45.
    Wikipedia, C&E News,PubChem C&E News (from ACS)
  • 46.
  • 47.
    Does one stereocentermatter? Thalidomide
  • 48.
  • 49.
    DailyMed “ DailyMedprovides high quality information about marketed drugs. This information includes FDA approved labels (package inserts).”
  • 50.
  • 51.
    Structures on DailyMedPoor Representations
  • 52.
  • 53.
  • 54.
    Wikis for ScienceWho in the room hasn’t used Wikipedia? Is it trustworthy? What are the advantages and disadvantages of the Wiki environment? How suitable is it for Chemistry?
  • 55.
    Collaborative KnowledgeManagement for Chemists
  • 56.
    Wikipedia Curation Lookingfor self-consistency across a Wikipedia Page Primary key is the article TITLE The chemical shown needs to match the title Cyclic self-consistency – and decisions must get made
  • 57.
  • 58.
    When are things“wrong”? Structures have a timeline…..
  • 59.
  • 60.
  • 61.
  • 62.
    Creating a trustedsource… Small databases can be curated by the hosts – EPA’s DSSTox, Wikipedia, etc. Who will curate an enormous database?
  • 63.
  • 64.
    Curating ChemSpider Anyonecan “Post Comments” associated with a structure. To curate data we require login to track
  • 65.
  • 66.
    ChemMantis Chem ical M arkup A nd N omenclature T ransformation I ntegrated S ystem
  • 67.
    On the flyconversion
  • 68.
  • 69.
    Integrations Out toOther Sources
  • 70.
  • 71.
  • 72.
    ChemSpider Everywhere NatureChemistry Nature Chemistry articles are annotated to identify all of the chemical compounds mentioned throughout the text. Those compounds are linked out to other information resources including PubChem and ChemSpider .
  • 73.
  • 74.
  • 75.
    It Happened ina Basement!! Homebuilt servers Cable internet Software donations Lots of hard work >8000 users per day >80,000 transactions per day
  • 76.
    And now… The Royal Society of Chemistry announced on May 11th that it has acquired ChemSpider , heralding a breakthrough investment for the organisation and for the Chemistry Community. This acquisition reflects RSC's commitment to providing access to rich resources of chemistry data and information.

Editor's Notes

  • #21 This is a list of some of the things an MS scientist might want to do and some of the queries we have already experienced