ChemSpider – Hosting, Linking and Curating Chemistry Data for the Community  Valery Tkachenko SLA Meeting, June 2011
Chemistry on the Internet 100s  of websites hosting chemistry-related data Chemistry information is generally “compound-based” Chemical “structures” Identifiers, names and synonyms Properties Analytical data How to synthesize Articles, patents, safety information Chemistry “language and dialects”
Dialects describing chemicals
A Pragmatic Vision “ Build a Structure Centric Community” Integrate chemistry across the internet based on “chemical structure” A “structure-based hub” to information and data Let chemists  contribute  their own data Allow the community to  curate & annotate   data
www.chemspider.com
Answering Questions for  Chemists Questions a chemist might ask… What is the melting point of n-heptanol?  What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Aspirin? What is the NMR spectrum of Benzoic Acid? What are the safety handling issues for toluene?
Search for a Chemical…by name
Available Information… Linked to chemical vendors, safety data, toxicity, metabolism…
Available Information….
ChemSpider Today Over  26 million  unique chemicals Over  420  data sources Grows daily – community and RSC depositions Community annotation and curation We  curate, edit, change, enhance  data daily
Three Years of Experience Internet-based chemistry is a  mess ! Public compound databases are  contaminated The annotation/curation of data online is difficult Most database hosts are non-responsive to feedback – “We are a host/repository of data” Who cares ? We all should!!!
Linked Data on the Web
Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications  Compound aggregators Blogs/Wikis and Open Notebook Science
What is the Structure of Vitamin K1?
What is the Structure of Vitamin K1?
Chemical Abstracts “Common Chemistry” Database
Wikipedia
 
 
Internet-Based Chemistry is a Mess Algorithms can get you so far Human curation is necessary Only the  crowds  can help with big data… ChemSpider is over 26 million compounds Imagine if we worked together to create a centralized validated structure-name dictionary! Enhances text-mining, searching, linking…
Search “Vitamin H”
Search “Vitamin H”
“ Curate” Identifiers
“ Curate” Identifiers
“ Curate” Identifiers
Crowd-sourcing Chemistry Curation Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
“ Curate” Identifiers General curation activities Remove incorrect names Correct spellings Add multilingual names Add alternative names In 3 years over  1 million  structure-identifier relationships have been validated – robotically and manually 130 people have participated in validation or annotation. “ Crowds ” can be quite small!
Vancomycin –  Curate This!!!
Vancomycin on ChemSpider  1 compound – 3 days
Crowdsourced “Annotations” Users can add  Descriptions/Syntheses/Commentaries Links to articles Spectral data Photos MP3 files Videos
Multimedia Content Holder
Gaming for Validation of Spectra
Crowdsourced Validation of Spectra
“ Game-based” Validation of Data
ChemSpider SyntheticPages
Sharing Our Activities Presently defining approaches with other public compound databases to share results of curation activities Member of large European project to link data from the Life Sciences. Sharing results of curation is essential Making curation and contribution interfaces Mobile.
Thank you Email: williamsa@rsc.org  Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

ChemSpider hosting linking and curating chemistry data for the community

  • 1.
    ChemSpider – Hosting,Linking and Curating Chemistry Data for the Community Valery Tkachenko SLA Meeting, June 2011
  • 2.
    Chemistry on theInternet 100s of websites hosting chemistry-related data Chemistry information is generally “compound-based” Chemical “structures” Identifiers, names and synonyms Properties Analytical data How to synthesize Articles, patents, safety information Chemistry “language and dialects”
  • 3.
  • 4.
    A Pragmatic Vision“ Build a Structure Centric Community” Integrate chemistry across the internet based on “chemical structure” A “structure-based hub” to information and data Let chemists contribute their own data Allow the community to curate & annotate data
  • 5.
  • 6.
    Answering Questions for Chemists Questions a chemist might ask… What is the melting point of n-heptanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Aspirin? What is the NMR spectrum of Benzoic Acid? What are the safety handling issues for toluene?
  • 7.
    Search for aChemical…by name
  • 8.
    Available Information… Linkedto chemical vendors, safety data, toxicity, metabolism…
  • 9.
  • 10.
    ChemSpider Today Over 26 million unique chemicals Over 420 data sources Grows daily – community and RSC depositions Community annotation and curation We curate, edit, change, enhance data daily
  • 11.
    Three Years ofExperience Internet-based chemistry is a mess ! Public compound databases are contaminated The annotation/curation of data online is difficult Most database hosts are non-responsive to feedback – “We are a host/repository of data” Who cares ? We all should!!!
  • 12.
  • 13.
    Where is chemistryonline? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
  • 14.
    What is theStructure of Vitamin K1?
  • 15.
    What is theStructure of Vitamin K1?
  • 16.
    Chemical Abstracts “CommonChemistry” Database
  • 17.
  • 18.
  • 19.
  • 20.
    Internet-Based Chemistry isa Mess Algorithms can get you so far Human curation is necessary Only the crowds can help with big data… ChemSpider is over 26 million compounds Imagine if we worked together to create a centralized validated structure-name dictionary! Enhances text-mining, searching, linking…
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    Crowd-sourcing Chemistry CurationCrowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 27.
    “ Curate” IdentifiersGeneral curation activities Remove incorrect names Correct spellings Add multilingual names Add alternative names In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually 130 people have participated in validation or annotation. “ Crowds ” can be quite small!
  • 28.
    Vancomycin – Curate This!!!
  • 29.
    Vancomycin on ChemSpider 1 compound – 3 days
  • 30.
    Crowdsourced “Annotations” Userscan add Descriptions/Syntheses/Commentaries Links to articles Spectral data Photos MP3 files Videos
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    Sharing Our ActivitiesPresently defining approaches with other public compound databases to share results of curation activities Member of large European project to link data from the Life Sciences. Sharing results of curation is essential Making curation and contribution interfaces Mobile.
  • 37.
    Thank you Email:williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams