ChemSpider hosting linking and curating chemistry data for the community

  • 2,409 views
Uploaded on

The Internet is the world’s publicly accessible container for a myriad of resources containing chemistry related data. Whether it be collections of millions of chemical compounds with their associated …

The Internet is the world’s publicly accessible container for a myriad of resources containing chemistry related data. Whether it be collections of millions of chemical compounds with their associated properties, interactive displays for analytical data, access to publications and patents or tapping into the increasing availability of online computational engines, the web has became the primary enabling technology to source information and data. Scientists collectively applaud and utilize the availability of such resources and an increasing proportion of the community are willing to support these resources by contributing both their data and skills to help curate and validate information on the web. This “crowdsourcing” has started to contribute large amounts of data to the commons and serves has a valuable platform for reference and, potentially, discovery.
ChemSpider is one of the chemistry community’s primary online resources and allows scientists to search across 25 million unique chemical compounds linked out to over 400 original data sources and has become a central hub for searching for chemistry-related data. The platform however offers much more to the community and has become a central repository for analytical data, specifically spectra, is a host for community-authored chemical syntheses and facilitates data curation and annotation by any of its users. This presentation will provide an overview of the ChemSpider platform in terms of available data and its efforts to act as a public repository and clearing ground for data curation. We will discuss how such a platform, when coupled with game-based approaches, facilitates both teaching and data validation and will discuss whether public domain resources such as ChemSpider will ultimately become authorities for chemistry.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,409
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ChemSpider – Hosting, Linking and Curating Chemistry Data for the Community Valery Tkachenko SLA Meeting, June 2011
  • 2. Chemistry on the Internet
    • 100s of websites hosting chemistry-related data
    • Chemistry information is generally “compound-based”
      • Chemical “structures”
      • Identifiers, names and synonyms
      • Properties
      • Analytical data
      • How to synthesize
      • Articles, patents, safety information
    • Chemistry “language and dialects”
  • 3. Dialects describing chemicals
  • 4. A Pragmatic Vision
      • “ Build a Structure Centric Community”
    • Integrate chemistry across the internet based on “chemical structure”
      • A “structure-based hub” to information and data
      • Let chemists contribute their own data
      • Allow the community to curate & annotate data
  • 5. www.chemspider.com
  • 6. Answering Questions for Chemists
    • Questions a chemist might ask…
      • What is the melting point of n-heptanol?
      • What is the chemical structure of Xanax?
      • Chemically, what is phenolphthalein?
      • What are the stereocenters of cholesterol?
      • Where can I find publications about xylene?
      • What are the different trade names for Aspirin?
      • What is the NMR spectrum of Benzoic Acid?
      • What are the safety handling issues for toluene?
  • 7. Search for a Chemical…by name
  • 8. Available Information…
    • Linked to chemical vendors, safety data, toxicity, metabolism…
  • 9. Available Information….
  • 10. ChemSpider Today
    • Over 26 million unique chemicals
    • Over 420 data sources
    • Grows daily – community and RSC depositions
    • Community annotation and curation
    • We curate, edit, change, enhance data daily
  • 11. Three Years of Experience
    • Internet-based chemistry is a mess !
    • Public compound databases are contaminated
    • The annotation/curation of data online is difficult
    • Most database hosts are non-responsive to feedback – “We are a host/repository of data”
    • Who cares ? We all should!!!
  • 12. Linked Data on the Web
  • 13. Where is chemistry online?
    • Encyclopedic articles (Wikipedia)
    • Chemical vendor databases
    • Metabolic pathway databases
    • Property databases
    • Patents with chemical structures
    • Drug Discovery data
    • Scientific publications
    • Compound aggregators
    • Blogs/Wikis and Open Notebook Science
  • 14. What is the Structure of Vitamin K1?
  • 15. What is the Structure of Vitamin K1?
  • 16. Chemical Abstracts “Common Chemistry” Database
  • 17. Wikipedia
  • 18.  
  • 19.  
  • 20. Internet-Based Chemistry is a Mess
    • Algorithms can get you so far
    • Human curation is necessary
    • Only the crowds can help with big data… ChemSpider is over 26 million compounds
    • Imagine if we worked together to create a centralized validated structure-name dictionary! Enhances text-mining, searching, linking…
  • 21. Search “Vitamin H”
  • 22. Search “Vitamin H”
  • 23. “ Curate” Identifiers
  • 24. “ Curate” Identifiers
  • 25. “ Curate” Identifiers
  • 26. Crowd-sourcing Chemistry Curation
    • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 27. “ Curate” Identifiers
    • General curation activities
      • Remove incorrect names
      • Correct spellings
      • Add multilingual names
      • Add alternative names
    • In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually
    • 130 people have participated in validation or annotation. “ Crowds ” can be quite small!
  • 28. Vancomycin – Curate This!!!
  • 29. Vancomycin on ChemSpider 1 compound – 3 days
  • 30. Crowdsourced “Annotations”
    • Users can add
      • Descriptions/Syntheses/Commentaries
      • Links to articles
      • Spectral data
      • Photos
      • MP3 files
      • Videos
  • 31. Multimedia Content Holder
  • 32. Gaming for Validation of Spectra
  • 33. Crowdsourced Validation of Spectra
  • 34. “ Game-based” Validation of Data
  • 35. ChemSpider SyntheticPages
  • 36. Sharing Our Activities
    • Presently defining approaches with other public compound databases to share results of curation activities
    • Member of large European project to link data from the Life Sciences. Sharing results of curation is essential
    • Making curation and contribution interfaces Mobile.
  • 37. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams