Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hosting a compound centric community resource for chemistry data


Published on

Laboratories around the world continue to generate immense amounts of data that are non-proprietary and of value to the community. If available these data could dramatically reduce costs by minimizing rework and ultimately facilitating faster research. High quality reference data collections of chemical compound dictionaries, properties and spectra have been generated over many decades. With the advent of social networking tools and platforms such as Wikipedia, the community has an opportunity to contribute. The ChemSpider platform hosted by the Royal Society of Chemistry is a compound centric database with associated data. Already populated with almost 25 million unique compounds the community can deposit and host their own data, and curate and annotate existing data including those generated in Open Notebook Science Efforts. This presentation will provide an overview of progress to date and outline the vision of this community platform for chemistry and ensuring the longevity of chemistry reference data.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Hosting a compound centric community resource for chemistry data

  1. 1. Hosting a Compound Centric Community Resource for Chemistry Data Antony Williams, ACS Anaheim March 28 th 2011
  2. 2. Data Archiving, e-Science and Primary Data <ul><li>How much data generated in a lab, that COULD go public, is lost forever? </li></ul><ul><li>Public Domain reference databases of value? </li></ul><ul><ul><li>Syntheses </li></ul></ul><ul><ul><li>Properties </li></ul></ul><ul><ul><li>Spectra </li></ul></ul><ul><ul><li>CIFs </li></ul></ul><ul><ul><li>Images </li></ul></ul><ul><li>Much of chemistry is chemical structure-based – where and how could we host these data? </li></ul>
  3. 3. The Social Network <ul><li>Career-wise, within the next few years NOT having a personal presence online will be a detriment </li></ul><ul><ul><li>Self-marketing </li></ul></ul><ul><ul><li>Establishing a profile </li></ul></ul><ul><ul><li>Getting on the record </li></ul></ul><ul><ul><li>Collaborative Science </li></ul></ul><ul><ul><li>Demonstrating a skill set </li></ul></ul><ul><ul><li>Measured using alternative metrics </li></ul></ul><ul><ul><li>Contributing to the public peer review process </li></ul></ul>
  4. 4. Social Networking Tools <ul><li>A growing number of social networking tools: </li></ul><ul><ul><li>Facebook </li></ul></ul><ul><ul><li>Twitter </li></ul></ul><ul><ul><li>Linked-In </li></ul></ul><ul><ul><li>Flickr </li></ul></ul><ul><ul><li>YouTube </li></ul></ul><ul><ul><li>Blogs </li></ul></ul><ul><ul><li>Communities </li></ul></ul><ul><ul><li>Collaborative environments </li></ul></ul>
  5. 5. Collaborative Knowledge Management
  6. 6.
  7. 7. Contributing Chemistry online <ul><li>Property databases </li></ul><ul><li>Compound aggregators </li></ul><ul><li>Screening assay results </li></ul><ul><li>Scientific publications </li></ul><ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>ADME/Tox data – eTOX for example </li></ul><ul><li>Blogs/Wikis and Open Notebook Science </li></ul><ul><li>Contributing Open Source code to projects </li></ul>
  8. 8. Chemistry Social Networking <ul><li>Methods of sharing MY chemistry online include: </li></ul><ul><ul><li>Wikis or blogs </li></ul></ul><ul><ul><li>Slideshare for presentations </li></ul></ul><ul><ul><li>YouTube for videos </li></ul></ul><ul><ul><li>Flickr, Wikimedia etc. for images (and FigShare) </li></ul></ul><ul><ul><li>PubChem for assay data </li></ul></ul><ul><ul><li>NMRShiftDB for NMR assignments </li></ul></ul><ul><ul><li>GoogleDocs for data (and FigShare) </li></ul></ul>
  9. 9. FigShare
  10. 10. FigShare
  11. 11. Chemistry Social Networking <ul><li>Methods of sharing MY chemistry online include: </li></ul><ul><ul><li>Wikis or blogs </li></ul></ul><ul><ul><li>Slideshare for presentations </li></ul></ul><ul><ul><li>YouTube for videos </li></ul></ul><ul><ul><li>Flickr, Wikimedia etc. for images (and FigShare) </li></ul></ul><ul><ul><li>PubChem for assay data </li></ul></ul><ul><ul><li>NMRShiftDB for NMR assignments </li></ul></ul><ul><ul><li>GoogleDocs for data (and FigShare) </li></ul></ul><ul><ul><li>What other online environments can you immediately share chemistry data? </li></ul></ul>
  12. 12. ChemSpider <ul><li>ChemSpider is a chemistry database </li></ul><ul><ul><li>>25 million compounds, >400 data sources </li></ul></ul><ul><ul><li>A deposition platform </li></ul></ul><ul><ul><ul><li>Structure(s) </li></ul></ul></ul><ul><ul><ul><li>Identifiers </li></ul></ul></ul><ul><ul><ul><li>Links to internet resources, articles and DOIs </li></ul></ul></ul><ul><ul><ul><li>Experimental data (spectra, images, CIFS) </li></ul></ul></ul><ul><ul><ul><li>Multimedia (videos, MP3s) </li></ul></ul></ul><ul><ul><li>A curation and annotation platform </li></ul></ul><ul><ul><ul><li>Remove “bad data” </li></ul></ul></ul><ul><ul><ul><li>Annotate existing data </li></ul></ul></ul><ul><ul><li>A publishing platform for the community </li></ul></ul>
  13. 13. Search for a Chemical by name
  14. 14. Available Information… <ul><li>Linked to vendors, safety data, toxicity, metabolism </li></ul>
  15. 15. Available Information….
  16. 16. Crowdsourced “Annotations” <ul><li>Users can add </li></ul><ul><ul><li>Descriptions/Syntheses/Commentaries </li></ul></ul><ul><ul><li>Links to PubMed articles </li></ul></ul><ul><ul><li>Links to articles via DOIs </li></ul></ul><ul><ul><li>Add spectral data </li></ul></ul><ul><ul><li>Add Crystallographic Information Files </li></ul></ul><ul><ul><li>Add photos </li></ul></ul><ul><ul><li>Add MP3 files </li></ul></ul><ul><ul><li>Add Videos </li></ul></ul>
  17. 18. Spectra
  18. 19. Spectra
  19. 20. Inherited Errors <ul><li>Inherited errors from every database… all public compound databases, including ours, have errors </li></ul><ul><li>“ Incorrect” structures – assertions, timelines etc </li></ul><ul><li>“ Incorrect” names associated with structures </li></ul><ul><li>ENORMOUS CHALLENGE </li></ul>
  20. 21. Crowdsourced Curation <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate </li></ul>
  21. 22. Search “Vitamin H”
  22. 23. “ Curate” Identifiers
  23. 24. “ Curate” Identifiers
  24. 25. “ Curate” Identifiers
  25. 26. Crowdsourcing Works <ul><li>>130 people have deposited data and participated in data curation </li></ul><ul><li>Different level curators check each other </li></ul><ul><li>More curators and depositors are encouraged! </li></ul>
  26. 27. Molbank (Open Access Journal)
  27. 28. ChemSpider SyntheticPages <ul><li>Many syntheses are not published but are of value </li></ul><ul><li>CSSP: A database of synthesis procedures built for the community, by the community. </li></ul><ul><li>Peer-reviewed by the community </li></ul><ul><li>Each contribution has a DOI – of value to the submitter? </li></ul>
  28. 30. Vandalism <ul><li>Vandalism of ChemSpider is VERY rare… </li></ul><ul><li>Three acts of vandalism ever </li></ul><ul><ul><li>Someone tried to “sell a house!” </li></ul></ul><ul><ul><li>A vendor posted their logo against a chemical </li></ul></ul><ul><ul><li>A student, Katie Crow, posted a “personal photo” </li></ul></ul><ul><li>But data quality can appear like vandalism! </li></ul>
  29. 31. Drivers in the Social Network <ul><li>Anonymity is a choice in the social networks </li></ul><ul><ul><li>Many people on Wikipedia are anonymous </li></ul></ul><ul><ul><li>Many blogs are anonymous </li></ul></ul><ul><ul><li>Comments on blogs can be anonymous </li></ul></ul><ul><li>Anonymity in peer-review will likely become less important and may be generational </li></ul><ul><li>I may want acknowledgment if… </li></ul><ul><ul><li>I share my data </li></ul></ul><ul><ul><li>I review a paper </li></ul></ul><ul><ul><li>I share my expertise </li></ul></ul>
  30. 32. The Alt-Metrics Manifesto <ul><li> </li></ul>
  31. 33. Enabled by ORCID…
  32. 34. Who declares data as Open? <ul><li>Data licensing is very interesting and can spark “interesting” conversations. Opinions differ: </li></ul><ul><ul><li>Are images data? Are assertions data? </li></ul></ul><ul><ul><li>What on a ChemSpider record is data? </li></ul></ul><ul><ul><li>Is PubChem or PubMed Open Data? </li></ul></ul><ul><li>We allow people to declare their data as Open and add an Open Data button at upload </li></ul><ul><li>A lot of data on ChemSpider are free but not Open </li></ul><ul><li>Pragmatism: Our focus is a community resource </li></ul>
  33. 35. Licensing “My Work” Online <ul><li>Is it “my” chemistry once it’s online? </li></ul><ul><li>The complex nature of licensing “my” chemistry </li></ul><ul><ul><li>Blogs - copyrighted and creative commons </li></ul></ul><ul><ul><li>Wikis - mixed licensing, depends on the host(s) </li></ul></ul><ul><ul><li>Data – much value in sharing data as “Open Data” </li></ul></ul><ul><li>Often, people can make money from your work! </li></ul><ul><li>Police your own “licensing” – how many people have read the Facebook and Twitter agreements?! </li></ul>
  34. 36. ChemSpider A Structure Centric Host <ul><li>An established community resource </li></ul><ul><ul><li>>25 million compounds from >400 data sources </li></ul></ul><ul><ul><li>Thousands of users per day </li></ul></ul><ul><ul><li>Approaching a million transactions per day </li></ul></ul><ul><ul><li>A crowdsourced deposition and curation platform </li></ul></ul><ul><ul><li>Grows daily – more depositions, more data </li></ul></ul><ul><ul><li>A publishing platform for the community </li></ul></ul><ul><ul><li>Contributions welcome! Learn how… </li></ul></ul>
  35. 37. ChemSpider Training Session <ul><li>ChemSpider: </li></ul><ul><li>A Community Resource for Chemical Data </li></ul><ul><li>Wednesday, March 30th </li></ul><ul><li>8:30-11:00 AM </li></ul><ul><li>Anaheim Convention Center, Room 211 A </li></ul>
  36. 38. Acknowledgments <ul><li>RSC|ChemSpider team </li></ul><ul><li>The “Crowd” of curators </li></ul><ul><li>All Data Source providers </li></ul><ul><li>GGA Software Services </li></ul><ul><li>ACD/Labs </li></ul><ul><li>OpenEye </li></ul><ul><li>Accelrys </li></ul>
  37. 39. Thank you Email: Twitter: ChemConnector Personal Blog: SLIDES: