Your SlideShare is downloading. ×
0
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider

1,367

Published on

RSC|ChemSpider is one of the world’s largest online resources for chemistry related data and services. Developed with the intention of delivering access to structure-based chemistry data via the …

RSC|ChemSpider is one of the world’s largest online resources for chemistry related data and services. Developed with the intention of delivering access to structure-based chemistry data via the internet the ChemSpider platform hosts over 26 million unique chemical compounds aggregated from over 400 data sources and provides an environment for the community to both annotate and curate these existing data as well as deposit new data to the system. The search system delivers flexible querying capabilities together with links to external sites for publication and patent data. This presentation will review the present capabilities of the ChemSpider system providing direct examples of how to use the system to source high quality data of value to chemists. We will discuss some of the challenges associated with validating data quality and examine how ChemSpider is a part of the new “semantic web for chemistry”. ChemSpider has also spawned a number of additional projects include ChemSpider SyntheticPages for hosting openly peer-reviewed chemical synthesis articles, Learn Chemistry Wiki for students learning chemistry and SpectraSchool for learning spectroscopy.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,367
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Delivering Curated Chemistry to the World via Crowdsourced Deposition and Annotation on ChemSpider Antony Williams University of Chicago, January 27 th 2012
  2. The World of Online Chemistry <ul><li>Property databases </li></ul><ul><li>Compound aggregators </li></ul><ul><li>Screening assay results </li></ul><ul><li>Scientific publications </li></ul><ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>ADME/Tox data – eTOX for example </li></ul><ul><li>Blogs/Wikis and Open Notebook Science </li></ul><ul><li>Contributing Open Source code to projects </li></ul>
  3. We Have …Too Much Data!!!
  4. e-Science and Primary Data <ul><li>How much data generated in a lab, that COULD go public, is lost forever? </li></ul>
  5. TotallySynthetic.com
  6. e-Science and Primary Data <ul><li>How much data generated in a lab, that COULD go public, is lost forever? </li></ul><ul><li>Public Domain reference databases of value? </li></ul><ul><ul><li>Syntheses </li></ul></ul><ul><ul><li>Properties </li></ul></ul><ul><ul><li>Spectra </li></ul></ul><ul><ul><li>CIFs </li></ul></ul><ul><ul><li>Images </li></ul></ul>
  7. PubChem
  8. ChEMBL
  9. Collaborative Knowledge Management
  10. e-Science and Primary Data <ul><li>How much data generated in a lab, that COULD go public, is lost forever? </li></ul><ul><li>Public Domain reference databases of value? </li></ul><ul><ul><li>Syntheses </li></ul></ul><ul><ul><li>Properties </li></ul></ul><ul><ul><li>Spectra </li></ul></ul><ul><ul><li>CIFs </li></ul></ul><ul><ul><li>Images </li></ul></ul><ul><li>Much of chemistry is chemical structure-based – where and how could we host these data? </li></ul>
  11. RSC’s ChemSpider
  12. Available Information… <ul><li>Linked to vendors, safety data, toxicity, metabolism </li></ul>
  13. Available Information….
  14. Crowdsourced “Annotations” <ul><li>Users can add </li></ul><ul><ul><li>Descriptions/Syntheses/Commentaries </li></ul></ul><ul><ul><li>Links to PubMed articles </li></ul></ul><ul><ul><li>Links to articles via DOIs </li></ul></ul><ul><ul><li>Add spectral data </li></ul></ul><ul><ul><li>Add Crystallographic Information Files </li></ul></ul><ul><ul><li>Add photos </li></ul></ul><ul><ul><li>Add MP3 files </li></ul></ul><ul><ul><li>Add Videos </li></ul></ul>
  15.  
  16. Spectra
  17. Spectra
  18. Data on the Web
  19. Chemistry Data online is messy <ul><li>We have inherited errors </li></ul><ul><li>All public compound databases, including ours, have errors </li></ul><ul><li>“ Incorrect” structures – assertions, timelines etc </li></ul><ul><li>“ Incorrect” names associated with structures </li></ul><ul><li>Properties </li></ul><ul><li>Links </li></ul><ul><li>Publications </li></ul><ul><li>ENORMOUS CHALLENGE </li></ul>
  20. The Structure of Vitamin K?
  21. MeSH <ul><li>A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants , VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione). Vitamin K 3 provitamins, after being alkylated in vivo, exhibit the antifibrinolytic activity of vitamin K. Green leafy vegetables, liver, cheese, butter, and egg yolk are good sources of vitamin K </li></ul>
  22. The Structure of Vitamin K1?
  23. What is the Structure of Vitamin K1?
  24. CAS’s Common Chemistry
  25. Wikipedia
  26.  
  27.  
  28. ChEBI – Manual Curation
  29.  
  30.  
  31.  
  32. <ul><li>“ 2-methyl-3-(3,7,11,15-tetramethyl hexadec-2-enyl)naphthalene-1,4-dione” </li></ul><ul><li>Variants of systematic names on PubChem </li></ul><ul><li>2-methyl-3-[(E,7R,11R)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E,7S,11R)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E,7R,11S)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E,7S,11S)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E,11S)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-(3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E)-3,7,11,15-tetramethyl </li></ul>
  33. Question Everything online: www.dhmo.org
  34. It’s all on Wikipedia…
  35. Chemistry on The Internet Is Messy
  36. It’s Methane…
  37. What’s Methane?
  38. What’s Methane?
  39. What ELSE is Methane???
  40.  
  41. EPA’s DailyMed
  42. EPA’s DailyMed
  43. EPA’s DailyMed
  44. PHYSPROP Database <ul><li>The freely downloadable database under the EPI Suite prediction software </li></ul><ul><li>Very Basic filters suggest data quality issues </li></ul>
  45. The Stereochemistry challenge. 12500 chemicals with “missed” stereo
  46. With Great Fanfare…
  47. NPC Browser http://tripod.nih.gov/npc/
  48. NPC Browser http://tripod.nih.gov/npc/
  49.  
  50. Openness and Quality Issues Williams and Ekins, DDT, 16: 747-750 (2011) Science Translational Medicine 2011
  51. Public Domain Databases <ul><li>Our databases are a mess… </li></ul><ul><li>Non-curated databases are proliferating errors </li></ul><ul><li>We source and deposit data between databases </li></ul><ul><li>Original sources of errors hard to determine </li></ul><ul><li>Curation is time-consuming and challenging </li></ul>
  52. Stop Whining – Fix it
  53. Crowdsourced Curation <ul><li>Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate </li></ul>
  54. Search “Vitamin H”
  55. “ Curate” Identifiers
  56. “ Curate” Identifiers
  57. “ Curate” Identifiers
  58. Standards : Structure Standardization
  59. Standards : Structure Standardization
  60. Standards : Structure Standardization
  61. What needs to happen? <ul><li>Standards </li></ul><ul><ul><li>Standardization of structures </li></ul></ul><ul><ul><ul><li>ChEBI/PubChem sharing </li></ul></ul></ul><ul><ul><ul><li>InChI adoption </li></ul></ul></ul>
  62. The InChI Identifier
  63. Multiple Layers
  64. InChIStrings Hash to InChIKeys
  65. Vancomycin – Search the Internet
  66. Vancomycin Search Molecular SKELETON Search Full Molecule
  67. Full Skeleton Search: 104 Hits
  68. Full Molecule Search: 4 Hits
  69. Crowdsourcing Works <ul><li>>130 people have deposited data and participated in data curation </li></ul><ul><li>Different level curators check each other </li></ul><ul><li>More curators and depositors are encouraged! </li></ul>
  70. What needs to happen? <ul><li>Standards </li></ul><ul><ul><li>Standardization of structures </li></ul></ul><ul><ul><ul><li>ChEBI/PubChem sharing </li></ul></ul></ul><ul><ul><ul><li>InChI adoption </li></ul></ul></ul><ul><li>Collaboration </li></ul><ul><ul><li>Stop reinventing the wheel </li></ul></ul><ul><ul><li>Share data, share efforts and speed the process </li></ul></ul>
  71. Antony Williams vs Identifiers Passport ID Dad, Tony, others SSN Green Card License 5 email addresses ChemSpiderman (blog, Twitter account, Facebook, Friendfeed) OpenID … .
  72. Aspirin names and synonyms <ul><li>Text searches depend on correct association </li></ul><ul><li>335 suggested identifiers for Aspirin just on PubChem! </li></ul><ul><li>Disambiguation dictionaries are necessary, not just for authors! </li></ul>
  73.  
  74.  
  75. The Final Search Strategy
  76. All Those Names, One Structure
  77. Ambiguity in Identifiers
  78. Curated Dictionaries Matter
  79. Success Depends on Dictionaries
  80. Validated Name-Structure Dictionaries <ul><li>Chemical name dictionaries are used for: </li></ul><ul><ul><ul><li>Text-mining (publications, patents) </li></ul></ul></ul><ul><ul><ul><ul><li>Used to index PubMed and link to Google Patents </li></ul></ul></ul></ul><ul><ul><ul><li>Linking to other databases – think Biology! </li></ul></ul></ul><ul><ul><ul><ul><li>When structures are not available drug names link </li></ul></ul></ul></ul><ul><ul><ul><li>Searching the web </li></ul></ul></ul><ul><ul><ul><ul><li>Names link to structures link to InChIs </li></ul></ul></ul></ul>
  81. I want to know about “Vincristine” If all algorithms work then everything on the page is correct by default except the name-structure relationship!
  82. Vincristine: Identifiers and Properties
  83. Vincristine: Vendors and Sources Linked by Structure
  84. Vincristine: Patents Linked by Name
  85. Vincristine: Articles Linked by Name
  86. Challenges of Complex Molecules Yohimbine
  87. Originally 15 compounds “called” Yohimbine 54 Skeletons for Yohimbine
  88. <ul><ul><li>Internal and external content </li></ul></ul><ul><ul><li>Built to meet primary use-case </li></ul></ul><ul><ul><li>Tailored indexes and GUIs </li></ul></ul><ul><ul><li>Internal unique language & metadata </li></ul></ul><ul><ul><li>Poor interoperability/integration </li></ul></ul><ul><ul><li>Powerpoint, Documents, Excel </li></ul></ul><ul><ul><li>Many suppliers of systems and content in a single workflow </li></ul></ul>Pharma Information Tombs Literature Patents News Pipeline SAR CSRs Safety In vivo Etc
  89. What could create change? <ul><li>Harvard Business Review (2010) </li></ul><ul><li>“ One change would make a substantial difference [ to drug R&D ] : the creation of agreed-upon standards for digitally representing drug assets. ” </li></ul>
  90. It is so difficult to navigate… What’s the structure? Are they in our file? What’s similar? What’s the target? Pharmacology data? Known Pathways? Working On Now? Connections to disease? Expressed in right cell type? Competitors? IP?
  91. <ul><li>Open PHACTS Project </li></ul><ul><li>Develop a set of robust standards… </li></ul><ul><li>Implement the standards in a semantic integration hub </li></ul><ul><li>Deliver services to support drug discovery programs in pharma and public domain </li></ul><ul><li>22 partners, 8 pharmaceutical companies, 3 biotechs </li></ul><ul><li>36 months project </li></ul>Guiding principle is open access, open usage, open source - Key to standards adoption -
  92.  
  93. ChemSpider Resources for Chemistry
  94. The Future Commercial Software Pre-competitive Data Open Science Open Data Publishers Educators Open Databases Chemical Vendors Small organic molecules Undefined materials Organometallics Nanomaterials Polymers Minerals Particle bound Links to Biologicals Internet Data
  95. The Future of Chemistry on the Web? <ul><li>Public compound databases federate & build a linked environment of validated data! </li></ul><ul><li>Data validation needs are not ignored </li></ul><ul><li>Publishers layer on information to make publications discoverable </li></ul><ul><li>Public-Private databases can be linked </li></ul><ul><li>Open Data proliferate </li></ul><ul><li>The “ Semantic Web ” in action </li></ul>
  96. Acknowledgments <ul><li>The ChemSpider team </li></ul><ul><li>Our data providers, depositors, collaborators and curators </li></ul><ul><li>Software providers – OpenEye, ChemDoodle, ACD/Labs, GGA Software, Open Source (Jmol, JSpecView, OpenBabel) </li></ul><ul><li>Sean Ekins @collabchem </li></ul>
  97. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

×