Your SlideShare is downloading. ×
Whitney Symposium Lecturejune 2008 1220331644496491 9
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Whitney Symposium Lecturejune 2008 1220331644496491 9

579
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
579
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Crowd-Sourcing to Build a Structure Centric Community for Chemists Antony Williams Whitney Symposium 2008 - Networks
  • 2. Social Networking for Chemists Building a Structure Centric Community for Chemists
  • 3. Network Drug Discovery Tools www.curehunter.com Building a Structure Centric Community for Chemists
  • 4. Beware the Networks! Building a Structure Centric Community for Chemists
  • 5. Collaborative Authoring in Academia  Group level collaboration via Wikis Building a Structure Centric Community for Chemists
  • 6. Collaborative Authoring for Drug Discovery  Pfizerpedia Building a Structure Centric Community for Chemists
  • 7. Collaborative Knowledge Management for Chemists – Wikipedia, Built by a Network Building a Structure Centric Community for Chemists
  • 8. and biologists…WikiProteins Building a Structure Centric Community for Chemists
  • 9. WikiProteins What Is Tegafur? Building a Structure Centric Community for Chemists
  • 10. Commonly Lacking…  Approaches generally lack “structural intelligence”  Structures have properties (Mw, MF, exp. & pred. properties)  Collections of structures need to be searchable by structure  Most data collections are “self-contained” and rarely connecting to other resources via “structure” Building a Structure Centric Community for Chemists
  • 11. A Search Engine for Chemists  Questions a chemist might ask…  What is the melting point of n-butanol?  What is the chemical structure of Xanax?  Chemically, what is viagra?  What are the stereocenters of cholesterol?  Where can I find publications about Taxol?  What are the different trade names for Ketoconazole?  What is the NMR spectrum of Aspirin?  What are the safety handling issues for Thymol Blue?  ChemSpider can answer all of these questions Building a Structure Centric Community for Chemists
  • 12. ChemSpider Data Content  Over 20 million unique chemical structures :  Online Databases –PubChem, Drugbank, HMDB, Wikipedia  Chemical Vendors – over 40 different vendors and growing  Personal Depositions – individual contributions  Journal Publishers  Content database vendors  Analytical data collections  Patents (9 MILLION Structures to search patents)  Web scraping Content is linked back to the original data sources Building a Structure Centric Community for Chemists
  • 13. A Structure Centric Community for Chemists  A FREE ACCESS platform for deposition, management, curation, annotation and extension of information associated with chemical structures  Semantically connect to other sites providing access to knowledge, data and information of determined quality  Search by alphanumeric text, chemical structure and substructure and combination searches  Predict properties for submitted structures Building a Structure Centric Community for Chemists
  • 14. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 15. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 16. Links out to KEGG Kyoto Encyclopedia of Genes and Genomes Building a Structure Centric Community for Chemists
  • 17. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 18. Tell me About Aspirin Building a Structure Centric Community for Chemists
  • 19. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 20. Tell me about Aspirin Building a Structure Centric Community for Chemists
  • 21. Abstract Compounds?  Is there any information about “Quesnoin”?  Type in the name (and there may be many) or other identifier  Paste a chemical structure  Draw the structure Building a Structure Centric Community for Chemists
  • 22. Example Search Building a Structure Centric Community for Chemists
  • 23. Example Search Building a Structure Centric Community for Chemists
  • 24. Example Search 2  What compounds have a mass of 300+/-0.001?  or search a combination of intrinsic/predicted properties Building a Structure Centric Community for Chemists
  • 25. Example Search 2 Building a Structure Centric Community for Chemists
  • 26. Complex Search Building a Structure Centric Community for Chemists
  • 27. Search Open Access Journals – ChemSpider Building a Structure Centric Community for Chemists
  • 28. Search PubMed – ChemSpider Building a Structure Centric Community for Chemists
  • 29. The Quality of Data Online…  Aggregating data opens up quality issues  Structure-identifier associations are “dirty”  Structures are COMMONLY incorrect – stereochem issues  Manual curation of small databases is enough work – what about millions of structures?  Structures are far from perfect. What is a “correct structure”?  Full stereochemistry?  Historical timeline of structure?  Who is the authority? Building a Structure Centric Community for Chemists
  • 30. Who holds THE Quality Authority?  Chemical Abstracts Service is the structural authority today. 1400 (?) employees, world standard in chemistry information  101 years of knowledge, process and expertise. MANUAL curation is key. Robotic curation is enabling  How can an online, free access system peacefully co- exist with the authority? Building a Structure Centric Community for Chemists
  • 31. Quality is a Major Issue- Search Butanol Building a Structure Centric Community for Chemists
  • 32. Crowd-sourcing Database Compilation Building a Structure Centric Community for Chemists
  • 33. Wikipedia – Crowdsourcing Chemistry Building a Structure Centric Community for Chemists
  • 34. Wikipedia Chemistry Curation project  Only ca. 5000 organic structures, 7000 total structures  MONTHS of work so far for a team of 6 people  Many errors removed in the process. Curation process is a daily event for users/depositors  Slow and torturous process for stereo molecules. Building a Structure Centric Community for Chemists
  • 35. Thymol Blue on ChemSpider  Data online includes:  UV-vis spectrum  Measured experimental properties  Link to Wikipedia article  Links to chromatography details  Multiple identifiers/trade names etc.  Links to vendors/suppliers/other databases  Safety information Building a Structure Centric Community for Chemists
  • 36. Differences between ChemSpider/Wikipedia ChemSpider Wikipedia >20 million unique structures ~5000 organics, 2000 others Complex queries – Properties, Text Text, structure/substructure, OA publishers, Data Sources, … Prediction of properties No Analytical Data No Active depositors/curators – 30 Active editors – about 50 (?) 5000 people/day; 1100 registered ???? Compound monographs linked Detailed compound monographs Building a Structure Centric Community for Chemists
  • 37. Differences between Wikipedia/ChemSpider Wikipedia ChemSpider Supported by tried and tested Primarily Microsoft .NET Media-Wiki platform. technologies with OS components Established infrastructure and “Out of a basement” on three Wikipedia Foundation Team servers and 5 volunteers Chemistry is a subset of the ‘Pedia Chemistry is the focus of ‘Spider GFL licensing for everything Mixed “licensing” Strong team of WP:Chem Growing team of WP:Chem advocates, curators and admins advocates, curators and admins Worldwide reputation as quality Growing reputation as focused on source quality Building a Structure Centric Community for Chemists
  • 38. Crowd-sourcing Curation  How to curate data for millions of structures?  Robot processes can clean up depositions  Search for Chloride and check molecular formula for Cl  Check for stereochemistry and remove names with stereo  Provide a simple-to-use platform to curate, annotate and tag data  Provide curator administration to prevent vandalism (Veropedia) Building a Structure Centric Community for Chemists
  • 39. Multi-level Curation and Approval Building a Structure Centric Community for Chemists
  • 40. Post Comments  Anyone can “Post Comments” associated with a structure. To curate data we require login to track Building a Structure Centric Community for Chemists
  • 41. Crowd-sourcing Chemistry  Crowd-sourced curation: identify and tag errors, edit names, synonyms, identify records for deprecation  ALSO  Crowd-sourced deposition: anyone can deposit data (structures, text, images, analytical data) Building a Structure Centric Community for Chemists
  • 42. But, when registered and logged in…  Ability to curate and add to the database  Add structures  “Clean” structures  Add data (spectra, CIFs, images)  Add links to other pages (URLs)  Add publication details Building a Structure Centric Community for Chemists
  • 43. Adding to the Database - Structure Building a Structure Centric Community for Chemists
  • 44. Adding New Text Data Add Publication Add URL Add Identifier Building a Structure Centric Community for Chemists
  • 45. Adding Supplementary Info to a Structure Building a Structure Centric Community for Chemists
  • 46. Can ChemSpider Enable Discovery?  Yes, chemists can search by text, structure, substructure or properties to look at relationships and probe drug discovery Building a Structure Centric Community for Chemists
  • 47. ChemSpider – Research in Progress  Supporting Open Notebook Science as a repository – JC Bradley at Drexel University  For the purpose of online virtual screening  Applying descriptors of various types to filter a database of 20 million compounds  In progress:  Utilizing SimBioSys’ LASSO Descriptor  Collaboration based on NISS’ ChemModLab Building a Structure Centric Community for Chemists
  • 48. LASSO Ligand Activity by Surface Similarity Order Building a Structure Centric Community for Chemists
  • 49. LASSO Descriptors on ChemSpider SEMANTIC WEB in action Building a Structure Centric Community for Chemists
  • 50. LASSO Searching Method 1  Ask the question “What are the top 1000 molecules with similar LASSO descriptors to the actives for the Estrogen Receptor” Building a Structure Centric Community for Chemists
  • 51. It WORKS - Enrichment Plot  60% of the actives were recovered in the top 1% of the database.  “Environmental binders” are weak binders  The top ranked compounds may well be active ER binders  Likely candidates for experimental investigation Building a Structure Centric Community for Chemists
  • 52. Tipping Point  Tipping point - the point at which a slow gradual change becomes irreversible and then proceeds with gathering pace Building a Structure Centric Community for Chemists
  • 53. ChemSpider Forums/Blogs  Forum.chemspider.com  www.chemspider.com/blog Building a Structure Centric Community for Chemists
  • 54. ChemSpider TouchGraph Building a Structure Centric Community for Chemists
  • 55. What would we most like to do?  Enable “Collaborative Science”. What would that look like?  Access to chemical supplies when people need them  Awareness of available literature, patents, databases of curated content – whether Open Access or not. Transaction fees (or not) are between user and provider  Host Open Notebook Science exchanges Building a Structure Centric Community for Chemists
  • 56. “ChemSpider Inside”  Instrument vendors integrated ChemSpider to their metabolism ID project – ChemSpider linked to all Mass Spec Intruments doing Metabolite ID?  Wikipedia roundtrip linking to ChemSpider  Google indexing ChemSpider at “fixed rate”  Integration to desktop drawing packages  Members of Microsoft BioIT Alliance  Discussions on Taverna’s Workflow Sourceforge group  Hosting Open Access articles shortly… Building a Structure Centric Community for Chemists
  • 57. Where to from here? Short term  Integrated text and structure/substructure searching of the Open Access literature is in development  Web-based scraping of structure-based information – examples in place  Enhanced web services layer to integrate searches  Deposit updated Patent Database (9 million structures)  Reaction handling and deposition Building a Structure Centric Community for Chemists
  • 58. Where to from here? Mid-term  Spidering for Chemistry – extract data from articles, webpages and data sources AND stay within copyright  WiChempedia project – wiki-layers on top of ChemSpider, alongside Wikipedia curation project.  Deeper integration to text-based searching and conversion of chemical names to structures for online structure searching:  Improved integration with NCBI Entrez system  Deliver “dedicated websites” for specific publishers Building a Structure Centric Community for Chemists
  • 59. Where to from here? Mid-Term  An extensible datamodel “on the fly” allows us to easily expand to integrate abstract data to structures  Data mine and curate “parameters” – physicochemical and physiological parameters to enable QSAR analysis, data modeling and provision of models online (UNC-Chapel Hill, NISS) Building a Structure Centric Community for Chemists
  • 60. Our Challenges  There are “no employees”  ChemSpider is non-funded  System is hyper-dependent on ISP, power and limited compute power  We are upsetting a lot of people – evangelists, cheminformatics system vendors, publishers, data content providers Building a Structure Centric Community for Chemists
  • 61. Acknowledgments  The ChemSpider team of volunteer developers  ChemSpider Advisory Group  Our curators, depositors and users  Suppliers of commercial software – Microsoft, ACD/Labs, OpenEye, ChemAxon, SimBioSys  SureChem – Structure Based Online Patent Searching Building a Structure Centric Community for Chemists
  • 62. Further reading  www.chemspider.com/blog  Internet-based tools for communication and collaboration in chemistry, Drug Discovery Today, Volume 13, Numbers 11/12, June 2008 502-506, doi:10.1016/j.drudis.2008.03.015  A perspective of publicly accessible/open-access chemistry databases, Drug Discovery Today, Volume 13, Numbers 11/12, June 2008, 495-501, doi:10.1016/j.drudis.2008.03.017 Building a Structure Centric Community for Chemists