• Like
  • Save
ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  Chemistry On The  Internet
Upcoming SlideShare
Loading in...5
×
 

ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of Chemistry On The Internet

on

  • 2,240 views

This is a presentation I gave at the FDA on December 1st 2009 in Wahington DC as part of a symposium involving PubChem, ChemIDPLus, PillBox, DailyMed and other related systems. The focus was, as ...

This is a presentation I gave at the FDA on December 1st 2009 in Wahington DC as part of a symposium involving PubChem, ChemIDPLus, PillBox, DailyMed and other related systems. The focus was, as usual, on the quality of data online and how to clean up the information and with a specific focus on the quality of data on the FDA's DailyMed and our efforts to apply semantic markup to the DailyMed articles

Statistics

Views

Total Views
2,240
Views on SlideShare
2,035
Embed Views
205

Actions

Likes
2
Downloads
23
Comments
1

4 Embeds 205

http://www.chemspider.com 202
http://www.slideshare.net 1
http://webcache.googleusercontent.com 1
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • A great presentation. You made an excellent point about the difference between a ’chemical’ and a ’chemical structure,’ as well as the importance of ’identifiers.’
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  Chemistry On The  Internet ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of Chemistry On The Internet Presentation Transcript

    • ChemSpider: How the Wisdom of the Crowd Can Improve the Quality of Chemistry on the Internet
    • Overview
      • Chemistry on the Internet
      • An introduction to ChemSpider
      • How crowds can enhance quality of public databases
      • The DailyMed Project on ChemSpider
        • Observed Quality Issues
        • The Curation Platform
        • Semantic Markup and Integration
    • Linked Data on the Web Taken from: Rafael Sidis’ Blog
    • Where Would You look? What Do You Trust?
    • Lots of “Public Compound” Databases
      • PubChem
      • Drugbank
      • ChEBI/ChEMBL
      • KEGG
      • LipidMAPs
      • ChemIDPlus
      • eMolecules
      • ZINC
      • Lots of chemical vendors
      • ChemSpider
    • What is a compound?
    • Connecting Chemistry on the Web
      • The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar)
      • Chemistry articles are indexed and searchable by a free online service
      • The web is linked together through the “language of chemistry”
    • Antony Williams vs Identifiers Passport ID Dad, Tony, others SSN Green Card License 5 email addresses ChemSpiderman (blog, Twitter account, Facebook, Friendfeed) OpenID … .
    • Aspirin vs Chemical Identifiers
    • Aspirin names and synonyms
      • Text searches depend on correct association
      • 335 suggested identifiers for Aspirin just on PubChem!
      • Disambiguation dictionaries are necessary
    •  
    •  
    •  
    • The Final Search Strategy
    • All Those Names, One Structure
    • Connections Can Lead Anywhere
    • The InChI Identifier
    • Multiple Layers
    • InChIStrings Hash to InChIKeys
    • Oleoylethanolamine
    • Search Engine Dependencies
    • Search Engine Dependencies
    • InChIs have traction…
    • Vancomycin
    •  
    • Vancomycin
      • Who will curate?
      • How would you clean such a large dataset?
    • What is ChemSpider?
      • ChemSpider is:
        • Building a Structure Centric Community for Chemists
        • Ca. 23 million compounds, ca. 300 data sources
        • A deposition and curation platform
        • A publishing platform for the community
        • Grows daily – more depositions, more links, more data sources
    • Search Cholesterol
    • Search Cholesterol
    • Search Cholesterol
    • Search Cholesterol
    • Search Cholesterol
    • Search Cholesterol
    • Linked across the internet
    • Kyoto Encyclopedia of Genes and Genomes
    • Link off a structure in ChemSpider
        • Chemical suppliers
        • Other publications
        • Analytical Data
        • Related Reactions
        • Wikipedia
        • Patents
        • “ Everything”
    • Links to Patents based on structure
    • Pubmed Articles Linked
    • Answering Questions for Chemists
      • Questions a chemist might ask…
        • What is the melting point of n-butanol?
        • What is the chemical structure of Xanax?
        • Chemically, what is phenolphthalein?
        • What are the stereocenters of cholesterol?
        • Where can I find publications about xylene?
        • What are the different trade names for Ketoconazole?
        • What is the NMR spectrum of Aspirin?
        • What are the safety handling issues for Thymol Blue?
    • Complex Data and Information
    • ChemSpider Searches
    • ChemSpider Complex Searches
    • Chemistry on the Internet
      • Much of the information is based on assertions and User Beware!
      • The Quality of information available is diverse and how does the user know what is and is not “correct”?
    • Question Everything online: www.dhmo.org
    • Vancomycin
    • Vancomycin on ChemSpider
    • Vancomycin
    • Vancomycin Search Molecular SKELETON Search Full Molecule
    • Full Skeleton Search: 104 Hits
    • Full Molecule Search: 4 Hits
    • The InChI “Resolver”
    • The EXPERTS must get it right?!
    • Wikipedia, C&E News, PubChem
      • C&E News (from ACS)
    • Feedback from Steve Ritter
      • “ Although CAS and C&EN are both part of the ACS Publications Division, we at C&EN still have to pay for our SciFinder access, strangely enough.”
      • “ It would be nice to have an authoritative web-based source of standard, well-drawn structures for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need. Maybe Wikipedia will be that source one day .”
    • What About Digitonin?
    • CAS as an authority
    • The Blogging Community Participate
    • Collaborative Knowledge Management
    • Assertion and Chemical Entities
      • Who says what Taxol is?
      • What is the “timeline” for a molecule?
      • How do we clean up the Public data?
    • Crowd-sourcing Chemistry Curation
      • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
    • Multi-level Curation and Approval Building a Structure Centric Community for Chemists
    • DailyMed
    • The Intention
      • Make DailyMed structure searchable via ChemSpider
      • In the process curate data on ChemSpider and validate data on DailyMed
      • Improve the curation platform on ChemSpider
      • Perform markup of DailyMed articles to enhance the reading experience
    • Poor Representations
    • Lack of Stereochemisty
    • Simply Wrong
    • Missing Fragments?
    • Hmmmm….
    • Does it Matter?
      • Does it matter to the consumer that the structures are wrong? No…what matters is what is in the bottle is the right medication!
      • To make DailyMed structure searchable it DOES matter
      • To data mine DailyMed it matters
      • To mark up DailyMed it matters
    • The Process
      • Import all XML files from DailyMed
      • Use “Home built” entity extraction based on our dictionary of chemical names
      • Articles online here:
        • http://www.chemspider.com/DailyMed.aspx
        • Example Article: http://www.chemspider.com/DailyMedArticle.aspx?id=2
    • State of the Data
    • Tolinase: DailyMed on ChemSpider
    • OTHER Mentioned Chemicals
    • One Name – Multiple Structures NO Stereo Full Stereo Partial Stereo Partial Stereo
    • Editing a Record
      • Do NOT deprecate record…remove association between name and chemical structure
    •  
    • Partial Stereochemistry
    • Loop of Assertions
      • Reduce to ONE structure – with full explicit stereo
    • How bad can it get??? And who is right????
    • Name-Structure Pairs
      • Cleaning up the associations of names and structures is torturous and time-consuming
      • Decisions get made and can be challenged
      • Names are not “removed” …they are still on the database
      • Such a curated “dictionary” is very valuable
    • ChemMantis
      • Chem ical M arkup A nd N omenclature T ransformation I ntegrated S ystem – ChemMantis
      • A platform for entity extraction for chemistry documents, markup and integration to online information sources – Wikipedia, ChemSpider, Entrez…
      • Web-based submission, markup and publishing platform
    • Name-Structure Pairs
    • Species – linked to Wikipedia
    • Semantic Linking of Structures
      • What would you want to link off a structure?
        • Chemical suppliers
        • Other publications
        • Analytical Data
        • Related Reactions
        • Wikipedia
        • Patents
        • “ Everything”
    • Outlinks…
    • Back to DailyMed
    • The Difference…
    • ChemMantis Markup
    • Species Markup
    • Dictionaries are Easily Enhanced
      • Copy-Paste into appropriate Entity Dictionary
      • Impacts all future markups
      • Expanding knowledgebases of information
      • Linked out to rich sources of information
    • Expand Dictionaries
    • Where To From Here?
      • The platform is built…it’s all eyeballs for curation now
      • As structure-identifier pairs are curated DailyMed will improve
      • The project is now on hold – no resources to continue
    • If We Had Our Way…
      • Convert every DailyMed Label to a ChemMantis marked up document
      • Use the XML segregation of the Tablet Labels to tag where chemicals are in the label
      • Allow data mining based on “where” in a label the chemicals are..drug-drug interactions etc
      • Markup and mine property data out of the labels using new dictionaries related to properties such as IC50 and toxicity
    • Citizen Scientists
    • Become a Data Source
    •  
    • Synthesis Procedures
    • Links to Data or Deposit Data
    • ChemSpider Everywhere : Embed
    • ChemSpider Everywhere: Spectral Game
    • ChemSpider Everywhere Crowdsourced Curation of Spectra
    • ChemSpider Everywhere ChemMobi Building a Structure Centric Community for Chemists
    • ChemSpider Web Services
    • Linked Data on the Web Taken from: Rafael Sidis’ Blog
    •  
    • Linking to Resources
      • Linking to resources by structures or name
      • Example integration: HSDB or ChemIDPlus
    • It’s a long road ahead…
    • Conclusions
      • The internet enables chemistry, at a reduced cost
      • Web 2.0 is here and improving quality
      • Question Quality! All data sources are imperfect – some more imperfect than others
      • Crowdsourcing to expand, curate and integrate
      • Structures submitted to DailyMed need checking
    • Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog SLIDES: www.slideshare.net/AntonyWilliams