Crowdsourcing, Collaborations andCrowdsourcing, Collaborations and
Text-Mining in a World of OpenText-Mining in a World of...
Building a Structure Centric Community
for Chemists
Linked Data CloudLinked Data Cloud
Building a Structure Centric Community
for Chemists
Chemistry on the InternetChemistry on the Internet
 Much of the infor...
Building a Structure Centric Community
for Chemists
Quality CostsQuality Costs
 Chemical Abstracts ServiceChemical Abstra...
Building a Structure Centric Community
for Chemists
What is “wrong”?What is “wrong”?
Building a Structure Centric Community
for Chemists
 A platform for:A platform for:
 Data deposition,Data deposition, cu...
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community
for Chemists
Complex Data and InformationComplex Data and Information
Building a Structure Centric Community
for Chemists
Online DataOnline Data
 Many websites host structure-based informatio...
Building a Structure Centric Community
for Chemists
Building a Structure Centric Community
for Chemists
Wikipedia, C&E News, PubChemWikipedia, C&E News, PubChem
C&E News (fro...
Building a Structure Centric Community
for Chemists
Does one stereocenter matter?Does one stereocenter matter?
Building a Structure Centric Community
for Chemists
VancomycinVancomycin
 Who will curate?Who will curate?
 PubChem is n...
Building a Structure Centric Community
for Chemists
VancomycinVancomycin
ChemSpider: 1 compound – 3 daysChemSpider: 1 comp...
Building a Structure Centric Community
for Chemists
Question EverythingQuestion Everything
www.dhmo.orgwww.dhmo.org
Building a Structure Centric Community
for Chemists
DailyMedDailyMed
““DailyMed providesDailyMed provides high qualityhigh...
Building a Structure Centric Community
for Chemists
The FDA’s DailyMedThe FDA’s DailyMed
Building a Structure Centric Community
for Chemists
Structures on DailyMedStructures on DailyMed
Poor RepresentationsPoor ...
Building a Structure Centric Community
for Chemists
Structures on DailyMedStructures on DailyMed
Lack of StereochemistyLac...
Building a Structure Centric Community
for Chemists
Incorrect StructuresIncorrect Structures
Scanning (?) IssuesScanning (...
Building a Structure Centric Community
for Chemists
Incorrect StructuresIncorrect Structures
Building a Structure Centric Community
for Chemists
Does it Matter?Does it Matter?
 Does it matter to the consumer that t...
Building a Structure Centric Community
for Chemists
CollaborativeCollaborative Knowledge ManagementKnowledge Management
fo...
Building a Structure Centric Community
for Chemists
Wikipedia Links to DrugbankWikipedia Links to Drugbank
Building a Structure Centric Community
for Chemists
Taxol on PubChemTaxol on PubChem
Building a Structure Centric Community
for Chemists
Taxol on Daily MedTaxol on Daily Med
Building a Structure Centric Community
for Chemists
The InChI IdentifierThe InChI Identifier
Building a Structure Centric Community
for Chemists
Multiple LayersMultiple Layers
Source: Unofficial InChI FAQ pageSource...
Building a Structure Centric Community
for Chemists
InChIStrings Hash to InChIKeysInChIStrings Hash to InChIKeys
Building a Structure Centric Community
for Chemists
InChIs for TaxolInChIs for Taxol
Building a Structure Centric Community
for Chemists
Back to TaxolBack to Taxol
 DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBa...
Building a Structure Centric Community
for Chemists
InChIKeys for TaxolInChIKeys for Taxol
 DrugBank: RCINICONZNJXQF-CLDW...
Building a Structure Centric Community
for Chemists
The InChI ResolverThe InChI Resolver
Building a Structure Centric Community
for Chemists
Building a Structure Centric Community
for Chemists
Coming Soon…Linked ArticlesComing Soon…Linked Articles
Building a Structure Centric Community
for Chemists
How bad can it get???How bad can it get???
And who is right????And who...
Building a Structure Centric Community
for Chemists
ChemMantisChemMantis
 ChemChemicalical MMarkuparkup AAndnd NNomenclat...
Building a Structure Centric Community
for Chemists
ChemMantis MarkupChemMantis Markup
Building a Structure Centric Community
for Chemists
Enable Electronic Articles…Enable Electronic Articles…
 Structures ar...
Building a Structure Centric Community
for Chemists
Species MarkupSpecies Markup
Building a Structure Centric Community
for Chemists
Dictionaries are Easily EnhancedDictionaries are Easily Enhanced
 Cop...
Building a Structure Centric Community
for Chemists
Build DictionariesBuild Dictionaries
Ontologies NextOntologies Next
Building a Structure Centric Community
for Chemists
Outlinks…Outlinks…
Building a Structure Centric Community
for Chemists
Publishers and Document Mark-UpPublishers and Document Mark-Up
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
 Linked from WikipediaLink...
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Embed Functionality (like Y...
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
www.spectralgame.comwww.spe...
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Crowdsourced Curation of Sp...
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
RSC CompoundsRSC Compounds
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
Nature ChemistryNature Chem...
Building a Structure Centric Community
for Chemists
ChemSpider EverywhereChemSpider Everywhere
ChemMobiChemMobi
Building a Structure Centric Community
for Chemists
Structure RSS Feeds with InChIsStructure RSS Feeds with InChIs
Building a Structure Centric Community
for Chemists
Building a Structure Centric Community
for Chemists
AcknowledgmentsAcknowledgments
 Richard Kidd, Royal Society of Chemis...
Building a Structure Centric Community
for Chemists
ConclusionsConclusions
www.chemspider.comwww.chemspider.com
www.chemsp...
Upcoming SlideShare
Loading in...5
×

Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry

1,376

Published on

There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. However, freedom costs and in many cases the cost is quality. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 150 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the issue of quality in many chemistry-related databases, approaches to cleaning up the data and how a curated platform can become the centralized hub for resourcing information about chemical entities. This includes experimental and predicted properties, analytical data, publications, suppliers and integrated databases. I will detail three efforts :1) the curation of chemistry on Wikipedia 2) an examination of structure integrity on the FDA Daily Med website, a web site of medication content and labeling as found in medication package inserts 3) recognizing chemical names in documents and providing a platform for structure-based searching of Open Access chemistry literature.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,376
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry

  1. 1. Crowdsourcing, Collaborations andCrowdsourcing, Collaborations and Text-Mining in a World of OpenText-Mining in a World of Open ChemistryChemistry Antony WilliamsAntony Williams Bio-IT World 2009Bio-IT World 2009
  2. 2. Building a Structure Centric Community for Chemists Linked Data CloudLinked Data Cloud
  3. 3. Building a Structure Centric Community for Chemists Chemistry on the InternetChemistry on the Internet  Much of the information online isMuch of the information online is User Beware!User Beware!  The Quality of information is “diverse”The Quality of information is “diverse”  Technologies can “link and connect” information butTechnologies can “link and connect” information but validation and curation is key to providing qualityvalidation and curation is key to providing quality  The LinkedData web is of less value when the data linkedThe LinkedData web is of less value when the data linked are “wrong”are “wrong”
  4. 4. Building a Structure Centric Community for Chemists Quality CostsQuality Costs  Chemical Abstracts ServiceChemical Abstracts Service (CAS), a division of the(CAS), a division of the ACS is “Gold Standard” in Chemistry relatedACS is “Gold Standard” in Chemistry related informationinformation  101 years of content, $260 million revenue (2006), >40101 years of content, $260 million revenue (2006), >40 million substances and 60 million sequencesmillion substances and 60 million sequences  But online…But online…
  5. 5. Building a Structure Centric Community for Chemists What is “wrong”?What is “wrong”?
  6. 6. Building a Structure Centric Community for Chemists  A platform for:A platform for:  Data deposition,Data deposition, curation and annotationcuration and annotation  Supporting Open Notebook Science effortsSupporting Open Notebook Science efforts  Chemistry document mark-up with ChemMantisChemistry document mark-up with ChemMantis  The Open Access ChemSpider Journal of ChemistryThe Open Access ChemSpider Journal of Chemistry
  7. 7. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  8. 8. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  9. 9. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  10. 10. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  11. 11. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  12. 12. Building a Structure Centric Community for Chemists Search CholesterolSearch Cholesterol
  13. 13. Building a Structure Centric Community for Chemists Complex Data and InformationComplex Data and Information
  14. 14. Building a Structure Centric Community for Chemists Online DataOnline Data  Many websites host structure-based informationMany websites host structure-based information  Question quality!!!Question quality!!!
  15. 15. Building a Structure Centric Community for Chemists
  16. 16. Building a Structure Centric Community for Chemists Wikipedia, C&E News, PubChemWikipedia, C&E News, PubChem C&E News (from ACS)C&E News (from ACS)
  17. 17. Building a Structure Centric Community for Chemists Does one stereocenter matter?Does one stereocenter matter?
  18. 18. Building a Structure Centric Community for Chemists VancomycinVancomycin  Who will curate?Who will curate?  PubChem is notPubChem is not resourced to cleanresourced to clean these errorsthese errors   How would youHow would you clean such a largeclean such a large dataset?dataset?
  19. 19. Building a Structure Centric Community for Chemists VancomycinVancomycin ChemSpider: 1 compound – 3 daysChemSpider: 1 compound – 3 days
  20. 20. Building a Structure Centric Community for Chemists Question EverythingQuestion Everything www.dhmo.orgwww.dhmo.org
  21. 21. Building a Structure Centric Community for Chemists DailyMedDailyMed ““DailyMed providesDailyMed provides high qualityhigh quality information aboutinformation about marketed drugs.marketed drugs. This information includes FDA approved labelsThis information includes FDA approved labels (package inserts).”(package inserts).”
  22. 22. Building a Structure Centric Community for Chemists The FDA’s DailyMedThe FDA’s DailyMed
  23. 23. Building a Structure Centric Community for Chemists Structures on DailyMedStructures on DailyMed Poor RepresentationsPoor Representations
  24. 24. Building a Structure Centric Community for Chemists Structures on DailyMedStructures on DailyMed Lack of StereochemistyLack of Stereochemisty
  25. 25. Building a Structure Centric Community for Chemists Incorrect StructuresIncorrect Structures Scanning (?) IssuesScanning (?) Issues
  26. 26. Building a Structure Centric Community for Chemists Incorrect StructuresIncorrect Structures
  27. 27. Building a Structure Centric Community for Chemists Does it Matter?Does it Matter?  Does it matter to the consumer that the structures areDoes it matter to the consumer that the structures are wrong? No…what matters is what is in the bottle is thewrong? No…what matters is what is in the bottle is the right medication!right medication!  To make DailyMed structure searchable it DOESTo make DailyMed structure searchable it DOES mattermatter  To data mine DailyMed it mattersTo data mine DailyMed it matters  To mark up DailyMed it mattersTo mark up DailyMed it matters
  28. 28. Building a Structure Centric Community for Chemists CollaborativeCollaborative Knowledge ManagementKnowledge Management for Chemistsfor Chemists
  29. 29. Building a Structure Centric Community for Chemists Wikipedia Links to DrugbankWikipedia Links to Drugbank
  30. 30. Building a Structure Centric Community for Chemists Taxol on PubChemTaxol on PubChem
  31. 31. Building a Structure Centric Community for Chemists Taxol on Daily MedTaxol on Daily Med
  32. 32. Building a Structure Centric Community for Chemists The InChI IdentifierThe InChI Identifier
  33. 33. Building a Structure Centric Community for Chemists Multiple LayersMultiple Layers Source: Unofficial InChI FAQ pageSource: Unofficial InChI FAQ page
  34. 34. Building a Structure Centric Community for Chemists InChIStrings Hash to InChIKeysInChIStrings Hash to InChIKeys
  35. 35. Building a Structure Centric Community for Chemists InChIs for TaxolInChIs for Taxol
  36. 36. Building a Structure Centric Community for Chemists Back to TaxolBack to Taxol  DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBank: RCINICONZNJXQF-CLDWUXIMDD  ChEBI:ChEBI: RCINICONZNJXQF-GXKQXQCDDNRCINICONZNJXQF-GXKQXQCDDN  Wikipedia:Wikipedia: RCINICONZNJXQF-MZXODVADBJ  Which one is correct???
  37. 37. Building a Structure Centric Community for Chemists InChIKeys for TaxolInChIKeys for Taxol  DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBank: RCINICONZNJXQF-CLDWUXIMDD  ChEBI:ChEBI: RCINICONZNJXQF-GXKQXQCDDNRCINICONZNJXQF-GXKQXQCDDN  Wikipedia:Wikipedia: RCINICONZNJXQF-MZXODVADBJ  ChEBI and Wikipedia are the SAME structure  Drugbank is a DIFFERENT structure – ONE stereocenter
  38. 38. Building a Structure Centric Community for Chemists The InChI ResolverThe InChI Resolver
  39. 39. Building a Structure Centric Community for Chemists
  40. 40. Building a Structure Centric Community for Chemists Coming Soon…Linked ArticlesComing Soon…Linked Articles
  41. 41. Building a Structure Centric Community for Chemists How bad can it get???How bad can it get??? And who is right????And who is right????
  42. 42. Building a Structure Centric Community for Chemists ChemMantisChemMantis  ChemChemicalical MMarkuparkup AAndnd NNomenclatureomenclature TTransformationransformation IIntegratedntegrated SSystem –ystem – ChemMantisChemMantis  A platform for entity extraction for chemistryA platform for entity extraction for chemistry documents, markup and integration to onlinedocuments, markup and integration to online information sources – Wikipedia, ChemSpider, Entrez…information sources – Wikipedia, ChemSpider, Entrez…  Web-based submission, markup and publishing platformWeb-based submission, markup and publishing platform now hosting thenow hosting the ChemSpider Journal of ChemistryChemSpider Journal of Chemistry
  43. 43. Building a Structure Centric Community for Chemists ChemMantis MarkupChemMantis Markup
  44. 44. Building a Structure Centric Community for Chemists Enable Electronic Articles…Enable Electronic Articles…  Structures are theStructures are the language of chemistrylanguage of chemistry  Show structures toShow structures to chemists and search/linkchemists and search/link from there…from there…
  45. 45. Building a Structure Centric Community for Chemists Species MarkupSpecies Markup
  46. 46. Building a Structure Centric Community for Chemists Dictionaries are Easily EnhancedDictionaries are Easily Enhanced  Copy-Paste into appropriate Entity DictionaryCopy-Paste into appropriate Entity Dictionary  Impacts all future markupsImpacts all future markups  Expanding knowledgebases of informationExpanding knowledgebases of information  Linked out to rich sources of informationLinked out to rich sources of information
  47. 47. Building a Structure Centric Community for Chemists Build DictionariesBuild Dictionaries Ontologies NextOntologies Next
  48. 48. Building a Structure Centric Community for Chemists Outlinks…Outlinks…
  49. 49. Building a Structure Centric Community for Chemists Publishers and Document Mark-UpPublishers and Document Mark-Up
  50. 50. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere  Linked from WikipediaLinked from Wikipedia  Linked from Open Notebook Science sites using EMBEDLinked from Open Notebook Science sites using EMBED  Linked from Blogs using Structure/Spectra EMBEDLinked from Blogs using Structure/Spectra EMBED  Integrated into structure drawing packages such asIntegrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source appletsACD/ChemSketch, Symyx Draw, Open Source applets  Integrated to software offerings from Thermo, Waters, Agilent,Integrated to software offerings from Thermo, Waters, Agilent, BrukerBruker
  51. 51. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere Embed Functionality (like YouTube)Embed Functionality (like YouTube)
  52. 52. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere www.spectralgame.comwww.spectralgame.com
  53. 53. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere Crowdsourced Curation of SpectraCrowdsourced Curation of Spectra
  54. 54. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere RSC CompoundsRSC Compounds
  55. 55. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere Nature ChemistryNature Chemistry Nature ChemistryNature Chemistry articles arearticles are annotated to identify all of theannotated to identify all of the chemical compounds mentionedchemical compounds mentioned throughout the text.throughout the text. Those compounds are linked out toThose compounds are linked out to other information resourcesother information resources including PubChem andincluding PubChem and ChemSpiderChemSpider..
  56. 56. Building a Structure Centric Community for Chemists ChemSpider EverywhereChemSpider Everywhere ChemMobiChemMobi
  57. 57. Building a Structure Centric Community for Chemists Structure RSS Feeds with InChIsStructure RSS Feeds with InChIs
  58. 58. Building a Structure Centric Community for Chemists
  59. 59. Building a Structure Centric Community for Chemists AcknowledgmentsAcknowledgments  Richard Kidd, Royal Society of ChemistryRichard Kidd, Royal Society of Chemistry  Jason Wilde, Nature Publishing GroupJason Wilde, Nature Publishing Group  Martin Walker and the Wikipedia Chemistry teamMartin Walker and the Wikipedia Chemistry team  Microsoft – Rudy PotenzoneMicrosoft – Rudy Potenzone  Symyx – Keith Taylor and James JackSymyx – Keith Taylor and James Jack  SureChem – Nicko GoncharoffSureChem – Nicko Goncharoff  Spectral game - Andrew Lang and Jean-Claude BradleySpectral game - Andrew Lang and Jean-Claude Bradley  ““The InChI team and Advisory Group”The InChI team and Advisory Group”
  60. 60. Building a Structure Centric Community for Chemists ConclusionsConclusions www.chemspider.comwww.chemspider.com www.chemspider.com/journalwww.chemspider.com/journal InChIs and Internet ChemistryInChIs and Internet Chemistry http://inchis.chemspider.comhttp://inchis.chemspider.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×