Crowdsourcing, Collaborations and Text-Mining in a  World of Open Chemistry  Antony Williams
Imagine a time when …. <ul><li>The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Sc...
ChemSpider - A Search Engine for Chemists <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting...
ChemSpider Data Content <ul><li>Over 21.5 million unique chemical structures from ca. 150 data sources </li></ul><ul><ul><...
Tell me about Aspirin
Tell me about Aspirin
Link outs
Links out to KEGG Kyoto Encyclopedia of Genes and Genomes
Tell me about Aspirin
Tell me About Aspirin
Tell me about Aspirin
Tell me about Aspirin
Tell me about Aspirin
Text- Indexing  and ChemSpider? <ul><li>ChemSpider text-indexes almost 500,000 Open Access and Free Access articles </li><...
Open Access Literature Search
Search PubMed – ChemSpider
Other Searches <ul><li>What compounds have a mass of 300+/-0.001? </li></ul><ul><li>or search a combination of intrinsic/p...
Other Searches
Complex Search
The Quality of Data Online… <ul><li>Aggregating data opens up quality issues </li></ul><ul><li>Structure-identifier associ...
Who holds THE Quality Authority? <ul><li>Chemical Abstracts Service is the structural authority today. 1400 (?) employees,...
Quality is a Major Issue- Search Butanol OLD EXAMPLE..now fixed
Wikipedia Chemistry Curation project <ul><li>Only ca. 5000 organic structures, 7000 total structures </li></ul><ul><li>Alm...
Wikipedia Curation <ul><li>Looking for self-consistency across a Wikipedia Page </li></ul><ul><li>Primary key is the artic...
Viagra <ul><li>Viagra is sildenafil… </li></ul>
Viagra is Sildenafil  Citrate <ul><li>Sildenafil may be shown in the Wikipedia record but their might be a redirect from a...
Other issues… <ul><li>If structure shows no stereo then don’t put stereo in name </li></ul><ul><li>Outlinks to external da...
Charges
Sugars – Machine Readable vs Aesthetics Fischer  Stereo  Haworth
Wikipedia – Crowdsourcing Chemistry
Thymol Blue on ChemSpider <ul><li>Data online includes: </li></ul><ul><ul><li>UV-vis spectrum </li></ul></ul><ul><ul><li>M...
Differences between ChemSpider/Wikipedia No, but links. Analytical Data Active editors > 50 (?) Active depositors/curators...
Differences between Wikipedia/ChemSpider Growing reputation as focused on quality Worldwide reputation as quality source –...
Usage Growth – Representative Plot
Crowd-sourcing Curation <ul><li>How to curate data for millions of structures?  </li></ul><ul><li>Robot processes can clea...
Multi-level Curation and Approval
Post Comments <ul><li>Anyone can “Post Comments” associated with a structure. To curate data we require login to track </l...
Crowd-sourcing Chemistry <ul><li>Crowd-sourced curation: identify and tag errors, edit names, synonyms, identify records f...
Structure-Centric  <ul><li>We want to search Open-Access articles by structure, substructure, similarity of structure </li...
“Entity Extraction” <ul><li>Rule-based recognition of systematic names: </li></ul><ul><ul><li>Use a lexeme of name fragmen...
Name-to-Structure and Lexemes
Name Recognition <ul><li>Azo aldehyde  2   was  synthesized according to a reported  method [17]. To  a stirred  solution ...
Name Recognition <ul><li>Azo aldehyde  2   was  synthesized according to a reported  method [17]. To  a stirred  solution ...
How Many Chemical Names? <ul><li>“ She had the drive to derive success in any venture and was well versed in Karate. When ...
How Many Chemical Names? <ul><li>“ She had the  drive  to derive  success  in any venture and was well  versed  in  Karate...
ChemMantis <ul><li>Chem ical  M arkup  A nd  N omenclature  T ransformation  I ntegrated  S ystem </li></ul>
Making Open Access Articles Searchable Proof of Concept <ul><li>Can we HOST Chemistry Open Access articles on ChemSpider a...
Document markup <ul><li>ChemSpider now hosting Open Access articles from MDPI, Molecular Diversity Preservation Internatio...
A Standard for Document Markup? <ul><li>NLM-DTD: National Library of Medicine; Document Type Definition </li></ul><ul><li>...
NLM/DTD markup
Chemistry and Biology <ul><li>Menus can be extended as necessary  </li></ul>
Document markup
Markup – 3 seconds!
On the fly conversion
Shorthand Formulae Supported
Curators Tools During Markup
One Click to more Info…
Coming Soon….
Structure Image Conversion
Two Seconds Later
Not Always Perfect….
A Platform for Markup <ul><li>Can we provide a platform for document markup for chemists? </li></ul><ul><li>Workflow: </li...
Challenges <ul><li>Computer software can generate chemical names better than the majority of chemists </li></ul><ul><li>Th...
Names and Structures <ul><li>Dichloroacetone </li></ul><ul><li>Trichloromethylsilane </li></ul>
Ambiguity
Ambiguity in Abbreviations - DPA
Ambiguity in Abbreviations - THF
Import is Easy <ul><li>Make articles Public/Private (embargo date soon) </li></ul><ul><li>Auto-markup and check by user </...
IUPAC PAC Articles
Supports Word .DOC, HTML, RTF
Drexel University Documents
Drexel University Documents
Drexel University Documents
Patents
Organism Markup
Extensible Markup Process  <ul><li>Markup process is easily extendable </li></ul><ul><li>Configurable from one XML file </...
Markup Movie
DailyMed
DailyMed on ChemSpider
Quality of Structures!!!
Quality of Structures
DailyMed
Remaining Issues in Markup <ul><li>Ongoing curation of look-up dictionaries – one name linked to multiple structures…which...
Oops… <ul><li>Online document markup and indexing is a very disruptive offering and a natural extension for ChemSpider </l...
What’s Coming? <ul><li>Agreement with Royal Society of Chemistry that we can add their structure-based RSS feeds to ChemSp...
What’s Coming? <ul><li>Now working on “organism” extraction in document markup to identify organisms and link out to exter...
Conclusions <ul><li>The quality of structure-based data online should always be questioned – that includes ChemSpider </li...
Further reading <ul><li>www.chemspider.com/blog </li></ul><ul><li>Internet-based tools for communication and collaboration...
ChemSpider Forums/Blogs <ul><li>Forum.chemspider.com </li></ul><ul><li>ChemSpider Blog </li></ul><ul><li>Open Chemistry We...
Upcoming SlideShare
Loading in...5
×

Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry

2,239

Published on

A Talk delivered at both UNC Chapel Hill and Drexel University


There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. However, freedom costs and in many cases the cost is quality. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 150 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the issue of quality in many chemistry-related databases, approaches to cleaning up the data and how a curated platform can become the centralized hub for resourcing information about chemical entities. This includes experimental and predicted properties, analytical data, publications, suppliers and integrated databases. I will detail three efforts :1) the curation of chemistry on Wikipedia 2) an examination of structure integrity on the FDA Daily Med website, a web site of medication content and labeling as found in medication package inserts 3) recognizing chemical names in documents and providing a platform for structure-based searching of Open Access chemistry literature.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,239
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry

  1. 1. Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Antony Williams
  2. 2. Imagine a time when …. <ul><li>The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) </li></ul><ul><li>Chemistry articles are indexed and searchable by a free online service </li></ul><ul><li>Publicly funded research data can be shared and discussed in the Open, maybe as ONS? </li></ul><ul><li>Cheminformatics has as much of a public face as bioinformatics and protectionism of data related to “structures” has waned </li></ul>
  3. 3. ChemSpider - A Search Engine for Chemists <ul><li>Questions a chemist might ask… </li></ul><ul><ul><li>What is the melting point of n-butanol? </li></ul></ul><ul><ul><li>What is the chemical structure of Xanax? </li></ul></ul><ul><ul><li>Chemically, what is phenolphthalein? </li></ul></ul><ul><ul><li>What are the stereocenters of cholesterol? </li></ul></ul><ul><ul><li>Where can I find publications about xylene? </li></ul></ul><ul><ul><li>What are the different trade names for Ketoconazole? </li></ul></ul><ul><ul><li>What is the NMR spectrum of Aspirin? </li></ul></ul><ul><ul><li>What are the safety handling issues for Thymol Blue? </li></ul></ul><ul><ul><li>ChemSpider can answer all of these questions </li></ul></ul>
  4. 4. ChemSpider Data Content <ul><li>Over 21.5 million unique chemical structures from ca. 150 data sources </li></ul><ul><ul><li>Online Databases –PubChem, Drugbank, KEGG, Wikipedia </li></ul></ul><ul><ul><li>Literature – PubMed, J Het Chem, Nature, RSC, Open Access </li></ul></ul><ul><ul><li>Chemical Vendors – over 40 different vendors and growing </li></ul></ul><ul><ul><li>Personal Depositions – individual contributions </li></ul></ul><ul><ul><li>Content database vendors </li></ul></ul><ul><ul><li>Analytical data collections </li></ul></ul><ul><ul><li>Patents </li></ul></ul><ul><ul><li>Web scraping </li></ul></ul><ul><ul><li>Content is linked back to the original data sources </li></ul></ul>
  5. 5. Tell me about Aspirin
  6. 6. Tell me about Aspirin
  7. 7. Link outs
  8. 8. Links out to KEGG Kyoto Encyclopedia of Genes and Genomes
  9. 9. Tell me about Aspirin
  10. 10. Tell me About Aspirin
  11. 11. Tell me about Aspirin
  12. 12. Tell me about Aspirin
  13. 13. Tell me about Aspirin
  14. 14. Text- Indexing and ChemSpider? <ul><li>ChemSpider text-indexes almost 500,000 Open Access and Free Access articles </li></ul><ul><li>Collection is growing and more publishers have already agreed. Including theses in the future. </li></ul>
  15. 15. Open Access Literature Search
  16. 16. Search PubMed – ChemSpider
  17. 17. Other Searches <ul><li>What compounds have a mass of 300+/-0.001? </li></ul><ul><li>or search a combination of intrinsic/predicted properties </li></ul>
  18. 18. Other Searches
  19. 19. Complex Search
  20. 20. The Quality of Data Online… <ul><li>Aggregating data opens up quality issues </li></ul><ul><li>Structure-identifier associations are “dirty” </li></ul><ul><li>Structures are COMMONLY incorrect </li></ul><ul><li>Manual curation of small databases is enough work – what about millions of structures? </li></ul><ul><li>Structures are far from perfect. What is a “correct structure”? </li></ul><ul><ul><li>Full stereochemistry? </li></ul></ul><ul><ul><li>Historical timeline of structure? </li></ul></ul><ul><ul><li>Who is the authority? </li></ul></ul>
  21. 21. Who holds THE Quality Authority? <ul><li>Chemical Abstracts Service is the structural authority today. 1400 (?) employees, world standard in chemistry information </li></ul><ul><li>101 years of knowledge, process and expertise. </li></ul><ul><li>MANUAL curation is key. Robotic curation is enabling – Chloride with no Cl </li></ul><ul><li>How can an online, free access system peacefully co-exist with the authority? </li></ul>
  22. 22. Quality is a Major Issue- Search Butanol OLD EXAMPLE..now fixed
  23. 23. Wikipedia Chemistry Curation project <ul><li>Only ca. 5000 organic structures, 7000 total structures </li></ul><ul><li>Almost a year of work so far for a team of 6 people </li></ul><ul><li>Many errors removed in the process. Curation process is a daily event for users/depositors </li></ul><ul><li>Slow and torturous process </li></ul><ul><li>http://en.wikipedia.org/wiki/Talk:Tacrolimus#IUPAC_Name_and_structure </li></ul>
  24. 24. Wikipedia Curation <ul><li>Looking for self-consistency across a Wikipedia Page </li></ul><ul><li>Primary key is the article TITLE </li></ul><ul><li>The chemical shown needs to match the title – then registry numbers, names, identifiers, outlinks need to match the chemical shown. </li></ul><ul><li>Cyclic self-consistency – and decisions must get made </li></ul>
  25. 25. Viagra <ul><li>Viagra is sildenafil… </li></ul>
  26. 26. Viagra is Sildenafil Citrate <ul><li>Sildenafil may be shown in the Wikipedia record but their might be a redirect from a search on Viagra </li></ul><ul><li>CAS registry Number: 139755-83-2, 171599-83-0 (as citrate). If structure box shows neutral compound then CAS Number must match neutral compound OR annotate </li></ul>
  27. 27. Other issues… <ul><li>If structure shows no stereo then don’t put stereo in name </li></ul><ul><li>Outlinks to external databases – links should be structure to structure not name to name </li></ul><ul><li>Tautomers </li></ul>
  28. 28. Charges
  29. 29. Sugars – Machine Readable vs Aesthetics Fischer Stereo Haworth
  30. 30. Wikipedia – Crowdsourcing Chemistry
  31. 31. Thymol Blue on ChemSpider <ul><li>Data online includes: </li></ul><ul><ul><li>UV-vis spectrum </li></ul></ul><ul><ul><li>Measured experimental properties </li></ul></ul><ul><ul><li>Link to Wikipedia article </li></ul></ul><ul><ul><li>Links to chromatography details </li></ul></ul><ul><ul><li>Multiple identifiers/trade names etc. </li></ul></ul><ul><ul><li>Links to vendors/suppliers/other databases </li></ul></ul><ul><ul><li>Safety information </li></ul></ul><ul><ul><li>http://www.chemspider.com/q/thymol%20blue </li></ul></ul>
  32. 32. Differences between ChemSpider/Wikipedia No, but links. Analytical Data Active editors > 50 (?) Active depositors/curators – 30 No Prediction of properties ???? 6000 people/day; 1900 registered Detailed compound monographs Compound monographs linked Text Complex queries – Properties, Text, structure/substructure, OA publishers, Data Sources, … ~5000 organics, 2000 others >21 million unique structures Wikipedia ChemSpider
  33. 33. Differences between Wikipedia/ChemSpider Growing reputation as focused on quality Worldwide reputation as quality source – good and bad Chemistry is the focus of ‘Spider Chemistry is a subset of the ‘Pedia Mixed “licensing” GFL licensing for everything Growing team of advocates, curators and users Strong team of WP:Chem advocates, curators and admins “ Out of a basement” on three servers and 5 volunteers Established infrastructure and Wikipedia Foundation Team Primarily Microsoft .NET technologies with OS components Supported by tried and tested Media-Wiki platform. ChemSpider Wikipedia
  34. 34. Usage Growth – Representative Plot
  35. 35. Crowd-sourcing Curation <ul><li>How to curate data for millions of structures? </li></ul><ul><li>Robot processes can clean up depositions </li></ul><ul><ul><li>Search for Chloride and check molecular formula for Cl </li></ul></ul><ul><ul><li>Check for stereochemistry and remove names with stereo </li></ul></ul><ul><li>Provide a simple-to-use platform to curate, annotate and tag data </li></ul><ul><li>Provide curator administration to prevent vandalism (Veropedia) </li></ul>
  36. 36. Multi-level Curation and Approval
  37. 37. Post Comments <ul><li>Anyone can “Post Comments” associated with a structure. To curate data we require login to track </li></ul>
  38. 38. Crowd-sourcing Chemistry <ul><li>Crowd-sourced curation: identify and tag errors, edit names, synonyms, identify records for deprecation </li></ul><ul><li>ALSO </li></ul><ul><li>Crowd-sourced deposition: anyone can deposit data (structures, text, images, analytical data) </li></ul>
  39. 39. Structure-Centric <ul><li>We want to search Open-Access articles by structure, substructure, similarity of structure </li></ul><ul><li>Standard approaches would be: </li></ul><ul><ul><li>Identify chemical names “entity extraction” </li></ul></ul><ul><ul><li>Convert chemical names to structures and index </li></ul></ul><ul><li>ChemSpider has a validated dictionary of structure-name pairs </li></ul><ul><li>Use name extraction, name-conversion and dictionary look-up. THEN curate. </li></ul>
  40. 40. “Entity Extraction” <ul><li>Rule-based recognition of systematic names: </li></ul><ul><ul><li>Use a lexeme of name fragments </li></ul></ul><ul><ul><li>Rules for identifying bounds of a name </li></ul></ul><ul><li>Look-up dictionary: </li></ul><ul><ul><li>Drug Names </li></ul></ul><ul><ul><li>Trivial Names </li></ul></ul><ul><ul><li>Numbers : Registry IDs, EINECS/ELINCS/Beilstein IDs </li></ul></ul><ul><ul><li>Massive look-up dictionary of validated identifiers on ChemSpider </li></ul></ul>
  41. 41. Name-to-Structure and Lexemes
  42. 42. Name Recognition <ul><li>Azo aldehyde 2   was  synthesized according to a reported  method [17]. To  a stirred  solution  of azo aldehyde 2   (1.08 g, 3.76 mmol )  in  dry CH2Cl2  (30.00 mL) at  0 oC  were  successively  added (3,4-diaminophenyl)phenyl methanone 1 (0.40 g, 1.88 mmol) and a excces of anhydrous MgSO4 (2.00 g,16.67 mmol) . </li></ul><ul><li>The resulting  mixture  was  stirred  for  6 hours  at room temperature [18]. The mixture was  filtered and washed with dichloromethane . Then the solvent was  evaporated under reduced pressure to  give azo Schiff base 3   as a red solid which was recrystalized from ethanol 95%    (1.28 g, 91 %) </li></ul>
  43. 43. Name Recognition <ul><li>Azo aldehyde 2   was  synthesized according to a reported  method [17]. To  a stirred  solution  of azo aldehyde 2   (1.08 g, 3.76 mmol )  in  dry CH2Cl2   (30.00 mL) at  0 oC  were  successively  added  (3,4-diaminophenyl)phenyl methanone 1 (0.40 g, 1.88 mmol) and a excess of anhydrous MgSO 4 (2.00 g,16.67 mmol) . </li></ul><ul><li>The resulting  mixture  was  stirred  for  6 hours  at room temperature [18]. The mixture was  filtered and washed with dichloromethane . Then the solvent was  evaporated under reduced pressure to  give azo Schiff base 3   as a red solid which was recrystalized from ethanol 95%    (1.28 g, 91 %) </li></ul>
  44. 44. How Many Chemical Names? <ul><li>“ She had the drive to derive success in any venture and was well versed in Karate. When the man in the tartan shirt approached her with a dagger in his hand she spat in his face, took the stance of a commando and took advantage of his shock to release the dagger from his grip and causing him to recoil. He went home and took an aspirin after the beating.” </li></ul>
  45. 45. How Many Chemical Names? <ul><li>“ She had the drive to derive success in any venture and was well versed in Karate . When the man in the tartan shirt approached her with a dagger in his hand she spat in his face, took the stance of a commando and took advantage of his shock to release the dagger from his grip and causing him to recoil . He went home and took an aspirin after the beating.” </li></ul>
  46. 46. ChemMantis <ul><li>Chem ical M arkup A nd N omenclature T ransformation I ntegrated S ystem </li></ul>
  47. 47. Making Open Access Articles Searchable Proof of Concept <ul><li>Can we HOST Chemistry Open Access articles on ChemSpider and add-value </li></ul><ul><li>Can we identify chemical names in Open Access articles in a user-friendly manner </li></ul><ul><li>Can we convert names to structures in Open-Access articles and expand ChemSpider and provide structure searching of Open Access chemistry articles? </li></ul><ul><li>Can we provide an environment for chemists to mark-up their own articles and crowd-source markup of an archive? </li></ul>
  48. 48. Document markup <ul><li>ChemSpider now hosting Open Access articles from MDPI, Molecular Diversity Preservation International </li></ul><ul><li>Hosting the Molbank collection at present </li></ul>
  49. 49. A Standard for Document Markup? <ul><li>NLM-DTD: National Library of Medicine; Document Type Definition </li></ul><ul><li>Approved markup definitions to apply to journal articles – extended as necessary for our purposes </li></ul>
  50. 50. NLM/DTD markup
  51. 51. Chemistry and Biology <ul><li>Menus can be extended as necessary </li></ul>
  52. 52. Document markup
  53. 53. Markup – 3 seconds!
  54. 54. On the fly conversion
  55. 55. Shorthand Formulae Supported
  56. 56. Curators Tools During Markup
  57. 57. One Click to more Info…
  58. 58. Coming Soon….
  59. 59. Structure Image Conversion
  60. 60. Two Seconds Later
  61. 61. Not Always Perfect….
  62. 62. A Platform for Markup <ul><li>Can we provide a platform for document markup for chemists? </li></ul><ul><li>Workflow: </li></ul><ul><ul><li>Upload word docs, RTF files or point to HTML and load </li></ul></ul><ul><ul><li>Apply entity extraction, convert names to structures, mark-up automatically and ask for user participation </li></ul></ul><ul><ul><li>Publish final version with NLM-DTD markup </li></ul></ul><ul><ul><li>Deposit all structures on ChemSpider under embargo and wait for article DOI to release </li></ul></ul>
  63. 63. Challenges <ul><li>Computer software can generate chemical names better than the majority of chemists </li></ul><ul><li>The majority of chemical names are generated by humans, and Incorrect – convert to the wrong structure or are ambiguous </li></ul><ul><li>One name, Multiple Structures </li></ul>
  64. 64. Names and Structures <ul><li>Dichloroacetone </li></ul><ul><li>Trichloromethylsilane </li></ul>
  65. 65. Ambiguity
  66. 66. Ambiguity in Abbreviations - DPA
  67. 67. Ambiguity in Abbreviations - THF
  68. 68. Import is Easy <ul><li>Make articles Public/Private (embargo date soon) </li></ul><ul><li>Auto-markup and check by user </li></ul>
  69. 69. IUPAC PAC Articles
  70. 70. Supports Word .DOC, HTML, RTF
  71. 71. Drexel University Documents
  72. 72. Drexel University Documents
  73. 73. Drexel University Documents
  74. 74. Patents
  75. 75. Organism Markup
  76. 76. Extensible Markup Process <ul><li>Markup process is easily extendable </li></ul><ul><li>Configurable from one XML file </li></ul><ul><li>NLM/DTD is incorporated but is easy to extend </li></ul>
  77. 77. Markup Movie
  78. 78. DailyMed
  79. 79. DailyMed on ChemSpider
  80. 80. Quality of Structures!!!
  81. 81. Quality of Structures
  82. 82. DailyMed
  83. 83. Remaining Issues in Markup <ul><li>Ongoing curation of look-up dictionaries – one name linked to multiple structures…which is right? Remember tautomers! </li></ul><ul><li>Import and markup of large documents – tested up to 2Mbyte. Needs extending for larger documents </li></ul><ul><li>Add comments/tags to documents where markup finds issues. </li></ul>
  84. 84. Oops… <ul><li>Online document markup and indexing is a very disruptive offering and a natural extension for ChemSpider </li></ul>
  85. 85. What’s Coming? <ul><li>Agreement with Royal Society of Chemistry that we can add their structure-based RSS feeds to ChemSpider </li></ul><ul><li>Agreement with Nature Publishing Group to add their Nature Chemical Biology structure collections to ChemSpider as they issue </li></ul><ul><li>Presently indexing Acta Chemica Scandanavica, 1947-1999 PDF backfile – our first foray into OCR </li></ul><ul><li>Presently indexing PLoS journals directly </li></ul><ul><li>More publishers have agreed… </li></ul>
  86. 86. What’s Coming? <ul><li>Now working on “organism” extraction in document markup to identify organisms and link out to external resources – bacteria, fungi, viruses </li></ul><ul><li>Open platform for users to deposit and markup their documents </li></ul><ul><li>Export the markup in XML format and then map to the NLM-DTD </li></ul><ul><li>Extract machine-readable structures directly </li></ul>
  87. 87. Conclusions <ul><li>The quality of structure-based data online should always be questioned – that includes ChemSpider </li></ul><ul><li>Robots and software algorithms can help but eyeballs are necessary </li></ul><ul><li>Data on ChemSpider are being added and curated on a daily basis but we need more eyeballs helping always </li></ul><ul><li>ChemSpider now has a large validated structure-name dictionary </li></ul><ul><li>Chemical name extraction and document markup is very enabling </li></ul>
  88. 88. Further reading <ul><li>www.chemspider.com/blog </li></ul><ul><li>Internet-based tools for communication and collaboration in chemistry, Drug Discovery Today, Volume 13, Numbers 11/12, June 2008 502-506, doi:10.1016/j.drudis.2008.03.015 </li></ul><ul><li>A perspective of publicly accessible/open-access chemistry databases, Drug Discovery Today, Volume 13, Numbers 11/12, June 2008, 495-501, doi:10.1016/j.drudis.2008.03.017 </li></ul>
  89. 89. ChemSpider Forums/Blogs <ul><li>Forum.chemspider.com </li></ul><ul><li>ChemSpider Blog </li></ul><ul><li>Open Chemistry Web </li></ul><ul><li>ChemConnector Blog </li></ul><ul><li>www.chemspider.com/blog </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×