Text mining for chemistry and building a public platform for document markup Antony Williams
Searching and Reading Articles… <ul><li>Online search tools for chemistry articles are generally text-based </li></ul><ul>...
Text-Based Search Tools  <ul><li>Google </li></ul><ul><li>Pubmed  </li></ul><ul><li>Google Scholar </li></ul><ul><li>Publi...
Vancomycin Through PubChem
Vancomycin Text Searches <ul><li>Pubmed </li></ul><ul><li>Google Scholar </li></ul>
Online Structure Searching of Articles <ul><li>Some capabilities from publishers starting to show up </li></ul>
Publishers should adopt/add InChIs RSC and Nature Publishing Group have!
 
ChemMantis - Single Click Mark-up
Name-Structure Pairs
Converting Detected Names… <ul><li>Names are searched against a validated dictionary (this expands as ChemSpider is curate...
RED Underline Non-validated, Cannot Convert through NTS <ul><li>“Names” can be added to Suppress List </li></ul>
BLUE Underline Name to Structure Converted
Deposit Structures
<ul><li>Entity Extraction built around modified algorithms from SureChem </li></ul><ul><li>Optimized for “publications” </...
Species..
What do you do with a markup system? <ul><li>Test it, Show it off and make it available… </li></ul><ul><li>Tested on chemi...
The ChemSpider Journal
Open Access Community Journal
Deposit Article <ul><li>Import URL or Document </li></ul><ul><li>Copy-Paste </li></ul><ul><li>Markup </li></ul>
Copy-Paste Version Martin Walker Monthly Article
Chemical names
Names, Elements, Groups, Families
Outlinks
Mark Up Open Access Article
Online Journals and Live Data
A Community Resource of Spectra <ul><li>Spectra deposited on ChemSpider as “Open Data” are available to anybody to “Embed”...
Present Dictionaries <ul><li>Chemical names - ChemSpider Validated Names </li></ul><ul><li>Reactions - Wikipedia Named Rea...
Conclusions <ul><li>The internet enables chemistry – and at a reduced cost </li></ul><ul><li>Web 2.0 is here and improving...
Upcoming SlideShare
Loading in...5
×

Text Mining for Chemistry and Building a Public Platform for Document Markup

1,095

Published on

Text Mining for Chemistry and Building a Public Platform for Document Markup

The identification of chemical names in documents has provided platforms to enable structure-based searching of patents and mark-up chemistry publications. A natural extension is the ability to make chemistry articles, blog pages, wiki pages and other documents searchable by the extracted chemical structures. The ChemSpider database is built on a database of over 21 million unique chemical entities from close to 200 data sources and provides a rich resource of information for chemists. We will report on our efforts to integrate chemical name extraction with the ChemSpider platform to enable structure searching of Open Access chemistry articles, and online chemistry materials. We will unveil our online document markup platform for chemists to make both their open- and closed-access publications searchable by the language of chemistry – the structure.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,095
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Text Mining for Chemistry and Building a Public Platform for Document Markup"

  1. 1. Text mining for chemistry and building a public platform for document markup Antony Williams
  2. 2. Searching and Reading Articles… <ul><li>Online search tools for chemistry articles are generally text-based </li></ul><ul><li>Searching articles based on chemical structure and substructure is very expensive.. but is changing </li></ul><ul><li>Text-mining is a “hot area” of research ….but what is public? What depends on public curation? </li></ul>
  3. 3. Text-Based Search Tools <ul><li>Google </li></ul><ul><li>Pubmed </li></ul><ul><li>Google Scholar </li></ul><ul><li>Publishers websites </li></ul><ul><li>And 10s of other resources…. </li></ul>
  4. 4. Vancomycin Through PubChem
  5. 5. Vancomycin Text Searches <ul><li>Pubmed </li></ul><ul><li>Google Scholar </li></ul>
  6. 6. Online Structure Searching of Articles <ul><li>Some capabilities from publishers starting to show up </li></ul>
  7. 7. Publishers should adopt/add InChIs RSC and Nature Publishing Group have!
  8. 9. ChemMantis - Single Click Mark-up
  9. 10. Name-Structure Pairs
  10. 11. Converting Detected Names… <ul><li>Names are searched against a validated dictionary (this expands as ChemSpider is curated </li></ul><ul><li>If not found then they are passed through a Name to Structure algorithm </li></ul><ul><li>If they cannot convert then ChemSpider is searched for non-validated names </li></ul>
  11. 12. RED Underline Non-validated, Cannot Convert through NTS <ul><li>“Names” can be added to Suppress List </li></ul>
  12. 13. BLUE Underline Name to Structure Converted
  13. 14. Deposit Structures
  14. 15. <ul><li>Entity Extraction built around modified algorithms from SureChem </li></ul><ul><li>Optimized for “publications” </li></ul><ul><li>Dictionaries for chemical entities, groups, reactions, elements, families, species… </li></ul><ul><li>Dictionaries can be expanded – presently adding PDB </li></ul>
  15. 16. Species..
  16. 17. What do you do with a markup system? <ul><li>Test it, Show it off and make it available… </li></ul><ul><li>Tested on chemistry articles so why not HOST articles? </li></ul><ul><li>…and create an online journal… </li></ul>
  17. 18. The ChemSpider Journal
  18. 19. Open Access Community Journal
  19. 20. Deposit Article <ul><li>Import URL or Document </li></ul><ul><li>Copy-Paste </li></ul><ul><li>Markup </li></ul>
  20. 21. Copy-Paste Version Martin Walker Monthly Article
  21. 22. Chemical names
  22. 23. Names, Elements, Groups, Families
  23. 24. Outlinks
  24. 25. Mark Up Open Access Article
  25. 26. Online Journals and Live Data
  26. 27. A Community Resource of Spectra <ul><li>Spectra deposited on ChemSpider as “Open Data” are available to anybody to “Embed” in their articles, blogs, wikis etc </li></ul>
  27. 28. Present Dictionaries <ul><li>Chemical names - ChemSpider Validated Names </li></ul><ul><li>Reactions - Wikipedia Named Reactions and RSC Reaction Ontology reactions </li></ul><ul><li>Species – Wikipedia “species” </li></ul><ul><li>To add – New Dictionaries </li></ul><ul><ul><li>PDB codes </li></ul></ul><ul><ul><li>IUPAC Gold Book </li></ul></ul>
  28. 29. Conclusions <ul><li>The internet enables chemistry – and at a reduced cost </li></ul><ul><li>Web 2.0 is here and improving quality – to benefit 3.0 </li></ul><ul><li>Question Quality! </li></ul><ul><li>Crowdsourcing for expansion, curation and integration </li></ul><ul><li>Classical models may die quite quickly – business models must change soon or fail </li></ul><ul><li>Publishers – heed the profileration of InChIs for Chemistry </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×