Your SlideShare is downloading. ×
  • Like
Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources

  • 1,639 views
Published

The extraction of chemical entities from documents such as patents and publications has been pursued for a number of years. We wish to report on ChemMantis, an integrated system for chemistry-based …

The extraction of chemical entities from documents such as patents and publications has been pursued for a number of years. We wish to report on ChemMantis, an integrated system for chemistry-based entity extraction and document mark-up enabling access to the rich resource of online chemistry know as ChemSpider. We will discuss the development of the platform from its inception as a series of dictionaries to the integration of an entity extraction algorithm and its expansion to a public deposition and publishing platform for chemistry. Chemistry articles can now be deposited, marked-up and exposed to the public within a few minutes in many cases making it an ideal platform for communicating research and providing integrated access to data sources including PubChem, ChEBI, Wikipedia and Entrez.

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,639
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
22
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building an integrated system for chemistry markup and online publishing integrated to online chemistry resources
  • 2. Electronic Publishing
    • Publishers know that embracing electronic publication is a must.
    • “ Cell Press and Elsevier have launched a project called Article of the Future … to redefine how the scientific article is presented online. ” Prototype: http://beta.cell.com/
  • 3. Publishers are experimenting
    • … invited researchers to prototype tools dealing with the ever-increasing amount of online life sciences information
    • The winners built:
    • Reflect: Automated Annotation of Scientific Terms : http:// reflect.ws
  • 4. Reflect
  • 5. Entity-Extraction, Mark-up, Annotate
  • 6. Entity-Extraction, Mark-up, Annotate
  • 7. And linked to STITCH…
  • 8. Success Depends on Dictionaries
  • 9. Concept Web Alliance, Knewco, Others
  • 10. NextBio and ScienceDirect
  • 11. Semantic Mark-up for Chemistry
    • Semantic mark-up for chemistry is here
      • RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies
      • Nature publishing group compound linking
      • ChemSpider Journal of Chemistry
  • 12. Nature Chemistry Compound Pages
  • 13. Project Prospect
  • 14. ChemSpider and Publishing
    • The curation efforts on ChemSpider led to a set of validated dictionaries
    • Integrate best-in-class entity extraction (SureChem) with validated name dictionaries
    • Additional dictionaries gave reactions, groups, families, hardware and software vendors etc
  • 15. ChemMantis and CJOC
  • 16. Name-Structure Pairs
  • 17. Converting Detected Names…
    • Names are searched against a validated dictionary (this expands as ChemSpider is curated)
    • If not found then they are passed through a Name to Structure algorithm
    • If they cannot convert then ChemSpider is searched for non-validated names
  • 18. Manual Curation is Necessary
  • 19. Deposit Structures
  • 20. Custom Dictionaries
    • Entity Extraction built around modified algorithms from SureChem
    • Optimized for “publications”
    • Dictionaries for chemical entities, groups, reactions, elements, families, species…
    • Dictionaries can be expanded
  • 21. Dictionaries are Easily Enhanced
    • Copy-Paste into appropriate Entity Dictionary
    • Impacts all future markups
    • Expanding knowledge bases of information
  • 22. Build Dictionaries
  • 23. Species – linked to Wikipedia
  • 24. Semantic Linking of Structures
    • What would you want to link off a structure?
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • 25. ChemSpider and its content
    • ChemSpider is:
      • A link farm for > 21 million compounds and 200 data sources
      • A curation platform to improve the quality of data online
      • A deposition platform for chemicals and content
  • 26. Link off a structure in ChemSpider
      • Chemical suppliers
      • Other publications
      • Analytical Data
      • Related Reactions
      • Wikipedia
      • Patents
      • “ Everything”
  • 27. SureChem Services
    • The SureChem Portal is a gateway for patent searching – can be searched by structure/substructure
    • ChemSpider previously integrated by depositing structures and linking out
  • 28. SureChem Services
    • Previous integration lacked any sense of numbers of patents, titles etc
    • New integration:
  • 29. SureChem Services
  • 30. Pubmed Articles Linked
  • 31. From Compounds to Syntheses
    • ChemSpider will support synthesis procedures moving forward
    • The ChemSpider Journal of Chemistry is an ideal platform for text-based procedures
  • 32. Org Prep Daily (Blog)
  • 33. Molbank (Open Access Journal)
  • 34. Synthetic Pages (Website)
  • 35. RSC Supplementary Info
  • 36. RSC Supplementary Info
  • 37. ChemSpider Synthesis
    • ChemSpider Synthesis will be a home for all things “synthetic”
    • An online resource for synthetic procedures from blogs, other online resources, RSC supplementary info, other publishers etc.
    • We will mine the RSC supplementary info backfile for reactions
    • Public peer-review and feedback for synthetic procedures
  • 38. Online Journals and Live Data
  • 39. Moving forward
    • Integrate ChemMantis and Project Prospect as appropriate – best of both worlds
    • Expand ChemSpider validated dictionaries with RSC content
    • Expand information in “Compound Boxes” after markup – take advantage of all ChemSpider resources
    • Invite the community to help build ChemSpider Synthesis
  • 40. Acknowledgments
    • SureChem (Nicko Goncharoff and Richard Koks)
    • RSC – Richard Kidd and Colin Batchelor
    • CJOC – multiple authors and reviewers
    • ChemSpider curation – a cast of many
  • 41. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog