• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Text Mining for Chemistry and Building a Public Platform for Document Markup
 

Text Mining for Chemistry and Building a Public Platform for Document Markup

on

  • 2,067 views

Text Mining for Chemistry and Building a Public Platform for Document Markup ...

Text Mining for Chemistry and Building a Public Platform for Document Markup

The identification of chemical names in documents has provided platforms to enable structure-based searching of patents and mark-up chemistry publications. A natural extension is the ability to make chemistry articles, blog pages, wiki pages and other documents searchable by the extracted chemical structures. The ChemSpider database is built on a database of over 21 million unique chemical entities from close to 200 data sources and provides a rich resource of information for chemists. We will report on our efforts to integrate chemical name extraction with the ChemSpider platform to enable structure searching of Open Access chemistry articles, and online chemistry materials. We will unveil our online document markup platform for chemists to make both their open- and closed-access publications searchable by the language of chemistry – the structure.

Statistics

Views

Total Views
2,067
Views on SlideShare
2,064
Embed Views
3

Actions

Likes
2
Downloads
27
Comments
0

2 Embeds 3

http://www.chemspider.com 2
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Text Mining for Chemistry and Building a Public Platform for Document Markup Text Mining for Chemistry and Building a Public Platform for Document Markup Presentation Transcript

    • Text mining for chemistry and building a public platform for document markup Antony Williams
    • Searching and Reading Articles…
      • Online search tools for chemistry articles are generally text-based
      • Searching articles based on chemical structure and substructure is very expensive.. but is changing
      • Text-mining is a “hot area” of research ….but what is public? What depends on public curation?
    • Text-Based Search Tools
      • Google
      • Pubmed
      • Google Scholar
      • Publishers websites
      • And 10s of other resources….
    • Vancomycin Through PubChem
    • Vancomycin Text Searches
      • Pubmed
      • Google Scholar
    • Online Structure Searching of Articles
      • Some capabilities from publishers starting to show up
    • Publishers should adopt/add InChIs RSC and Nature Publishing Group have!
    •  
    • ChemMantis - Single Click Mark-up
    • Name-Structure Pairs
    • Converting Detected Names…
      • Names are searched against a validated dictionary (this expands as ChemSpider is curated
      • If not found then they are passed through a Name to Structure algorithm
      • If they cannot convert then ChemSpider is searched for non-validated names
    • RED Underline Non-validated, Cannot Convert through NTS
      • “Names” can be added to Suppress List
    • BLUE Underline Name to Structure Converted
    • Deposit Structures
      • Entity Extraction built around modified algorithms from SureChem
      • Optimized for “publications”
      • Dictionaries for chemical entities, groups, reactions, elements, families, species…
      • Dictionaries can be expanded – presently adding PDB
    • Species..
    • What do you do with a markup system?
      • Test it, Show it off and make it available…
      • Tested on chemistry articles so why not HOST articles?
      • …and create an online journal…
    • The ChemSpider Journal
    • Open Access Community Journal
    • Deposit Article
      • Import URL or Document
      • Copy-Paste
      • Markup
    • Copy-Paste Version Martin Walker Monthly Article
    • Chemical names
    • Names, Elements, Groups, Families
    • Outlinks
    • Mark Up Open Access Article
    • Online Journals and Live Data
    • A Community Resource of Spectra
      • Spectra deposited on ChemSpider as “Open Data” are available to anybody to “Embed” in their articles, blogs, wikis etc
    • Present Dictionaries
      • Chemical names - ChemSpider Validated Names
      • Reactions - Wikipedia Named Reactions and RSC Reaction Ontology reactions
      • Species – Wikipedia “species”
      • To add – New Dictionaries
        • PDB codes
        • IUPAC Gold Book
    • Conclusions
      • The internet enables chemistry – and at a reduced cost
      • Web 2.0 is here and improving quality – to benefit 3.0
      • Question Quality!
      • Crowdsourcing for expansion, curation and integration
      • Classical models may die quite quickly – business models must change soon or fail
      • Publishers – heed the profileration of InChIs for Chemistry