Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ICIC 2017: Open data in chemistry: the fast track to scientific content

575 views

Published on


Josef Eiblmaier (InfoChem, Germany)
Dorothee Geppert (InfoChem, Germany)
Heinz Saller (InfoChem, Germany)
Open chemistry platforms such as PubChem offer depositor-provided cross-references to scientific abstracts in PubMed for many compounds. However, only a few cases have direct links to associated primary literature on the publisher’s site.

In our talk we will show how chemical data gathered through Chemical Named Entity Recognition will enhance discoverability and accessibility of scientific information in an easy and intuitive way. By depositing these chemicals in public databases, literature information can be searched by chemical structure beyond typical text search. Additionally, the use of a smart algorithm ensures that only relevant compounds found in an article will be displayed.

We will present publisher-internal projects dealing with the comprehensive chemical annotation of relevant content and the deposition of the results in PubChem. Additionally we will discuss further opportunities with regards to other platforms like ChemSpider, ZINC or OpenPHACTS.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

ICIC 2017: Open data in chemistry: the fast track to scientific content

  1. 1. InfoChem Copyright © 2017 1 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Open data in chemistry: the fast track to scientific content Dr. Josef Eiblmaier, Dr. Heinz Saller, Dr. Dorothee Geppert Heidelberg, October 24th 2017 Outline
  2. 2. InfoChem Copyright © 2017 2 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Outline » The project NERO 2 » PubChem » A show case » Relevance ranking » Outlook: further opportunities © cora / PIXELIO, www.pixelio.de Outline
  3. 3. InfoChem Copyright © 2017 3 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th 2015: The Networked: Springer Chemistry Demonstrator © Stephanie Hofschlaeger / pixelio.de » Large scale automatic extraction of chemical entities from SpringerLink documents » Joint definition of output formats (inline and standoff XML) » Semantic enrichment of chemically relevant SpringerLink documents (> 2,700 titles) » Creation of a chemical registry including all chemistry sources of Springer / InfoChem having structural information » Implementation of an online-demonstrator, Interlink different data repositories via the chemical structure The project NERO 2
  4. 4. InfoChem Copyright © 2017 4 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Discoverability and accessability Showcase data for SciGraph » Position Springer Nature as a thoughtleader in Open Data » Raise attractiveness of Springer Nature for authors » Set the basis to enhance content and functionality of Springer Nature » Strengthen our position in chemistry via Open Data Project NERO 2 (Named Entity Recognition of Organic Compounds) The project NERO 2
  5. 5. InfoChem Copyright © 2017 5 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th The Content Journal articles, Major Reference Works, eBooks (1846 - 2017) from: » Biomedical and Life Sciences » Chemistry and Material Science » Earth and Environmental Science » Engineering » Medicine » Physics and Astronomy Chemistry related journals » Nature Communications » Scientific Reports » Nature Reviews Materials » Nature Reviews Chemistry » Nature Chemistry » Nature Chemical Biology The project NERO 2
  6. 6. InfoChem Copyright © 2017 6 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Concept InChIKeys + 7-Chlor-1-methyl-5-phenyl- 3-hydro-1,4-benzodiazepin-2-on The project NERO 2
  7. 7. InfoChem Copyright © 2017 7 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Results Count Comment Input documents 7,133,996 Selected SpringerLink subject collections, 6 Nature journals Documents containing significant chemistry 3,981,327 Containing at least one compound with relevance value > 0.1 Documents having a PMID 858,934 Also available via PubMed Documents Open/free access 294,380 Unique chemical compounds 604,506 Compound -> DOI links 26,839,100 The project NERO 2
  8. 8. InfoChem Copyright © 2017 8 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Discoverability? 2.1 million SpringerLink documents on PubMed 1.188 million visits on SpringerLink (July 2016) = 6%! Only little Springer Nature content (compounds) on PubChem No significant traffic generated from PubChem! The project NERO 2
  9. 9. InfoChem Copyright © 2017 9 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th » 93,890,677 compounds » 1,252,816 Bio assays » 10,341 protein targets » 22,104 gene targets » On the order of 100k searches per day, via API: 2 – 12M * » 1.7 million monthly visits in last 3 months** * Evan Bolton, Lead Scientist at NCBI, October 15th , 2017 ** Natalia Manrique, Analyst in Market Intelligence SpringerNature *** Noorden, Richard Van (27 March 2012). "Chemistry's web of data expands". Nature. PubChem is the first major public database to connect cheminformatics to bioinformatics and thereby provide a unique information resource for pharmaceutical research. (Steve Bryant, NIH) PubChem is the largest known public repository of biological and chemical data (Tudor I. Oprea (University of New Mexico) ) PubChem is the world's largest free chemical database *** PubChem
  10. 10. InfoChem Copyright © 2017 10 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th » 93,890,677 compounds » 1,252,816 Bio assays » 10,341 protein targets » 22,104 gene targets » On the order of 100k searches per day, via API: 2 – 12M * » 1.7 million monthly visits in last 3 months** * Evan Bolton, Lead Scientist at NCBI, October 15th , 2017 ** Natalia Manrique, Analyst in Market Intelligence SpringerNature *** Noorden, Richard Van (27 March 2012). "Chemistry's web of data expands". Nature. PubChem is the first major public database to connect cheminformatics to bioinformatics and thereby provide a unique information resource for pharmaceutical research. (Steve Bryant, NIH) PubChem is the largest known public repository of biological and chemical data (Tudor I. Oprea (University of New Mexico) ) PubChem is the world's largest free chemical database *** PubChem
  11. 11. InfoChem Copyright © 2017 11 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th PubChem
  12. 12. InfoChem Copyright © 2017 12 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Literature information in PubChem: associations between PubChem records and scientific articles, Sunghwan Kim,corresponding author Paul A. Thiessen, Tiejun Cheng, Bo Yu, Benjamin A. Shoemaker, Jiyao Wang, Evan E. Bolton, Yanli Wang, and Stephen H. Bryant, J Cheminform. 2016; 8: 32 PubChem
  13. 13. InfoChem Copyright © 2017 13 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Concept 604,506 NERO2 Compounds extracted from SpringerLink and nature.com documents • InChIKeys • DOIs (SpringerLink / Nature documents) • relevance values • journal titles • article/chapter titles • publication year PubChem
  14. 14. InfoChem Copyright © 2017 14 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Concept 604,506 NERO2 Compounds extracted from SpringerLink and nature.com documents • InChIKeys • DOIs (SpringerLink / Nature documents) • relevance values • journal titles • article/chapter titles • publication year Upload Link to SpringerLink Link to nature.com Link to Biomed Central PubChem
  15. 15. InfoChem Copyright © 2017 15 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Distribution of Documents and Compounds PubChem
  16. 16. InfoChem Copyright © 2017 16 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Distribution of Documents and Compounds PubChem
  17. 17. InfoChem Copyright © 2017 17 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Distribution of Documents and Compounds 3,981,327 documents / 604,506 compounds PubChem
  18. 18. InfoChem Copyright © 2017 18 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Documents in PubMed? Open / Free Access? 3,981,327 documents PubChem
  19. 19. InfoChem Copyright © 2017 19 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Novelty? Druglikeness? 604,506 compounds * Compounds satisfied Lipinski's Rule of Five in evaluating druglikeness. (https://en.wikipedia.org/wiki/Lipinski%27s_rule_of_five) PubChem
  20. 20. InfoChem Copyright © 2017 20 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Show Case Temozolomide » Oral chemotherapy drug (brain tumor) » ‘Temodar’ (Merck US) » ‘Temcad’ (Cadila Pharmaceuticals) A show case
  21. 21. InfoChem Copyright © 2017 21 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th » ‘Temodar’ (Merck US): 233 hits » ‘Temcad’ (Cadila Pharmaceuticals): 2 hits » ‘Temozolomid’: (German INN): 608 hits » ‘Temozolomide’: (English INN) 7,583 hits » ‚85622-93-1‘: 7 hits » ‚BPEGJWRSRHCHSN-UHFFFAOYSA-N ‘: 0 hits » Methazolastone: (FDA SRS): 27 hits A show case Show Case Temozolomide
  22. 22. InfoChem Copyright © 2017 22 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  23. 23. InfoChem Copyright © 2017 23 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  24. 24. InfoChem Copyright © 2017 24 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  25. 25. InfoChem Copyright © 2017 25 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  26. 26. InfoChem Copyright © 2017 26 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  27. 27. InfoChem Copyright © 2017 27 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  28. 28. InfoChem Copyright © 2017 28 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  29. 29. InfoChem Copyright © 2017 29 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  30. 30. InfoChem Copyright © 2017 30 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  31. 31. InfoChem Copyright © 2017 31 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  32. 32. InfoChem Copyright © 2017 32 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th A show case
  33. 33. InfoChem Copyright © 2017 33 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Relevance Ranking? Why? ? Relevance ranking
  34. 34. InfoChem Copyright © 2017 34 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Relevance Ranking? Why? ? Relevance ranking
  35. 35. InfoChem Copyright © 2017 35 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Relevance Ranking? Why? Main questions » Which are the most relevant compounds in one article/chapter? » Which are the most relevant articles/chapters for a given compound? Task Add a relevance value to every unique structure per document and create InchIKey <-> DOI <-> RV triples Relevance ranking
  36. 36. InfoChem Copyright © 2017 36 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th The Algorithm RV 𝑠, 𝑑 = 𝑆𝐹 ∙ 𝐼𝐷𝐹 + 𝐷𝑆 ∙ 𝐶𝑆 SF∙IDF structure frequency – inverse document frequency, 𝑆𝐹∙𝐼𝐷𝐹 = 𝑓(𝑠,𝑑) 𝑓 𝑤𝑜𝑟𝑑𝑠 (𝑑) log 𝑁 𝑛 𝑠 f: #structure s in d; fwords: #words in d; N: #docs; ns: #docs with structure s DS document structure CS chemistry related contributions Relevance ranking
  37. 37. InfoChem Copyright © 2017 37 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Challenges: Acronyms and Numbered Compounds Acronyms: “… specific bioactivity of (2S,3R)-3-amino-2-hydroxy-4- phenylbutanic acid (AHPA) was measured … in the following text only referred to as ‚AHPA‘ Numbered compounds: “… preparation of 3-(tritylthio)propionic acid (6) is done in a solution of …“ in the following text only referred to as ‚compound 6‘ or ‚6‘. Relevance ranking
  38. 38. InfoChem Copyright © 2017 38 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Challenges: Ambiguities Diethylazodicarboxylate ‚DEAD‘ Moexipril Perdix ® Hexen (German) Relevance ranking
  39. 39. InfoChem Copyright © 2017 39 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th The Other Way Round: Which Chemicals are Important in 1 Article? Effects of Sublethal Doses of Acetamiprid and Thiamethoxam on the Behavior of the Honeybee (Apis mellifera) Acetamiprid and thiamethoxam are insecticides introduced for pest control, but they can also affect non-target insects such as honeybees. In insects, these neonicotinoid insecticides are known to act on acetylcholine nicotinic receptors but the behavioral effects of low doses are not yet fully understood. The effects of acetamiprid and thiamethoxam were studied after acute sublethal treatment on the behavior of the honeybee (Apis mellifera) under controlled laboratory conditions. The drugs were either administered orally or applied topically on the thorax. After oral consumption acetamiprid increased sensitivity to antennal stimulation by sucrose solutions at doses of 1 μg/bee and impaired long-term retention of olfactory learning at the dose of 0.1 μg/bee. Acetamiprid thoracic application induced no effect in these behavioral assays but increased locomotor activity (0.1 and 0.5 μg/bee) and water-induced proboscis extension reflex (0.1, 0.5, and 1 μg/bee). Unlike acetamiprid, thiamethoxam had no effect on bees’ behavior under the conditions used. Our results suggest a particular vulnerability of honeybee behavior to sublethal doses of acetamiprid. Rank Name RV 1 acetamiprid 3.4422 2 thiamethoxam 2.6834 3 sucrose 0.9561 4 imidacloprid 0.4630 14 mentions in text, comparison 5 acetylcholine 0.3052 6 clothianidin 0.1245 3 mentions in text, metabolite of thiamethoxam 7 6-chloropyridine-3-carboxylicacid 0.0919 8 acetone 0.0537 9 phenylpyrazoles 0.0404 10 thiacloprid 0.0404 11 methyllycaconitine 0.0388 12 Fipronil 0.037 13 acetonitrile 0.033 14 nicotine 0.0199 Relevance ranking
  40. 40. InfoChem Copyright © 2017 40 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Relevance Ranking Verification 467 Springer articles from NERO1 were analyzed by chemistry subject matter experts Quality of Relevance Ranking in 94 % good or okay 89% 5% 6% good ok wrong Relevance ranking
  41. 41. InfoChem Copyright © 2017 41 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Outlook: What Comes Next? © bbroianigo / PIXELIO, www.pixelio.de Outlook: further opportunities
  42. 42. InfoChem Copyright © 2017 42 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Springer Nature SciGraph www.springernature.com/scigraph Outlook: further opportunities
  43. 43. InfoChem Copyright © 2017 43 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th Springer Nature SciGraph Data Landscape Compounds 600K www.springernature.com/scigraph Outlook: further opportunities
  44. 44. InfoChem Copyright © 2017 44 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th » NIH  Evan Bolton  Jan Zhang  Ben Shoemaker  The PubChem Team » Springer Nature  Henning Schönenberger  Markus Kaindl  The SciGraph Team » The InfoChem Team Acknowledgements © P. Storz / PIXELIO, www.pixelio.de
  45. 45. InfoChem Copyright © 2017 45 / 45 Dr. Josef EiblmaierICIC 2017, Heidelberg, Germany, October 24th InfoChem GmbH: www.infochem.de, www.spresi.com, info@infochem.de 4.bp.blogspot.com/.../s1600/thank-you.jpg

×