More Related Content
Similar to II-SDV 2015, 20 - 21 April, in Nice
Similar to II-SDV 2015, 20 - 21 April, in Nice (20)
More from Dr. Haxel Consult
More from Dr. Haxel Consult (20)
II-SDV 2015, 20 - 21 April, in Nice
- 1. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
1 / 29
Interlinking scientific information by means of
chemical semantic enrichment
Valentina Eigner-Pitto, Josef Eiblmaier, Hans Kraut, Heinz Saller, Peter Loew
InfoChem GmbH, Landsberger Strasse 408, Munich, 81241, Germany
- 2. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
2 / 29
Outline
• Introduction
o Role of chemical structure searching
o Where do I perform structure searches?
o Cost implications
• Setting the scene: chemical structures as common denominator?
o Publishers efforts
Creation of chemical content
Semantic enrichment of journal articles
• Case Studies:
o Wiley The Smart Article
o Springer Chemistry Demonstrator
http://www.bubblews.com/news/2372700-tips-to-be-a-professional-content-writer
- 3. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
3 / 29
Why structure searching?
• CICAG (RSC) survey by Neil Stutchbury, May 20, 2009
Chemical Information Mining: Possibilities and Pitfalls
(http://www.rsc.org/images/ChemInfoMining_tcm18-153536.pdf)
65 responses from pharma, academia, vendors, and publishers
“Search documents by chemical structure or substructure”?
- 4. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
4 / 29
Diazepam OR Valium OR Ansiolisina OR Diazemuls OR Relanium OR Stesolid OR
Apaurin OR Faustan OR Seduxen OR Sibazon OR Methyldiazepinone OR Calmocitene
OR Neurolytril OR Bialzepam OR Ceregulart OR Condition OR Diazetard OR Liberetas
OR Relaminal OR Serenamin OR Tranquirit OR Ansiolin OR Apozepam OR Atensine
OR Bensedin OR Calmpose OR Diacepan OR Diazepan OR Dipezona OR Domalium
OR Kiatrium OR Paranten OR Quetinil OR Quiatril OR Quievita OR Renborin OR
Ruhsitus OR Seduksen OR Serenack OR Serenzin OR Stesolin OR Tensopam OR
Horizon OR Lembrol OR Morosan OR Saromet OR Sedipam OR Setonil Anxionil OR
Benzopin OR Calmaven OR Chuansuan OR Desconet OR Desloneg OR Diaceplex OR
Diazepin OR Gewacalm OR Jinpanfan OR Mentalium OR Metamidol OR Nixtensyn OR
Novodipam OR Pacitran OR Paralium OR Prozepam OR Psychopax OR Radizepam OR
Simasedan OR Trankinon OR Trazepam OR Valaxona OR Valiquid OR Valuzepam OR
Vanconin OR Antenex OR Arzepam OR Betapam OR Diapine OR Diaquel OR 7-Chloro-
1,3-dihydro-1-methyl-5-phenyl-2H-1,4-benzodiazepin-2-one OR NCGC00178168-01 OR
WLN: T67 GNV JN IHJ CG G1 KR OR 2H-1,4-Benzodiazepin-2-one, 7-chloro-1,3-
dihydro-1-methyl-5-phenyl- OR CPD000058398 OR SAM001246536 OR
SMR000058398 OR 439-14-5 OR 7-Chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin-
2(1H)-one OR 7-Chloro-1-methyl-2-oxo-5-phenyl-3H-1,4-benzodiazepine OR 7-Chloro-
1-methyl-5-phenyl-2H-1,4-benzodiazepin-2-one OR C06948 OR D00293 OR 5-24-04-
00300 OR D003975 OR A3662/0155188 OR I06-0194 OR 1-Methyl-5-phenyl-7-chloro-
1,3-dihydro-2H-1,4-benzodiazepin-2-one OR 7-Chloro-1-methyl-5-3H-1,4-
benzodiazepin-2(1H)-one OR 7-chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin-2-one
OR DZP OR Dap OR Pax OR 11100-37-1 OR 53320-84-6 OR
InChI=1/C16H13ClN2O/c1-19-14-8-7-12(17)9-13(14)16(18-10-15(19)20)11-5-3-2-4-6-
11/h2-9H,10H2,1H
... (343 Synonyms!)
„Full text searching is sufficient!“
WLN
SMILES
SMARTS
ROSDAL
Connection Table
Molfile
SDfile
CML
InChI
InChI Key
http://us.cdn4.123rf.com/168nwm/baz777/baz7771101/baz777110100
058/8576422-cartoon-scienziato-isolato-su-sfondo-bianco.jpg
- 5. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
5 / 29
Where am I able to perform structure searches?
- 6. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
6 / 29
Manuscript
submission
Publishing
Cost implications
Manual
Indexing
Database
production
http://premium.wpmudev.org/blog/tutorial-
how-to-add-authors-images-to-your-
wordpress-blog/
- 7. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
7 / 29
• Automatic production of chemical content:
o chemical named entity recognition
o automatic CDX work-up
• Semantic enrichment of journal articles
o RSC: RSC Semantic Publishing (Project Prospect)
o NPG: sematically enriched PDF
o Elsevier: Article of the future
o Wiley: The Smart Article
• Structure search on web-pages (partly)
• Automatic production of chemical content:
o chemical named entity recognition
o automatic CDX work-up
• Semantic enrichment of journal articles
o RSC: RSC Semantic Publishing (Project Prospect)
o NPG: sematically enriched PDF
o Elsevier: Article of the future
o Wiley: The Smart Article
• Structure search on web-pages (partly)
Publishers efforts
- 8. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
8 / 29
Automatic production of chemical content
Manual
IndexingPublishing
- 9. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
9 / 29
http://manuelo-pro.deviantart.com/art/Disclaimer-281316501
• Automatic production of chemical content:
o chemical named entity recognition
o automatic CDX work-up
• Semantic enrichment of journal articles
o RSC: RSC Semantic Publishing (Project Prospect)
o NPG: sematically enriched PDF
o Elsevier: Article of the future
o Wiley: The Smart Article
• Structure search on web-pages (partly)
• Automatic production of chemical content:
o chemical named entity recognition
o automatic CDX work-up
• Semantic enrichment of journal articles
o RSC: RSC Semantic Publishing (Project Prospect)
o NPG: sematically enriched PDF
o Elsevier: Article of the future
o Wiley: The Smart Article
• Structure search on web-pages (partly)
Publishers efforts
- 10. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
10 / 29
• Pioneer work: Project Prospect (2007)
• Online since 2011
• Extraction of chemical names from over
30,000 journal articles
• Integration of compounds into ChemSpider
• Approach integrated within routine
publication processes
• Features:
o Highlighting of:
Compounds
Chemical terms
Biomedical terms
o Link to compounds in ChemSpider
o Structure search only in ChemSpider
RSC: Semantic Publishing
- 11. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
11 / 29
• XMP-embedded PDFs available online since 2008
• Entity specific annotation service:
o SureChem for chemical compounds
o LuXiD for genes/proteins
o …
• Mix between automated services and editorial QA
• Features:
o Figures and compound browser
o Links to:
Web of Science
PubMed
CAS Reference Linking
o No structure search
Nature Publishing Group: Semantically enriched PDF
- 12. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
12 / 29
• Launched 2012
• Guiding principals:
o readability
o discoverability
o extensibility
• Supplementary content, features and external
databases info presented in right sidebar
• Features:
o 3-pane presentation layout:
navigation bar
main content area
right sidebar
o Links to:
NCBI
Reaxys
… (depending on subject)
o No structure search
Elsevier: Article of the Future
- 13. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
13 / 29
Wiley: The Smart Article
• Launched in 2012
• Goal: providing quick information on chemical compounds
featured in an article, chemical terms in the text, and other
key parts of the chemistry within the article
• Live for following journals and major reference works:
o Chemistry: An Asian Journal
o Chirality
o Applied Organometallic Chemistry
o Journal of Physical Organic Chemistry
o Journal of Heterocyclic Chemistry
o eEros
o Organic Synthesis
o Organic Reactions
• Features:
o Compound browser
o Chemistry term highlighter
o Compound index
o Enhanced abstract page
o Compound record
o Chemistry structure search
- 14. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
14 / 29
Structure as common denominator: 2 use cases
Chemistry Demonstrator
- 15. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
15 / 29
The challenge*
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
- 16. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
16 / 29
Text annotation: Chemistry enrichment workflow*
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
- 17. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
17 / 29
Text annotation: ICANNOTATOR
- 18. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
18 / 29
CDX scheme enumeration: Chemistry enrichment workflow*
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
- 19. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
19 / 29
Author‘s CDX file CDX template
Templating
CDX-Templating
Guidelines (Structures)
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
CDX scheme enumeration: Templating*
- 20. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
20 / 29
CDX scheme enumeration: Template*
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
- 21. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
21 / 29
CDX scheme enumeration:
ICSchemeProcessor
Chemical structures (SD files)
ICSchemeProcessor
Reactions (RD files)
Source: Thieme Pharmaceutical Substances, Ticagrelor (in production)
Reagent Solvent Catalyst
SOCl2
LiOH H2O, THF Pd(OAc)2
Cl-Co2Et,
Et3N
Acetone,
H2O
Conditions (RD files)
- 22. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
22 / 29
Examples
- 23. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
23 / 29
Chemistry Demonstrator
The challenge
• Chemical annotation of > 6 mio SpringerLink documents
• Interlink different data repositories via chemical structure
• Create one search interface
• Data aggregation / results consolidation
- 24. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
24 / 29
Document
display
The demonstrator
Springer Chemistry
Demonstrator
Structure search
Display
servers
Client
computers
Contains:
• master index of all structures
• basic molecule attributes
• links to the source page/document
Internet/Intranet
HTTP(S)
- 25. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
25 / 29
- 26. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
26 / 29
Example
- 27. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
27 / 29
Summary
• Importance of structure searching and cost implications
• Publisher efforts
o Automatic generation of chemical content
o Semantic enrichment
• Case studies where structure is common denominator
o Generation of chemical content for Wiley
o Springer study: “Chemistry Demonstrator”
Proof of concept
Demonstrator
http://writing.phillipmartin.info/la_summary.htm
- 28. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
28 / 29
Conclusions
• Starting middle 2000 chemical structure gains significance by publishers
• Publishers recognize importance of structure searching
• Chemical content is generated to a greater extent with automatic processes
The chemical structure is an
extremely efficient entity to be
used for effective retrieval as well
as linking of different sources
http://www.nedarc.org/emsDataSystems/lessonslearned.html
- 29. InfoChem GmbH © 2015 Dr. Valentina Eigner PittoII-SDV, Nice, April 21, 2015
29 / 29
Acknowledgments
• Reinhard Neudert (Wiley)
• Wendy Warr
• InfoChem Team
http://www.wien2k.at/pictures/pa2005/pa/Thank%20you%20for%20your%20attention%2001.html
http://www.allenschool.edu/blog-online/questions-medical-billing-job-offered/2681/