0
PaperMaker: Validation of biomedical scientificpublicationsJanuary 19th, 2011Workshop: „BeyondThePdf“Dietrich Rebholz-Schu...
Publishing is about …    • ... Agreeing / disagreeing about current science           • Only peer review can judge current...
Future of biomedical text mining    Working towards ...    • ... Literature integration           • to have it full fledge...
Literature content in the Semantic Web4   20.01.2011        Literature and Text Mining
Terminologies vs. Ontologies                                                      Ontological resources    Database type R...
Efforts in the Rebholz group towards    interoperability of literature with bioinformatics    •    Whatizit infrastructure...
1                 Whatizit7   20.01.2011          Literature and Text Mining                  BioCreative III, Rebholz
Integrating biomedical literature and data                                                    Rebholz-Schuhmann, D., et   ...
2                 BioLexicon                   LexEBI9   20.01.2011           Literature and Text Mining                  ...
LexEBI: content                                  # Labels # Variants        Total        Total / # Unique Uniq. T. /      ...
3                  IeXML11   20.01.2011         Literature and Text Mining                  BioCreative III, Rebholz
IeXML: Annotating entities in text     • Inline annotations to any part of the document with the       annotations     • N...
4                  CALBC13   20.01.2011         Literature and Text Mining                  BioCreative III, Rebholz
The challenge                                           150,000 documents                                           or mor...
CALBC Challenge II(1) 75,000 documents training data(2) 175,000 testing data(3) Additional 700,000 testing data•    Septem...
5     Ukpmc/Elixir16   20.01.2011         Literature and Text Mining                  BioCreative III, Rebholz
17   20.01.2011         Literature and Text Mining                  BioCreative III, Rebholz
UKPMC                  ~ 10 % the size of PubMed18   20.01.2011             Literature and Text Mining                    ...
6                  sesl19   20.01.2011         Literature and Text Mining                  BioCreative III, Rebholz
SESL Project: from publisher to pharma                                                                                    ...
Literature content in the Semantic Web21   20.01.2011        Literature and Text Mining
7      Papermaker22   20.01.2011         Literature and Text Mining                  BioCreative III, Rebholz
PaperMaker - Overview• Inte• PaperMaker - a tool to support authors writing biomedical  papers:• Interactive feedback on t...
Consistency parametersDomain-independent•    General spelling and grammar•    General readability•    Appropriate use of r...
Consistence parametersDomain-specific• The use of terminology:       • Should be consistent with naming domain-specific gu...
Content feedback• Resolving the contents to literature repositories       • Finding related work (document retrieval)     ...
PaperMaker workflow30.03.2009         Literature and Text Mining             BioCreative III, Rebholz
Literature and Text Mining
Literature and Text Mining
Literature and Text Mining
Literature and Text Mining
Conclusions• PaperMaker can help the author conform to the formal  requirements of paper writing with special emphasis on ...
8                  Summary33   20.01.2011          Literature and Text Mining                   BioCreative III, Rebholz
Efforts in the Rebholz group towards     interoperability of literature with bioinformatics     •    Whatizit infrastructu...
Literature and Text MiningBioCreative III, Rebholz
Upcoming SlideShare
Loading in...5
×

PaperMaker, BeyondThePdf, RebholzSchuhmann, 19Jan2011

398

Published on

Presentation on Whatizit, LexEBI, IeXML, CALBC, SESL, PaperMaker

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
398
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "PaperMaker, BeyondThePdf, RebholzSchuhmann, 19Jan2011"

  1. 1. PaperMaker: Validation of biomedical scientificpublicationsJanuary 19th, 2011Workshop: „BeyondThePdf“Dietrich Rebholz-Schuhmann, MD, PhDGroup Leader Rebholz GroupEuropean Bioinformatics Institute
  2. 2. Publishing is about … • ... Agreeing / disagreeing about current science • Only peer review can judge current science • ... Bringing new results • Conceptual results are more difficult than new data • ... Gaining new knowledge • New data and new results can imply new knowledge where even the author is still unaware of • ... Rewarding the scientist • Count whatever you can count that could have an impact. • Validating the scientist’s claim is the key reward. • Any scientist can fool any system, but (hopefully) only short-term2 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  3. 3. Future of biomedical text mining Working towards ... • ... Literature integration • to have it full fledged as part of bioinformatics data resources • ... Cross-domain support • to deliver the content to different scientific communities. • ... Provenance • to carry credit of findings into analytical biomedical research • ... Inference & Reasoning • to make use of the full semantic support in the scientific literature3 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  4. 4. Literature content in the Semantic Web4 20.01.2011 Literature and Text Mining
  5. 5. Terminologies vs. Ontologies Ontological resources Database type Resource building Explicit semantics Terminologies, collection of terms Manual generation Automatic generation Consistency, inference, reasoning Exploitation of terminological features Interoperability with all semantic Standardisation of TM solutions resources Interoperability with database Working towards a reasoning resources infrastructure5 Literature and Text Mining
  6. 6. Efforts in the Rebholz group towards interoperability of literature with bioinformatics • Whatizit infrastructure • Biomedical NER as a public, large-scale service • LexEBI / BioLexicon (collab. w. NaCTeM, Pisa-U) • Biomedical terminological resource, standardisation of semantics • IeXML (BioLink SIG 2006, Brasil) • Put the annotations into the document (inline annotations) • CALBC project • Collaborative annotation of a large-scale biomedical corpus • UKPMC: U.K. Pubmed Central (collab. w. NaCTeM, BL) • Use of Whatizit, BioLexicon, IeXML, CALBC alignments for the delivery of quality annotation services to the public • SESL project • Joint project with pharma & publishers, literature content in a triple store • PaperMaker • Validation of the scientific literature against the above6 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  7. 7. 1 Whatizit7 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  8. 8. Integrating biomedical literature and data Rebholz-Schuhmann, D., et al. Text Processing through Web Services: Calling Whatizit. Bioinformatics 24, no. 2 (2008): 296-98.8 20.01.2011 Literature and Text Mining
  9. 9. 2 BioLexicon LexEBI9 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  10. 10. LexEBI: content # Labels # Variants Total Total / # Unique Uniq. T. / Labels terms Labels Prot. Gene GP 7.0 516,113 4,005,040 4,521,153 8.76 1,726,853 3.35 / GP 6.0 488,577 3,389,316 3,877,893 7.94 1,564,436 3.20 Jochem 278,578 1,691,980 1,970,558 7.07 1,527,752 5.48 Chemi- cals ChEBI 19,645 94,748 114,393 5.82 101,307 5.16 ChEBI (all) 549,838 1,187,322 1,737,160 3.16 Enzymes 4,905 8,082 12,987 2.65 12,377 2.52 Other Species 643,280 199,130 842,410 1.31 838,135 1.30 Interpro 20,671 0 20,671 1.00 20,671 1.00 Antineuro., 4,718 6,488 11,206 2.38 Neo Bio. Act. 54,148 87,209 141,357 2.61 UMLS Enzymes 26,065 56,332 82,397 3.16 Lipid, Carb. 11,518 9,770 21,288 1.85 Pharm. Act. 104,201 123,840 228,041 2.19 Vit., Horm. 6,877 10,258 17,135 2.4910 20.01.2011 Literature and Text Mining
  11. 11. 3 IeXML11 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  12. 12. IeXML: Annotating entities in text • Inline annotations to any part of the document with the annotations • No hassle with character or byte counts or layout modifications to the document • “Alignment” of annotated documtents to • Compare annotations • Validate annotations • Harmonise annotations (SESL project)12 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  13. 13. 4 CALBC13 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  14. 14. The challenge 150,000 documents or more ... Test set for all systems Assessment, benchmarking14 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  15. 15. CALBC Challenge II(1) 75,000 documents training data(2) 175,000 testing data(3) Additional 700,000 testing data• September 13th 2010: Second harmonized corpus available for CALBC Challenge II• December 15th, 2010: Challenge II closes• March 2011: CALBC Workshop II• June 30th, 2011: Final harmonized corpus available Literature and Text Mining BioCreative III, Rebholz
  16. 16. 5 Ukpmc/Elixir16 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  17. 17. 17 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  18. 18. UKPMC ~ 10 % the size of PubMed18 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  19. 19. 6 sesl19 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  20. 20. SESL Project: from publisher to pharma Multiple Consumers Disease Knowledge Dossier Applications Service Layer (RDF, Web 2.0) Std PublicOpen Common Assertions, SPARQL, Triple Store VocabulariesStan- Service Integration, Inference, Reasoning Businessdards Broker Sharing of data Rules Content Suppliers20 20.01.2011 Literature20 and Text Mining
  21. 21. Literature content in the Semantic Web21 20.01.2011 Literature and Text Mining
  22. 22. 7 Papermaker22 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  23. 23. PaperMaker - Overview• Inte• PaperMaker - a tool to support authors writing biomedical papers:• Interactive feedback on the contents of papers (related work and concept annotations)• Formal consistency criteria checking (spelling, terminology, acronyms, references)30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  24. 24. Consistency parametersDomain-independent• General spelling and grammar• General readability• Appropriate use of references• Finding and acknowledging related work30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  25. 25. Consistence parametersDomain-specific• The use of terminology: • Should be consistent with naming domain-specific guidelines • Should not be ambiguous • Should conform to the conventional usage (possible clashes between naming guidelines and common-sense convention) • Useful to resolve terminology to reference databases (e. g. UniProt for protein names, ChEBI chemical entities, etc.) • The special case of acronyms30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  26. 26. Content feedback• Resolving the contents to literature repositories • Finding related work (document retrieval) • Finding related ideas (passage retrieval)• Resolving the contents to ontological reference databases • MeSH descriptors have been demonstrated to improve biomedical information retrieval. Can we suggest MeSH terms directly to the authors? • Gene Ontology (GO) terms are increasingly used in information extraction systems.30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  27. 27. PaperMaker workflow30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  28. 28. Literature and Text Mining
  29. 29. Literature and Text Mining
  30. 30. Literature and Text Mining
  31. 31. Literature and Text Mining
  32. 32. Conclusions• PaperMaker can help the author conform to the formal requirements of paper writing with special emphasis on the domain• It also provides feedback on the contents by relating it to reference resources and literature repositories• It may improve the indexing of a paper in literature repositories (less ambiguous terminology)• http://www.ebi.ac.uk/Rebholz-srv/PaperMaker Work in progress 30.03.2009 Literature and Text Mining BioCreative III, Rebholz
  33. 33. 8 Summary33 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  34. 34. Efforts in the Rebholz group towards interoperability of literature with bioinformatics • Whatizit infrastructure • Biomedical NER as a public, large-scale service • LexEBI / BioLexicon (collab. w. NaCTeM, Pisa-U) • Biomedical terminological resource, standardisation of semantics • IeXML (BioLink SIG 2006, Brasil) • Put the annotations into the document (inline annotations) • CALBC project • Collaborative annotation of a large-scale biomedical corpus • UKPMC: U.K. Pubmed Central (collab. w. NaCTeM, BL) • Use of Whatizit, BioLexicon, IeXML, CALBC alignments for the delivery of quality annotation services to the public • SESL project • Joint project with pharma & publishers, literature content in a triple store • PaperMaker • Validation of the scientific literature against the above34 20.01.2011 Literature and Text Mining BioCreative III, Rebholz
  35. 35. Literature and Text MiningBioCreative III, Rebholz
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×