Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Literature Services Resource Description Framework
Jee-Hyub Kim
Literature Services, EMBL-EBI
21 May 2015
1 / 15
Contents
1 Europe PMC and Linking Literature
2 Publishing Text-Mined Data on RDF
3 Text-Mining RDF Service
4 Discussion
2 ...
Europe PMC
• Europe PMC is a literature database [1].
• Abstracts: 30 million PubMed, Agricola and patent records, updated...
Linking Literature
• Europe PMC provides various types of linking literature.
• External Links: to any (e.g., database, Wi...
Europe PMC Text-Mining Pipeline
• A pipeline of dictionary- and machine learning-based named entity
taggers [3].
• 6 seman...
Publishing Text-Mined Data
• Beyond BioEntities Tab
• Goals
• More connectivity
• More contexts for each linking
• Links t...
Web Annotation Data Model
• Built on the top on RDF
• Annotations as resources
• To provide a standard description mechani...
Core Annotation Framework
• Typically an Annotation has a single Body, which is the comment or
other descriptive resource,...
One Scenario: Text Comment On Web Page
• A textual comment on a selection of text within a web page
• How to select a text...
Text Quote Selector
10 / 15
A Model for Annotation
11 / 15
Service Description
• Running on EBI RDF Platform
• Stores 1,563,241,810 triples text-mined from 400,746 Open Access
artic...
Use Case for Database Curation
• Given an database identier, provides sentence-level information for
database curation.
1 ...
Discussion
• Can we deal with a large number of triples from 3 million full text
articles?
• A better URI scheme: e.g.,
ht...
References
The Europe PMC Consortium.
Europe pmc: a full-text literature database for the life sciences and
platform for i...
Upcoming SlideShare
Loading in …5
×

Literature Services Resource Description Framework

Presented as part of EBI industrial workshop

  • Be the first to comment

  • Be the first to like this

Literature Services Resource Description Framework

  1. 1. Literature Services Resource Description Framework Jee-Hyub Kim Literature Services, EMBL-EBI 21 May 2015 1 / 15
  2. 2. Contents 1 Europe PMC and Linking Literature 2 Publishing Text-Mined Data on RDF 3 Text-Mining RDF Service 4 Discussion 2 / 15
  3. 3. Europe PMC • Europe PMC is a literature database [1]. • Abstracts: 30 million PubMed, Agricola and patent records, updated daily • Full text articles: over 3 million full text articles, of which over 900,000 are free to read and reuse, updated daily • Powerful and easy search • Search all article content through one simple search interface, supported by deep search options for advanced users. 3 / 15
  4. 4. Linking Literature • Europe PMC provides various types of linking literature. • External Links: to any (e.g., database, Wikipedia, press release, etc.) • Citations: to literature • BioEntities (produced by Europe PMC text-mining pipeline) • Biological entities: to concept • Accession numbers: to data • Example: http://europepmc.org/abstract/MED/21926972 4 / 15
  5. 5. Europe PMC Text-Mining Pipeline • A pipeline of dictionary- and machine learning-based named entity taggers [3]. • 6 semantic types • Genes/proteins • Chemicals • Organisms • GO terms • Disease terms • EFO terms • 20 accession numbers [2]: • ENA, RefSNP, PDB, UniProt, OMIM, PFam, ArrayExpress, RefSeq, Data DOI, Ensembl, InterPro • NCT, Bioproject, Biosample, Eudract, EMDB, PXD, GO, EGA, TreeFam • Programmatic access available. 5 / 15
  6. 6. Publishing Text-Mined Data • Beyond BioEntities Tab • Goals • More connectivity • More contexts for each linking • Links to share • Challenge: dealing with nearly a billion annotations generated automatically in a large scale • Using Web Annotation Data Model. 6 / 15
  7. 7. Web Annotation Data Model • Built on the top on RDF • Annotations as resources • To provide a standard description mechanism for sharing annotations between systems • For more general purpose use • Not only for text mining • For example, YouTube video comments (by people), image annotation, etc. • W3C Working Draft: http://www.w3.org/TR/2014/WD-annotation-model-20141211/ 7 / 15
  8. 8. Core Annotation Framework • Typically an Annotation has a single Body, which is the comment or other descriptive resource, and a single Target that the Body is somehow "about". • The Body provides the information which is annotating the Target. • This "aboutness" may be further claried or extended to notions such as classifying or identifying. 8 / 15
  9. 9. One Scenario: Text Comment On Web Page • A textual comment on a selection of text within a web page • How to select a text fragment? • Text Position Selector: oa:start, oa:end • Text Quote Selector: oa:exact, oa:prex, oa:postx 9 / 15
  10. 10. Text Quote Selector 10 / 15
  11. 11. A Model for Annotation 11 / 15
  12. 12. Service Description • Running on EBI RDF Platform • Stores 1,563,241,810 triples text-mined from 400,746 Open Access articles in Europe PubMed Central. • Provides • for each article, all the annotations linking to ontologies/databases • with contexts: • sentences • section information 12 / 15
  13. 13. Use Case for Database Curation • Given an database identier, provides sentence-level information for database curation. 1 Show all the articles where a PDB accession number 3NSS is mentioned. 2 Show all the annotations with each its label in PMC3382907. 3 Show all the articles where inammatory bowel disease (C0021390) is mentioned. • http://wwwdev.ebi.ac.uk/rdf/services/textmining/sparql 13 / 15
  14. 14. Discussion • Can we deal with a large number of triples from 3 million full text articles? • A better URI scheme: e.g., http://europepmc.org/articles/PMC4298172/methods/genes/TEM- 1/23 • Interoperability with other formats used in text-mining community • e.g., BioC, UIMA • Questions? 14 / 15
  15. 15. References The Europe PMC Consortium. Europe pmc: a full-text literature database for the life sciences and platform for innovation. Nucleic Acids Research, 2014. Senay Kafkas, Jee-Hyub Kim, and Johanna R. McEntyre. Database citation in full text biomedical articles. PLoS ONE, 8(5):e63184, 05 2013. Dietrich Rebholz-Schuhmann, Miguel Arregui, Sylvain Gaudan, Harald Kirsch, and Antonio J. Yepes. Text processing through web services: Calling whatizit. Bioinformatics, pages btm557+, November 2007. 15 / 15

×