Biotea poster biolinks at ISMB 2013


Published on

Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Biotea poster biolinks at ISMB 2013

  1. 1. Biotea: RDFizing PubMed Central in Support for the Paper as an Interface to the Web of Data Alexander Garcia, Casey Mclaughlin, Institute for Digital Information, Florida State University. Tallahassee Leyla Garcia Castro Departamento de Leguajes y Sistemas Informáticos Universitat Jaumé I Corresponding author: Scholarly data and documents are of most value when they are interconnected rather than independent Christine L. Borgman In a nutshell, Biotea at • Is a semantic dataset for full-text, open-access subset of PubMed Central • Makes extensive use of existing ontologies and semantic enrichment services • Supports the generation of self-describing machine- readable scholarly documents. • Comprises a flexible and adaptable set of tools for metadata enrichment and semantic processing of biomedical documents. • Provides semantically rich and highly interconnected dataset with self-describing content. RDF4PMC, our workflow 1. Metadata & content Metadata Content RDFized article Provenance NXML 2. Semantic content enrichment Enriched content RDFization Annotation RDF4PMC and Bio2RDF Consuming the dataset, a first prototype 3. Navigating the neighborhood 2. Enriched content  facts-based reading 1. Retrieval: Metadata + Cloud of annotations Contextual reading Graphical tools Interactive zone Search and retrieval based on human gene names: the term is resolved with GeneWiki, and the associated UniProt accession is used in the query Enriched content based on annotations is displayed in the interactive zone Graph-based retrieval for the terms “catalase”; only shared terms with more than 30 associated biological terms are included in the results. Consuming the dataset, SPARQL and API Retrieval Service A list of terms and their related topics SELECT distinct ?pmid WHERE { ?article a bibo:AcademicArticle ; bibo:pmid ?pmid . ?annotation a aot:ExactQualifier ; ao:annotatesResource ?article ; ao:hasTopic <> . } have been semantically annotated with the biological  entity CHEBI:60004. The semantic annotation comes from the occurrence of the term “mixture” in any paragraph of the retrieved articles. e.g., e.g., All terms that start with a specific string (for autocompletion) e.g., All topics related to a vocabulary e.g., RDF of articles that include a term e.g., Count of RDF of articles that include a term e.g., A list of vocabularies and their prefixes RDF of articles that include a vocabulary Retrieving PubMed identifier for those articles that All vocabularies related to a term Query expressed in natural language A list of topics and their related vocabularies All topics related to a term SPARQL query e.g., AGC and CM have been funded by US DoD Grant MOMRP w81xwh-10-2-0181.