Semantic Trilogy Bio2RDF tutorial


Published on

The Bio2RDF project aims to transform silos of life science data into a globally distributed network of linked data for biological knowledge discovery. Bio2RDF creates and provides machine understandable descriptions of biological entities using the RDF/RDFS/OWL Semantic Web languages. Using both syntactic and semantic data integration techniques, Bio2RDF seamlessly integrates diverse biological data and enables powerful new SPARQL-based services across it’s globally distributed knowledge bases. The project has released 28 public databases in RDF format, all available on the internet using a SPARQL endpoint or by fetching dereferencable URI.
Now with major data provider like NCBO, UniProt, KEGG, PDB and EBI who also expose their data as Linked Data, we need a framework to ease the buildup of mashup application and designing a workflow is a well-known approach to do so. The tutorial propose to use an open source professional ETL software, Talend, to help rdfization of existing data and to automate triples fetching to populate a mashup into the OpenLink Virtuoso triplestore.
How can we build a specific database to answer a very specialized question? How can we build a mashup by fetching linked data from the web? How can we merge our own lab results with the publicly available knowledge from the semantic web? Those are the questions we answer in the tutorial by proposing tools and methods to the participant. In this tutorial you will learn how to install and administer the Virtuoso triplestore, then we will show you how to load RDF triples directly from the web or from your own data you will have converted to RDF using an open source professional ETL software: Talend. Now that Life Sciences semantic web is a reality, we need to make it answer our questions

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Semantic Trilogy Bio2RDF tutorial

  1. 1. How to produce and consume Linked Data the Bio2RDF way (Using Virtuoso triplestore and Talend ETL) François Belleau, Arnaud Droit Centre de recherche du CHUQ, Laval University Québec, Canada
  2. 2. Download stuff... Virtuoso Triplestore http://virtuoso.openlinksw. com/dataspace/doc/dav/wiki/Main/VOSDownload Talend Software Bio2RDF Talend jobs
  3. 3. Program ● Presentation of Bio2RDF project and other RDF public data provider like NCBO, UniProt and KEGG. ○ 15 minutes ● Virtuoso triplestore installation and administration ○ 30 minutes ● Talend Open Studio installation and basic introduction ○ 30 minutes ● Hands on part of the tutorial ○ 90 minute
  4. 4. Virtuoso triplestore installation and administration (30 min.) ● Basic server configuration ● Installing the facet browser ● Loading RDF into the triplestore ● Submitting SPARQL queries
  5. 5. Talend Open Studio installation and basic introduction (30 min.) ● Concept of JOB and Component ● Java compilation and exporting package ● How to access and transform data from SQL database or in, XML, JSON or text format ● How to access the web and consume SOAP service
  6. 6. Hands on part of the tutorial (90 minutes) ● Learning basic Talend technics ● Fetching data from the web ● Creating triple in n-triples format ● Parsing a XML document ● Accessing Virtuoso triplestore via JDBC API
  7. 7. Would you contribute ? The project goal is to build Talend components for Semantic Web.