Be the first to like this
The Bio2RDF project aims to transform silos of life science data into a globally distributed network of linked data for biological knowledge discovery. Bio2RDF creates and provides machine understandable descriptions of biological entities using the RDF/RDFS/OWL Semantic Web languages. Using both syntactic and semantic data integration techniques, Bio2RDF seamlessly integrates diverse biological data and enables powerful new SPARQL-based services across it’s globally distributed knowledge bases. The project has released 28 public databases in RDF format, all available on the internet using a SPARQL endpoint or by fetching dereferencable URI.
Now with major data provider like NCBO, UniProt, KEGG, PDB and EBI who also expose their data as Linked Data, we need a framework to ease the buildup of mashup application and designing a workflow is a well-known approach to do so. The tutorial propose to use an open source professional ETL software, Talend, to help rdfization of existing data and to automate triples fetching to populate a mashup into the OpenLink Virtuoso triplestore.
How can we build a specific database to answer a very specialized question? How can we build a mashup by fetching linked data from the web? How can we merge our own lab results with the publicly available knowledge from the semantic web? Those are the questions we answer in the tutorial by proposing tools and methods to the participant. In this tutorial you will learn how to install and administer the Virtuoso triplestore, then we will show you how to load RDF triples directly from the web or from your own data you will have converted to RDF using an open source professional ETL software: Talend. Now that Life Sciences semantic web is a reality, we need to make it answer our questions