Linked Data and Semantic Web - EUDAT Summer School (Yann Le Franc, e-Science Data Factory)

www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Introduction to Linked Data
and Semantic Web
Yann Le Franc, PhD
This work is licensed under the Creative
Commons CC-BY 4.0 licence.
Attribution: EUDAT – www.eudat.eu
Version 2017-1

How to cope with an expending
universe of scientific data?
“The Hitchhiker’s guide to the Semantic Web Galaxy”

EUDAT Summer School, 3-7 July 2017, Crete
Introduction: a bit of context
The general principles of Linked Data and standards
Application: data annotations with B2NOTE
Outline

Problem: the volume of scientific data is
expanding

?
Challenge: Aggregating multi-dimensional
data from multiple data sources

?
Similar problem and challenge in Neuroscience

Multiple species
Multi-scale data
ConnectivityGenes Molecules
Electrical
activity Functional
Data aggregation

Modeling
Multiple species
Multi-scale data
ConnectivityGenes Molecules
Electrical
activity Functional
Data Analysis
Data aggregation

Data enclosed in information silos : Distinct APIs, Data published within HTML or
unstructured
2710 databases related to Neurosciences (Neuroscience Information
Framework)
How can we make these data resources interoperable and
link them together?
The current situation: distributed data
resources in large variety of formats
WebAPI
<HTML>
<HTML>
WebAPI

https://fr.wikipedia.org/wiki/Tim_Berners-Lee
A global problem
World Wide Web is a global document space
Documents are interconnected with links
Data is hidden in HTML pages: Easy to use by humans but
not by machines
Large diversity of Web APIs
Impossible to access and interlink data
Need for semantics for transforming the global document
space into a global data space

A solution for Life Science, the Universe
and Everything

What is Linked Data?
Tim Berners-Lee (2006) - Design Issues
Use URIs as name for things
Use HTTP URIs so that people can look up those
names (dereferencable)
When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
Include links to other URIs, so that they can
discover more things
https://www.w3.org/DesignIssues/LinkedData.html

Use URI instead of URN (Uniform Resource Name) and DOIs
Example
Real Person
http://www.esciencedatafactory.com/people/yann_le_franc
Description RDF (for machines)
http://www.esciencedatafactory.com/people/yann_le_franc.rdf
Description HTML (for humans)
http://www.esciencedatafactory.com/people/yann_le_franc.html
Separate the URI representing the real object or concept from its description
Name things with URIs

Make use of HTTP content negociation
Two technical solutions for designing the URIs:
1 - Use the content negotiation Redirect 303 (see Other Link)
2 – Hash URI
https://www.w3.org/TR/cooluris/
Make URI dereferencable
https://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html

Use the content negotiation Redirect 303 (see Other Link)
Client Server

GET URI
Client Server
Client HEADER
GET /people/yann_le_franc HTTP/1.1
Host: esciencedatafactory.com
Accept: text/html, application/rdf+xml

GET URI
303- See URI2
Client Server
Client HEADER
Server Answer
HTTP/1.1 303 See Other
Location: http://www.esciencedatafactory.com/
people/yann_le_franc.rdf
Vary: Accept

GET URI
303- See URI2
GET URI2
Client Server
Client HEADER
GET /people/yann_le_franc.rdf HTTP/1.1

GET URI
303- See URI2
GET URI2
Content URI2
Client Server
Client HEADER
Server Answer
HTTP/1.1 200 OK
Content-Type: application/rdf+xml
…

GET URI
303- See URI2
GET URI2
Content URI2
Client Server
Client HEADER
Server Answer
HTTP/1.1 200 OK
…
Requires 4 HTTP calls per item

2 – Use Hash URI
GET URI
Client
Server
http://www.esciencedatafactory.com/people
List of people
• http://www.esciencedatafactory.com/people#yann_le_franc
• http://www.esciencedatafactory.com/people#john_doe
Client HEADER
GET /people HTTP/1.1
Accept: application/rdf+xml

2 – Use Hash URI
GET URI
Content URI
Client
Server
List of people
Client HEADER
HTTP/1.1 200 OK
The whole list
Server Answer

2 – Use Hash URI
GET URI
Content URI
Client
Server
List of people
Client HEADER
HTTP/1.1 200 OK
The whole list
Server Answer
Cache

2 – Use Hash URI
GET URI
Content URI
Client
ServerCache
Get the whole file and then look into the file to find the items with the hash
List of people

Resource A
URI
Resource B
URI
Relation
URI
My website
http://www.example.com/
index.html
Me
http://myprofile/name
Created by
RDF Triple
(subject, predicate, object)
The RDF Data Model

Labeled directed graph
From W3C RDF 1.1. Primer https://www.w3.org/TR/rdf11-primer/
RDF in action

RDF/XML
RDF serializations
<?xml version =“1.0” encoding=”UTF-8”?>
<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:foaf=“http://xmlns.com/foaf/0.1”>
<rdf:Description rdf:about=“http://www.esciencedatafactory.com/people/yann_le_franc”>
<rdf:type rdf:resource=“http://xmlns.com/foaf/0.1/Person”>
<foaf:name>Yann Le Franc</foaf:name>
</rdf:Description>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
@prefix foaf: < http://xmlns.com/foaf/0.1>
< http://www.esciencedatafactory.com/people/yann_le_franc>
rdf:type foaf:Person
foaf:name “Yann Le Franc”
Turtle

RDFa
RDF serializations
<!DOCTYPE html PUBLIC “ _//W3C//DTD XHTML+RDFa 1.0//EN”
“http://www.w3c.org/MarkUp/DTD/xhtml-rdfa-1.dtd”>
<html xmlns=“http://www.w3c.org/1999/xhtml”
xmlns:rdf=“http://www.w3c.org/1999/02/22-rdf-syntax-ns#”
xmlns:foaf=“http://xmlns.com/foaf/0.1/”>
<head>
<meta http-equiv=“Content-Type” content=“application/xhtml+xml; charset=UTF-8”/>
<title>Profile page for Yann Le Franc</title>
<:/head>
<body>
<div about=http://www.esciencedatafactory.com/people/yann_le_franc typeof=“foaf:Person”>
<span property=“foaf:name”>Yann Le Franc</span>
</div>
</body>
</html>

Subject Predicate Object
Alice is a friend of Bob
Bob Is interested
in
The Mona
Lisa
Bob Is a Person
Bob Is born 14 July 1990
The Mona
Lisa
Was created
by
Leonardo Da
Vinci
La Joconde in
Washington
Is about The Mona
Lisa
Triple Store
SPARQL endpoint
SPARQL
Queries
Publishing RDF

RDF Triple store Graph database
M. Junghanns and A. Petermann, “Management and Analysis of Big Graph Data: Current Systems and Open Challenges,” …
(eds: S Sakr, 2017.
B. Haslhofer, E. Momeni Roochi, B. Schandl, and S. Zander, “Europeana RDF Store Report,” Mar. 2011.
Z. Kaoudi and G. Weikum, RDF in the clouds: a survey In The VLDB Journal. 2014.
Technologies to publish RDF

Resource 1: http://www.incf.org/images/newsroom/le-franc
Resource 2:
http://m.c.lnkd.licdn.com/mpr/mpr/shrink_200_200/p/2/000/22d/056/2bdc24c.jpg
Last Name : Le Franc
<last_name>
Le Franc
</last_name>
Family Name : Le Franc
<family_name>
Le Franc
</family_name>
Do we need anything else?

Resource 2:
<last_name>
Le Franc
</last_name>
<family_name>
Le Franc
</family_name>
Synonym/Equivalent

Resource 2:
<last_name>
Le Franc
</last_name>
<family_name>
Le Franc
</family_name>
Synonym/Equivalent
?
? ??
?
WE NEED COMMON VOCABULARIES TO SHARE THE SAME SEMANT

Yes if you are interested in:
Sharing data with other
Data aggregation from multiple sources
Not if you are a lone scientist in your ivory tower
Do we really need vocabularies?

“In computer science and information science, an ontology formally represents
knowledge as a set of concepts within a domain, using a shared vocabulary to
denotes the types, properties and interrelationships of the concepts” - Wikipedia
You need to create a controlled vocabulary also called ontology that could be
used as a common “standardized” vocabulary to annotate your resource
W3C semantic web standards:
 RDF Schema
OWL (Web Ontology Language)
SKOS (Simple Knowledge Organization System)
What is an ontology?
How do you encode this in practice?
How can we make it better?

Class
What is an ontology in practice?

Class
Unique identifier
Label
Human-readable definition
Other metadata
(creator, version, date,…)

Superclass
Unique identifier
Label
Other metadata
Subclass
Unique identifier
Label
is_aSubsumption
relation
Macaqua mulata is an animal

Person
Unique identifier
Label
Other metadata
Yann
Le Franc
Unique identifier
Label
is_aSubsumption
relation

Superclass
Subclass
is_aSubsumption
relation
Superclass 2
has_a
Associative relation

Person
Yann
Le Franc
is_aSubsumption
relation
Relations between concepts are based on first-order logic
Use reasoners/classifiers- machine learning algorithms
Name
has_a
Associative relation

Structuring RDFRDF Schema OWL

Structuring RDF: SKOS

http://microformats.org/wiki/Main_Page
Microformat and Schema.org

http://schema.org/

Example of vocabularies
FOAF – Friend Of A Friend
DCAT (Data Catalog Vocabulary)
PROV (Provenance vocabulary)
Web Annotation
Music Ontology
SIOC (Semantically Interlinked Online Community)

By user:Marobi1 [CC0], via Wikimedia Commons
https://en.wikipedia.org/wiki/Semantic_Web_Stack
The semantic web stack

 Limitation of a unique formal model: monolithic ontologies
Difficulty to reconcile different models
Lack of validation and quality testing for ontologies
Difficult reach consensus on research topics
Slow integration of the new concepts in existing ontologies
Hard to use for scientists
However designing common terminologies is valuable and Mostly Harmless
?
Limits of the approach

Google Knowledge Graph
https://www.google.com/intl/bn/insidesearch/features/sea
rch/knowledge.html
Facebook graph:
https://developers.facebook.com/docs/graph-
api/overview/
Wikidata:
https://www.wikidata.org/wiki/Wikidata:Main_Page
Freebase
Dbpedia
https://datahub.io/dataset
EBI RDF store
Some major RDF resources

Metadata
Different types of metadata to describe the context, the
content, the format and the history of the data
Metadata are generally frozen after publication of a data
record
Descriptive Metadata can be incomplete and/or biased
by the data publisher perspective

Metadata
Different types of metadata to describe the context, the
content, the format and the history of the data
Metadata are generally frozen after publication of a data
record
Descriptive Metadata can be incomplete and/or biased
by the data publisher perspective
 Annotations
How to add new metadata/information in a flexible way?

What do we mean by annotation?
By definition, an annotation is “a note added to a text,
book, drawing, etc., as a comment or an explanation”
(from Merriam Webster).
In our context, it is an assertion we want to make about a
digital resource i.e. a text file, an image, a recording, a
movie,... .

Semantic Annotation: General Principles

Web Annotation Data Model
Use W3C Web Annotation data model –
(https://www.w3.org/TR/annotation-model/)
Serialized in JSON-LD (https://www.w3.org/TR/json-ld/)
= JSON based representation of RDF graphs

The annotation “use-cases”
Manual annotations of data elements: semantic
tagging and file linking
Semi-automatic annotations of data element content:
related with LTER Data Pilot
Data curation: curation status tags
Create aggregated datasets from multi-scale or
multi-domain datasets.

B2NOTE
Crowdsourcing annotator
All annotation are public
Private annotation in the next release
Easy-to-use
auto-completion with terms from domain specific controlled vocabularies
Intuitive User Interface
Easily create new datasets selected based on annotations
Easy integration approach based Widget/iframe approach
Integrate with EUDAT services
Integrate with community web UI
Easy to deploy
Store triples as JSON-LD in MongoDB backend
Uses Django as CMS

B2NOTE architecture

B2NOTE Annotation Model
anno1 rdf:type
body1
oa:tagging
oa:hasTargetoa:hasBody
oa:motivatedBy oa:
Annotation
person1
dcterms:creator
foaf:Person
rdf:type
“pseudo”
foaf:nick
client1
as:generator
as:Application
rdf:type
“http://b2note.bsc.es”
foaf:name
“B2Note v1.0”
foaf:homepage
“2017-01-17T09:51:02Z”
“2017-01-17T09:51:02Z”
dcterms:created
dcterms:issued
“http://b2share.eudat.eu/record/30”
oa:Composite
Semantic Tag
rdf:type
oa:TextualBody
Keyword and Comment
rdf:type

B2NOTE at work
Try it @ http://b2note.bsc.es
Login/Register Annotation interface Access to annotation

B2NOTE at work
Access semantic term
information
Search files using
annotations
Export annotations and
selected data for reuse

Test integration with B2SHARE
https://trng-b2share.eudat.eu/

The added-value of annotations
Enriching digital content with your personal keyword
without modifying the data record
Structure data differently using annotations
Support data curation before and after publication
Create aggregated datasets from multi-scale or multi-
domain datasets.

Additional Resources
EUDAT Webinar: Organise, retrieve and
aggregate data using annotations with
B2NOTE

Linked Data and Semantic Web - EUDAT Summer School (Yann Le Franc, e-Science Data Factory)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Linked Data and Semantic Web - EUDAT Summer School (Yann Le Franc, e-Science Data Factory)

Similar to Linked Data and Semantic Web - EUDAT Summer School (Yann Le Franc, e-Science Data Factory) (20)

More from EUDAT

More from EUDAT (20)

Recently uploaded

Recently uploaded (20)

Linked Data and Semantic Web - EUDAT Summer School (Yann Le Franc, e-Science Data Factory)