SlideShare a Scribd company logo
1 of 95
Download to read offline
1
Make our
Scientific Datasets
Accessible and
Interoperable
on the Web
Franck MICHEL
I3S - UMR 7271, CNRS - Univ. Nice Sophia
RBDD2015, CNRS Database Network
Oct. 21st 2015
2Franck Michel, RBDD 2015
Make our datasets
accessible and
interoperable on the Web…
 Not only because of H2020 requirement
 Linking data increases its value
• Mash up with related data
• Produce new knowledge
• Opportunity for new (unexpected) usage
 Citizenship demand for access to public
data (scientific, government…)
 …
3Franck Michel, RBDD 2015
 Publication/interlinking of open datasets
• Publish heterogeneous data in a common format
• Using common vocabularies
 Driven by major initiatives, e.g.:
• Linking Open Data
• W3C Data Activity
• Open Data hosting services... OpenAIRE, Zenodo...
 As well as other domain-specific projects
• Bio2RDF, BioPortal
Towards a Web of Data
From a Web of Documents
...to a Web of Data
4
May 2007
April 2008 Sept. 2008
March 2009
Sept. 2010
Linked Open Data Cloud
Sept. 2011
Aug. 2014
Linked Datasets as of Aug. 30th 2014. (c) R. Cyganiak & and A. Jentzsch
5
May 2007
April 2008 Sept. 2008
March 2009
Sept. 2010
Linked Open Data Cloud
Sept. 2011
Aug. 2014
Linked Datasets as of Aug. 30th 2014. (c) R. Cyganiak & and A. Jentzsch
6Franck Michel, RBDD 2015
Visible Web
Deep Web
7Franck Michel, RBDD 2015
Heterogeneous data models
Tabular
ID NAME
Directory
o=fr
cn=Franck Michel
ou=cnrs
objectclass=user
userid=65286
Graph
Object-Oriented
Documents
XML Native DBs,
Document stores…
8Franck Michel, RBDD 2015
 RDBs, NewSQL => SQL
 XML native DBs => XPath/XQuery
 Graph - Neo4J: Cypher, Allegrograph: SPARQL
 Document - MongoDB, CouchDB:
similar JS/JSON-based QL
 Column – Cassandra (CQL), HIVE (HQL)
 Key-value – Riak, DynamoDB: low level API
 …
Heterogeneous query capabilities
9Franck Michel, RBDD 2015
Make our datasets
accessible and
interoperable on the Web…
10Franck Michel, RBDD 2015
© NeuroLOG project
To you, your data may mean this…
11Franck Michel, RBDD 2015
© NeuroLOG project
To you, your data may mean this…
© MicroStep-Mis. 3D meteorological modelling.
12Franck Michel, RBDD 2015
© NeuroLOG project
To you, your data may mean this…
© MicroStep-Mis. 3D meteorological modelling.
© CERN. First proton-lead ion collisions recorded by ALICE
13Franck Michel, RBDD 2015
To others,
your data may mean that…
14Franck Michel, RBDD 2015
The key is Metadata
Finding, understanding, and reusing scientific datasets
requires consistent, high-quality metadata
 Context: identification, authors, dates, license, version,
reference articles
 Access: format, structure, location (dwld), query method
 Meaning:
• What does it represent? What concepts, entities, semantics?
 Interpretation: units (cm or inches, left/right)…
 Provenance:
• Acquired with what equipment? Parameters, protocols?
• Derived from what dataset? With what processing?
• Dataset-level or entity-level provenance
 Statistics
 Etc.
15Franck Michel, RBDD 2015
CSV on the Web*
 Help access and understand CSV tabular data available
on the web
• Recommendations for Metadata vocabulary for CSV data
• Access methods for CSV Metadata
• Mapping mechanism to transforming CSV into various
Formats (e.g., RDF, JSON, or XML)
 Annotations on a table or group of tables, columns…
*https://www.w3.org/standards/techs/csv#w3c_all
16Franck Michel, RBDD 2015
CSV on the Web*
*https://www.w3.org/standards/techs/csv#w3c_all
GET tree-ops.csv
Content-Type: text/csv
Link: <http://example.org/tree-ops.json>; rel="…"
GID, Street, Species,Trim Cycle, Inventory Date
1, Addison Av, Celtis australis, 2010/10/18
2, Emerson St, Liquidambar styraciflua, 2010/06/02
GID, Street, Species,Trim Cycle, Inventory Date
1, Addison Av, Celtis australis, 2010/10/18
2, Emerson St, Liquidambar styraciflua, 2010/06/02
{ "@context":["http://www.w3.org/ns/csvw",{"@language":"en"}],
"url": "tree-ops.csv",
"dc:title": "Tree Operations",
"dc:license": { "@id":
"http://opendefinition.org/licenses/cc-by/"},
"dc:modified": {"@value":"2010-12-31","@type":"xsd:date"},
"tableSchema": {
"columns": [{
"name":"GID","titles":["GID", "Generic Identifier"],
"dc:description":"...",
"datatype": "string","required":true },
{
"name":« Street","titles":"On Street",
"dc:description":"The street that the tree is on.",
"datatype":"string" }, ...
],
"primaryKey": "GID","aboutUrl": "#gid-{GID}" }}
17Franck Michel, RBDD 2015
HCLS Profile*
 Health Care and the Life Sciences
 Consensus among participating stakeholders on the
description of datasets using RDF
 Data description, versioning, provenance, discovery,
exchange, query, and retrieval
*http://www.w3.org/TR/hcls-dataset/
RDF, RDFS, XSD
Citation Typing Ontology
Data Catalog
Dublin Core Metadata Types, Dublin Core Metadata Terms
Friend-of-a-Friend
Collection Description Frequency Vocabulary
Identifiers.org vocabulary
Lexvo.org - Lexical Vocabulary
Provenance Authoring and Versioning ontology (PAV)
PROV Ontology
Semanticscience Integrated Ontology (SIO)
Vocabulary of Interlinked Datasets (VoID)
Used
vocabularies
18Franck Michel, RBDD 2015
Challenges of publishing Metadata and/or Data?
Metadata
Data
Web
Syntax?
Shared meaning?
Link to others?
describe
Raw data?
Convert?
19Franck Michel, RBDD 2015
 Have a common representation format
 Structural heterogeneity
 Have common ways to describe the data
• Vocabularies, ontologies, thesaurus…
 Semantic heterogeneity
 Have common ways to query the data
Make our datasets
accessible and
interoperable on the Web…
20Franck Michel, RBDD 2015
 The Web of Data and the Semantic Web
 Create, reuse and link vocabularies
 Populate the Web of Data
 Publish Linked Open Data on the Web
Agenda
21Franck Michel, RBDD 2015
The Web of Data
And
the Semantic Web
Source: C. Faron Zucker[1], O. Corby[1]. Introduction au web de données et au web sémantique. Séminaire INRA Open Data Dec. 2014.
[1] INRIA Sophia Antipolis, CNRS, UNS.
22Franck Michel, RBDD 2015
Standards of the Semantic Web
23Franck Michel, RBDD 2015
Standards of the Semantic Web
24Franck Michel, RBDD 2015
Standards of the Semantic Web
25Franck Michel, RBDD 2015
Standards of the Semantic Web
26Franck Michel, RBDD 2015
Standards of the Semantic Web
27Franck Michel, RBDD 2015
Standards of the Semantic Web
Web of Data
28Franck Michel, RBDD 2015
RDFis a model based on triples, i.e. any fact
consists of 3 components:
( subject, predicate, object )
The Resource Description Framework
29Franck Michel, RBDD 2015
websem.html is a texte
websem.html has as author Fabien
websem.html has as author Olivier
websem.html has as author Catherine
websem.html has as subject Semantic Web
websem.html was written in 2011
The Resource Description Framework
30Franck Michel, RBDD 2015
websem.html
SemanticWeb
Texte
Catherine
Olivier
Fabien
type
date
author
subject
author
author
2011
The Resource Description Framework
31Franck Michel, RBDD 2015
http://ns.inria.fr/
ex/websem.html
http://en.wikipedia.org/
wiki/Semantic_Web
dt:Text
http://ns.inria.fr/
catherine.faron
http://ns.inria.fr/
olivier.corby
http://ns.inria.fr/
fabien.gandon
rdf:type
dc:date
dc:author
dc:subject
dc:author
dc:author
2011
The Resource Description Framework
32Franck Michel, RBDD 2015
N-Triples syntax
<http://inria.fr/ex/websem.html>
<http://purl.org/dc/elements/1.1/author>
<http://ns.inria.fr/catherine.faron> .
<http://inria.fr/ex/websem.html>
<http://purl.org/dc/elements/1.1/theme> “Semantic Web" .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<http://inria.fr/ex/websem.html>
dc:author <http://ns.inria.fr/catherine.faron> ;
dc:theme "Semantic Web" .
The Resource Description Framework
Turtle syntax
33Franck Michel, RBDD 2015
XML Syntax
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://inria.fr/ex/websem.html">
<dc:author
rdf:resource="http://ns.inria.fr/catherine.faron"/>
<dc:theme>Semantic Web</dc:theme>
</rdf:Description>
</rdf:RDF>
The Resource Description Framework
34Franck Michel, RBDD 2015
Linked Open Data
35Franck Michel, RBDD 2015
chemas define
classes of resources,
their properties,
and organize their hierarchies
RDFS
RDF Schema
36Franck Michel, RBDD 2015
igeo:TerritoireAdministratif
igeo:Commune
rdfs:subClassOf rdfs:Class
rdf:type
rdf:type
RDF Schema
http://id.insee.fr/geo/
commune/34301
rdf:type
@prefix igeo: <http://rdf.insee.fr/def/geo#> .
37Franck Michel, RBDD 2015
igeo:codeINSEE
igeo:codeCommune
rdfs:subPropertyOf rdf:Property
rdf:type
rdf:type
RDF Schema
@prefix igeo: <http://rdf.insee.fr/def/geo#> .
38Franck Michel, RBDD 2015
igeo:Commune
rdfs:range
igeo:chefLieu
igeo:PaysOuTerritoire
rdfs:domain
RDF Schema
http://id.insee.fr/geo/
departement/34
igeo:chefLieu
rdf:typerdf:type
@prefix igeo: <http://rdf.insee.fr/def/geo#> .
http://id.insee.fr/geo/
commune/34172
Montpellier
39Franck Michel, RBDD 2015
SPARQL
Query RDF with SPARQL
SPARQL Protocol and RDF
Query Language
40Franck Michel, RBDD 2015
SPARQL 1.1 Rec. 21 Mar. 2013
 Query Language (using the Turtle syntax)
• SPARQL 1.1 Query Language
• SPARQL 1.1 Update
 Representation of query results
• SPARQL Query Results Format XML, CSV/TCV, JSON
 Protocols
• SPARQL 1.1 Protocol
• SPARQL 1.1 Graph Store HTTP Protocol
 Entailments
• SPARQL 1.1 Entailment Regimes
41Franck Michel, RBDD 2015
SPARQL: triple patterns
Turtle syntax with « ? » to mark variables:
?x rdf:type ex:Person
Describe patterns of triples that we look for:
SELECT ?subject ?type
WHERE { ?subject rdf:type ?type }
Default pattern: conjunction of triple patterns:
SELECT ?x WHERE
{ ?x rdf:type ex:Person .
?x ex:name ?name . }
?x
rdf:type
ex:Person
?name
ex:name
42Franck Michel, RBDD 2015
SPARQL: namespace prefixes
Declare prefixes of used vocabularies:
PREFIX mit: <http://www.mit.edu#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?student
WHERE {
?student mit:registeredAt ?x .
?x foaf:homepage <http://www.mit.edu> .
}
Declare a base namespace for relative URIs:
BASE <http://www.example.org/people#>
SELECT ?student
WHERE { ?student foaf:knows <Ted> . }
?student
mit:registeredAt
?x
http://www.mit.edu
foaf:homepage
43Franck Michel, RBDD 2015
SPARQL: language and typed literals
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?x ?f WHERE {
?x foaf:name "Fabien"@fr ; foaf:knows ?f .
}
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?x WHERE {
?x foaf:name "Fabien"@fr ;
foaf:age "21"^^xsd:integer .
}
44Franck Michel, RBDD 2015
SPARQL: optional pattern
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
?person foaf:homepage <http://fabien.info> .
OPTIONAL { ?person foaf:name ?name . }
}
Variable ?name is potentially unbound.
45Franck Michel, RBDD 2015
SPARQL alternative pattern
Merge the results of two graph patterns:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
?person foaf:name ?name .
{ ?person foaf:homepage <http://fabien.info> . }
UNION
{ ?person foaf:homepage <http://bafien.org> . }
}
46Franck Michel, RBDD 2015
SPARQL filters
PREFIX ex: <http://inria.fr/schema#>
SELECT ?person ?name
WHERE {
?person rdf:type ex:Person; ex:name ?name; ex:age ?age .
FILTER (xsd:integer(?age) >= 18)
}
Other examples:
FILTER(?name IN ("fabien", "olivier", "catherine"))
FILTER(if (langMatches(lang(?name),"FR"), ?age>=21)
FILTER NOT EXISTS {?x foaf:age -1}
47Franck Michel, RBDD 2015
SPARQL additional features
 Substract results
WHERE { ?x a ex:Person MINUS { ?x a ex:John } }
 Bind values
?person foaf:name ?name .
VALUES ?name { "Peter" "Pietro" "Pedro" "Pierre" }
 Property paths
?x foaf:knows+ ?friend .
 From
FROM <http//www.mit.edu/data.rdf>
SELECT ?student
WHERE { ?student mit:registeredAt ?x . }
48Franck Michel, RBDD 2015
SPARQL XML results
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head><variable name="student"/></head>
<results>
<result>
<binding name="student">
<uri>http//www.mit.edu/data.rdf#ndieng</uri>
</binding>
</result>
<result>
<binding name="student">
<uri>http//www.mit.edu/data.rdf#jdoe</uri>
</binding>
</result>
</results>
</sparql>
49Franck Michel, RBDD 2015
RDF/SPARQL @ French national inst. of statistics
50Franck Michel, RBDD 2015
RDF/SPARQL @ French National Library (BnF)
51Franck Michel, RBDD 2015
RDF/SPARQL @ French National Library (BnF)
52Franck Michel, RBDD 2015
RDFS Entailment: infer new knowledge
PREFIX igeo:<http://rdf.insee.fr/def/geo#>
SELECT ?x
WHERE { ?x rdf:type igeo:TerritoireAdministratif }
igeo:TerritoireAdministratif
igeo:Commune
rdfs:subClassOf
ex:Sète
rdf:type
53Franck Michel, RBDD 2015
RDFS Entailment: infer new knowledge
PREFIX igeo:<http://rdf.insee.fr/def/geo#>
SELECT ?x ?code
WHERE { ?x igeo:codeINSEE ?code}
igeo:codeINSEE
igeo:codeCommune
rdfs:subPropertyOf
54Franck Michel, RBDD 2015
RDFS Entailment: infer new knowledge
igeo:Commune
rdfs:range
igeo:chefLieu
igeo:PaysOuTerritoire
rdfs:domain
SELECT ?x WHERE { ?x rdf:type igeo:Commune }
SELECT ?x WHERE { ?x rdf:type igeo:PaysOuTerritoire }
http://id.insee.fr/geo/
departement/34
igeo:chefLieu
http://id.insee.fr/geo/
commune/34172
55Franck Michel, RBDD 2015
So far, so good…
56Franck Michel, RBDD 2015
def. by enumeration
def. by intersection
def. by union
def. by complement
 class disjunction
def. by restriction!
def. by cardinality
1..1
symetric prop.
def. by equivalence
[>=18] def. by value restrict.
prop. disjunction
cardinality
1..1
!
indiv. prop. negation
chained prop.


…
Need for more? OWL in one slide…
57Franck Michel, RBDD 2015
Web of Data vs. Semantic Web
Web of Data: first step
in the deployment of
the Semantic Web
58Franck Michel, RBDD 2015
Make our datasets
accessible and
interoperable on the Web…
59Franck Michel, RBDD 2015
Definitions
 Taxonomy:
• Practice and science of classification
• Hierarchical categorization of controlled classes/terms
• Nested classes under broader categories
 Thesaurus
• Networked collection of controlled vocabulary terms, grouped according to
various types of relationship, e.g. similarity of meaning (synonyms,
antonyms),
 Ontology
• Formal semantic description for the taxonomy terms, properties and
interrelationships between categories in a domain of discourse, to facilitate
conceptual search and natural language queries.
 Folksonomy
• Collaborative/social tagging, social classification…
• Tag category schemes
• No (not necessarily) hierarchical categorization
60Franck Michel, RBDD 2015
Create, reuse and link vocabularies
 May seem easier: “I do whatever I want”
 Can be derived from an existing schema, e.g.:
• RDB: table -> class, column -> property,
primary key -> resource URI
• Thesaurus -> list of classes or SKOS Concepts
 But modeling implies choosing a point of view…
• E.g. biologist vs. geneticist, surgeon vs. anatomist, history…
• Domain experts must be involved
 Risk: create an island of data
 How to link my vocabulary/dataset
with other related ones?
Create my own vocabulary
61Franck Michel, RBDD 2015
Create, reuse and link vocabularies
 Where to look for: vocab./ontologies catalogs(see later)
 Difficulty to find the appropriate description
• Partial coverage of the domain I’m dealing with
• E.g. geographical area
• Granularity: level of details
• Too many (cumbersome), not enough (useless)
• Different points of view
Frequently, a mixed approach is used
• Reuse + create
• Need for interlinking => alignment
Reuse existing vocabularies
62Franck Michel, RBDD 2015
My vocabular Third-party vocabular
Link ontologies (very basic)
websem.html
SemanticWeb
ex:Book dc:Text
rdf:type
ex:topic
owl:equivalentProperty
ex:topic dc:subject
WSbook.html
Web
Sémantique
dc:subject
owl:equivalentClass
rdf:type
owl:sameAs
63Franck Michel, RBDD 2015
Link ontologies (basics)
 Classes
• owl:equivalentClass, owl:disjointWith, rdfs:subClassOf,
 Properties
• owl:equivalentProperty, owl:inverseOf, rdfs:subPropertyOf
 Individuals
• owl:sameAs, owl:differentFrom, owl:allDifferent
 rdfs:seeAlso
• Indicate a resource that might provide additional information about the
subject resource
 SKOS concepts
• skos:exactMatch: transitive
• skos:closeMatch, skos:relatedMatch
• skos:narrowMatch, skos:broaderMatch
64Franck Michel, RBDD 2015
Link ontologies … a complex topic
 Discovery of matching between classes,
properties
 Discovery of matching between
individuals
 Named Entity Recognition,
Entity matching, text-mining…
Ontology matching: “representing declaratively relations
between heterogeneous models”
65Franck Michel, RBDD 2015
SKOS: Simple Knowledge Organization System
RDF-based standard to represent controlled vocabularies:
glossary, dictionary, taxonomy, thesaurus…
Bridge the gap between existing KOSs and the SW and LD
Definition and documentation of classification systems
 SKOS concepts
• skos:Concept
 Labels and classification codes
• skos:prefLabel, skos:altLabel, skos:notation…
 Documentation
• skos:definition, skos:changeNote, skos:editorialNote, skos:example, etc.
 SKOS schemas
• skos:ConceptScheme, skos:hasTopConcept, skos:isTopConceptOf
66Franck Michel, RBDD 2015
SKOS: Simple Knowledge Organization System
 Hierarchy of collections of concepts
• skos:Collection, skos:OrderedCollection, skos:member…
 Semantic network and Hierarchies of concepts
• skos:related
• skos:broader, skos:narrower
 Alignment of schemas
• skos:closeMatch, skos:exactMatch
• skos:relatedMatch, skos:broadMatch, skos:narrowMatch
Semantic relations between concepts
67Franck Michel, RBDD 2015
Linked Open Vocabularies
 522 curated vocabularies
 Quality requirements
• URI stability and availability,
• Quality metadata and
documentation,
• Identifiable and trustable
publication body,
• Proper versioning policy,
• …
“Vocabularies provide the semantic glue
enabling Data to become meaningful Data.”
http://lov.okfn.org/dataset/lov/
68Franck Michel, RBDD 2015
Linked Open Vocabularies
BBC Wildlife Ontology
UniProt: protein sequence and
functional information.
69Franck Michel, RBDD 2015
Linked Open Vocabularies
BBC Wildlife Ontology
UniProt: protein sequence and
functional information.
70Franck Michel, RBDD 2015
Other catalogs of vocabularies
 Schemapedia (?) http://schemapedia.org
 schema.org
“Create, maintain, and promote schemas for structured data on
the Internet, on web pages, in email messages, and beyond”.
Controlled set of curated vocabularies: cars, TV series, arts,
administrations, diseases…
 DERI Vocabularies http://vocab.deri.ie/
URI space for RDFS vocabularies and OWL ontologies maintained at DERI.
No search interface.
 NCBO BioPortal http://bioportal.bioontology.org/ontologies/
 TDWG - Biodiversity Information Standards
http://www.tdwg.org/standards/
TDWG - Taxonomic Databases Working Group
And your favorite web search engine…
GeneralDomainspecific
71Franck Michel, RBDD 2015
Practical use case: TAXREF
CD_NOM : Unique identifier of the scientific name
CD_SUP : Identifier of the upper taxon in the classification
CD_REF : Identifier of the reference taxon
RANG : taxonomical rank
72Franck Michel, RBDD 2015
Practical use case: Taxref
CD_NOM : Unique identifier of the scientific name
CD_SUP : Identifier of the upper taxon in the classification
CD_REF : Identifier of the reference taxon
RANG : taxonomical rank
How to translate this table into a thesaurus
exploitable as a semantic reference using
semantic web technologies?
73Franck Michel, RBDD 2015
TAXREF SKOS Modelling
Taxon
skos:Concept
Reference name
skosxl:Label
Synonym
skosxl:Label
skosxl:prefLabel
Habitat
skos:Concept
taxref:habitat
Biogeo. status
skos:Concept
Taxonomical Rank
skos:Concept
"Linnaeus, 1758”
nt:has_rank
“Delphinus delphis”
skos:broader
<http://inpn.mnhn.fr/taxref/v8/taxon/60878>
txn:authority
“Van Bree, 1971”
“Delphinus tropicalis”
<http://inpn.mnhn.fr/espece/cd_nom/60881>
txn:authority
skosxl:literalForm
"Short-beaked
common dolphin"@en
taxref:
vernacularName
skosxl:literalForm
<http://inpn.mnhn.fr/espece/cd_nom/60878>
taxref:bioGeoStatusIn
skosxl:altLabel
74Franck Michel, RBDD 2015
TAXREF: Alignments with domain ontologies
 TaxonConcept Ontology: properties habitat and authority, taxonomic ranks
 NCBI Organismal Classification: property has_rank, taxonomic ranks
 GeoSpecies Knowledge Base: taxonomic ranks
 ENVO Environment Ontology: habitats
 Geonames: mainland France and overseas territories
 Darwin Core Terms: properties occurenceStatus, locationID
 TDWG Occurence Status Terms: biogeographical statuses
 World Geographical Scheme for Recording Plant Distributions
“Static” (hand-made) alignments:
predicates, reference values (habitats, taxonomical ranks)
75Franck Michel, RBDD 2015
TAXREF: Alignments with domain ontologies
Alignment of taxa and names done in a second step
 Automate search of matches within other taxonomical references
• DBpedia
• NCBI Organismal Classification
• Agrovoc
• BnF
• Encyclopedia of Life
• Vertebrate Taxonomy Ontology
 Difficulties
• Spelling differences
• Disagreements: reference vs. synonym, taxonomical rank
• owl:sameAs (individuals), owl:equivalentClass (classes), rdfs:seeAlso,
skos:exactMatch, skos:closeMatch, skos:relatedMatch (concepts)...
76Franck Michel, RBDD 2015
Populate
the
Web of Data
77Franck Michel, RBDD 2015
 HTML: RDFa, Microformats
 XML, XHTML
• XPath: RML
• XQuery: XSPARQL, SPARQL2XQuery
• XSLT: Gleaning Resource Descriptions from Dialects of Languages
(GRDDL), Scissor-Lift
 CSV/TSV/Spreadsheets: CSV on the web (W3C WG)
 RDBs (next slides)
 NoSQL stores (MongoDB…): xR2RML (next slides)
 Integration frameworks
• DataLift, Asio Tool Suite, Talend (with Semantic Web plugin?)
Many methods for many data sources
78Franck Michel, RBDD 2015
<body vocab="http://schema.org/">
<div resource="/jrbdd2015" typeof="Event">
<h2 property="title">RBDD 2015</h2>
<p>Date: <span property="startDate">2015-10-20</span></p>
...
<p>Conduire et construire un plan de gestion des données.
<a property="url"
href="http://rbdd.cnrs.fr/spip.php?article179">More…</a>
</p>
</div>
</body>
prefix sch: <http://schema.org/>
<http://rbdd.cnrs.fr/jrbdd2015>
rdf:type sch:Event ;
sch:title "RBDD 2015";
sch:startDate "2015-10-20" ;
sch:url <http://rbdd.cnrs.fr/spip.php?article179> .
RDFa: RDF in HTML attributes
: http://rbdd.cnrs.fr
79Franck Michel, RBDD 2015
 Various initial motivations
• Web of Data, Linked Data
• OBDA
• Ontology learning
• Schema mapping…
 Historical products: D2RQ, Virtuoso…
 R2RML
• 2012 W3C recommendation, mapping language,
several implementations
 Several methods: direct mapping vs. domain-specific
Translation of RDBs to RDF (RDB2RDF)
80Franck Michel, RBDD 2015
Direct Mapping of RDB to RDF
<PEOPLE/ID=7> rdf:type <PEOPLE> .
<PEOPLE/ID=7> <PEOPLE#FNAME> "Catherine" .
<PEOPLE/ID=7> <PEOPLE#ADDR> <ADDRESS/ID=18> .
<PEOPLE/ID=8> rdf:type <People> .
<PEOPLE/ID=8> <PEOPLE#FNAME> "Olivier" .
<PEOPLE/ID=8> <PEOPLE#ADDR> <ADDRESS/ID=22> .
Table: PEOPLE
ID FNAME ADDR (FK ADDRESS/ID)
7 Catherine 18
8 Olivier 22
… … …
81Franck Michel, RBDD 2015
Customized Mapping of RDB to RDF
<#TriplesMap1>
rr:logicalTable [ rr:tableName "PEOPLE" ];
rr:subjectMap [
rr:template "http://i3s.wimmics.org/staff/{ID}";
rr:class ex:Teacher;
];
rr:predicateObjectMap [
rr:predicate foaf:name;
rr:objectMap [ rr:column "FNAME" ];
].
<http://i3s.wimmics.org/staff/7> rdf:type ex:Teacher.
<http://i3s.wimmics.org/staff/7> foaf:name "Catherine".
<http://i3s.wimmics.org/staff/8> rdf:type ex:Teacher.
<http://i3s.wimmics.org/staff/8> foaf:name "Olivier".
82Franck Michel, RBDD 2015
xR2RML: Mapping of heterogeneous DBs to RDF
 Uniform language to describe mappings from most
common types of DB to RDF
 Extends R2RML and RML
 Features:
• Allow any declarative query language
• Allow any syntax to reference data elements from query results
(column name, attribute name, JSONPath, XPath...)
• Generate RDF lists and containers (bag, sequence, alternate)
• Support mixed content, e.g. XML value in relational column
 Implementation for MongoDB
• Data Materialization
• Query rewriting
83Franck Michel, RBDD 2015
xR2RML mapping example
<#TriplesMap>
xrr:logicalSource [
xrr:query "db.studies.find({ studyid:{ $exists:true } )";
];
rr:subjectMap [
rr:template "http://example.org/study#{$.studyid}";
rr:class ex:Study
];
rr:predicateObjectMap [
rr:predicate ex:involves;
rr:objectMap [
xrr:reference "$.centres.*.name" ];
];
84Franck Michel, RBDD 2015
xR2RML in practice: data materialization
Morph-xR2RML
xR2RML mapping
description
MongoQL
JSON
documents
Domain
ontologies
refers to
uses
85Franck Michel, RBDD 2015
xR2RML in practice: query rewriting
Morph
xR2RML
xR2RML mapping
description
MongoQL
SPARQL
SPARQL to
Mongo query
rewriting
JSON
documents
86Franck Michel, RBDD 2015
Publish
Linked Open Data
on the Web
87Franck Michel, RBDD 2015
Linked Data rules
1. Use URIs as names for things
2. Use HTTP URIs so that people
can look up those names
3. When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
4. Include links to other URIs, so that they can discover
more things
88Franck Michel, RBDD 2015
Using URIs to look up information resources
 Dereferencing the URI returns a representation of the
document
• Either a direct link to
the representation:
the URI is a URL
• Or content negotiation
links to an appropriate
representation
Source: http://www.w3.org/TR/cooluris/
89Franck Michel, RBDD 2015
GET /people/cv_alice HTTP/1.1
Host: www.example.com
Accept: text/html, application/xhtml+xml
Accept-Language: en, de
HTTP/1.1 200 OK
Content-Type: text/html
Content-Language: en
Content-Location: http://www.example.com/cv_alice.en.html
<html ...>
...
Using URIs to look up information resources
HTTP content negotiation
90Franck Michel, RBDD 2015
GET /people/cv_alice HTTP/1.1
Host: www.example.com
Accept: text/html, application/xhtml+xml
Accept-Language: en, de
HTTP/1.1 302 Found
Location:
http://www.example.com/cv_alice.en.html
GET /cv_alice.en.html HTTP/1.1
Host: www.example.com
...
HTTP/1.1 200 OK
Content-Type: text/html
Content-Language: en
<html ...>
...
Using URIs to look up information resources
HTTP content negotiation
91Franck Michel, RBDD 2015
Using URIs to look up real-world objects
 One option: use Hash URIs for non-document resources
Source: http://www.w3.org/TR/cooluris/
92Franck Michel, RBDD 2015
Using URIs to look up real-world objects
 Use HTTP 303 forwarding to an information resource
• 303 See Other: the requested resource is not a regular Web doc.
There is no suitable representation for the resource, but we can
provide information about the resource
Source: http://www.w3.org/TR/cooluris/
93Franck Michel, RBDD 2015
Reference documentation
 Cool URIs for the Semantic Web.
W3C Interest Group Note 03 December 2008
http://www.w3.org/TR/cooluris/
 Dereferencing HTTP URIs
Draft Tag Finding 04 October 2007
http://www.w3.org/2001/tag/doc/httpRange-14/HttpRange-14.html
94Franck Michel, RBDD 2015
Data curation
 Main idea: better publish less
data but publish useful data
• Choose an appropriate modeling
• Choose appropriate vocabularies
• Include high quality metadata,
provenance information
• Deal with privacy issues
• Interlink
 Time consuming activity => significant cost
 Need skilled scientists who know the data, the software…
• Under-valued, no reward: data must become citable like
any scientific publication => Data Paper
95Franck Michel, RBDD 2015
Thank
you!

More Related Content

What's hot

Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 

What's hot (19)

Cognitive data
Cognitive dataCognitive data
Cognitive data
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government Data
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology Engineering
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
W3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator Group
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process Descriptions
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
Linked data as a library data platform
Linked data as a library data platformLinked data as a library data platform
Linked data as a library data platform
 
Linked library data
Linked library dataLinked library data
Linked library data
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 

Viewers also liked

Herramientas para el trabajo educativo
Herramientas  para el trabajo educativoHerramientas  para el trabajo educativo
Herramientas para el trabajo educativo
Rosalia Castelan
 
Guías proyecto de vida
Guías proyecto de vidaGuías proyecto de vida
Guías proyecto de vida
Edwin Fuentes
 
Instituto Coincidir: Entrevista en la revista Palabra. (marzo 2013)
Instituto Coincidir: Entrevista en la revista Palabra. (marzo 2013)Instituto Coincidir: Entrevista en la revista Palabra. (marzo 2013)
Instituto Coincidir: Entrevista en la revista Palabra. (marzo 2013)
Instituto Coincidir
 

Viewers also liked (20)

Domínio lalur
Domínio lalurDomínio lalur
Domínio lalur
 
Mapa conceptual
Mapa conceptualMapa conceptual
Mapa conceptual
 
Músicas del brasil
Músicas del brasilMúsicas del brasil
Músicas del brasil
 
Marketing Communications Professional with 13 years experience
Marketing Communications Professional with 13 years experienceMarketing Communications Professional with 13 years experience
Marketing Communications Professional with 13 years experience
 
Stuttgarter Bildungspartnerschaft
Stuttgarter BildungspartnerschaftStuttgarter Bildungspartnerschaft
Stuttgarter Bildungspartnerschaft
 
Marketing de guerrilla para peomocionar tu ecomemrce
Marketing de guerrilla para peomocionar tu ecomemrceMarketing de guerrilla para peomocionar tu ecomemrce
Marketing de guerrilla para peomocionar tu ecomemrce
 
Herramientas para el trabajo educativo
Herramientas  para el trabajo educativoHerramientas  para el trabajo educativo
Herramientas para el trabajo educativo
 
Digital sammenhæng med forretningen
Digital sammenhæng med forretningenDigital sammenhæng med forretningen
Digital sammenhæng med forretningen
 
vectores 2 bachillerato
vectores 2 bachilleratovectores 2 bachillerato
vectores 2 bachillerato
 
Projecte del nou Mercat Gastronòmic de Sant Cugat
Projecte del nou Mercat Gastronòmic de Sant CugatProjecte del nou Mercat Gastronòmic de Sant Cugat
Projecte del nou Mercat Gastronòmic de Sant Cugat
 
Google analytics- Alta dirección.
 Google analytics- Alta dirección. Google analytics- Alta dirección.
Google analytics- Alta dirección.
 
Guías proyecto de vida
Guías proyecto de vidaGuías proyecto de vida
Guías proyecto de vida
 
Material deportivo pymes españolas exportadoras
Material deportivo pymes españolas exportadorasMaterial deportivo pymes españolas exportadoras
Material deportivo pymes españolas exportadoras
 
Conoce MAUD MANNONI.
Conoce MAUD MANNONI.Conoce MAUD MANNONI.
Conoce MAUD MANNONI.
 
Ayudas municipales EMPRÉN 2013 diciembre.
Ayudas municipales EMPRÉN 2013 diciembre.Ayudas municipales EMPRÉN 2013 diciembre.
Ayudas municipales EMPRÉN 2013 diciembre.
 
Buzzfeed virales nav_ad_ojo_ibero_nov_2015
Buzzfeed virales nav_ad_ojo_ibero_nov_2015Buzzfeed virales nav_ad_ojo_ibero_nov_2015
Buzzfeed virales nav_ad_ojo_ibero_nov_2015
 
No need for Black Chambers: Testing TLS in the E-Mail Ecosystem at Large (hac...
No need for Black Chambers: Testing TLS in the E-Mail Ecosystem at Large (hac...No need for Black Chambers: Testing TLS in the E-Mail Ecosystem at Large (hac...
No need for Black Chambers: Testing TLS in the E-Mail Ecosystem at Large (hac...
 
#4.Caso empresarial - CUPONATIC.
#4.Caso empresarial - CUPONATIC.#4.Caso empresarial - CUPONATIC.
#4.Caso empresarial - CUPONATIC.
 
Instituto Coincidir: Entrevista en la revista Palabra. (marzo 2013)
Instituto Coincidir: Entrevista en la revista Palabra. (marzo 2013)Instituto Coincidir: Entrevista en la revista Palabra. (marzo 2013)
Instituto Coincidir: Entrevista en la revista Palabra. (marzo 2013)
 
Risk study of vibrations in buses, report Gothenburg_1
Risk study of vibrations in buses, report Gothenburg_1Risk study of vibrations in buses, report Gothenburg_1
Risk study of vibrations in buses, report Gothenburg_1
 

Similar to Make our Scientific Datasets Accessible and Interoperable on the Web

Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
emmanuel_jamin
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
manujam
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 

Similar to Make our Scientific Datasets Accessible and Interoperable on the Web (20)

Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
Datalift: A Catalyser for the Web of Data - Francois Scharffe
Datalift: A Catalyser for the Web of Data - Francois ScharffeDatalift: A Catalyser for the Web of Data - Francois Scharffe
Datalift: A Catalyser for the Web of Data - Francois Scharffe
 
Open Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaicsOpen Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaics
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL India
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 

More from Franck Michel

A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
Franck Michel
 
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
Franck Michel
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Franck Michel
 

More from Franck Michel (16)

ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
 
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
 
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
 
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
 
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
 
Knowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataKnowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked data
 
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
 
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesModelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
 
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
 
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
 
Integrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataIntegrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of Data
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
 
A Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLA Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQL
 
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
 
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLTranslation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RML
 
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
 

Recently uploaded

MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...
Annibale Panichella
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
Sérgio Sacani
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
GOWTHAMIM22
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
Sérgio Sacani
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Sérgio Sacani
 

Recently uploaded (20)

GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...MIP Award presentation at the IEEE International Conference on Software Analy...
MIP Award presentation at the IEEE International Conference on Software Analy...
 
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptxBiochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
 
The Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfThe Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdf
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
RACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxRACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptx
 
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
 
mixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategymixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategy
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 
-case selection and treatment planing.pptx
-case selection and treatment planing.pptx-case selection and treatment planing.pptx
-case selection and treatment planing.pptx
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
B lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and ActivationB lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and Activation
 
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
 

Make our Scientific Datasets Accessible and Interoperable on the Web

  • 1. 1 Make our Scientific Datasets Accessible and Interoperable on the Web Franck MICHEL I3S - UMR 7271, CNRS - Univ. Nice Sophia RBDD2015, CNRS Database Network Oct. 21st 2015
  • 2. 2Franck Michel, RBDD 2015 Make our datasets accessible and interoperable on the Web…  Not only because of H2020 requirement  Linking data increases its value • Mash up with related data • Produce new knowledge • Opportunity for new (unexpected) usage  Citizenship demand for access to public data (scientific, government…)  …
  • 3. 3Franck Michel, RBDD 2015  Publication/interlinking of open datasets • Publish heterogeneous data in a common format • Using common vocabularies  Driven by major initiatives, e.g.: • Linking Open Data • W3C Data Activity • Open Data hosting services... OpenAIRE, Zenodo...  As well as other domain-specific projects • Bio2RDF, BioPortal Towards a Web of Data From a Web of Documents ...to a Web of Data
  • 4. 4 May 2007 April 2008 Sept. 2008 March 2009 Sept. 2010 Linked Open Data Cloud Sept. 2011 Aug. 2014 Linked Datasets as of Aug. 30th 2014. (c) R. Cyganiak & and A. Jentzsch
  • 5. 5 May 2007 April 2008 Sept. 2008 March 2009 Sept. 2010 Linked Open Data Cloud Sept. 2011 Aug. 2014 Linked Datasets as of Aug. 30th 2014. (c) R. Cyganiak & and A. Jentzsch
  • 6. 6Franck Michel, RBDD 2015 Visible Web Deep Web
  • 7. 7Franck Michel, RBDD 2015 Heterogeneous data models Tabular ID NAME Directory o=fr cn=Franck Michel ou=cnrs objectclass=user userid=65286 Graph Object-Oriented Documents XML Native DBs, Document stores…
  • 8. 8Franck Michel, RBDD 2015  RDBs, NewSQL => SQL  XML native DBs => XPath/XQuery  Graph - Neo4J: Cypher, Allegrograph: SPARQL  Document - MongoDB, CouchDB: similar JS/JSON-based QL  Column – Cassandra (CQL), HIVE (HQL)  Key-value – Riak, DynamoDB: low level API  … Heterogeneous query capabilities
  • 9. 9Franck Michel, RBDD 2015 Make our datasets accessible and interoperable on the Web…
  • 10. 10Franck Michel, RBDD 2015 © NeuroLOG project To you, your data may mean this…
  • 11. 11Franck Michel, RBDD 2015 © NeuroLOG project To you, your data may mean this… © MicroStep-Mis. 3D meteorological modelling.
  • 12. 12Franck Michel, RBDD 2015 © NeuroLOG project To you, your data may mean this… © MicroStep-Mis. 3D meteorological modelling. © CERN. First proton-lead ion collisions recorded by ALICE
  • 13. 13Franck Michel, RBDD 2015 To others, your data may mean that…
  • 14. 14Franck Michel, RBDD 2015 The key is Metadata Finding, understanding, and reusing scientific datasets requires consistent, high-quality metadata  Context: identification, authors, dates, license, version, reference articles  Access: format, structure, location (dwld), query method  Meaning: • What does it represent? What concepts, entities, semantics?  Interpretation: units (cm or inches, left/right)…  Provenance: • Acquired with what equipment? Parameters, protocols? • Derived from what dataset? With what processing? • Dataset-level or entity-level provenance  Statistics  Etc.
  • 15. 15Franck Michel, RBDD 2015 CSV on the Web*  Help access and understand CSV tabular data available on the web • Recommendations for Metadata vocabulary for CSV data • Access methods for CSV Metadata • Mapping mechanism to transforming CSV into various Formats (e.g., RDF, JSON, or XML)  Annotations on a table or group of tables, columns… *https://www.w3.org/standards/techs/csv#w3c_all
  • 16. 16Franck Michel, RBDD 2015 CSV on the Web* *https://www.w3.org/standards/techs/csv#w3c_all GET tree-ops.csv Content-Type: text/csv Link: <http://example.org/tree-ops.json>; rel="…" GID, Street, Species,Trim Cycle, Inventory Date 1, Addison Av, Celtis australis, 2010/10/18 2, Emerson St, Liquidambar styraciflua, 2010/06/02 GID, Street, Species,Trim Cycle, Inventory Date 1, Addison Av, Celtis australis, 2010/10/18 2, Emerson St, Liquidambar styraciflua, 2010/06/02 { "@context":["http://www.w3.org/ns/csvw",{"@language":"en"}], "url": "tree-ops.csv", "dc:title": "Tree Operations", "dc:license": { "@id": "http://opendefinition.org/licenses/cc-by/"}, "dc:modified": {"@value":"2010-12-31","@type":"xsd:date"}, "tableSchema": { "columns": [{ "name":"GID","titles":["GID", "Generic Identifier"], "dc:description":"...", "datatype": "string","required":true }, { "name":« Street","titles":"On Street", "dc:description":"The street that the tree is on.", "datatype":"string" }, ... ], "primaryKey": "GID","aboutUrl": "#gid-{GID}" }}
  • 17. 17Franck Michel, RBDD 2015 HCLS Profile*  Health Care and the Life Sciences  Consensus among participating stakeholders on the description of datasets using RDF  Data description, versioning, provenance, discovery, exchange, query, and retrieval *http://www.w3.org/TR/hcls-dataset/ RDF, RDFS, XSD Citation Typing Ontology Data Catalog Dublin Core Metadata Types, Dublin Core Metadata Terms Friend-of-a-Friend Collection Description Frequency Vocabulary Identifiers.org vocabulary Lexvo.org - Lexical Vocabulary Provenance Authoring and Versioning ontology (PAV) PROV Ontology Semanticscience Integrated Ontology (SIO) Vocabulary of Interlinked Datasets (VoID) Used vocabularies
  • 18. 18Franck Michel, RBDD 2015 Challenges of publishing Metadata and/or Data? Metadata Data Web Syntax? Shared meaning? Link to others? describe Raw data? Convert?
  • 19. 19Franck Michel, RBDD 2015  Have a common representation format  Structural heterogeneity  Have common ways to describe the data • Vocabularies, ontologies, thesaurus…  Semantic heterogeneity  Have common ways to query the data Make our datasets accessible and interoperable on the Web…
  • 20. 20Franck Michel, RBDD 2015  The Web of Data and the Semantic Web  Create, reuse and link vocabularies  Populate the Web of Data  Publish Linked Open Data on the Web Agenda
  • 21. 21Franck Michel, RBDD 2015 The Web of Data And the Semantic Web Source: C. Faron Zucker[1], O. Corby[1]. Introduction au web de données et au web sémantique. Séminaire INRA Open Data Dec. 2014. [1] INRIA Sophia Antipolis, CNRS, UNS.
  • 22. 22Franck Michel, RBDD 2015 Standards of the Semantic Web
  • 23. 23Franck Michel, RBDD 2015 Standards of the Semantic Web
  • 24. 24Franck Michel, RBDD 2015 Standards of the Semantic Web
  • 25. 25Franck Michel, RBDD 2015 Standards of the Semantic Web
  • 26. 26Franck Michel, RBDD 2015 Standards of the Semantic Web
  • 27. 27Franck Michel, RBDD 2015 Standards of the Semantic Web Web of Data
  • 28. 28Franck Michel, RBDD 2015 RDFis a model based on triples, i.e. any fact consists of 3 components: ( subject, predicate, object ) The Resource Description Framework
  • 29. 29Franck Michel, RBDD 2015 websem.html is a texte websem.html has as author Fabien websem.html has as author Olivier websem.html has as author Catherine websem.html has as subject Semantic Web websem.html was written in 2011 The Resource Description Framework
  • 30. 30Franck Michel, RBDD 2015 websem.html SemanticWeb Texte Catherine Olivier Fabien type date author subject author author 2011 The Resource Description Framework
  • 31. 31Franck Michel, RBDD 2015 http://ns.inria.fr/ ex/websem.html http://en.wikipedia.org/ wiki/Semantic_Web dt:Text http://ns.inria.fr/ catherine.faron http://ns.inria.fr/ olivier.corby http://ns.inria.fr/ fabien.gandon rdf:type dc:date dc:author dc:subject dc:author dc:author 2011 The Resource Description Framework
  • 32. 32Franck Michel, RBDD 2015 N-Triples syntax <http://inria.fr/ex/websem.html> <http://purl.org/dc/elements/1.1/author> <http://ns.inria.fr/catherine.faron> . <http://inria.fr/ex/websem.html> <http://purl.org/dc/elements/1.1/theme> “Semantic Web" . @prefix dc: <http://purl.org/dc/elements/1.1/> . <http://inria.fr/ex/websem.html> dc:author <http://ns.inria.fr/catherine.faron> ; dc:theme "Semantic Web" . The Resource Description Framework Turtle syntax
  • 33. 33Franck Michel, RBDD 2015 XML Syntax <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://inria.fr/ex/websem.html"> <dc:author rdf:resource="http://ns.inria.fr/catherine.faron"/> <dc:theme>Semantic Web</dc:theme> </rdf:Description> </rdf:RDF> The Resource Description Framework
  • 34. 34Franck Michel, RBDD 2015 Linked Open Data
  • 35. 35Franck Michel, RBDD 2015 chemas define classes of resources, their properties, and organize their hierarchies RDFS RDF Schema
  • 36. 36Franck Michel, RBDD 2015 igeo:TerritoireAdministratif igeo:Commune rdfs:subClassOf rdfs:Class rdf:type rdf:type RDF Schema http://id.insee.fr/geo/ commune/34301 rdf:type @prefix igeo: <http://rdf.insee.fr/def/geo#> .
  • 37. 37Franck Michel, RBDD 2015 igeo:codeINSEE igeo:codeCommune rdfs:subPropertyOf rdf:Property rdf:type rdf:type RDF Schema @prefix igeo: <http://rdf.insee.fr/def/geo#> .
  • 38. 38Franck Michel, RBDD 2015 igeo:Commune rdfs:range igeo:chefLieu igeo:PaysOuTerritoire rdfs:domain RDF Schema http://id.insee.fr/geo/ departement/34 igeo:chefLieu rdf:typerdf:type @prefix igeo: <http://rdf.insee.fr/def/geo#> . http://id.insee.fr/geo/ commune/34172 Montpellier
  • 39. 39Franck Michel, RBDD 2015 SPARQL Query RDF with SPARQL SPARQL Protocol and RDF Query Language
  • 40. 40Franck Michel, RBDD 2015 SPARQL 1.1 Rec. 21 Mar. 2013  Query Language (using the Turtle syntax) • SPARQL 1.1 Query Language • SPARQL 1.1 Update  Representation of query results • SPARQL Query Results Format XML, CSV/TCV, JSON  Protocols • SPARQL 1.1 Protocol • SPARQL 1.1 Graph Store HTTP Protocol  Entailments • SPARQL 1.1 Entailment Regimes
  • 41. 41Franck Michel, RBDD 2015 SPARQL: triple patterns Turtle syntax with « ? » to mark variables: ?x rdf:type ex:Person Describe patterns of triples that we look for: SELECT ?subject ?type WHERE { ?subject rdf:type ?type } Default pattern: conjunction of triple patterns: SELECT ?x WHERE { ?x rdf:type ex:Person . ?x ex:name ?name . } ?x rdf:type ex:Person ?name ex:name
  • 42. 42Franck Michel, RBDD 2015 SPARQL: namespace prefixes Declare prefixes of used vocabularies: PREFIX mit: <http://www.mit.edu#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?student WHERE { ?student mit:registeredAt ?x . ?x foaf:homepage <http://www.mit.edu> . } Declare a base namespace for relative URIs: BASE <http://www.example.org/people#> SELECT ?student WHERE { ?student foaf:knows <Ted> . } ?student mit:registeredAt ?x http://www.mit.edu foaf:homepage
  • 43. 43Franck Michel, RBDD 2015 SPARQL: language and typed literals PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?x ?f WHERE { ?x foaf:name "Fabien"@fr ; foaf:knows ?f . } PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?x WHERE { ?x foaf:name "Fabien"@fr ; foaf:age "21"^^xsd:integer . }
  • 44. 44Franck Michel, RBDD 2015 SPARQL: optional pattern PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?person ?name WHERE { ?person foaf:homepage <http://fabien.info> . OPTIONAL { ?person foaf:name ?name . } } Variable ?name is potentially unbound.
  • 45. 45Franck Michel, RBDD 2015 SPARQL alternative pattern Merge the results of two graph patterns: PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?person ?name WHERE { ?person foaf:name ?name . { ?person foaf:homepage <http://fabien.info> . } UNION { ?person foaf:homepage <http://bafien.org> . } }
  • 46. 46Franck Michel, RBDD 2015 SPARQL filters PREFIX ex: <http://inria.fr/schema#> SELECT ?person ?name WHERE { ?person rdf:type ex:Person; ex:name ?name; ex:age ?age . FILTER (xsd:integer(?age) >= 18) } Other examples: FILTER(?name IN ("fabien", "olivier", "catherine")) FILTER(if (langMatches(lang(?name),"FR"), ?age>=21) FILTER NOT EXISTS {?x foaf:age -1}
  • 47. 47Franck Michel, RBDD 2015 SPARQL additional features  Substract results WHERE { ?x a ex:Person MINUS { ?x a ex:John } }  Bind values ?person foaf:name ?name . VALUES ?name { "Peter" "Pietro" "Pedro" "Pierre" }  Property paths ?x foaf:knows+ ?friend .  From FROM <http//www.mit.edu/data.rdf> SELECT ?student WHERE { ?student mit:registeredAt ?x . }
  • 48. 48Franck Michel, RBDD 2015 SPARQL XML results <?xml version="1.0"?> <sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head><variable name="student"/></head> <results> <result> <binding name="student"> <uri>http//www.mit.edu/data.rdf#ndieng</uri> </binding> </result> <result> <binding name="student"> <uri>http//www.mit.edu/data.rdf#jdoe</uri> </binding> </result> </results> </sparql>
  • 49. 49Franck Michel, RBDD 2015 RDF/SPARQL @ French national inst. of statistics
  • 50. 50Franck Michel, RBDD 2015 RDF/SPARQL @ French National Library (BnF)
  • 51. 51Franck Michel, RBDD 2015 RDF/SPARQL @ French National Library (BnF)
  • 52. 52Franck Michel, RBDD 2015 RDFS Entailment: infer new knowledge PREFIX igeo:<http://rdf.insee.fr/def/geo#> SELECT ?x WHERE { ?x rdf:type igeo:TerritoireAdministratif } igeo:TerritoireAdministratif igeo:Commune rdfs:subClassOf ex:Sète rdf:type
  • 53. 53Franck Michel, RBDD 2015 RDFS Entailment: infer new knowledge PREFIX igeo:<http://rdf.insee.fr/def/geo#> SELECT ?x ?code WHERE { ?x igeo:codeINSEE ?code} igeo:codeINSEE igeo:codeCommune rdfs:subPropertyOf
  • 54. 54Franck Michel, RBDD 2015 RDFS Entailment: infer new knowledge igeo:Commune rdfs:range igeo:chefLieu igeo:PaysOuTerritoire rdfs:domain SELECT ?x WHERE { ?x rdf:type igeo:Commune } SELECT ?x WHERE { ?x rdf:type igeo:PaysOuTerritoire } http://id.insee.fr/geo/ departement/34 igeo:chefLieu http://id.insee.fr/geo/ commune/34172
  • 55. 55Franck Michel, RBDD 2015 So far, so good…
  • 56. 56Franck Michel, RBDD 2015 def. by enumeration def. by intersection def. by union def. by complement  class disjunction def. by restriction! def. by cardinality 1..1 symetric prop. def. by equivalence [>=18] def. by value restrict. prop. disjunction cardinality 1..1 ! indiv. prop. negation chained prop.   … Need for more? OWL in one slide…
  • 57. 57Franck Michel, RBDD 2015 Web of Data vs. Semantic Web Web of Data: first step in the deployment of the Semantic Web
  • 58. 58Franck Michel, RBDD 2015 Make our datasets accessible and interoperable on the Web…
  • 59. 59Franck Michel, RBDD 2015 Definitions  Taxonomy: • Practice and science of classification • Hierarchical categorization of controlled classes/terms • Nested classes under broader categories  Thesaurus • Networked collection of controlled vocabulary terms, grouped according to various types of relationship, e.g. similarity of meaning (synonyms, antonyms),  Ontology • Formal semantic description for the taxonomy terms, properties and interrelationships between categories in a domain of discourse, to facilitate conceptual search and natural language queries.  Folksonomy • Collaborative/social tagging, social classification… • Tag category schemes • No (not necessarily) hierarchical categorization
  • 60. 60Franck Michel, RBDD 2015 Create, reuse and link vocabularies  May seem easier: “I do whatever I want”  Can be derived from an existing schema, e.g.: • RDB: table -> class, column -> property, primary key -> resource URI • Thesaurus -> list of classes or SKOS Concepts  But modeling implies choosing a point of view… • E.g. biologist vs. geneticist, surgeon vs. anatomist, history… • Domain experts must be involved  Risk: create an island of data  How to link my vocabulary/dataset with other related ones? Create my own vocabulary
  • 61. 61Franck Michel, RBDD 2015 Create, reuse and link vocabularies  Where to look for: vocab./ontologies catalogs(see later)  Difficulty to find the appropriate description • Partial coverage of the domain I’m dealing with • E.g. geographical area • Granularity: level of details • Too many (cumbersome), not enough (useless) • Different points of view Frequently, a mixed approach is used • Reuse + create • Need for interlinking => alignment Reuse existing vocabularies
  • 62. 62Franck Michel, RBDD 2015 My vocabular Third-party vocabular Link ontologies (very basic) websem.html SemanticWeb ex:Book dc:Text rdf:type ex:topic owl:equivalentProperty ex:topic dc:subject WSbook.html Web Sémantique dc:subject owl:equivalentClass rdf:type owl:sameAs
  • 63. 63Franck Michel, RBDD 2015 Link ontologies (basics)  Classes • owl:equivalentClass, owl:disjointWith, rdfs:subClassOf,  Properties • owl:equivalentProperty, owl:inverseOf, rdfs:subPropertyOf  Individuals • owl:sameAs, owl:differentFrom, owl:allDifferent  rdfs:seeAlso • Indicate a resource that might provide additional information about the subject resource  SKOS concepts • skos:exactMatch: transitive • skos:closeMatch, skos:relatedMatch • skos:narrowMatch, skos:broaderMatch
  • 64. 64Franck Michel, RBDD 2015 Link ontologies … a complex topic  Discovery of matching between classes, properties  Discovery of matching between individuals  Named Entity Recognition, Entity matching, text-mining… Ontology matching: “representing declaratively relations between heterogeneous models”
  • 65. 65Franck Michel, RBDD 2015 SKOS: Simple Knowledge Organization System RDF-based standard to represent controlled vocabularies: glossary, dictionary, taxonomy, thesaurus… Bridge the gap between existing KOSs and the SW and LD Definition and documentation of classification systems  SKOS concepts • skos:Concept  Labels and classification codes • skos:prefLabel, skos:altLabel, skos:notation…  Documentation • skos:definition, skos:changeNote, skos:editorialNote, skos:example, etc.  SKOS schemas • skos:ConceptScheme, skos:hasTopConcept, skos:isTopConceptOf
  • 66. 66Franck Michel, RBDD 2015 SKOS: Simple Knowledge Organization System  Hierarchy of collections of concepts • skos:Collection, skos:OrderedCollection, skos:member…  Semantic network and Hierarchies of concepts • skos:related • skos:broader, skos:narrower  Alignment of schemas • skos:closeMatch, skos:exactMatch • skos:relatedMatch, skos:broadMatch, skos:narrowMatch Semantic relations between concepts
  • 67. 67Franck Michel, RBDD 2015 Linked Open Vocabularies  522 curated vocabularies  Quality requirements • URI stability and availability, • Quality metadata and documentation, • Identifiable and trustable publication body, • Proper versioning policy, • … “Vocabularies provide the semantic glue enabling Data to become meaningful Data.” http://lov.okfn.org/dataset/lov/
  • 68. 68Franck Michel, RBDD 2015 Linked Open Vocabularies BBC Wildlife Ontology UniProt: protein sequence and functional information.
  • 69. 69Franck Michel, RBDD 2015 Linked Open Vocabularies BBC Wildlife Ontology UniProt: protein sequence and functional information.
  • 70. 70Franck Michel, RBDD 2015 Other catalogs of vocabularies  Schemapedia (?) http://schemapedia.org  schema.org “Create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond”. Controlled set of curated vocabularies: cars, TV series, arts, administrations, diseases…  DERI Vocabularies http://vocab.deri.ie/ URI space for RDFS vocabularies and OWL ontologies maintained at DERI. No search interface.  NCBO BioPortal http://bioportal.bioontology.org/ontologies/  TDWG - Biodiversity Information Standards http://www.tdwg.org/standards/ TDWG - Taxonomic Databases Working Group And your favorite web search engine… GeneralDomainspecific
  • 71. 71Franck Michel, RBDD 2015 Practical use case: TAXREF CD_NOM : Unique identifier of the scientific name CD_SUP : Identifier of the upper taxon in the classification CD_REF : Identifier of the reference taxon RANG : taxonomical rank
  • 72. 72Franck Michel, RBDD 2015 Practical use case: Taxref CD_NOM : Unique identifier of the scientific name CD_SUP : Identifier of the upper taxon in the classification CD_REF : Identifier of the reference taxon RANG : taxonomical rank How to translate this table into a thesaurus exploitable as a semantic reference using semantic web technologies?
  • 73. 73Franck Michel, RBDD 2015 TAXREF SKOS Modelling Taxon skos:Concept Reference name skosxl:Label Synonym skosxl:Label skosxl:prefLabel Habitat skos:Concept taxref:habitat Biogeo. status skos:Concept Taxonomical Rank skos:Concept "Linnaeus, 1758” nt:has_rank “Delphinus delphis” skos:broader <http://inpn.mnhn.fr/taxref/v8/taxon/60878> txn:authority “Van Bree, 1971” “Delphinus tropicalis” <http://inpn.mnhn.fr/espece/cd_nom/60881> txn:authority skosxl:literalForm "Short-beaked common dolphin"@en taxref: vernacularName skosxl:literalForm <http://inpn.mnhn.fr/espece/cd_nom/60878> taxref:bioGeoStatusIn skosxl:altLabel
  • 74. 74Franck Michel, RBDD 2015 TAXREF: Alignments with domain ontologies  TaxonConcept Ontology: properties habitat and authority, taxonomic ranks  NCBI Organismal Classification: property has_rank, taxonomic ranks  GeoSpecies Knowledge Base: taxonomic ranks  ENVO Environment Ontology: habitats  Geonames: mainland France and overseas territories  Darwin Core Terms: properties occurenceStatus, locationID  TDWG Occurence Status Terms: biogeographical statuses  World Geographical Scheme for Recording Plant Distributions “Static” (hand-made) alignments: predicates, reference values (habitats, taxonomical ranks)
  • 75. 75Franck Michel, RBDD 2015 TAXREF: Alignments with domain ontologies Alignment of taxa and names done in a second step  Automate search of matches within other taxonomical references • DBpedia • NCBI Organismal Classification • Agrovoc • BnF • Encyclopedia of Life • Vertebrate Taxonomy Ontology  Difficulties • Spelling differences • Disagreements: reference vs. synonym, taxonomical rank • owl:sameAs (individuals), owl:equivalentClass (classes), rdfs:seeAlso, skos:exactMatch, skos:closeMatch, skos:relatedMatch (concepts)...
  • 76. 76Franck Michel, RBDD 2015 Populate the Web of Data
  • 77. 77Franck Michel, RBDD 2015  HTML: RDFa, Microformats  XML, XHTML • XPath: RML • XQuery: XSPARQL, SPARQL2XQuery • XSLT: Gleaning Resource Descriptions from Dialects of Languages (GRDDL), Scissor-Lift  CSV/TSV/Spreadsheets: CSV on the web (W3C WG)  RDBs (next slides)  NoSQL stores (MongoDB…): xR2RML (next slides)  Integration frameworks • DataLift, Asio Tool Suite, Talend (with Semantic Web plugin?) Many methods for many data sources
  • 78. 78Franck Michel, RBDD 2015 <body vocab="http://schema.org/"> <div resource="/jrbdd2015" typeof="Event"> <h2 property="title">RBDD 2015</h2> <p>Date: <span property="startDate">2015-10-20</span></p> ... <p>Conduire et construire un plan de gestion des données. <a property="url" href="http://rbdd.cnrs.fr/spip.php?article179">More…</a> </p> </div> </body> prefix sch: <http://schema.org/> <http://rbdd.cnrs.fr/jrbdd2015> rdf:type sch:Event ; sch:title "RBDD 2015"; sch:startDate "2015-10-20" ; sch:url <http://rbdd.cnrs.fr/spip.php?article179> . RDFa: RDF in HTML attributes : http://rbdd.cnrs.fr
  • 79. 79Franck Michel, RBDD 2015  Various initial motivations • Web of Data, Linked Data • OBDA • Ontology learning • Schema mapping…  Historical products: D2RQ, Virtuoso…  R2RML • 2012 W3C recommendation, mapping language, several implementations  Several methods: direct mapping vs. domain-specific Translation of RDBs to RDF (RDB2RDF)
  • 80. 80Franck Michel, RBDD 2015 Direct Mapping of RDB to RDF <PEOPLE/ID=7> rdf:type <PEOPLE> . <PEOPLE/ID=7> <PEOPLE#FNAME> "Catherine" . <PEOPLE/ID=7> <PEOPLE#ADDR> <ADDRESS/ID=18> . <PEOPLE/ID=8> rdf:type <People> . <PEOPLE/ID=8> <PEOPLE#FNAME> "Olivier" . <PEOPLE/ID=8> <PEOPLE#ADDR> <ADDRESS/ID=22> . Table: PEOPLE ID FNAME ADDR (FK ADDRESS/ID) 7 Catherine 18 8 Olivier 22 … … …
  • 81. 81Franck Michel, RBDD 2015 Customized Mapping of RDB to RDF <#TriplesMap1> rr:logicalTable [ rr:tableName "PEOPLE" ]; rr:subjectMap [ rr:template "http://i3s.wimmics.org/staff/{ID}"; rr:class ex:Teacher; ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [ rr:column "FNAME" ]; ]. <http://i3s.wimmics.org/staff/7> rdf:type ex:Teacher. <http://i3s.wimmics.org/staff/7> foaf:name "Catherine". <http://i3s.wimmics.org/staff/8> rdf:type ex:Teacher. <http://i3s.wimmics.org/staff/8> foaf:name "Olivier".
  • 82. 82Franck Michel, RBDD 2015 xR2RML: Mapping of heterogeneous DBs to RDF  Uniform language to describe mappings from most common types of DB to RDF  Extends R2RML and RML  Features: • Allow any declarative query language • Allow any syntax to reference data elements from query results (column name, attribute name, JSONPath, XPath...) • Generate RDF lists and containers (bag, sequence, alternate) • Support mixed content, e.g. XML value in relational column  Implementation for MongoDB • Data Materialization • Query rewriting
  • 83. 83Franck Michel, RBDD 2015 xR2RML mapping example <#TriplesMap> xrr:logicalSource [ xrr:query "db.studies.find({ studyid:{ $exists:true } )"; ]; rr:subjectMap [ rr:template "http://example.org/study#{$.studyid}"; rr:class ex:Study ]; rr:predicateObjectMap [ rr:predicate ex:involves; rr:objectMap [ xrr:reference "$.centres.*.name" ]; ];
  • 84. 84Franck Michel, RBDD 2015 xR2RML in practice: data materialization Morph-xR2RML xR2RML mapping description MongoQL JSON documents Domain ontologies refers to uses
  • 85. 85Franck Michel, RBDD 2015 xR2RML in practice: query rewriting Morph xR2RML xR2RML mapping description MongoQL SPARQL SPARQL to Mongo query rewriting JSON documents
  • 86. 86Franck Michel, RBDD 2015 Publish Linked Open Data on the Web
  • 87. 87Franck Michel, RBDD 2015 Linked Data rules 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover more things
  • 88. 88Franck Michel, RBDD 2015 Using URIs to look up information resources  Dereferencing the URI returns a representation of the document • Either a direct link to the representation: the URI is a URL • Or content negotiation links to an appropriate representation Source: http://www.w3.org/TR/cooluris/
  • 89. 89Franck Michel, RBDD 2015 GET /people/cv_alice HTTP/1.1 Host: www.example.com Accept: text/html, application/xhtml+xml Accept-Language: en, de HTTP/1.1 200 OK Content-Type: text/html Content-Language: en Content-Location: http://www.example.com/cv_alice.en.html <html ...> ... Using URIs to look up information resources HTTP content negotiation
  • 90. 90Franck Michel, RBDD 2015 GET /people/cv_alice HTTP/1.1 Host: www.example.com Accept: text/html, application/xhtml+xml Accept-Language: en, de HTTP/1.1 302 Found Location: http://www.example.com/cv_alice.en.html GET /cv_alice.en.html HTTP/1.1 Host: www.example.com ... HTTP/1.1 200 OK Content-Type: text/html Content-Language: en <html ...> ... Using URIs to look up information resources HTTP content negotiation
  • 91. 91Franck Michel, RBDD 2015 Using URIs to look up real-world objects  One option: use Hash URIs for non-document resources Source: http://www.w3.org/TR/cooluris/
  • 92. 92Franck Michel, RBDD 2015 Using URIs to look up real-world objects  Use HTTP 303 forwarding to an information resource • 303 See Other: the requested resource is not a regular Web doc. There is no suitable representation for the resource, but we can provide information about the resource Source: http://www.w3.org/TR/cooluris/
  • 93. 93Franck Michel, RBDD 2015 Reference documentation  Cool URIs for the Semantic Web. W3C Interest Group Note 03 December 2008 http://www.w3.org/TR/cooluris/  Dereferencing HTTP URIs Draft Tag Finding 04 October 2007 http://www.w3.org/2001/tag/doc/httpRange-14/HttpRange-14.html
  • 94. 94Franck Michel, RBDD 2015 Data curation  Main idea: better publish less data but publish useful data • Choose an appropriate modeling • Choose appropriate vocabularies • Include high quality metadata, provenance information • Deal with privacy issues • Interlink  Time consuming activity => significant cost  Need skilled scientists who know the data, the software… • Under-valued, no reward: data must become citable like any scientific publication => Data Paper
  • 95. 95Franck Michel, RBDD 2015 Thank you!