PhD Defense

Institut Mines-Télécom
Generation of Linked Data Platforms in
Highly Decentralized Information Ecosystem
Mohammad Noorani BAKERALLY
Institut Henri Fayol, EMSE,
Connected intelligence, Laboratoire Hubert Curien, UMR CNRS 5516
1
December 20, 2018
PhD Thesis Defense

Institut Mines-Télécom2
Developers
Web Services
Data Providers
Data Sources
Data Consumers

Developers
Web Services
Data Sources
Data Consumers
Data Publisher
<<owns>>
Data Publisher
<<owns>>
Data Providers
Data Portals
Data Portals

Developers
Web Services
Data Sources
Data Consumers
Data Publishers
<<owns>>
Data Publishers
<<owns>>
Data
Providers
Data Portals
Data Portals
is an information ecosystem consisting of information systems managed by
actors that are self-governed with little to no coordination between them,
e.g. Open data context, the Web, Organizational information ecosystem

■ Data Heterogeneity levels:
• Syntax
• Semantics
• Access
■ Hosting Constraints preventing
hosting of data in third party software
environments.
• Examples:
─ Data sources bounded by
license restrictions
─ Real-time data sources
Problems
5
Developers
Web Services
Data
Sources
Data
Consumers
Data
Publishers
<<owns>>
Data
Publishers
<<owns>>
Data
Providers
Data Portals
Data Portals

■ Facilitate data exploitation for data consumers in highly decentralized
information ecosystem
Aim
6
Developers
Web Services
Data
Sources
Data
Consumers
Data
Publishers
<<owns>>
Data
Publishers
<<owns>>
Data
Providers
Data
Portals
Data
Portals

■ Facilitate data exploitation for data consumers in highly decentralized
information ecosystem
Aim
7
Developers
Web Services
Data
Sources
Data
Consumers
Data
Publishers
<<owns>>
Data
Publishers
<<owns>>
Data
Providers
Data
Portals
Data
Portals
Publication of interoperable data and semantics by data publishers

■ Syntax
• Uniform identification mechanism to refer to
resources
• Flexibility wrt description of resources having varying
structures
■ Semantics
• Ontology languages to make semantics explicit
• Semantics in syntax to make data self-described and
portable
■ Access
• High-level protocols to hide heterogeneity of platforms
• Uniform data access to facilitate data exploitation
Requirements for data interoperability
8
Open
standards

■ Semantic Web
■ Linked Data Platform Generation Model
■ Linked Data Platform Generation Toolkit
■ Evaluation
■ Conclusion & Perspectives
Outline
9

■ Semantic Web
■ Linked Data Platform Generation Model
■ LDP Generation Toolkit
■ Evaluation
Outline
10

■ Data Syntax: RDF [CWL14]
• 😃 Uniform identification mechanism
─ Uniform Resource Identifier (URI)
• 😃 Flexibility
─ Schema-less
■ Data Semantics: RDFS [BG14] and OWL [W3C12]
• 😃 Ontology languages
─ RDFS and OWL are ontology languages
• 😃 Semantics in syntax
─ RDFS and OWL can be serialized in RDF
Semantic Web wrt to Data Syntax & Semantics
11

■ SPARQL [Gro13]: Standard query language for RDF
• 😃 High-level protocol
─ SPARQL 1.1 Protocol
• 😃 Uniform data access
─ Formal syntax and semantics
■ SPARQL is only for
querying (data consumers ) rather
than publishing data (data publishers )
Semantic Web for Data Access
12
Model
View
Controller
XQUERY,
SQL,
SPARQL

Semantic Web for Data Access
13
■ Linked Data principles [BL06]: provide RESTful access to data in RDF
• High-level protocol
─ operates on HTTP
• Uniform data access
─ Provides description using set of standards (RDF, Turtle etc)
─ Leaves open choices (e.g. Default RDF serialization)
■ Linked Data Platform 1.0 [SAM15c]: standardizes RESTful access to
data in RDF
• 😃 High-level protocol
─ Standardizes interaction on top of HTTP
• 😃 Uniform data access
─ Provides domain and interaction model

Linked Data Platform 1.0
■ Domain Model
• Defines different types of LDP resources
• Used to describe resources on LDPs
■ Interaction Model
• Well-defined HTTP methods for CRUD
operations on LDP Resources
14
LDP
Resource
LDP
RDF Source
LDP
Non-RDF
Source
LDP
Basic
Container
LDP
Container
LDP
Indirect
Container
LDP
Direct
Container
Semantic Web
LDP Standard: Linked Data Platform 1.0
LDPs: data platforms implementing LDP Standard

■ RDF for Data Syntax
• Uniform identification mechanism
• Flexibility
■ RDFS/OWL for Data Semantics
• Ontology languages
• Semantics in syntax
■ LDP Standard for Data Access
• High-level protocols
• Uniform data access
Satisfaction of Requirements for data interoperability
15
Semantic Web
Open
standards

LDP Related Work
16
■ Usage of LDP
• Linked Data Platform as a novel approach for Enterprise Application Integration [MGG13]
• Music SOFA: An architecture for semantically informed recomposition of Digital Music
Objects [DDR18]
• ECA2LD: Generating Linked Data from Entity-Component-Attribute runtimes [TRM18]
• Linking the Web of Things: LDP-CoAP Mapping [LIG+16]
■ Custom Generation of LDP
• Morph-LDP: An R2RML-based Linked Data Platform implementation [MPC+14]
• A Linked Data Platform adapter for the Bugzilla issue tracker [MGG14]
■ LDP Implementations:
• LDP Resource Management Systems: Generic LDP servers
• LDP Frameworks: Tools for developing LDP servers
Semantic Web

LDP Implementations
■ LDP Resource Management Systems:
• Generic LDP servers for storing, retrieving and
manipulating LDP resources through HTTP
methods
• e.g. OpenLink Virtuoso Server, Apache Marmotta,
Fedora Commons
■ LDP Frameworks:
• API for facilitating the manual development of
LDPs
• e.g. LDP4j [EGMGC14], Eclipse Lyo
17
RDF Data Sources
LDP Resource
Generator
LDP Resources

Generation of LDPs
18
Design Implementation Deployment
● Define data design: how
data is organized
according the domain
model
● Encode data design in
LDP Resource
Generator
● Deploy LDP server and
data
● Problems:
○ Heterogeneity: No
support for non-RDF
data sources
○ Hosting constraints
● Problems:
○ Tight coupling between
design and
implementation
hindering:
■ Maintainability of
design
■ Reusability of design
● Problems:
○ Definition is manual
Semantic Web

State of the art: Synthesis
19
■ Problems wrt to data exploitation in highly decentralized information
ecosystems are data heterogeneity and hosting constraints
■ Semantic Web standards (RDF, RDFS/OWL, LDP) satisfy requirements
for data interoperability
■ But generating LDPs from existing RDF data sources is a complex task:
• No support for non-RDF data sources
• No support for hosting constraints
• Manual development producing tight coupling between data
design and implementation
─ Reusability and maintainability of LDP designs are strongly limited

Objective
■ Automatize the generation of LDPs in highly decentralized
information ecosystem by using Semantic Web technologies and
considering the following constraints:
• Data Heterogeneity
• Hosting Constraints
• LDP Design Reusability
20

Outline
■ Semantic Web
■ LDP Generation Model
• LDP Generation Workflow
• LDP Design Language (LDP-DL)
■ Evaluation
21

■ Models as first-class entities
to generate [FR07]:
• Models
• Platforms
■ Higher reusability of
systems’ models [SVB+06]
Model Driven Engineering
<<defined using>>
<<defined using>>
<<uses>>
<<uses>>
<<uses>>
<<uses>>
LDP Generation Workflow

23
LDP Server
Data sources
LDP Resource Generation

24
LDP Server
Data sources
LDP
Dataset

25
LDP Server
LDP design
document
LDP
Dataset
Data sources

26
LDP Server
LDP design
document
LDP
Dataset
Model-to-Model
Transformation
Model-to-Platform
Transformation
Data sources

27
LDP Server
LDP design
document
LDP
Dataset
LDPizer
LDP Dataset
Deployer
Deployment
Parameters
Data sources

LDP Dataset
■ LDP Dataset consists of:
• Set of container structures (n,g,M):
─ n is the IRI of the container
─ g its RDF graph
─ M is a set of IRIs representing the members of container n
• Set of named graphs (n,g):
─ n is the IRI of the non-container
─ g its RDF graph
28

LDP Design Language (LDP-DL)
29
■ Overview
■ Syntax
■ Semantics

LDP-DL: Overview
30
Data Source

LDP-DL: Overview
31
Data Source
Data design questions:
■ What are the LDP resources wrt to
resources from the data source ?
■ What is the structure of
containers/non-containers ?
■ What are the content of

LDP-DL: Overview
32
LDP Dataset
Data Source

LDP-DL: Overview
33
LDP Dataset
Data Source
dex:paris-catalog a ldp:BasicContainer;
foaf:primaryTopic ex:paris-catalog;
ex:paris-catalog a dcat:catalog;
dcat:keyword "paris","dataset";
…….
ldp:contains dex:parking, dex:busStation;

LDP-DL: Overview
34
LDP Dataset
Data Source
LDP design language describes LDP
resources:
■ IRIs
■ organization in containers
■ Content (graph)
■ Members of containers

LDP-DL: Overview
35
Related resource
Related resource

LDP-DL: Overview
36
Related resource
…….

…….
LDP-DL: Overview
37
Related resource
RDF Graph of
the LDP Resource

LDP-DL: Syntax
■ ResourceMap:
• Related resources identified by
Query Pattern
• RDF graph of LDP resources described
by Construct Query
38

LDP-DL: Syntax
■ ResourceMap:
Query Pattern
by Construct Query
■ NonContainerMap: describes non-containers
39

LDP-DL: Syntax
■ ResourceMap:
Query Pattern
by Construct Query
■ ContainerMap: describes containers and their
members (containers or non-containers)
40

LDP-DL: Syntax
■ ResourceMap:
Query Pattern
by Construct Query
■ ContainerMap: describes containers and their members
(containers or non-containers)
■ DataSource describes:
• RDF Sources using their IRIs
• Non-RDF Sources using:
─ IRIs of data sources
─ IRIs of lifting rules
41

LDP-DL: Formal Semantics
42
eltdd
Interpretation of LDP-DL syntactic
constructs
notion of
satisfaction
<<instanceOf>>

LDP-DL: Formal Semantics
43
dd
Interpretation of LDP-DL syntactic
constructs
notion of
satisfaction
<<instanceO
f>>

■ Given an interpretation and a design document , we define
the LDP dataset that we call the evaluation of wrt
LDP-DL Formal Semantics
44
■ A LDP dataset D is valid wrt to iff there exists such that:
⊧ and D is the evaluation of wrt
■ We provide an algorithm for that generates LDP datasets that
are provably valid wrt input design documents

Handling Hosting Constraints
■ Dynamic LDP dataset store instructions to generate graph of LDP
resources
■ Using dynamic LDP dataset:
• Generate LDP dataset at deployment
• Generate graph of LDP resources at query time
■ Deal with dynamicity of data sources and hosting constraints
45

■ Semantic Web
• LDP Design Language (LDP-DL)
■ Evaluation
Outline
46

LDP Generation Toolkit
47

48
*Lefrançois, Maxime, Antoine Zimmermann, and Noorani Bakerally.
"A SPARQL extension for generating RDF from heterogeneous
formats." European Semantic Web Conference. Springer, Cham, 2017.

49

50

51

■ Semantic Web
■ Evaluation
Outline
52

Evaluation
■ Objective: Automatize the generation of LDPs in highly
decentralized information ecosystem by using Semantic Web
technologies and considering the following constraints:
• Data Heterogeneity
• Hosting Constraints
• LDP Design Reusability
■ Evaluation criteria are derived from objective
53

Evaluation: Experiment Settings
■ 8 design documents
■ 28 data sources
• RDF data sources:
─ Open data catalogs from 21 data portals
─ BBC wildlife dataset
─ LodPaddle
• Heterogeneous data sources (JSON, CSV)
• Real-time data sources (JSON, CSV)
■ Github: https://github.com/noorbakerally/LDPDatasetExamples
■ Performance test done using a simple design document and
different data sources having a maximum of 1 million triples
• Performance is approximately linear
54

Evaluation
■ Homogeneous LDP Access Experiment: LDP Generation from
heterogeneous data sources

Evaluation
■ Dynamic LDP Experiment: LDP Generation from real-time data source

Evaluation: LDP Design Reusability
■ Domain Design Reusability Experiment: Same design document
and varying data sources structured with same ontology
57

■ Generic Design Reusability Experiment: Same design document
and varying data sources structured with different ontology
58

■ Modular Design Reusability Experiment: Modular design
documents
59

Summary of evaluation
60
Evaluation Criteria
Experiments Data
Heterogeneity
Hosting
Constraints
LDP Design
Reusability
Automatization
Homogeneous
LDP Access ✔ ✔
Dynamic LDP
✔ ✔
Domain Design
Reusability ✔ ✔
Generic Design
Reusability ✔ ✔
Modular Design
Reusability ✔ ✔

■ Semantic Web
• LDP Design Language
■ Evaluation
Outline
61

■ Definition of Highly decentralized information ecosystem
• Identification of problems w.r.t data exploitation
• Identification of requirements for data interoperability
■ Semantic Web standards as foundations to facilitate data
publications
■ Data exploitation may be facilitated by providing tools to data
publishers rather than only data consumers
Conclusion: Context
62

■ LDP Generation Workflow
• LDP Design Language with:
─ Formal syntax to write LDP design documents
─ Formal semantics to properly interpret LDP design documents
• LDP Dataset
■ LDP Generation Toolkit: Implementation of the LDP Generation
Workflow
■ Evaluation of LDP Generation Toolkit wrt data heterogeneity, hosting
constraints, LDP design reusability
Conclusion: Summary of Contributions
63

■ Partial coverage of the LDP standard (e.g. Direct, Indirect
Containers are not considered)
■ Limited handling of hosting constraints
■ Manual generation of LDP design documents
■ Manual generation of lifting rules
Conclusion: Limitations
64

Perspectives
■ Enrich design aspects in LDP-DL Model
• Consider Direct & Indirect containers
• Provide deployment constructs to describe aspects such as:
─ Access rights
─ Paging
■ Generate Linked Data based on best practices from Data on the Web Best
Practices [LBC17]
■ Provide LDP Generation methodology
■ Evaluate with real users of LDP
65

References
[BG14] Dan Brickley and Ramanathan V. Guha. RDF Schema 1.1. W3C
Recommendation, World Wide Web Consortium (W3C), February 25 2014.
[BL06] Tim Berners-Lee. Linked Data-Design Issues, 2006.
[CWL14] R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1 Concepts and Abstract
Syntax, W3C Recommendation 25 February 2014. Technical report, W3C, 2014
[DDR18] De Roure, David, et al. "Music sofa: An architecture for semantically informed
recomposition of digital music objects." Proceedings of the 1st International Workshop
on Semantic Applications for Audio and Music. ACM, 2018.
[FR07] R. B. France and B. Rumpe. Model-driven development of complex software: A
research roadmap. In FOSE, 2007.
[Gro13] W3C SPARQL Working Group. SPARQL 1.1 Overview. W3C Recommendation,
World Wide Web Consortium (W3C), March 21 2013.
66

References
[LIG+16] Loseto, Giuseppe, et al. "Linking the web of things: LDP-CoAP mapping."
Procedia Computer Science 83 (2016): 1182-1187.
[MGG13] Mihindukulasooriya, Nandana, Raúl García-Castro, and Miguel Esteban
Gutiérrez. "Linked Data Platform as a novel approach for Enterprise Application
Integration." COLD. 2013.
[MGG14] Mihindukulasooriya, Nandana Sampath, Miguel Esteban Gutiérrez, and Raul
García Castro. "A Linked Data Platform adapter for the Bugzilla issue tracker." (2014):
89-92.
[MPC+14] Mihindukulasooriya, Nandana, et al. "morph-LDP: an R2RML-based linked
data platform implementation." European Semantic Web Conference. Springer, Cham,
2014.
[SAM15c] Steve Speicher, John Arwe, and Ashok Malhotra. Linked Data Platform 1.0.
Technical report, World Wide Web Consortium (W3C), February 26 2015.
67

References
[SVB+06] T. Stahl, M. Volter, J. Bettin, A. Haase, and S. Helsen. Model-driven software
development: technology, engineering, management. Pitman, 2006.
[TRM18] Spieldenner, T., Schubotz, R., & Guldner, M. (2018, June). ECA2LD:
Generating Linked Data from Entity-Component-Attribute runtimes. In 2018 Global
Internet of Things Summit (GIoTS) (pp. 1-4). IEEE.
[W3C12] W3C OWL Working Group. OWL 2 Web Ontology Language Docu-ment
Overview (Second Edition), W3C Recommendation 11 December2012. W3C
Recommendation, World Wide Web Consortium (W3C),December 11 2012
68

Annexes
69

Model Theoretic Semantics: LDP-DL Interpretation

Model Theoretic Semantics: DataSource Satisfaction
71

Model Theoretic Semantics: Ancestor List and Mapping
72

Model Theoretic Semantics: ResourceMap Satisfaction
73

Model Theoretic Semantics: NonContainerMap
Satisfaction
74

Model Theoretic Semantics: ContainerMap Satisfaction
75

Map Evaluation
76

Design Document Evaluation
77

Design Document Evaluation
78

Flexible LDP Design
79

LDP-DL Semantics
80

LDP-DL Semantics
81
1. Eval of qp returns { 𝞀←ex:paris-catalog} and
{𝞀←ex:toulouse-catalog}
2. for each of them, a new resource is created
3. consider {𝞀 ←ex:paris-catalog}
4. the new resource (𝜈) is dex:paris-catalog
5. To generate graph of dex:paris-catalog, cq is
evaluated on the source with the bindings
{𝞀←ex:paris-catalog}, {𝜈←dex:paris-catalog}
𝞀: related resource, 𝜈: new LDP resource

LDP-DL Semantics
82
:dataset ContainerMap
members of dex:paris-catalog and
dex:toulouse-catalogs

LDP-DL Semantics
83
-Consider eval of :dataset to generate members of
dex:paris-catalog
-members of dex:paris-catalog describes
dcat:datasets of ex:paris-catalog (related
resource)
- eval of qp is done with bindings
{π1
← ex:paris-catalog}

PhD Defense

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PhD Defense

Similar to PhD Defense (20)

Recently uploaded

Recently uploaded (20)

PhD Defense