SlideShare a Scribd company logo
1 of 34
Download to read offline
1Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, FranceFranck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
F. Michel
Université Côte d’Azur, CNRS, Inia, laboratore I3S
Défi MASTODONS - Les Big Data en recherche, 13 Juin 2019
Heterogeneous Data Aggregation and Querying
at Web Scale
Using Semantic alignment Technics
2Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
More data sources  More Data Integration opportunities
3Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Hortus Sanitatis.
First Natural History encyclopaedia, 1485.
4Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Data Integration ex. in Digital Humanities
Archaeological excavationConservation biology*
*http://www.lynxeds.com/hmw/plate/family-delphinidae-ocean-dolphins
Hortus Sanitatis, 1485.
5Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Data Integration ex. in Digital Humanities
Archaeological excavationConservation biology*
*http://www.lynxeds.com/hmw/plate/family-delphinidae-ocean-dolphins
First Natural History Encycloedia, 1485.
Knowledge formalization
Controlled vocabularies,
taxonomies,
domain ontologies…
6Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
fédération de données et de
ConnaissancEs Distribuées en Imagerie BiomédicaLE
Scientific annual workshops 2012, 2013, 2014, 2015
Issues:
High heterogeneity
Increasing amount/number of sources
Need for cross-factor analysis
Sensitive (privacy, access policies)
Methods:
Knowledge formalization
Semantic alignment
Mediation towards common formats
Distributed querying
7Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
How to enable RDF-based integration
of heterogeneous data sources?
8Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
RDF-based Data Integration
Graph
Materialization
(ETL like)
Virtual Graph
Query
rewriting
SPARQL
SPARQL
Heterogeneous
data sources
ID NAME
9Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Many methods for many types of data sources
AstroGrid-D, SPARQL2XQuery, XSPARQL
XML
XLWrap, Linked CSV, CSVW, RML
CSV/TSV/Spreadsheets
D2RQ, R2O, Ultrawrap, Triplify, SM
R2RML: Morph-RDB, ontop, Virtuoso
Relational Databases
RML, TARQL, Apache Any23, DataLift,
SPARQL-Generate
Multiple formats
RDFa, Microformats, JSON-LD
HTML
TARQL, JSON-LD, RML
JSON
xR2RML (MongoDB), ontop (MongoDB),
[Mugnier et al, 2016] (key-value stores)
NoSQL
M.L. Mugnier, M.C. Rousset, and F. Ulliana. “Ontology-Mediated Queries for NOSQL Databases.” In Proc. AAAI. 2016.
SPARQL Micro-services, Linked REST APIs
Web APIs
10Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Agenda
xR2RML: Generic translation of
heterogeneous data sources into RDF
SPARQL micro-services:
Bridging Web APIs and the Web of Data
Applications in the biodiversity domain
11Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Agenda
xR2RML: Generic translation of
heterogeneous data sources into RDF
SPARQL micro-services:
Bridging Web APIs and the Web of Data
Applications in the biodiversity domain
12Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
The generic translation of
heterogeneous data sources into RDF
requires a generic mapping description.
13Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
TEACHERS
ID FNAME TEACHES
7 Catherine Semantic Web
8 Philippe Software Engineering
… … …
http://example.org/teacher/7
Catherine
foaf:name ex:teaches
https://www.wikidata.org/
entity/Q54837
Mapping description
14Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
The xR2RML mapping language
Uniform description of mappings from
most common types of DB to RDF
Extends R2RML, the W3C recommendation
for RDBs, and RML
Rich iteration model to accommodate
nested, hierarchical documents
Flexibility:
• Allow any query language
• Allow any syntax to reference data elements
from query results
http://i3s.unice.fr/~fmichel/xr2rml_specification_v5.html
15Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
How to query a data source with SPARQL
using such a mapping description?
16Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
SPARQL rewriting techniques for SQL and XQuery
Semantics-preserving 1-to-1 rewriting
Closely coupled with the target QL capabilities:
Support of joins, unions, nested queries, filtering, string fctn, etc.
Optimization:
Enforced on the target query,
or delegated to the DB query-processing engine
SQL: Bizer & Cyganiak, 2006; Unbehauen et al., 2013a; Priyatna et al., 2014; Rodríguez-Muro & Rezk, 2015
XQuery: Bikakis et al., 2015
Optimization: Unbehauen et al., 2013b; Rodríguez-Muro & Rezk, 2015; Elliott et al., 2009; Sequeda & Miranker, 2013
17Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
How much of the SPARQL rewriting process can be
done in a DB-agnostic yet optimized manner?
18Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Abstract Query Language (AQL)
Embark enough information for translation towards “any” DB QL.
Early optimizations
Self-Join Elimination, Self-Union Elimination, Filter propagation
SPARQL
query
xR2RML
mappings
Abstract
query
Concrete DB
query
19Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Application to
AQL-to-MongoDB rewriting challenging:
 Expressiveness gap: SPARQL  AQL  MongoDB
Joins not supported, nested query hardly supported, limited filter expressions
 Semantic ambiguity
20Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Filling the gap between the two worlds is not straightforward
Yet, NoSQL DBs are a huge, quickly increasing source of data.
Potential for RDF-based data integration and publication in the Web of Data.
SemanticWeb vs. NoSQL
Semantic Web NoSQL
highly connected graphs isolated documents, joins hardly supported
rich query expressiveness low expressiveness
reasoning _
? high throughput, high availability
_ horizontal elasticity
21Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Generic approach suitable
when direct access to the data source
Graph
Materialization
query rewriting
ID NAME
What if we access the data source via an API?
SPARQL
22Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Agenda
xR2RML: Generic translation of
heterogeneous data sources into RDF
SPARQL micro-services:
Bridging Web APIs and the Web of Data
Applications in the biodiversity domain
23Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Web APIs: APIs all over the web
21,700+ Web APIs are registered on ProgrammableWeb.com (Jun. 2019)
Limitations:
• Standard formats (e.g. JSON, XML)
but proprietary vocabularies
• Documented in web pages but
not machine-processable,
no explicit semantics
• Internal resource identifiers,
no hyperlinks to resources
• Partial view over the database by
means of predefined services
24Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
The SPARQL Micro-ServiceArchitecture
Lightweight method to query a Web API with SPARQL
SPARQL
Client
SPARQL
Micro-Service
(1) SPARQL
query
(2) Web API
query(4) SPARQL
response
(3) Web API
response
25Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Bridging Web APIs and the Web of Data
Assign dereferenceable
URIs to Web API resources
Brooklyn Bridge sunset
schema:name
schema:contentUtl
unlock
http://example.org/photo/53735656
SPARQL
µ-service
Expose in the Web of Data
resources locked in a silo
26Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Agenda
xR2RML: Generic translation of
heterogeneous data sources into RDF
SPARQL micro-services:
Bridging Web APIs and the Web of Data
Applications in the biodiversity domain
27Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Use case
TAXREF
French TAXonomic REFerence for fauna, flora, fungus
maintained by the Muséum National d’Histoire Naturelle.
570,000+ scientific names, 260,000+ taxa
Mainland France and overseas territories,
Web site, Web service, downloadable text file
28Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Biodiversity studies (e.g. impact of global warming
on species distributions) require mashing up data
from multiple stakeholders
How to make biodiversity data FAIR?
29Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
TAXREF-LD
Linking Open Data cloud diagram, 2019. J.P. McCrae, A. Abele,
P. Buitelaar, R. Cyganiak, A. Jentzsch, V. Andryushechkin and J.
Debattista. http://lod-cloud.net/
http://taxref.mnhn.fr/sparql
Several steps involved…
• Modelling of taxonomic
information as Linked Data
• Write and enact xR2RML
mappings
(JSON  MongoDB  RDF)
• Publish on the Web of Data
30Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Web app.
SPARQL  HTML
SPARQL
Micro-services
TAXREF-LD
NCBI
TaxonConcept
Agrovoc
31Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
SPARQL micro-services to compare TAXREF
information with 7 biodiversity sources:
• FishBase
• Global Biodiversity Information Framework
• World Register of Marine Species
• Pan-European Species directoris Infrstructure
• Index Fungorum
• Tropicos
• Sandre – Service d’Administration National des
Donées et Référentiels de l’Eau
32Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
http://sms.i3s.unice.fr/demo-sms?param=Delphinapterus+leucas
33Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Take-aways
More data sources => new data
integration scenarios
Need for explicit, machine-processable
data semantics
The SW provides tools to do that
RDF, SPARQL, ontologies…
Various methods to translate
heterogeneous data sources to RDF
Mapping language-based
Wrapper-based
More research needed to:
• Allow automatic discovery of data sources,
e.g. data portals, search engines…
• Automatic generation of federated queries
• Automate semantic alignment of data
sources represented in RDF
These technics are a way to achieve
Open Data, Open Science, FAIRness
34Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Related publications
Generic translation to RDF
Michel F., Djimenou L., Faron-Zucker C. & Montagnat J. (2015). Translation of Relational
and Non-Relational Databases into RDF with xR2RML. In Proceeding of the WebIST, pp.
443–454. Lisbon, Portugal.
Michel F., Faron-Zucker C. & Montagnat J. (2016). A Generic Mapping-Based Query
Translation from SPARQL to Various Target Database Query Languages. In Proceeding of
WebIST vol. 2, pp. 147–158. Rome, Italy.
Michel F., Faron-Zucker C. & Montagnat J. (2016). A Mapping-based Method to Query
MongoDB Documents with SPARQL. In Proceedings of DEXA vol. 9828, LNCS, pp. 52–67.
Porto, Portugal.
Michel F., Catherine F. Z. & Montagnat J. (2018). Bridging the Semantic Web and NoSQL
Worlds: Generic SPARQL Query Translation and Application to MongoDB. Transactions
on Large-Scale Data- and Knowledge-Centered Systems (LNCS 11360):125–165.
Biodiversity
Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. (2017). A Model to Represent
Nomenclatural and Taxonomic Information as Linked Data. Application to the French
Taxonomic Register, TAXREF. In Proceedings of the ISWC2017 workshop on Semantics for
Biodiversity (S4BioDiv) vol. 1933. Vienna, Austria.
Michel F., Faron-Zucker C., Tercerie S. & Olivier G. (2018). Modelling Biodiversity Linked
Data: Pragmatism May Narrow Future Opportunities. In Biodiversity Information Science
and Standards, TDWG 2018 Proceedings vol. 2, p. e26235. Dunedin, New Zealand.
SPARL micro-services
Michel F., Faron-Zucker C. & Gandon F (2018). SPARQL Micro-Services: Lightweight
Integration of Web APIs and Linked Data. In Proceedings of the Linked Data on the Web
Workshop (LDOW2018). Lyon, France.
Michel F., Zucker C., Gargominy O. & Gandon F. (2018). Integration of Web APIs and Linked
Data Using SPARQL Micro-Services—Application to Biodiversity Use Cases. Information
9(12):310.
F. Michel, C. Faron-Zucker, O. Corby & F. Gandon. Enabling Automatic Discovery and Querying
of Web APIs at Web Scale using Linked Data Standards. In Companion Proceedings of the
2019 World Wide Web Conference
(WWW ’19 Companion), 2019, San Francisco, CA, USA.

More Related Content

What's hot

End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataVassilis Protonotarios
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpymustafa sarac
 
Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Globus
 
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...Nikolaos Konstantinou
 
51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data StackGeoffrey Fox
 
Dealing with Open Domain Data
Dealing with Open Domain DataDealing with Open Domain Data
Dealing with Open Domain DataMathieu d'Aquin
 
Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingTobias Kuhn
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community UpdateCarole Goble
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC Geoffrey Fox
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinalDeborah McGuinness
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic AgeStuart Chalk
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
 

What's hot (20)

DisGeNET Tutorial SWAT4LS 2015-12-07
DisGeNET Tutorial SWAT4LS 2015-12-07DisGeNET Tutorial SWAT4LS 2015-12-07
DisGeNET Tutorial SWAT4LS 2015-12-07
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
 
Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects
 
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
 
51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
 
Dealing with Open Domain Data
Dealing with Open Domain DataDealing with Open Domain Data
Dealing with Open Domain Data
 
Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
Article06
Article06Article06
Article06
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 

Similar to Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic alignment Technics

SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataFranck Michel
 
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...Franck Michel
 
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Franck Michel
 
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...Franck Michel
 
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...Franck Michel
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
 
Knowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataKnowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataFranck Michel
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloudstratuslab
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
OpenAIRE provide dashboard #OpenAIREweek2020
OpenAIRE provide dashboard #OpenAIREweek2020OpenAIRE provide dashboard #OpenAIREweek2020
OpenAIRE provide dashboard #OpenAIREweek2020Pedro Príncipe
 
EPOS metadata catalogue
EPOS metadata catalogueEPOS metadata catalogue
EPOS metadata catalogueBlue BRIDGE
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...Eric Stephan
 
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Franck Michel
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataMicrosoft Technet France
 
Challenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing PlatformsChallenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing PlatformsFrederic Desprez
 
OpenACC Monthly Highlights: January 2024
OpenACC Monthly Highlights: January 2024OpenACC Monthly Highlights: January 2024
OpenACC Monthly Highlights: January 2024OpenACC
 

Similar to Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic alignment Technics (20)

SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
 
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
 
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
 
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
 
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the Web
 
Knowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataKnowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked data
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 
World bank 2011-05
World bank 2011-05World bank 2011-05
World bank 2011-05
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
OpenAIRE provide dashboard #OpenAIREweek2020
OpenAIRE provide dashboard #OpenAIREweek2020OpenAIRE provide dashboard #OpenAIREweek2020
OpenAIRE provide dashboard #OpenAIREweek2020
 
EPOS metadata catalogue
EPOS metadata catalogueEPOS metadata catalogue
EPOS metadata catalogue
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
 
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big data
 
Challenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing PlatformsChallenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing Platforms
 
OpenACC Monthly Highlights: January 2024
OpenACC Monthly Highlights: January 2024OpenACC Monthly Highlights: January 2024
OpenACC Monthly Highlights: January 2024
 

More from Franck Michel

Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesModelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesFranck Michel
 
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...Franck Michel
 
Integrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataIntegrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataFranck Michel
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Franck Michel
 
A Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLA Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLFranck Michel
 
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...Franck Michel
 
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLTranslation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLFranck Michel
 
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Franck Michel
 

More from Franck Michel (8)

Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesModelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
 
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
 
Integrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataIntegrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of Data
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
 
A Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLA Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQL
 
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
 
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLTranslation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RML
 
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
 

Recently uploaded

Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!University of Hertfordshire
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfPharmatech-rx
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Ansari Aashif Raza Mohd Imtiyaz
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionAreesha Ahmad
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed systemADB online India
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxGOWTHAMIM22
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfmarcuskenyatta275
 
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyanPlasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyanmuralinath2
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxmuralinath2
 
-case selection and treatment planing.pptx
-case selection and treatment planing.pptx-case selection and treatment planing.pptx
-case selection and treatment planing.pptxmohamedturki866
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Sérgio Sacani
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfpablovgd
 
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
SCHISTOSOMA HEAMATOBIUM life cycle  .pdfSCHISTOSOMA HEAMATOBIUM life cycle  .pdf
SCHISTOSOMA HEAMATOBIUM life cycle .pdfDebdattaGhosh6
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyanmuralinath2
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.syedmuneemqadri
 
Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesjyothisaisri
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureSérgio Sacani
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Sérgio Sacani
 
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)Areesha Ahmad
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsSérgio Sacani
 

Recently uploaded (20)

Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interaction
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed system
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyanPlasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 
-case selection and treatment planing.pptx
-case selection and treatment planing.pptx-case selection and treatment planing.pptx
-case selection and treatment planing.pptx
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
SCHISTOSOMA HEAMATOBIUM life cycle  .pdfSCHISTOSOMA HEAMATOBIUM life cycle  .pdf
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyan
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.
 
Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
 
Detectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a TechnosignatureDetectability of Solar Panels as a Technosignature
Detectability of Solar Panels as a Technosignature
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)GBSN -  Microbiology Lab (Microbiology Lab Safety Procedures)
GBSN - Microbiology Lab (Microbiology Lab Safety Procedures)
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 

Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic alignment Technics

  • 1. 1Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, FranceFranck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France F. Michel Université Côte d’Azur, CNRS, Inia, laboratore I3S Défi MASTODONS - Les Big Data en recherche, 13 Juin 2019 Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic alignment Technics
  • 2. 2Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France More data sources  More Data Integration opportunities
  • 3. 3Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Hortus Sanitatis. First Natural History encyclopaedia, 1485.
  • 4. 4Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Data Integration ex. in Digital Humanities Archaeological excavationConservation biology* *http://www.lynxeds.com/hmw/plate/family-delphinidae-ocean-dolphins Hortus Sanitatis, 1485.
  • 5. 5Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Data Integration ex. in Digital Humanities Archaeological excavationConservation biology* *http://www.lynxeds.com/hmw/plate/family-delphinidae-ocean-dolphins First Natural History Encycloedia, 1485. Knowledge formalization Controlled vocabularies, taxonomies, domain ontologies…
  • 6. 6Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Scientific annual workshops 2012, 2013, 2014, 2015 Issues: High heterogeneity Increasing amount/number of sources Need for cross-factor analysis Sensitive (privacy, access policies) Methods: Knowledge formalization Semantic alignment Mediation towards common formats Distributed querying
  • 7. 7Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France How to enable RDF-based integration of heterogeneous data sources?
  • 8. 8Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France RDF-based Data Integration Graph Materialization (ETL like) Virtual Graph Query rewriting SPARQL SPARQL Heterogeneous data sources ID NAME
  • 9. 9Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Many methods for many types of data sources AstroGrid-D, SPARQL2XQuery, XSPARQL XML XLWrap, Linked CSV, CSVW, RML CSV/TSV/Spreadsheets D2RQ, R2O, Ultrawrap, Triplify, SM R2RML: Morph-RDB, ontop, Virtuoso Relational Databases RML, TARQL, Apache Any23, DataLift, SPARQL-Generate Multiple formats RDFa, Microformats, JSON-LD HTML TARQL, JSON-LD, RML JSON xR2RML (MongoDB), ontop (MongoDB), [Mugnier et al, 2016] (key-value stores) NoSQL M.L. Mugnier, M.C. Rousset, and F. Ulliana. “Ontology-Mediated Queries for NOSQL Databases.” In Proc. AAAI. 2016. SPARQL Micro-services, Linked REST APIs Web APIs
  • 10. 10Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Agenda xR2RML: Generic translation of heterogeneous data sources into RDF SPARQL micro-services: Bridging Web APIs and the Web of Data Applications in the biodiversity domain
  • 11. 11Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Agenda xR2RML: Generic translation of heterogeneous data sources into RDF SPARQL micro-services: Bridging Web APIs and the Web of Data Applications in the biodiversity domain
  • 12. 12Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France The generic translation of heterogeneous data sources into RDF requires a generic mapping description.
  • 13. 13Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France TEACHERS ID FNAME TEACHES 7 Catherine Semantic Web 8 Philippe Software Engineering … … … http://example.org/teacher/7 Catherine foaf:name ex:teaches https://www.wikidata.org/ entity/Q54837 Mapping description
  • 14. 14Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France The xR2RML mapping language Uniform description of mappings from most common types of DB to RDF Extends R2RML, the W3C recommendation for RDBs, and RML Rich iteration model to accommodate nested, hierarchical documents Flexibility: • Allow any query language • Allow any syntax to reference data elements from query results http://i3s.unice.fr/~fmichel/xr2rml_specification_v5.html
  • 15. 15Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France How to query a data source with SPARQL using such a mapping description?
  • 16. 16Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France SPARQL rewriting techniques for SQL and XQuery Semantics-preserving 1-to-1 rewriting Closely coupled with the target QL capabilities: Support of joins, unions, nested queries, filtering, string fctn, etc. Optimization: Enforced on the target query, or delegated to the DB query-processing engine SQL: Bizer & Cyganiak, 2006; Unbehauen et al., 2013a; Priyatna et al., 2014; Rodríguez-Muro & Rezk, 2015 XQuery: Bikakis et al., 2015 Optimization: Unbehauen et al., 2013b; Rodríguez-Muro & Rezk, 2015; Elliott et al., 2009; Sequeda & Miranker, 2013
  • 17. 17Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France How much of the SPARQL rewriting process can be done in a DB-agnostic yet optimized manner?
  • 18. 18Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Abstract Query Language (AQL) Embark enough information for translation towards “any” DB QL. Early optimizations Self-Join Elimination, Self-Union Elimination, Filter propagation SPARQL query xR2RML mappings Abstract query Concrete DB query
  • 19. 19Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Application to AQL-to-MongoDB rewriting challenging:  Expressiveness gap: SPARQL  AQL  MongoDB Joins not supported, nested query hardly supported, limited filter expressions  Semantic ambiguity
  • 20. 20Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Filling the gap between the two worlds is not straightforward Yet, NoSQL DBs are a huge, quickly increasing source of data. Potential for RDF-based data integration and publication in the Web of Data. SemanticWeb vs. NoSQL Semantic Web NoSQL highly connected graphs isolated documents, joins hardly supported rich query expressiveness low expressiveness reasoning _ ? high throughput, high availability _ horizontal elasticity
  • 21. 21Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Generic approach suitable when direct access to the data source Graph Materialization query rewriting ID NAME What if we access the data source via an API? SPARQL
  • 22. 22Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Agenda xR2RML: Generic translation of heterogeneous data sources into RDF SPARQL micro-services: Bridging Web APIs and the Web of Data Applications in the biodiversity domain
  • 23. 23Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Web APIs: APIs all over the web 21,700+ Web APIs are registered on ProgrammableWeb.com (Jun. 2019) Limitations: • Standard formats (e.g. JSON, XML) but proprietary vocabularies • Documented in web pages but not machine-processable, no explicit semantics • Internal resource identifiers, no hyperlinks to resources • Partial view over the database by means of predefined services
  • 24. 24Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France The SPARQL Micro-ServiceArchitecture Lightweight method to query a Web API with SPARQL SPARQL Client SPARQL Micro-Service (1) SPARQL query (2) Web API query(4) SPARQL response (3) Web API response
  • 25. 25Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Bridging Web APIs and the Web of Data Assign dereferenceable URIs to Web API resources Brooklyn Bridge sunset schema:name schema:contentUtl unlock http://example.org/photo/53735656 SPARQL µ-service Expose in the Web of Data resources locked in a silo
  • 26. 26Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Agenda xR2RML: Generic translation of heterogeneous data sources into RDF SPARQL micro-services: Bridging Web APIs and the Web of Data Applications in the biodiversity domain
  • 27. 27Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Use case TAXREF French TAXonomic REFerence for fauna, flora, fungus maintained by the Muséum National d’Histoire Naturelle. 570,000+ scientific names, 260,000+ taxa Mainland France and overseas territories, Web site, Web service, downloadable text file
  • 28. 28Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Biodiversity studies (e.g. impact of global warming on species distributions) require mashing up data from multiple stakeholders How to make biodiversity data FAIR?
  • 29. 29Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France TAXREF-LD Linking Open Data cloud diagram, 2019. J.P. McCrae, A. Abele, P. Buitelaar, R. Cyganiak, A. Jentzsch, V. Andryushechkin and J. Debattista. http://lod-cloud.net/ http://taxref.mnhn.fr/sparql Several steps involved… • Modelling of taxonomic information as Linked Data • Write and enact xR2RML mappings (JSON  MongoDB  RDF) • Publish on the Web of Data
  • 30. 30Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Web app. SPARQL  HTML SPARQL Micro-services TAXREF-LD NCBI TaxonConcept Agrovoc
  • 31. 31Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France SPARQL micro-services to compare TAXREF information with 7 biodiversity sources: • FishBase • Global Biodiversity Information Framework • World Register of Marine Species • Pan-European Species directoris Infrstructure • Index Fungorum • Tropicos • Sandre – Service d’Administration National des Donées et Référentiels de l’Eau
  • 32. 32Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France http://sms.i3s.unice.fr/demo-sms?param=Delphinapterus+leucas
  • 33. 33Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Take-aways More data sources => new data integration scenarios Need for explicit, machine-processable data semantics The SW provides tools to do that RDF, SPARQL, ontologies… Various methods to translate heterogeneous data sources to RDF Mapping language-based Wrapper-based More research needed to: • Allow automatic discovery of data sources, e.g. data portals, search engines… • Automatic generation of federated queries • Automate semantic alignment of data sources represented in RDF These technics are a way to achieve Open Data, Open Science, FAIRness
  • 34. 34Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Related publications Generic translation to RDF Michel F., Djimenou L., Faron-Zucker C. & Montagnat J. (2015). Translation of Relational and Non-Relational Databases into RDF with xR2RML. In Proceeding of the WebIST, pp. 443–454. Lisbon, Portugal. Michel F., Faron-Zucker C. & Montagnat J. (2016). A Generic Mapping-Based Query Translation from SPARQL to Various Target Database Query Languages. In Proceeding of WebIST vol. 2, pp. 147–158. Rome, Italy. Michel F., Faron-Zucker C. & Montagnat J. (2016). A Mapping-based Method to Query MongoDB Documents with SPARQL. In Proceedings of DEXA vol. 9828, LNCS, pp. 52–67. Porto, Portugal. Michel F., Catherine F. Z. & Montagnat J. (2018). Bridging the Semantic Web and NoSQL Worlds: Generic SPARQL Query Translation and Application to MongoDB. Transactions on Large-Scale Data- and Knowledge-Centered Systems (LNCS 11360):125–165. Biodiversity Michel F., Gargominy O., Tercerie S. & Faron-Zucker C. (2017). A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. Application to the French Taxonomic Register, TAXREF. In Proceedings of the ISWC2017 workshop on Semantics for Biodiversity (S4BioDiv) vol. 1933. Vienna, Austria. Michel F., Faron-Zucker C., Tercerie S. & Olivier G. (2018). Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities. In Biodiversity Information Science and Standards, TDWG 2018 Proceedings vol. 2, p. e26235. Dunedin, New Zealand. SPARL micro-services Michel F., Faron-Zucker C. & Gandon F (2018). SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data. In Proceedings of the Linked Data on the Web Workshop (LDOW2018). Lyon, France. Michel F., Zucker C., Gargominy O. & Gandon F. (2018). Integration of Web APIs and Linked Data Using SPARQL Micro-Services—Application to Biodiversity Use Cases. Information 9(12):310. F. Michel, C. Faron-Zucker, O. Corby & F. Gandon. Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Linked Data Standards. In Companion Proceedings of the 2019 World Wide Web Conference (WWW ’19 Companion), 2019, San Francisco, CA, USA.