These are the slides of a 40mn presentation I've made at the CNRS Software Development days (JDEV 2017), in Marseille (France), July 5th, 2017.
Here is the Webcast, in French: https://webcast.in2p3.fr/videos-integrer_des_sources_de_donnees_heterogenes_dans_le_web_de_donnees
What is the fuzz on triple stores? Will triple stores eventually replace relational databases? This talk looks at the big picture, explains the technology and tries to look at the road ahead.
Overview of how data on the Web of Data can be consumed (first and foremost Linked Data) and implications for the development of usage mining approaches.
References:
Elbedweihy, K., Mazumdar, S., Cano, A. E., Wrigley, S. N., & Ciravegna, F. (2011). Identifying Information Needs by Modelling Collective Query Patterns. COLD, 782.
Elbedweihy, K., Wrigley, S. N., & Ciravegna, F. (2012). Improving Semantic Search Using Query Log Analysis. Interacting with Linked Data (ILD 2012), 61.
Raghuveer, A. (2012). Characterizing machine agent behavior through SPARQL query mining. In Proceedings of the International Workshop on Usage Analysis and the Web of Data, Lyon, France.
Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study of real-world SPARQL queries. arXiv preprint arXiv:1103.5043.
Hartig, O., Bizer, C., & Freytag, J. C. (2009). Executing SPARQL queries over the web of linked data (pp. 293-309). Springer Berlin Heidelberg.
Verborgh, R., Hartig, O., De Meester, B., Haesendonck, G., De Vocht, L., Vander Sande, M., ... & Van de Walle, R. (2014). Querying datasets on the web with high availability. In The Semantic Web–ISWC 2014 (pp. 180-196). Springer International Publishing.
Verborgh, R., Vander Sande, M., Colpaert, P., Coppens, S., Mannens, E., & Van de Walle, R. (2014, April). Web-Scale Querying through Linked Data Fragments. In LDOW.
Luczak-Rösch, M., & Bischoff, M. (2011). Statistical analysis of web of data usage. In Joint Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn2011), CEUR WS.
Luczak-Rösch, M. (2014). Usage-dependent maintenance of structured Web data sets (Doctoral dissertation, Freie Universität Berlin, Germany), http://edocs.fu-berlin.de/diss/receive/FUDISS_thesis_000000096138.
Talk delivered at YOW! Developer Conferences in Melbourne, Brisbane and Sydney Australia on 1-9 December 2016.
Abstract: Governments collect a lot of data. Data on air quality, toxic chemicals, laws and regulations, public health, and the census are intended to be widely distributed. Some data is not for public consumption. This talk focuses on open government data — the information that is meant to be made available for benefit of policy makers, researchers, scientists, industry, community organisers, journalists and members of civil society.
We’ll cover the evolution of Linked Data, which is now being used by Google, Apple, IBM Watson, federal governments worldwide, non-profits including CSIRO and OpenPHACTS, and thousands of others worldwide.
Next we’ll delve into the evolution of the U.S. Environmental Protection Agency’s Open Data service that we implemented using Linked Data and an Open Source Data Platform. Highlights include how we connected to hundreds of billions of open data facts in the world’s largest, open chemical molecules database PubChem and DBpedia.
WHO SHOULD ATTEND
Data scientists, software engineers, data analysts, DBAs, technical leaders and anyone interested in utilising linked data and open government data.
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
One day workshop Linked Data and Semantic WebVictor de Boer
As taught at UNIMAS July 2019. based on a three day summer school by Knud Hinnerk Moeller and Victor de Boer. Includes hands on excercises using SWI-Prolog ClioPatria
What is the fuzz on triple stores? Will triple stores eventually replace relational databases? This talk looks at the big picture, explains the technology and tries to look at the road ahead.
Overview of how data on the Web of Data can be consumed (first and foremost Linked Data) and implications for the development of usage mining approaches.
References:
Elbedweihy, K., Mazumdar, S., Cano, A. E., Wrigley, S. N., & Ciravegna, F. (2011). Identifying Information Needs by Modelling Collective Query Patterns. COLD, 782.
Elbedweihy, K., Wrigley, S. N., & Ciravegna, F. (2012). Improving Semantic Search Using Query Log Analysis. Interacting with Linked Data (ILD 2012), 61.
Raghuveer, A. (2012). Characterizing machine agent behavior through SPARQL query mining. In Proceedings of the International Workshop on Usage Analysis and the Web of Data, Lyon, France.
Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de la Fuente, P. (2011). An empirical study of real-world SPARQL queries. arXiv preprint arXiv:1103.5043.
Hartig, O., Bizer, C., & Freytag, J. C. (2009). Executing SPARQL queries over the web of linked data (pp. 293-309). Springer Berlin Heidelberg.
Verborgh, R., Hartig, O., De Meester, B., Haesendonck, G., De Vocht, L., Vander Sande, M., ... & Van de Walle, R. (2014). Querying datasets on the web with high availability. In The Semantic Web–ISWC 2014 (pp. 180-196). Springer International Publishing.
Verborgh, R., Vander Sande, M., Colpaert, P., Coppens, S., Mannens, E., & Van de Walle, R. (2014, April). Web-Scale Querying through Linked Data Fragments. In LDOW.
Luczak-Rösch, M., & Bischoff, M. (2011). Statistical analysis of web of data usage. In Joint Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn2011), CEUR WS.
Luczak-Rösch, M. (2014). Usage-dependent maintenance of structured Web data sets (Doctoral dissertation, Freie Universität Berlin, Germany), http://edocs.fu-berlin.de/diss/receive/FUDISS_thesis_000000096138.
Talk delivered at YOW! Developer Conferences in Melbourne, Brisbane and Sydney Australia on 1-9 December 2016.
Abstract: Governments collect a lot of data. Data on air quality, toxic chemicals, laws and regulations, public health, and the census are intended to be widely distributed. Some data is not for public consumption. This talk focuses on open government data — the information that is meant to be made available for benefit of policy makers, researchers, scientists, industry, community organisers, journalists and members of civil society.
We’ll cover the evolution of Linked Data, which is now being used by Google, Apple, IBM Watson, federal governments worldwide, non-profits including CSIRO and OpenPHACTS, and thousands of others worldwide.
Next we’ll delve into the evolution of the U.S. Environmental Protection Agency’s Open Data service that we implemented using Linked Data and an Open Source Data Platform. Highlights include how we connected to hundreds of billions of open data facts in the world’s largest, open chemical molecules database PubChem and DBpedia.
WHO SHOULD ATTEND
Data scientists, software engineers, data analysts, DBAs, technical leaders and anyone interested in utilising linked data and open government data.
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
One day workshop Linked Data and Semantic WebVictor de Boer
As taught at UNIMAS July 2019. based on a three day summer school by Knud Hinnerk Moeller and Victor de Boer. Includes hands on excercises using SWI-Prolog ClioPatria
Resource Description Framework (RDF) has entered the metadata scene for libraries in a major way over the last few years. While the promise of its Linked Data capabilities is exciting, the realities of changing data models, encoding practices, and even ontologies can put a check on that excitement. This session will explore these issues and discuss when this is worth doing and how to go about doing it.
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
An electronic laboratory Notebook (ELN) can be characterized as a system that allows scientists to capture the data and resources used in performing scientific experiments. This allows users to easily organize and find their data however, little information about the scientific process is recorded.
In this paper we highlight the current status of progress toward semantic representation of science in ELNs.
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
The presentation investigates the challenges that we must face to share scientific datasets on the Web following the Linked Open Data principles. We present the standards of the Semantic Web and investigate how they can help address those challenges. We give tips as to how to choose vocabularies to describe data and metadata, link datasets to other related datasets by making appropriate alignments, translate existing data sources to RDF and publish it on the Web as linked data.
The International Federation of Library Associations and Institutions (IFLA) is responsible for the development and maintenance of International Standard Bibliographic Description (ISBD), UNIMARC, and the "Functional Requirements" family for bibliographic records (FRBR), authority data (FRAD), and subject authority data (FRSAD). ISBD underpins the MARC family of formats used by libraries world-wide for many millions of catalog records, while FRBR is a relatively new model optimized for users and the digital environment. These metadata models, schemas, and content rules are now being expressed in the Resource Description Framework language for use in the Semantic Web.
This webinar provides a general update on the work being undertaken. It describes the development of an Application Profile for ISBD to specify the sequence, repeatability, and mandatory status of its elements. It discusses issues involved in deriving linked data from legacy catalogue records based on monolithic and multi-part schemas following ISBD and FRBR, such as the duplication which arises from copy cataloging and FRBRization. The webinar provides practical examples of deriving high-quality linked data from the vast numbers of records created by libraries, and demonstrates how a shift of focus from records to linked-data triples can provide more efficient and effective user-centered resource discovery services.
These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 .
A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .
Presentation at ELAG 2011, European Library Automation Group Conference, Prague, Czech Republic. 25th May 2011
http://elag2011.techlib.cz/en/815-lifting-the-lid-on-linked-data/
Structured Dynamics provides 'ontology-driven applications'. Our product stack is geared to enable the semantic enterprise. The products are premised on preserving and leveraging existing information assets in an incremental, low-risk way. SD's products span from converters to authoring environments to Web services middleware and to eventual ontologies and user interfaces and applications.
Similar to Integrating Heterogeneous Data Sources in the Web of Data (20)
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Franck Michel
Slides of a presentation I gave at the TDWG 2020 conference.
Paper: https://doi.org/10.3897/biss.4.59046
Video: https://www.youtube.com/watch?v=KiAgTWpEkHE
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...Franck Michel
Presentation of an article published at the 2nd International Workshop on Semantics for Biodiversity (S4Biodiv 2017), co-located with ISWC2017.
Article: https://hal.archives-ouvertes.fr/hal-01617708
Taxonomic registers are key tools to help us comprehend the diversity of nature. Publishing such registers in the Web of Data, following the standards and best practices of Linked Open Data (LOD), is a way of integrating multiple data sources into a world-scale, biological knowledge base. In this pa-per, we present an on-going work aimed at the publication of TAXREF, the French national taxonomic register, on the Web of Data. Far beyond the mere translation of the TAXREF database into LOD standards, we show that the key point of this endeavor is the design of a model capable of capturing the two coexisting yet distinct realities underlying taxonomic registers, namely the nomenclature (the rules for naming biological entities) and the taxonomy (the description and characterization of these biological entities). We first analyze different modelling choices made to represent some international taxonomic registers as LOD, and we underline the issues that arise from these differences. Then, we propose a model aimed to tackle these is-sues. This model separates nomenclature from taxonomy, it is flexible enough to accommodate the ever-changing scientific consensus on taxonomy, and it adheres to the philosophy underpinning the Semantic Web standards. Finally, using the example of TAXREF, we show that the model enables interlinking with third-party LOD data sets, may they represent nomenclatural or taxonomic information.
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataFranck Michel
Presentation of an article published at the 11th workshop on Linked Data on the Web (LDOW2018), co-located with the Web Conference 2018.
Article: https://hal.archives-ouvertes.fr/hal-01722792
We hypothesize that harnessing the Semantic Web standards to enable automatic combination of Linked Data and non-RDF Web APIs data could trigger novel cross-fertilization scenarios.
To achieve this goal, we define the SPARQL Micro-Service architecture. A SPARQL micro-service is a lightweight, task-specific SPARQL endpoint that provides access to a small, resource-centric, virtual graph, while dynamically assigning dereferenceable URIs to Web API resources that do not have URIs beforehand. The graph is delineated by the Web API service being wrapped, the arguments passed to this service, and the restricted types of RDF triples that this SPARQL micro-service is designed to spawn. In this context, we argue that full SPARQL expressiveness can be supported efficiently without jeopardizing servers availability. Eventually, we believe that an ecosystem of SPARQL micro-services could emerge from independent service providers, enabling Linked Data-based applications to glean pieces of data from a wealth of distributed, scalable and reliable services. We describe an experimentation where we dynamically augment biodiversity-related Linked Data with data from Flickr, MusicBrainz and the Macauley scientific media library.
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Franck Michel
Conférence SemWeb.Pro, 21/11/2016.
Une des missions du Muséum National d’Histoire Naturelle (MNHN) est d’établir une synthèse de la biodiversité et du patrimoine naturel français. Dans ce contexte, il est en charge de l’élaboration d’un référentiel taxonomique pour la faune, la flore et la fonge, TAXREF. Ce référentiel unique liste et organise les noms scientifiques de l'ensemble des êtres vivants recensés sur les territoires français, métropole et outremer, et constitue la pierre angulaire du Système d’Information sur la Nature et les Paysages (SINP). Il est utilisé par de nombreux acteurs publics, privés et de la société civile (collectivités, conservateurs, cabinets d’architecte, enseignants, citoyens, etc.). TAXREF est de plus aligné avec d'autres référentiels taxonomiques ou nomenclaturaux internationaux.
Le projet de recherche Zoomathia vise à étudier l’histoire de la connaissance zoologique à travers l’Antiquité et le Moyen-Age. Pour cela, il envisage d’utiliser les technologies du web sémantique afin d’intégrer des sources de données hétérogènes, allant d’encyclopédies médiévales à des données de biologie moderne, en passant par des rapports de fouilles archéologiques et des ressources iconographiques. Ce travail passe nécessairement par la sélection et/ou la définition de vocabulaires pouvant servir de référentiels taxonomique, culturel, géographique, chronologique etc. Afin de rendre les données intégrées interopérables sur le web, ces vocabulaires doivent faire l’objet d’un consensus et être liés à d’autres vocabulaires connexes faisant autorité. TAXREF étant le résultat d’un large consensus scientifique, et étant déjà utilisé pour l’intégration de données de biologie moderne et de données archéologiques, il a été sélectionné pour construire un thésaurus supportant l’intégration des données considérées dans le cadre du projet Zoomathia.
Dans cette présentation, je reviendrai sur les motivations exposées ci-dessus, puis je décrirai la modélisation d’un thésaurus exprimé en SKOS (Simple Knowledge Organisation System) afin de produire une version de TAXREF exploitable avec les technologies du web sémantique. J’aborderai notamment la question du lien entre ce « TAXREF-SKOS » et d’autres thésaurus et ontologies existantes. Enfin, je décrirai la méthode utilisée pour produire le résultat en RDF et son exposition sur le web de données sous forme d’URI pérennes déréférençables, et je ferai une courte démonstration via la navigation dans les URI en Linked Data et l’utilisation de requêtes SPARQL. En conclusion je reviendrai sur le fait que la construction de thésaurus SKOS n’est qu’une étape, un « enabler », visant à encourager les producteurs de données utilisant déjà TAXREF, et les concepteurs d’applications, à utiliser ces technologies et s’appuyer sur TAXREF-SKOS.
A Mapping-based Method to Query MongoDB Documents with SPARQLFranck Michel
Accessing legacy data as virtual RDF stores is a key issue in the building of the Web of Data. In recent years, the MongoDB database has become a popular actor in the NoSQL market, making it a significant potential contributor to the Web of Linked Data. Therefore, in this talk we present an article published at the DEXA 2016 conference. It addresses the question of how to access arbitrary MongoDB documents with SPARQL.
We propose a two-step method to (i) translate a SPARQL query into a pivot abstract query under MongoDB-to-RDF mappings represented in the xR2RML language, then (ii) translate the pivot query into a concrete MongoDB query.
We elaborate on the discrepancy between the expressiveness of SPARQL and the MongoDB query language, and we show that we can always come up with a rewriting that shall produce all correct answers.
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Franck Michel
Presentateion of a collective article we submited at the First Semantic Web for Scientific History workshop (SW4SH) co-located with ESWC 2015.
Link to the article: https://hal.archives-ouvertes.fr/hal-01146638v1
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
Instagram has become one of the most popular social media platforms, allowing people to share photos, videos, and stories with their followers. Sometimes, though, you might want to view someone's story without them knowing.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
3. 3Franck MICHEL
Example: study history of zoological knowledge
Archaeological excavationConservation biology*
*http://www.lynxeds.com/hmw/plate/family-delphinidae-ocean-dolphins
First Natural History Encycloedia, 1485.
4. 4Franck MICHEL
Example: study history of zoological knowledge
Archaeological excavationConservation biology*
*http://www.lynxeds.com/hmw/plate/family-delphinidae-ocean-dolphins
Knowledge formalizations
Controlled vocabularies,
taxonomies
domain ontologies, …
5. 5Franck MICHEL
LOD Cloud: 10K datasets, 150B Statements
Linking Open Data cloud diagram 2017. A. Abele, J.P. McCrae, P. Buitelaar, A. Jentzsch and R. Cyganiak. http://lod-cloud.net/
On the Web
In RDF
Under open
licences
Interlinked
6. 6Franck MICHEL
Publishing legacy data in RDF raises tricky questions
Metadata
Data
Vocabularies?
Create links?
Raw data?
Translate
into RDF?
7. 7Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
8. 8Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
9. 9Franck MICHEL
Data Sources Have Heterogeneous Data Models
Relational DB
ID NAME
GraphObject-Oriented
Native XML DBs
Documents
10. 10Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
11. 11Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
12. 12Franck MICHEL
<body vocab="http://schema.org/">
<div resource="/jdev2017" typeof="Event">
<h2 property="title">JDEV 2017</h2>
<p>Date: <span property="startDate">2017-07-04</span></p>
...
<p>T2 - Ingénierie et web des données.
<a property="url“href="http://devlog.cnrs.fr/jdev2017/t2">More…</a>
</p>
</div>
</body> prefix sch: <http://schema.org/>
<http://devlog.cnrs.fr/jdev2017>
rdf:type sch:Event ;
sch:title "JDEV 2017";
sch:startDate "2015-10-20" ;
sch:url <http://devlog.cnrs.fr/jdev2017/t2> .
RDFa: RDF in HTML attributes
http://devlog.cnrs.fr/
https://www.w3.org/TR/rdfa-core/
13. 13Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
15. 15Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
16. 16Franck MICHEL
Direct Mapping of a RDB to RDF
<PEOPLE/ID=7> rdf:type <PEOPLE> .
<PEOPLE/ID=7> <PEOPLE#FNAME> "Catherine" .
<PEOPLE/ID=7> <PEOPLE#WROTE> <BOOK/ID=22> .
<PEOPLE/ID=8> rdf:type <People> .
<PEOPLE/ID=8> <PEOPLE#FNAME> "Olivier" .
<PEOPLE/ID=8> <PEOPLE#WROTE> <BOOK/ID=22> .
Table: PEOPLE
ID FNAME WROTE (FK BOOK/ID)
7 Catherine 22
8 Olivier 22
… … …
https://www.w3.org/TR/2012/REC-rdb-direct-mapping-20120927/
17. 17Franck MICHEL
Custom Mapping of a RDB to RDF
<http://unice.fr/staff/7> rdf:type ex:Teacher.
<http://unice.fr/staff/7> foaf:name "Catherine".
<http://unice.fr/staff/7> dc:contributor <http://unice.fr/book/22>.
<http://unice.fr/staff/8> rdf:type ex:Teacher.
<http://unice.fr/staff/8> foaf:name "Olivier".
<http://unice.fr/staff/8> dc:contributor <http://unice.fr/book/22>.
Existing vocabularies
Table: PEOPLE
ID FNAME WROTE (FK BOOK/ID)
7 Catherine 22
8 Olivier 22
… … …
18. 18Franck MICHEL
Custom Mapping of a RDB to RDF with R2RML
<#MapPeople>
rr:logicalTable [ rr:tableName "PEOPLE" ];
rr:subjectMap [
rr:template "http://unice.fr/staff/{ID}";
rr:class ex:Teacher;
];
rr:predicateObjectMap [
rr:predicate foaf:name;
rr:objectMap [ rr:column "FNAME" ];
].
<http://unice.fr/staff/7> rdf:type ex:Teacher.
<http://unice.fr/staff/7> foaf:name "Catherine".
<http://unice.fr/staff/8> rdf:type ex:Teacher.
<http://unice.fr/staff/8> foaf:name "Olivier".
http://www.w3.org/TR/r2rml/
19. 19Franck MICHEL
Describe the translation of heterogeneous data into RDF
HTML data with RDFa
CSV data
Relational data
NoSQL data
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
21. 21Franck MICHEL
Many methods for many types of data sources
AstroGrid-D, SPARQL2XQuery, XSPARQL
XML
XLWrap, Linked CSV, CSVW, RML
CSV/TSV/Spreadsheets
D2RQ, R2O, Ultrawrap, Triplify, SM
R2RML: Morph-RDB, ontop, Virtuoso
Relational Databases
RML, TARQL, Apache Any23, DataLift,
SPARQL-Generate
Multiple formats
RDFa, Microformats
HTML
TARQL, JSON-LD, RML
JSON
xR2RML (MongoDB), ontop (MongoDB),
[Mugnier et al, 2016]
NoSQL
M.L. Mugnier, M.C. Rousset, and F. Ulliana. “Ontology-Mediated Queries for NOSQL Databases.” In Proc. AAAI. 2016.
22. 22Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
23. 23Franck MICHEL
Direct mapping: create my own vocabulary
Can be derived from an existing schema
May seem easier: “I do whatever I want”
But no added semantics, need to link my vocabulary with
other ones
24. 24Franck MICHEL
Custom Mapping: reuse existing vocabularies
Large variety of existing vocabularies
But may be difficult to find the appropriate one
Partial coverage of the domain
Granularity: too high (cumbersome), too low (useless)
Different points of view
Frequently, a mixed approach is used
25. 25Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
26. 26Franck MICHEL
Two approaches to translate existing data sources in RDF
Graph
Materialization
(ETL like)
Virtual Graph
Query
rewriting
SPARQL
SPARQL
ID NAME
27. 27Franck MICHEL
A large variety of approaches
Graph
Materialization
Query
Rewriting
Direct
Mapping
Custom
Mapping
RDFa XSPARQL
SPARQL2XQuery
AstroGrid-D
XLWrap
Linked CSV
CSVW
TARQL
DataLift
SPARQL-Generate
RML
JSON-LD
Virtuoso
R2RML
D2RQ
R2O
Ultrawrap ontop
R2RML xR2RML
Any23
Triplify
28. 28Franck MICHEL
Describe the translation of heterogeneous data into RDF
Choose vocabularies to represent RDF data
Access the RDF data produced
The key importance of metadata
Agenda
31. 31Franck MICHEL
Metadata are the key to enable dataset reuse
Context Identification, authors, dates, license, version, reference articles
Access Format, structure, location (dwld), query method
Meaning What do the data represent? What concepts, entities, semantics?
Interpretation Units (cm or inches, left/right)…
Provenance
Acquired with which equipment, parameters, protocols?
Derived from which dataset? With which processing?
Dataset-level or entity-level provenance
Statistics Number of triples per property of class, links to other datasets…
…
32. 32Franck MICHEL
Several works normalize metadata descritpions
and how to use metadata
CSVW: CSV on the Web
DCAT: Data Catalog Vocabulary
DCAT extensions, application profiles
W3C Dataset Exchange Working Group
VoID: Vocabulary of Interlinked Datasets
HCLS: Health Care & Life Sciences Dataset Profile
…
35. 35Franck MICHEL
HCLS: Health Care & Life Sciences Dataset Profile
Consensus among stakeholders on
the description of datasets using RDF
*http://www.w3.org/TR/hcls-dataset/
RDF, RDFS, XSD
Citation Typing Ontology
Data Catalog (DCAT)
Dublin Core Metadata Types, Dublin Core Metadata Terms
Friend-of-a-Friend (FOAF)
Collection Description Frequency Vocabulary
Identifiers.org vocabulary
Lexvo.org - Lexical Vocabulary
Provenance Authoring and Versioning ontology (PAV)
PROV Ontology
Semantic science Integrated Ontology (SIO)
Vocabulary of Interlinked Datasets (VoID)
Used
vocabularies
36. 36Franck MICHEL
Using a JSON-LD profile to translate JSON into RDF
<http://example.org/member/106> foaf:mbox "john@foo.com".
<http://example.org/member/106> foaf:mbox "john@example.org".
{ "id": 106,
"firstname": "John",
"emails": [ "john@foo.com",
"john@example.org" ]
}
{ "@context": {
"id": "@id",
"@base": "http://example.org/member/"
"emails": "http://xmlns.com/foaf/0.1/mbox"
}
}
https://www.w3.org/TR/json-ld/
37. 37Franck MICHEL
Various initial motivations
• Web of Data, Linked Data,
• OBDA,
• Ontology learning,
• Schema mapping…
Historical products: D2RQ, Virtuoso…
R2RML mapping language
• 2012 W3C recommendation
• Several implementations
Several methods: direct mapping vs. domain-specific
Translation of RDBs to RDF
38. 38Franck MICHEL
All You Need is LOV
Linked Open Vocabularies
522 curated vocabularies
Quality requirements
• URI stability and availability,
• Quality metadata and
documentation,
• Identifiable and trustable
publication body,
• Proper versioning policy,
• …
“Vocabularies provide the semantic glue
enabling Data to become meaningful Data.”
http://lov.okfn.org/dataset/lov/
39. 39Franck MICHEL
Linked Data rules
1.Use URIs as names for things
2.Use HTTP URIs so that people
can look up those names
3.When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
4.Include links to other URIs, so that they can discover
more things
Ce matin Manuel a introduit les LD, WoD, URI etc., Olivier a décrit plus en detail RDF, SPARQL.
Now: comment alimenter ce WoD avec des données qui ne sont pas en RDF au depart.
On est tous témoin de la multiplication des DS dispo sur le Web. Témoins et Acteurs.
Social networks, collaborative wikis, scientific databases, crowd-sourced information.
The availability of these DS spurs new ideas and opportunities for DI,
but also comes with new challenges: need to capture and share the semantics of data sources.
Transition: So to make sense of them, we need…
SW technologies bring answers to these challenges
RDF increasingly used as the pivot format for integrating heterogeneous DS.
Transition: c’est précisément l’objet du LOD…
Result of an RDF-based DI process.
Starts with translating existing DS into RDF.
INTERACT: Qui a déjà eu à faire de la conversion d’un format vers un autre pour DI ?
Qui a déjà eu à faire ce genre de transformation ?
Quels outils ?
RDFa définit de nouvelles balises HTML (typeof, resource, property) pour ajouter des métadonnées qui permettent de générer du RDF.
Pb: approche invasive
Non invasif, contrairement à RDFa
Create ad-hoc ontology:
Class = « Table »
Subject is « Table/ID »
Property is « Table#column »
Object is value of column
Create ad-hoc ontology:
Class = « Table »
Subject is « Table/ID »
Property is « Table#column »
Object is value of column
Point of view issue :
- biologist vs. taxonomist: est-ce que “espèce” a bien le même sens pour les 2?
- surgeon vs. anatomist: un chirurgien va sans hésitation donner la délimitation d’une zone du cerveau, alors que l’anatomiste va prudemment désigner le centre de la zone mais pas ses frontières…
(…)
Translate each individual DS into RDF using appropriate vocab = mediation
In practice, this mediation follows two common approaches…
Permettre la répétabilité des expériences.
INTERACT: pouvez-vous me donner qq exemples de métadonnées ?
VoID et DCAT à décrire
(…) library metadata formats provided by the Library of Congress, BnF, DNB, etc..
(…) published and used by (…) large media corporations (BBC), national administrations (INSEE), EC, universities and research projects,
(…) published by individuals and put on the community table, in the tradition and spirit of the open, collaborative Web.”
Retour sur les fondamentaux du LD, les tables de la loi!
On a vu:
- Comment décrire nos données avec des métadonnées,
- les technos du WS qui peuvent nous aider à publier données et métadonnées
- comment créer/réutiliser les vocabulaires qui constituent la référence sémantique de mes données
enfin, comment convertir des données existantes en RDF
Maintenant, quelles sont les bonnes pratiques pour en faire des données ouvertes et liées…