This document discusses the generation of linked data platforms (LDPs) in highly decentralized information ecosystems. It presents a model for automating the generation of LDPs that considers data heterogeneity, hosting constraints, and reusability of LDP designs. The model includes an LDP generation workflow, a design language called LDP-DL to describe LDP designs, and an LDP generation toolkit to implement the workflow. The goal is to facilitate data exploitation for consumers in decentralized environments.
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors' most important findings.
Apache Marmotta is a linked data platform that provides a linked data server, SPARQL server, and development environment for building linked data applications. It uses modular components including a triplestore backend, SPARQL endpoint, LDCache for remote data access, and an optional reasoner. Marmotta is implemented as a Java web application and uses services, dependency injection, and REST APIs.
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernández
Sergio Fernández gave a presentation on geospatial querying in Apache Marmotta. He explained that Marmotta is an open platform for linked data that allows publishing and building applications on linked data. It includes features like a read-write linked data server and SPARQL querying. He discussed how GeoSPARQL allows representing and querying geospatial data on the semantic web by defining a vocabulary and SPARQL extension. Marmotta implements GeoSPARQL by materializing geospatial data and supports topological relations and functions through PostGIS. He demonstrated example GeoSPARQL queries on municipalities in Madrid, rivers bordering Austria, and mountain bike routes crossing cities.
This presentation gives details on technologies and approaches towards exploiting Linked Data by building LD applications. In particular, it gives an overview of popular existing applications and introduces the main technologies that support implementation and development. Furthermore, it illustrates how data exposed through common Web APIs can be integrated with Linked Data in order to create mashups.
Semantic Media Management with Apache MarmottaThomas Kurz
Thomas Kurz gives a presentation on semantic media management using Apache Marmotta. He plans to create a new Marmotta module that supports storing images, annotating image fragments, and retrieving images and fragments based on annotations. This will make use of linked data platform, media fragment URIs, open annotation model, and SPARQL-MM. The goal is to create a Marmotta module and webapp that extends LDP for image fragments and provides a UI for image annotation and retrieval.
The document discusses a webinar presented by LOD2 on creating knowledge from interlinked data. It describes LOD2 as an EU-funded project involving leading linked open data organizations. The webinar agenda includes discussing SIREn, a plugin for Elasticsearch that allows indexing and searching of JSON documents. It provides an overview of Elasticsearch and describes how to install SIREn, create an index, index documents, and perform searches on nested JSON data.
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors' most important findings.
Apache Marmotta is a linked data platform that provides a linked data server, SPARQL server, and development environment for building linked data applications. It uses modular components including a triplestore backend, SPARQL endpoint, LDCache for remote data access, and an optional reasoner. Marmotta is implemented as a Java web application and uses services, dependency injection, and REST APIs.
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernández
Sergio Fernández gave a presentation on geospatial querying in Apache Marmotta. He explained that Marmotta is an open platform for linked data that allows publishing and building applications on linked data. It includes features like a read-write linked data server and SPARQL querying. He discussed how GeoSPARQL allows representing and querying geospatial data on the semantic web by defining a vocabulary and SPARQL extension. Marmotta implements GeoSPARQL by materializing geospatial data and supports topological relations and functions through PostGIS. He demonstrated example GeoSPARQL queries on municipalities in Madrid, rivers bordering Austria, and mountain bike routes crossing cities.
This presentation gives details on technologies and approaches towards exploiting Linked Data by building LD applications. In particular, it gives an overview of popular existing applications and introduces the main technologies that support implementation and development. Furthermore, it illustrates how data exposed through common Web APIs can be integrated with Linked Data in order to create mashups.
Semantic Media Management with Apache MarmottaThomas Kurz
Thomas Kurz gives a presentation on semantic media management using Apache Marmotta. He plans to create a new Marmotta module that supports storing images, annotating image fragments, and retrieving images and fragments based on annotations. This will make use of linked data platform, media fragment URIs, open annotation model, and SPARQL-MM. The goal is to create a Marmotta module and webapp that extends LDP for image fragments and provides a UI for image annotation and retrieval.
The document discusses a webinar presented by LOD2 on creating knowledge from interlinked data. It describes LOD2 as an EU-funded project involving leading linked open data organizations. The webinar agenda includes discussing SIREn, a plugin for Elasticsearch that allows indexing and searching of JSON documents. It provides an overview of Elasticsearch and describes how to install SIREn, create an index, index documents, and perform searches on nested JSON data.
The document discusses the AudioMD metadata scheme created by the Library of Congress to describe technical qualities of digital audio objects. It defines AudioMD, provides examples of its use, and describes its importance in understanding audio files. The scheme captures administrative, technical, and preservation metadata in a structured XML format. It has evolved through versions 1.0 and 2.0. Additionally, the document outlines the BIBFRAME initiative led by the Library of Congress to transform bibliographic standards to a linked data model and make library catalog records more accessible online.
This document discusses linked data life cycles, including modeling, publishing, discovery, integration, and use cases. It describes key concepts like dataspaces, DSSPs, linked data principles, and the linked open data cloud. Challenges with linked data include schema mapping, write-enablement, authentication, and dataset dynamics as data sources change over time.
The presentation I gave at Linköping University about web stream processing. I discuss two problems: (i) exchanging data streams on the web, and (ii) combining streams and contextual quasi-static data on the web
Enabling access to Linked Media with SPARQL-MMThomas Kurz
The amount of audio, video and image data on the web is immensely growing, which leads to data management problems based on the hidden character of multimedia. Therefore the interlinking of semantic concepts and media data with the aim to bridge the gap between the document web and the Web of Data has become a common practice and is known as Linked Media. However, the value of connecting media to its semantic meta data is limited due to lacking access methods specialized for media assets and fragments as well as to the variety of used description models. With SPARQL-MM we extend SPARQL, the standard query language for the Semantic Web with media specific concepts and functions to unify the access to Linked Media. In this paper we describe the motivation for SPARQL-MM, present the State of the Art of Linked Media description formats and Multimedia query languages, and outline the specification and implementation of the SPARQL-MM function set.
(http://lod2.eu/BlogPost/webinar-series) In this Webinar Michael Martin presents CubeViz - a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. This vocabulary is compatible with SDMX and increasingly being adopted. Based on the vocabulary and the encoded Data Cube, CubeViz is generating a facetted browsing widget that can be used to filter interactively observations to be visualized in charts. Based on the selected structure, CubeViz offer beneficiary chart types and options which can be selected by users.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
Aggregation of cultural heritage datasets through the Web of DataNuno Freire
The existence of many digital libraries, maintained by different organizations, brings challenges to the discoverability of cultural heritage (CH) resources. Metadata aggregation is an approach where centralized efforts like Europeana facilitate their discoverability by collecting the resource’s metadata. Nowadays, CH institutions are increasingly applying technologies designed for the wider interoperability on the Web. In this context, we have identified the Schema.org vocabulary and linked data (LD) as potential technologies for innovating CH metadata aggregation. We present the results of an analysis using the case of the Europeana network of aggregators and data providers as basis. We have conducted a survey of the available linked data technology, and we defined a solution, which we have put into practice in a pilot implementation within the Europeana network. In this pilot, the National Library of The Netherlands fulfils the role of data provider, with the Dutch Digital Heritage Network, as national aggregator, supporting the provision of several datasets from the national library to Europeana. The metadata is published using LD practices, having Schema.org as the main vocabulary. The national library also implements all the necessary semantic web mechanisms, defined in our solution, for making the datasets discoverable and harvestable by Europeana. Our proposal involves the use of vocabularies for description of datasets, and their distributions, namely DCAT, VoID and Schema.org. Europeana implements the LD harvester side of the solution and applies it to harvest the Schema.org data from the national library.
ckan 2.0 Introduction (20140618 updated)Chengjen Lee
This document provides an overview and agenda for a presentation on CKAN 2.0, an open-source data management system. The presentation covers topics such as features for publishing and finding datasets, storing and managing data, customizing and extending CKAN, and how CKAN supports open data principles. It also provides examples of CKAN in use by government open data portals and discusses issues such as language support and extensions. Harvester extensions are introduced for harvesting metadata and datasets from remote CKAN instances and other data sources.
The document discusses the Linked Data Platform (LDP), which provides best practices and a simple approach for a read-write Linked Data architecture based on HTTP access to web resources described using RDF. LDP defines two types of resources - those whose state is represented in RDF (LDP-RS) and those using other formats (LDP-NR). It also defines different types of containers (LDP-BC, LDP-DC, LDP-IC) that organize contained resources and support creation, modification, and enumeration of members. LDP aims to clarify and extend existing Linked Data principles for standardized access, update, creation and deletion of resources from servers exposing their data as Linked Data.
This document summarizes a presentation on recent developments in cataloging standards and practices, including RDA, Bibframe, and linked data. The presentation discusses how standards like RDA and FRBR are moving cataloging towards a more entity-centric model based on semantic web principles. It also outlines proposals to encode library metadata as linked open data using the Resource Description Framework (RDF) to represent bibliographic records as sets of semantic triples and link them to external datasets. The goal is to transform library data into a true "Web of data" rather than just making it available on the traditional document-based web.
The need of Interoperability in Office and GIS formatsMarkus Neteler
Free GIS and Interoperability: The need of Interoperability in Office and GIS formats
GIS Open Source, interoperabilità e cultura del dato nei SIAT della Pubblica Amministrazione
[GIS Open Source, interoperability and the 'culture of data' in the spatial data warehouses of the Public Administration]
Slides for talk given at IWMW 1998 held at the University of Newcastle on 15-17 September 1998.
See http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-sep1998/materials/
In this Webinar Lorenz Bühmann presents the ontology repair and enrichment tool ORE and also the DL-Learner , a machine learning tool to solve supervised learnings tasks and support knowledge engineers in constructing knowledge. Those two beneighbored tools in the LOD2 Stack are for classification and the following quality analysis of Linked Data.
This document discusses standards and interoperability in geographic information systems (GIS). It emphasizes that standards are important for sharing data between government departments and making location-based data accessible to citizens. It outlines some relevant technical standards like OGC, ISO, and OpenLS. The document also discusses challenges around reading, displaying, and editing spatial data from different sources and solutions like spatial databases and web services. Finally, it provides details on how standards will be implemented for a GIS project in Madinah, Saudi Arabia, including the use of Oracle, Envinsa, web services, and OGC standards.
http://lod2.eu/BlogPost/webinar-series
This webinar in the course of the LOD2 webinar series will present the release 3.0 of the LOD2 stack, which contains updates to
*) Virtuoso 7 [Openlink]: the original row store of the Virtuoso 6 universal server has now been replaced by a column store, increasing the performance of SPARQL queries significantly, the store is now up to three times as fast as the previous major version.
Linked Open Data Manager Suite [SWC]: the 'lodms' application allows the user to quickly set up pipelines for transforming linked data through the use of its many extensions. It also allows operations for extracting rdf from other types of data.
*) dbpedia-spotlight-ui [ULEI]: a graphical user interface component that allows the user to use a remote DBpedia spotlight instance to annotate a text with DBpedia concepts.
*) sparqlify [ULEI]: a scalable SPARQL-SQL rewriter, allowing you to query an SQL database as if it were a triple store.
*) SIREn [DERI]: a Lucene plugin that allows you to efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields.
*) CubeViz [ULEI]: CubeViz allows visualization of the Data Cube linked data representation of statistical data. It has support for the more advanced DataCube features, such as slices. It also allows the selection of a remote SPARQL endpoint and export of a modified cube.
*) R2R [UMA]: the R2R mapping API is now included directly into the lod2 demonstrator application, allowing users to experience the full effect of the R2R semantic mapping language through a graphical user interface.
*) ontowiki-csvimport [ULEI]: an OntoWiki extension that transforms CSV files to RDF. The extension can create Data Cubes that can be visualized by CubeViz.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
20160922 Materials Data Facility TMS WebinarBen Blaiszik
Fall 2016 TMS Webinar on Data Curation Tools. Slides for the Materials Data Facility presentation on data services (publish and discover) as described by Ben Blaiszik. See http://www.materialsdatafacility.org for more information.
UnifiedViews is a joint project currently maintained by Semantic Web Company (SWC) and Semantica.cz (Semantica.cz). It has been mainly developed by Charles University in Prague as a student project called ODCleanStore (version 2). It is based on the experience SWC obtained with the LOD Management Suite (LODMS) used in WP7 and ODCleansStore (version 1) developed by Charles University in Prague for the WP9a use case of the LOD2 FP7 project. In the next stack release of the LOD2 stack, UnifiedViews will replace LODMS as an ETL tool in the stack and the tool has already been adopted in other projects.
In the webinar we will give a brief overview of the UnifiedViews project (Helmut Nagy). The main part will be a presentation of the tool and it's capabilities (Tomas Knap)
This document discusses techniques for discovering structured information from web sites. It presents three main contributions:
1. A method to extract structured data in the form of web lists that are split across multiple web pages, called logical lists.
2. An approach for automatically extracting sitemaps from web sites.
3. A technique for clustering web pages based on intra-page and extra-page features.
The document discusses open data and the CKAN open data catalog. It provides an overview of CKAN, including its data model and API. It also discusses open data initiatives like data.gov.uk and how CKAN is used to power open data portals around the world.
CKAN is an open-source data management solution for open data. It provides a platform for publishing and exposing metadata through an API and front-end interface. Major governments and communities use CKAN to organize large numbers of datasets. While it has advantages like organizing data in a structured way and providing APIs, its data model does not work for all use cases and there are no strict guidelines for dataset publishing. Extensions allow additional functionality and it can be deployed in various ways.
This document provides an overview of relevant approaches for accessing open data programmatically and data-as-a-service (DaaS) solutions. It discusses common data access methods like web APIs, OData, and SPARQL and describes several DaaS platforms that simplify publishing and consuming open data. It also outlines requirements for a proposed open DaaS platform called DaPaaS that aims to address challenges in open data management and application development.
The document discusses the AudioMD metadata scheme created by the Library of Congress to describe technical qualities of digital audio objects. It defines AudioMD, provides examples of its use, and describes its importance in understanding audio files. The scheme captures administrative, technical, and preservation metadata in a structured XML format. It has evolved through versions 1.0 and 2.0. Additionally, the document outlines the BIBFRAME initiative led by the Library of Congress to transform bibliographic standards to a linked data model and make library catalog records more accessible online.
This document discusses linked data life cycles, including modeling, publishing, discovery, integration, and use cases. It describes key concepts like dataspaces, DSSPs, linked data principles, and the linked open data cloud. Challenges with linked data include schema mapping, write-enablement, authentication, and dataset dynamics as data sources change over time.
The presentation I gave at Linköping University about web stream processing. I discuss two problems: (i) exchanging data streams on the web, and (ii) combining streams and contextual quasi-static data on the web
Enabling access to Linked Media with SPARQL-MMThomas Kurz
The amount of audio, video and image data on the web is immensely growing, which leads to data management problems based on the hidden character of multimedia. Therefore the interlinking of semantic concepts and media data with the aim to bridge the gap between the document web and the Web of Data has become a common practice and is known as Linked Media. However, the value of connecting media to its semantic meta data is limited due to lacking access methods specialized for media assets and fragments as well as to the variety of used description models. With SPARQL-MM we extend SPARQL, the standard query language for the Semantic Web with media specific concepts and functions to unify the access to Linked Media. In this paper we describe the motivation for SPARQL-MM, present the State of the Art of Linked Media description formats and Multimedia query languages, and outline the specification and implementation of the SPARQL-MM function set.
(http://lod2.eu/BlogPost/webinar-series) In this Webinar Michael Martin presents CubeViz - a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. This vocabulary is compatible with SDMX and increasingly being adopted. Based on the vocabulary and the encoded Data Cube, CubeViz is generating a facetted browsing widget that can be used to filter interactively observations to be visualized in charts. Based on the selected structure, CubeViz offer beneficiary chart types and options which can be selected by users.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
Aggregation of cultural heritage datasets through the Web of DataNuno Freire
The existence of many digital libraries, maintained by different organizations, brings challenges to the discoverability of cultural heritage (CH) resources. Metadata aggregation is an approach where centralized efforts like Europeana facilitate their discoverability by collecting the resource’s metadata. Nowadays, CH institutions are increasingly applying technologies designed for the wider interoperability on the Web. In this context, we have identified the Schema.org vocabulary and linked data (LD) as potential technologies for innovating CH metadata aggregation. We present the results of an analysis using the case of the Europeana network of aggregators and data providers as basis. We have conducted a survey of the available linked data technology, and we defined a solution, which we have put into practice in a pilot implementation within the Europeana network. In this pilot, the National Library of The Netherlands fulfils the role of data provider, with the Dutch Digital Heritage Network, as national aggregator, supporting the provision of several datasets from the national library to Europeana. The metadata is published using LD practices, having Schema.org as the main vocabulary. The national library also implements all the necessary semantic web mechanisms, defined in our solution, for making the datasets discoverable and harvestable by Europeana. Our proposal involves the use of vocabularies for description of datasets, and their distributions, namely DCAT, VoID and Schema.org. Europeana implements the LD harvester side of the solution and applies it to harvest the Schema.org data from the national library.
ckan 2.0 Introduction (20140618 updated)Chengjen Lee
This document provides an overview and agenda for a presentation on CKAN 2.0, an open-source data management system. The presentation covers topics such as features for publishing and finding datasets, storing and managing data, customizing and extending CKAN, and how CKAN supports open data principles. It also provides examples of CKAN in use by government open data portals and discusses issues such as language support and extensions. Harvester extensions are introduced for harvesting metadata and datasets from remote CKAN instances and other data sources.
The document discusses the Linked Data Platform (LDP), which provides best practices and a simple approach for a read-write Linked Data architecture based on HTTP access to web resources described using RDF. LDP defines two types of resources - those whose state is represented in RDF (LDP-RS) and those using other formats (LDP-NR). It also defines different types of containers (LDP-BC, LDP-DC, LDP-IC) that organize contained resources and support creation, modification, and enumeration of members. LDP aims to clarify and extend existing Linked Data principles for standardized access, update, creation and deletion of resources from servers exposing their data as Linked Data.
This document summarizes a presentation on recent developments in cataloging standards and practices, including RDA, Bibframe, and linked data. The presentation discusses how standards like RDA and FRBR are moving cataloging towards a more entity-centric model based on semantic web principles. It also outlines proposals to encode library metadata as linked open data using the Resource Description Framework (RDF) to represent bibliographic records as sets of semantic triples and link them to external datasets. The goal is to transform library data into a true "Web of data" rather than just making it available on the traditional document-based web.
The need of Interoperability in Office and GIS formatsMarkus Neteler
Free GIS and Interoperability: The need of Interoperability in Office and GIS formats
GIS Open Source, interoperabilità e cultura del dato nei SIAT della Pubblica Amministrazione
[GIS Open Source, interoperability and the 'culture of data' in the spatial data warehouses of the Public Administration]
Slides for talk given at IWMW 1998 held at the University of Newcastle on 15-17 September 1998.
See http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-sep1998/materials/
In this Webinar Lorenz Bühmann presents the ontology repair and enrichment tool ORE and also the DL-Learner , a machine learning tool to solve supervised learnings tasks and support knowledge engineers in constructing knowledge. Those two beneighbored tools in the LOD2 Stack are for classification and the following quality analysis of Linked Data.
This document discusses standards and interoperability in geographic information systems (GIS). It emphasizes that standards are important for sharing data between government departments and making location-based data accessible to citizens. It outlines some relevant technical standards like OGC, ISO, and OpenLS. The document also discusses challenges around reading, displaying, and editing spatial data from different sources and solutions like spatial databases and web services. Finally, it provides details on how standards will be implemented for a GIS project in Madinah, Saudi Arabia, including the use of Oracle, Envinsa, web services, and OGC standards.
http://lod2.eu/BlogPost/webinar-series
This webinar in the course of the LOD2 webinar series will present the release 3.0 of the LOD2 stack, which contains updates to
*) Virtuoso 7 [Openlink]: the original row store of the Virtuoso 6 universal server has now been replaced by a column store, increasing the performance of SPARQL queries significantly, the store is now up to three times as fast as the previous major version.
Linked Open Data Manager Suite [SWC]: the 'lodms' application allows the user to quickly set up pipelines for transforming linked data through the use of its many extensions. It also allows operations for extracting rdf from other types of data.
*) dbpedia-spotlight-ui [ULEI]: a graphical user interface component that allows the user to use a remote DBpedia spotlight instance to annotate a text with DBpedia concepts.
*) sparqlify [ULEI]: a scalable SPARQL-SQL rewriter, allowing you to query an SQL database as if it were a triple store.
*) SIREn [DERI]: a Lucene plugin that allows you to efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields.
*) CubeViz [ULEI]: CubeViz allows visualization of the Data Cube linked data representation of statistical data. It has support for the more advanced DataCube features, such as slices. It also allows the selection of a remote SPARQL endpoint and export of a modified cube.
*) R2R [UMA]: the R2R mapping API is now included directly into the lod2 demonstrator application, allowing users to experience the full effect of the R2R semantic mapping language through a graphical user interface.
*) ontowiki-csvimport [ULEI]: an OntoWiki extension that transforms CSV files to RDF. The extension can create Data Cubes that can be visualized by CubeViz.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
20160922 Materials Data Facility TMS WebinarBen Blaiszik
Fall 2016 TMS Webinar on Data Curation Tools. Slides for the Materials Data Facility presentation on data services (publish and discover) as described by Ben Blaiszik. See http://www.materialsdatafacility.org for more information.
UnifiedViews is a joint project currently maintained by Semantic Web Company (SWC) and Semantica.cz (Semantica.cz). It has been mainly developed by Charles University in Prague as a student project called ODCleanStore (version 2). It is based on the experience SWC obtained with the LOD Management Suite (LODMS) used in WP7 and ODCleansStore (version 1) developed by Charles University in Prague for the WP9a use case of the LOD2 FP7 project. In the next stack release of the LOD2 stack, UnifiedViews will replace LODMS as an ETL tool in the stack and the tool has already been adopted in other projects.
In the webinar we will give a brief overview of the UnifiedViews project (Helmut Nagy). The main part will be a presentation of the tool and it's capabilities (Tomas Knap)
This document discusses techniques for discovering structured information from web sites. It presents three main contributions:
1. A method to extract structured data in the form of web lists that are split across multiple web pages, called logical lists.
2. An approach for automatically extracting sitemaps from web sites.
3. A technique for clustering web pages based on intra-page and extra-page features.
The document discusses open data and the CKAN open data catalog. It provides an overview of CKAN, including its data model and API. It also discusses open data initiatives like data.gov.uk and how CKAN is used to power open data portals around the world.
CKAN is an open-source data management solution for open data. It provides a platform for publishing and exposing metadata through an API and front-end interface. Major governments and communities use CKAN to organize large numbers of datasets. While it has advantages like organizing data in a structured way and providing APIs, its data model does not work for all use cases and there are no strict guidelines for dataset publishing. Extensions allow additional functionality and it can be deployed in various ways.
This document provides an overview of relevant approaches for accessing open data programmatically and data-as-a-service (DaaS) solutions. It discusses common data access methods like web APIs, OData, and SPARQL and describes several DaaS platforms that simplify publishing and consuming open data. It also outlines requirements for a proposed open DaaS platform called DaPaaS that aims to address challenges in open data management and application development.
Beyond sparql linked data, software, services and applications. Keynote at D...John Domingue
This document discusses efforts to leverage semantics and Linked Data to support interoperability, discovery, and linking of computing system components and paradigms. It describes how services, software projects, and cloud resources can have machine-readable descriptions to allow them to be discoverable, reusable, interoperable, and linkable. Several European projects aim to apply these principles by developing semantic models and ontologies for services, software forges, and cloud offerings. Overall, the use of Linked Data across services, software, and clouds could significantly improve interoperability between current and emerging computing system paradigms.
This document presents LDP-DL, a language for defining the design of Linked Data Platforms (LDPs). LDP-DL allows describing what resources an LDP contains, how they are organized into containers, and the content of each resource. An LDP-DL model can be interpreted to automatically generate the described LDP. The implementation generates LDPs from LDP-DL designs and heterogeneous data sources. Experiments show LDP-DL supports generating multiple LDPs from a single design, applying one design across data sources, and loose coupling between designs and generated LDPs.
Dec'2013 webinar from the EUCLID project on managing large volumes of Linked Data
webinar recording at https://vimeo.com/84126769 and https://vimeo.com/84126770
more info on EUCLID: http://euclid-project.eu/
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
This document summarizes a webinar about Open Services for Lifecycle Collaboration (OSLC) and data integration. It introduces the presenter Axel Reichwein and his company Koneksys, which helps organizations create data integration solutions. It discusses challenges of distributed engineering data from different sources and the benefits of data integration. Key concepts discussed include using URLs, HTTP, and RDF to create a web of linked data. OSLC standards provide APIs to access and link data from different sources. This allows building mashup applications to search, visualize, and link engineering information across distributed systems.
This document discusses Linked Open Data and how to publish open government data. It explains that publishing data in open, machine-readable formats and linking it to other external data sources increases its value. It provides examples of published open government data and outlines best practices for making data open through licensing, standard formats like CSV and XML, using URIs as identifiers, and linking to related external data. The key benefits outlined are empowering others to build upon the data and improving transparency, competition and innovation.
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
Ontotext is a leading semantic technology company that has developed OWLIM, a family of semantic repositories for storing and querying RDF and OWL data. OWLIM can handle large datasets, perform reasoning, and supports features like full text search, notifications, and geo-spatial querying. It has been used successfully in large-scale production systems like the BBC's World Cup website to power semantic search and dynamic content delivery using semantic web technologies.
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
Peter Haase and Michael Schmidt of fluid Operations AG presented on developing applications using linked open data. They discussed the increasing amount of linked open data available and challenges in building applications that integrate data from different sources and domains. Their Information Workbench platform aims to address these challenges by allowing users to discover, integrate, and customize applications using linked data in a no-code environment. Key components of the platform include virtualized integration of data sources and the vision of accessing linked data as a cloud-based data service.
A set of slides that provides a high-level overview of the W3C Linked Data Platform specification presented at the 4th Linked Data in Architecture and Construction Workshop.
For more detailed and technical version of the presentation, please refer to
http://www.slideshare.net/nandana/learning-w3c-linked-data-platform-with-examples
LDAC 2016 programme
http://smartcity.linkeddata.es/LDAC2016/#programme
Open Data management is still not trivial nor sustainable - COMSODE results are here to bring automation to publication and management of Open Data in public institutions and companies. Presentation includes Open Data Ready standard proposal, three use cases and invitation for Horizon 2020 projects 2016.
This document provides an overview of the CCS375 - Web Technologies course, including its objectives, outcomes, syllabus, and textbooks. The course aims to teach students different internet technologies, web services architecture, and how to develop web applications using frameworks. The syllabus covers topics like website basics, HTML5, CSS3, frameworks, and more across multiple units. Key textbooks listed are related to internet programming, web technologies, and Angular frameworks.
Lantea is an open source big data platform for .NET that allows easy extraction, transformation, and loading of data from various sources. It features SQL querying of aggregated data, simple data collection from websites, files, emails and databases, and export of data in multiple formats and APIs. Lantea is targeted towards data scientists, market analysts, managers needing business intelligence, researchers, and big data developers.
Database connectivity and web technologiesDhani Ahmad
This chapter discusses database connectivity and web technologies. It covers various database connectivity interfaces like ODBC, OLE-DB and ADO.NET that allow applications to connect to databases. It also discusses web-to-database middleware, client-side extensions, web application servers and XML - which has become important for exchanging structured data over the web. The chapter aims to explain how databases can be integrated with web applications and internet.
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Gas agency management system project report.pdfKamal Acharya
The project entitled "Gas Agency" is done to make the manual process easier by making it a computerized system for billing and maintaining stock. The Gas Agencies get the order request through phone calls or by personal from their customers and deliver the gas cylinders to their address based on their demand and previous delivery date. This process is made computerized and the customer's name, address and stock details are stored in a database. Based on this the billing for a customer is made simple and easier, since a customer order for gas can be accepted only after completing a certain period from the previous delivery. This can be calculated and billed easily through this. There are two types of delivery like domestic purpose use delivery and commercial purpose use delivery. The bill rate and capacity differs for both. This can be easily maintained and charged accordingly.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
1. Institut Mines-Télécom
Generation of Linked Data Platforms in
Highly Decentralized Information Ecosystem
Mohammad Noorani BAKERALLY
Institut Henri Fayol, EMSE,
Connected intelligence, Laboratoire Hubert Curien, UMR CNRS 5516
1
December 20, 2018
PhD Thesis Defense
4. Institut Mines-Télécom4
Highly Decentralized Information Ecosystem
Developers
Web Services
Data Sources
Data Consumers
Data Publishers
<<owns>>
Data Publishers
<<owns>>
Data
Providers
Data Portals
Data Portals
is an information ecosystem consisting of information systems managed by
actors that are self-governed with little to no coordination between them,
e.g. Open data context, the Web, Organizational information ecosystem
5. Institut Mines-Télécom
■ Data Heterogeneity levels:
• Syntax
• Semantics
• Access
■ Hosting Constraints preventing
hosting of data in third party software
environments.
• Examples:
─ Data sources bounded by
license restrictions
─ Real-time data sources
Problems
5
Highly Decentralized Information Ecosystem
Developers
Web Services
Data
Sources
Data
Consumers
Data
Publishers
<<owns>>
Data
Publishers
<<owns>>
Data
Providers
Data Portals
Data Portals
6. Institut Mines-Télécom
■ Facilitate data exploitation for data consumers in highly decentralized
information ecosystem
Aim
6
Highly Decentralized Information Ecosystem
Developers
Web Services
Data
Sources
Data
Consumers
Data
Publishers
<<owns>>
Data
Publishers
<<owns>>
Data
Providers
Data
Portals
Data
Portals
7. Institut Mines-Télécom
■ Facilitate data exploitation for data consumers in highly decentralized
information ecosystem
Aim
7
Highly Decentralized Information Ecosystem
Developers
Web Services
Data
Sources
Data
Consumers
Data
Publishers
<<owns>>
Data
Publishers
<<owns>>
Data
Providers
Data
Portals
Data
Portals
Publication of interoperable data and semantics by data publishers
8. Institut Mines-Télécom
■ Syntax
• Uniform identification mechanism to refer to
resources
• Flexibility wrt description of resources having varying
structures
■ Semantics
• Ontology languages to make semantics explicit
• Semantics in syntax to make data self-described and
portable
■ Access
• High-level protocols to hide heterogeneity of platforms
• Uniform data access to facilitate data exploitation
Requirements for data interoperability
8
Highly Decentralized Information Ecosystem
Open
standards
9. Institut Mines-Télécom
■ Semantic Web
■ Linked Data Platform Generation Model
■ Linked Data Platform Generation Toolkit
■ Evaluation
■ Conclusion & Perspectives
Outline
9
10. Institut Mines-Télécom
■ Semantic Web
■ Linked Data Platform Generation Model
■ LDP Generation Toolkit
■ Evaluation
■ Conclusion & Perspectives
Outline
10
11. Institut Mines-Télécom
■ Data Syntax: RDF [CWL14]
• 😃 Uniform identification mechanism
─ Uniform Resource Identifier (URI)
• 😃 Flexibility
─ Schema-less
■ Data Semantics: RDFS [BG14] and OWL [W3C12]
• 😃 Ontology languages
─ RDFS and OWL are ontology languages
• 😃 Semantics in syntax
─ RDFS and OWL can be serialized in RDF
Semantic Web wrt to Data Syntax & Semantics
11
12. Institut Mines-Télécom
■ SPARQL [Gro13]: Standard query language for RDF
• 😃 High-level protocol
─ SPARQL 1.1 Protocol
• 😃 Uniform data access
─ Formal syntax and semantics
■ SPARQL is only for
querying (data consumers ) rather
than publishing data (data publishers )
Semantic Web for Data Access
12
Model
View
Controller
XQUERY,
SQL,
SPARQL
13. Institut Mines-Télécom
Semantic Web for Data Access
13
■ Linked Data principles [BL06]: provide RESTful access to data in RDF
• High-level protocol
─ operates on HTTP
• Uniform data access
─ Provides description using set of standards (RDF, Turtle etc)
─ Leaves open choices (e.g. Default RDF serialization)
■ Linked Data Platform 1.0 [SAM15c]: standardizes RESTful access to
data in RDF
• 😃 High-level protocol
─ Standardizes interaction on top of HTTP
• 😃 Uniform data access
─ Provides domain and interaction model
14. Institut Mines-Télécom
Linked Data Platform 1.0
■ Domain Model
• Defines different types of LDP resources
• Used to describe resources on LDPs
■ Interaction Model
• Well-defined HTTP methods for CRUD
operations on LDP Resources
14
LDP
Resource
LDP
RDF Source
LDP
Non-RDF
Source
LDP
Basic
Container
LDP
Container
LDP
Indirect
Container
LDP
Direct
Container
Semantic Web
LDP Standard: Linked Data Platform 1.0
LDPs: data platforms implementing LDP Standard
15. Institut Mines-Télécom
■ RDF for Data Syntax
• Uniform identification mechanism
• Flexibility
■ RDFS/OWL for Data Semantics
• Ontology languages
• Semantics in syntax
■ LDP Standard for Data Access
• High-level protocols
• Uniform data access
Satisfaction of Requirements for data interoperability
15
Semantic Web
Open
standards
16. Institut Mines-Télécom
LDP Related Work
16
■ Usage of LDP
• Linked Data Platform as a novel approach for Enterprise Application Integration [MGG13]
• Music SOFA: An architecture for semantically informed recomposition of Digital Music
Objects [DDR18]
• ECA2LD: Generating Linked Data from Entity-Component-Attribute runtimes [TRM18]
• Linking the Web of Things: LDP-CoAP Mapping [LIG+16]
■ Custom Generation of LDP
• Morph-LDP: An R2RML-based Linked Data Platform implementation [MPC+14]
• A Linked Data Platform adapter for the Bugzilla issue tracker [MGG14]
■ LDP Implementations:
• LDP Resource Management Systems: Generic LDP servers
• LDP Frameworks: Tools for developing LDP servers
Semantic Web
17. Institut Mines-Télécom
LDP Implementations
■ LDP Resource Management Systems:
• Generic LDP servers for storing, retrieving and
manipulating LDP resources through HTTP
methods
• e.g. OpenLink Virtuoso Server, Apache Marmotta,
Fedora Commons
■ LDP Frameworks:
• API for facilitating the manual development of
LDPs
• e.g. LDP4j [EGMGC14], Eclipse Lyo
17
RDF Data Sources
LDP Resource
Generator
LDP Resources
18. Institut Mines-Télécom
Generation of LDPs
18
Design Implementation Deployment
● Define data design: how
data is organized
according the domain
model
● Encode data design in
LDP Resource
Generator
● Deploy LDP server and
data
● Problems:
○ Heterogeneity: No
support for non-RDF
data sources
○ Hosting constraints
● Problems:
○ Tight coupling between
design and
implementation
hindering:
■ Maintainability of
design
■ Reusability of design
● Problems:
○ Definition is manual
Semantic Web
19. Institut Mines-Télécom
State of the art: Synthesis
19
■ Problems wrt to data exploitation in highly decentralized information
ecosystems are data heterogeneity and hosting constraints
■ Semantic Web standards (RDF, RDFS/OWL, LDP) satisfy requirements
for data interoperability
■ But generating LDPs from existing RDF data sources is a complex task:
• No support for non-RDF data sources
• No support for hosting constraints
• Manual development producing tight coupling between data
design and implementation
─ Reusability and maintainability of LDP designs are strongly limited
20. Institut Mines-Télécom
Objective
■ Automatize the generation of LDPs in highly decentralized
information ecosystem by using Semantic Web technologies and
considering the following constraints:
• Data Heterogeneity
• Hosting Constraints
• LDP Design Reusability
20
28. Institut Mines-Télécom
LDP Dataset
■ LDP Dataset consists of:
• Set of container structures (n,g,M):
─ n is the IRI of the container
─ g its RDF graph
─ M is a set of IRIs representing the members of container n
• Set of named graphs (n,g):
─ n is the IRI of the non-container
─ g its RDF graph
28
LDP Generation Workflow
31. Institut Mines-Télécom
LDP-DL: Overview
31
Data Source
LDP Generation Workflow
Data design questions:
■ What are the LDP resources wrt to
resources from the data source ?
■ What is the structure of
containers/non-containers ?
■ What are the content of
containers/non-containers ?
32. Institut Mines-Télécom
LDP-DL: Overview
32
LDP Dataset
Data Source
LDP Generation Workflow
Data design questions:
■ What are the LDP resources wrt to
resources from the data source ?
■ What is the structure of
containers/non-containers ?
■ What are the content of
containers/non-containers ?
33. Institut Mines-Télécom
LDP-DL: Overview
33
LDP Dataset
Data Source
LDP Generation Workflow
Data design questions:
■ What are the LDP resources wrt to
resources from the data source ?
■ What is the structure of
containers/non-containers ?
■ What are the content of
containers/non-containers ?
dex:paris-catalog a ldp:BasicContainer;
foaf:primaryTopic ex:paris-catalog;
ex:paris-catalog a dcat:catalog;
dcat:keyword "paris","dataset";
…….
ldp:contains dex:parking, dex:busStation;
34. Institut Mines-Télécom
LDP-DL: Overview
34
LDP Dataset
Data Source
Data design questions:
■ What are the LDP resources wrt to
resources from the data source ?
■ What is the structure of
containers/non-containers ?
■ What are the content of
containers/non-containers ?
LDP design language describes LDP
resources:
■ IRIs
■ organization in containers
■ Content (graph)
■ Members of containers
LDP Generation Workflow
36. Institut Mines-Télécom
LDP-DL: Overview
36
Related resource
dex:paris-catalog a ldp:BasicContainer;
foaf:primaryTopic ex:paris-catalog;
ex:paris-catalog a dcat:catalog;
dcat:keyword "paris","dataset";
…….
ldp:contains dex:parking, dex:busStation;
LDP Generation Workflow
37. Institut Mines-Télécom
dex:paris-catalog a ldp:BasicContainer;
foaf:primaryTopic ex:paris-catalog;
ex:paris-catalog a dcat:catalog;
dcat:keyword "paris","dataset";
…….
ldp:contains dex:parking, dex:busStation;
LDP-DL: Overview
37
Related resource
LDP Generation Workflow
RDF Graph of
the LDP Resource
38. Institut Mines-Télécom
LDP-DL: Syntax
■ ResourceMap:
• Related resources identified by
Query Pattern
• RDF graph of LDP resources described
by Construct Query
38
39. Institut Mines-Télécom
LDP-DL: Syntax
■ ResourceMap:
• Related resources identified by
Query Pattern
• RDF graph of LDP resources described
by Construct Query
■ NonContainerMap: describes non-containers
39
40. Institut Mines-Télécom
LDP-DL: Syntax
■ ResourceMap:
• Related resources identified by
Query Pattern
• RDF graph of LDP resources described
by Construct Query
■ NonContainerMap: describes non-containers
■ ContainerMap: describes containers and their
members (containers or non-containers)
40
41. Institut Mines-Télécom
LDP-DL: Syntax
■ ResourceMap:
• Related resources identified by
Query Pattern
• RDF graph of LDP resources described
by Construct Query
■ NonContainerMap: describes non-containers
■ ContainerMap: describes containers and their members
(containers or non-containers)
■ DataSource describes:
• RDF Sources using their IRIs
• Non-RDF Sources using:
─ IRIs of data sources
─ IRIs of lifting rules
41
44. Institut Mines-Télécom
■ Given an interpretation and a design document , we define
the LDP dataset that we call the evaluation of wrt
LDP-DL Formal Semantics
44
■ A LDP dataset D is valid wrt to iff there exists such that:
⊧ and D is the evaluation of wrt
■ We provide an algorithm for that generates LDP datasets that
are provably valid wrt input design documents
45. Institut Mines-Télécom
Handling Hosting Constraints
■ Dynamic LDP dataset store instructions to generate graph of LDP
resources
■ Using dynamic LDP dataset:
• Generate LDP dataset at deployment
• Generate graph of LDP resources at query time
■ Deal with dynamicity of data sources and hosting constraints
45
LDP Generation Workflow
48. Institut Mines-Télécom
LDP Generation Toolkit
48
*Lefrançois, Maxime, Antoine Zimmermann, and Noorani Bakerally.
"A SPARQL extension for generating RDF from heterogeneous
formats." European Semantic Web Conference. Springer, Cham, 2017.
53. Institut Mines-Télécom
Evaluation
■ Objective: Automatize the generation of LDPs in highly
decentralized information ecosystem by using Semantic Web
technologies and considering the following constraints:
• Data Heterogeneity
• Hosting Constraints
• LDP Design Reusability
■ Evaluation criteria are derived from objective
53
54. Institut Mines-Télécom
Evaluation: Experiment Settings
■ 8 design documents
■ 28 data sources
• RDF data sources:
─ Open data catalogs from 21 data portals
─ BBC wildlife dataset
─ LodPaddle
• Heterogeneous data sources (JSON, CSV)
• Real-time data sources (JSON, CSV)
■ Github: https://github.com/noorbakerally/LDPDatasetExamples
■ Performance test done using a simple design document and
different data sources having a maximum of 1 million triples
• Performance is approximately linear
54
57. Institut Mines-Télécom
Evaluation: LDP Design Reusability
■ Domain Design Reusability Experiment: Same design document
and varying data sources structured with same ontology
57
58. Institut Mines-Télécom
■ Generic Design Reusability Experiment: Same design document
and varying data sources structured with different ontology
58
Evaluation: LDP Design Reusability
61. Institut Mines-Télécom
■ Semantic Web
■ LDP Generation Model
• LDP Generation Workflow
• LDP Design Language
■ LDP Generation Toolkit
■ Evaluation
■ Conclusion & Perspectives
Outline
61
62. Institut Mines-Télécom
■ Definition of Highly decentralized information ecosystem
• Identification of problems w.r.t data exploitation
• Identification of requirements for data interoperability
■ Semantic Web standards as foundations to facilitate data
publications
■ Data exploitation may be facilitated by providing tools to data
publishers rather than only data consumers
Conclusion: Context
62
63. Institut Mines-Télécom
■ LDP Generation Workflow
• LDP Design Language with:
─ Formal syntax to write LDP design documents
─ Formal semantics to properly interpret LDP design documents
• LDP Dataset
■ LDP Generation Toolkit: Implementation of the LDP Generation
Workflow
■ Evaluation of LDP Generation Toolkit wrt data heterogeneity, hosting
constraints, LDP design reusability
Conclusion: Summary of Contributions
63
64. Institut Mines-Télécom
■ Partial coverage of the LDP standard (e.g. Direct, Indirect
Containers are not considered)
■ Limited handling of hosting constraints
■ Manual generation of LDP design documents
■ Manual generation of lifting rules
Conclusion: Limitations
64
65. Institut Mines-Télécom
Perspectives
■ Enrich design aspects in LDP-DL Model
• Consider Direct & Indirect containers
• Provide deployment constructs to describe aspects such as:
─ Access rights
─ Paging
■ Generate Linked Data based on best practices from Data on the Web Best
Practices [LBC17]
■ Provide LDP Generation methodology
■ Evaluate with real users of LDP
65
66. Institut Mines-Télécom
References
[BG14] Dan Brickley and Ramanathan V. Guha. RDF Schema 1.1. W3C
Recommendation, World Wide Web Consortium (W3C), February 25 2014.
[BL06] Tim Berners-Lee. Linked Data-Design Issues, 2006.
[CWL14] R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1 Concepts and Abstract
Syntax, W3C Recommendation 25 February 2014. Technical report, W3C, 2014
[DDR18] De Roure, David, et al. "Music sofa: An architecture for semantically informed
recomposition of digital music objects." Proceedings of the 1st International Workshop
on Semantic Applications for Audio and Music. ACM, 2018.
[FR07] R. B. France and B. Rumpe. Model-driven development of complex software: A
research roadmap. In FOSE, 2007.
[Gro13] W3C SPARQL Working Group. SPARQL 1.1 Overview. W3C Recommendation,
World Wide Web Consortium (W3C), March 21 2013.
66
67. Institut Mines-Télécom
References
[LIG+16] Loseto, Giuseppe, et al. "Linking the web of things: LDP-CoAP mapping."
Procedia Computer Science 83 (2016): 1182-1187.
[MGG13] Mihindukulasooriya, Nandana, Raúl García-Castro, and Miguel Esteban
Gutiérrez. "Linked Data Platform as a novel approach for Enterprise Application
Integration." COLD. 2013.
[MGG14] Mihindukulasooriya, Nandana Sampath, Miguel Esteban Gutiérrez, and Raul
García Castro. "A Linked Data Platform adapter for the Bugzilla issue tracker." (2014):
89-92.
[MPC+14] Mihindukulasooriya, Nandana, et al. "morph-LDP: an R2RML-based linked
data platform implementation." European Semantic Web Conference. Springer, Cham,
2014.
[SAM15c] Steve Speicher, John Arwe, and Ashok Malhotra. Linked Data Platform 1.0.
Technical report, World Wide Web Consortium (W3C), February 26 2015.
67
68. Institut Mines-Télécom
References
[SVB+06] T. Stahl, M. Volter, J. Bettin, A. Haase, and S. Helsen. Model-driven software
development: technology, engineering, management. Pitman, 2006.
[TRM18] Spieldenner, T., Schubotz, R., & Guldner, M. (2018, June). ECA2LD:
Generating Linked Data from Entity-Component-Attribute runtimes. In 2018 Global
Internet of Things Summit (GIoTS) (pp. 1-4). IEEE.
[W3C12] W3C OWL Working Group. OWL 2 Web Ontology Language Docu-ment
Overview (Second Edition), W3C Recommendation 11 December2012. W3C
Recommendation, World Wide Web Consortium (W3C),December 11 2012
68
81. Institut Mines-Télécom
LDP-DL Semantics
81
1. Eval of qp returns { 𝞀←ex:paris-catalog} and
{𝞀←ex:toulouse-catalog}
2. for each of them, a new resource is created
3. consider {𝞀 ←ex:paris-catalog}
4. the new resource (𝜈) is dex:paris-catalog
5. To generate graph of dex:paris-catalog, cq is
evaluated on the source with the bindings
{𝞀←ex:paris-catalog}, {𝜈←dex:paris-catalog}
𝞀: related resource, 𝜈: new LDP resource
83. Institut Mines-Télécom
LDP-DL Semantics
83
-Consider eval of :dataset to generate members of
dex:paris-catalog
-members of dex:paris-catalog describes
dcat:datasets of ex:paris-catalog (related
resource)
- eval of qp is done with bindings
{π1
← ex:paris-catalog}