The Linked Data and Services presentation was presented by Andreas Harth (KIT) and Barry Norton (KIT) at the PlanetData project Kick-off Meeting on October 11, 2010 in Palma de Mallorca, Spain.
The document discusses semantic search and how it can improve on traditional keyword-based search. It describes how semantic search can extend and refine search queries using ontologies and semantic metadata. This allows for more precise and complete search results. Semantic search also enables cross-referencing related information, exploratory search through semantic navigation, and reasoning over semantic data to infer implicit facts.
The document discusses exploratory semantic search using linked open data. It describes how a user could browse related entities in a knowledge graph starting from a book, following links to the author, other authors influenced by or influencing the first author, and their notable works. This allows the user to serendipitously discover related information without having to formulate a precise search query. The document also provides examples of exploring topics like space flights and accidents. Finally, it mentions exploratory search tools that augment video search using linked open data.
This document discusses applications of linked data and semantic web technologies. It describes the linked open data cloud and prominent datasets like DBpedia. It provides statistics about the size and connectivity of linked open data. It also discusses ontologies, browsers, and search engines that facilitate working with linked data. Finally, it outlines the components needed to build linked data driven web applications and access linked data through SPARQL endpoints and libraries.
Talk delivered at YOW! Developer Conferences in Melbourne, Brisbane and Sydney Australia on 1-9 December 2016.
Abstract: Governments collect a lot of data. Data on air quality, toxic chemicals, laws and regulations, public health, and the census are intended to be widely distributed. Some data is not for public consumption. This talk focuses on open government data — the information that is meant to be made available for benefit of policy makers, researchers, scientists, industry, community organisers, journalists and members of civil society.
We’ll cover the evolution of Linked Data, which is now being used by Google, Apple, IBM Watson, federal governments worldwide, non-profits including CSIRO and OpenPHACTS, and thousands of others worldwide.
Next we’ll delve into the evolution of the U.S. Environmental Protection Agency’s Open Data service that we implemented using Linked Data and an Open Source Data Platform. Highlights include how we connected to hundreds of billions of open data facts in the world’s largest, open chemical molecules database PubChem and DBpedia.
WHO SHOULD ATTEND
Data scientists, software engineers, data analysts, DBAs, technical leaders and anyone interested in utilising linked data and open government data.
This document discusses linked data and its applications in the Web of Data. It describes the four principles of linked data: (1) using URIs to identify things, (2) using HTTP URIs so that these things can be looked up, (3) returning useful RDF information about each URI, and (4) including links to other URIs. Following these principles leads to interlinking data across the web and creating a Web of Data. The document outlines the increasing growth of the Web of Data since its inception in 2007.
Slides of my talk at OSLCfest in Stockholm Nov 6, 2019
Video recording of the talk is available here:
https://www.facebook.com/oslcfest/videos/2261640397437958/
This document summarizes Sebastian Hellmann's PhD thesis on integrating natural language processing (NLP) data, tools, and applications with RDF and OWL. The thesis proposes creating datasets in RDF to facilitate data integration and linking. It describes converting Wiktionary and the Wortschatz corpus to RDF to create a linguistic linked data web. Standardized formats like POWLA are discussed for representing corpora on the web. The thesis also covers knowledge acquisition from resources like the Tiger Corpus Navigator and ontology learning from text using techniques like LExO.
This document discusses ontology design and development. It describes the ontology development process, which includes pre-development, development, and post-development activities. Development activities involve specification, conceptualization, formalization, and implementation. The document also outlines methodologies for ontology design, which guide the construction of consistent ontologies through management, development-oriented, and support activities. These activities work together to efficiently develop complex ontologies.
The document discusses semantic search and how it can improve on traditional keyword-based search. It describes how semantic search can extend and refine search queries using ontologies and semantic metadata. This allows for more precise and complete search results. Semantic search also enables cross-referencing related information, exploratory search through semantic navigation, and reasoning over semantic data to infer implicit facts.
The document discusses exploratory semantic search using linked open data. It describes how a user could browse related entities in a knowledge graph starting from a book, following links to the author, other authors influenced by or influencing the first author, and their notable works. This allows the user to serendipitously discover related information without having to formulate a precise search query. The document also provides examples of exploring topics like space flights and accidents. Finally, it mentions exploratory search tools that augment video search using linked open data.
This document discusses applications of linked data and semantic web technologies. It describes the linked open data cloud and prominent datasets like DBpedia. It provides statistics about the size and connectivity of linked open data. It also discusses ontologies, browsers, and search engines that facilitate working with linked data. Finally, it outlines the components needed to build linked data driven web applications and access linked data through SPARQL endpoints and libraries.
Talk delivered at YOW! Developer Conferences in Melbourne, Brisbane and Sydney Australia on 1-9 December 2016.
Abstract: Governments collect a lot of data. Data on air quality, toxic chemicals, laws and regulations, public health, and the census are intended to be widely distributed. Some data is not for public consumption. This talk focuses on open government data — the information that is meant to be made available for benefit of policy makers, researchers, scientists, industry, community organisers, journalists and members of civil society.
We’ll cover the evolution of Linked Data, which is now being used by Google, Apple, IBM Watson, federal governments worldwide, non-profits including CSIRO and OpenPHACTS, and thousands of others worldwide.
Next we’ll delve into the evolution of the U.S. Environmental Protection Agency’s Open Data service that we implemented using Linked Data and an Open Source Data Platform. Highlights include how we connected to hundreds of billions of open data facts in the world’s largest, open chemical molecules database PubChem and DBpedia.
WHO SHOULD ATTEND
Data scientists, software engineers, data analysts, DBAs, technical leaders and anyone interested in utilising linked data and open government data.
This document discusses linked data and its applications in the Web of Data. It describes the four principles of linked data: (1) using URIs to identify things, (2) using HTTP URIs so that these things can be looked up, (3) returning useful RDF information about each URI, and (4) including links to other URIs. Following these principles leads to interlinking data across the web and creating a Web of Data. The document outlines the increasing growth of the Web of Data since its inception in 2007.
Slides of my talk at OSLCfest in Stockholm Nov 6, 2019
Video recording of the talk is available here:
https://www.facebook.com/oslcfest/videos/2261640397437958/
This document summarizes Sebastian Hellmann's PhD thesis on integrating natural language processing (NLP) data, tools, and applications with RDF and OWL. The thesis proposes creating datasets in RDF to facilitate data integration and linking. It describes converting Wiktionary and the Wortschatz corpus to RDF to create a linguistic linked data web. Standardized formats like POWLA are discussed for representing corpora on the web. The thesis also covers knowledge acquisition from resources like the Tiger Corpus Navigator and ontology learning from text using techniques like LExO.
This document discusses ontology design and development. It describes the ontology development process, which includes pre-development, development, and post-development activities. Development activities involve specification, conceptualization, formalization, and implementation. The document also outlines methodologies for ontology design, which guide the construction of consistent ontologies through management, development-oriented, and support activities. These activities work together to efficiently develop complex ontologies.
The document describes the aDORe Federation Architecture, which was developed to address challenges of scale in digital repositories. The key aspects are:
1) It is a 3-tier architecture that federates distributed digital object repositories to provide unified access to content.
2) The first tier consists of surrogate and sometimes datastream repositories that store metadata about digital objects and bitstreams.
3) The architecture leverages URIs to identify digital objects, surrogates, repositories and interfaces to allow federated access across repositories.
This document discusses verifying the integrity constraints of the Portuguese WordNet (OpenWordnet-PT) against the ontology for encoding wordnets. It was the first attempt to check correctness and improve the linguistic data by correcting errors found. Various types of errors were discovered, including datatype errors, domain and range errors, and structural errors. Explanations provided by reasoning tools helped identify and fix issues, improving the overall quality and accuracy of the OpenWordnet-PT resource.
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...Herbert Van de Sompel
The document discusses the OAI-ORE Interoperability Framework in the context of current scholarly communication. It describes how OAI-ORE was funded and lists the editors. It then discusses how the current scholarly system is like a scanned paper system and outlines some technical trends emerging, including augmenting scholarship with machine-readable content, integrating datasets into the scholarly record, and exposing scholarly processes.
The document discusses creating Linked Open Data (LOD) microthesauri from the Art & Architecture Thesaurus (AAT). It defines a microthesaurus as a designated subset of a thesaurus that can function independently. The document provides an overview of creating an AAT-based LOD dataset for a digital art and architecture collection. It also demonstrates how to extract concept URIs and labels from the AAT thesaurus structure using SPARQL queries to build microthesauri.
Open library data and embrace the world library linked data皓仁 柯
The document discusses linked open data and library linked data. It provides an overview of semantic web and linked data principles. It then describes the linked data life cycle and implementation process. The document presents three case studies: 1) Linked data in the National Central Library of Taiwan including datasets and metadata terms. 2) Movie linked data. 3) Linked OPAC. It concludes by discussing the future of linked open data.
TranSMART: How open source software revolutionizes drug discovery through cro...keesvb
Presentation about the use of open source software in pharmaceutical companies at Global Discovery & Development Innovation Summit (GDDIS) in Princeton, NY, fall 2013.
- FactForge is a semantic data service that provides access to a large collection of heterogeneous linked open data through inference and a reference ontology.
- It allows exploration of inferred knowledge through SPARQL queries, an RDF search, and relationship browsing.
- Challenges include cleaning input data, detecting contradictions, consistency checking, and curating and upgrading the methodology. FactForge has been used to generate linked data from unstructured sources and integrate metadata.
The bX project: Federating and Mining Usage Logs from Linking ServersHerbert Van de Sompel
The document describes the bX project which aims to federate and mine usage log data from linking servers. It discusses analyzing local usage data, moving towards sharing federated usage data across institutions, and collaborating on the bX project. The goal is to mine the federated usage data to develop novel methods for evaluating scholarly resources based on usage patterns.
The document discusses linked data and services. It describes the linked data principles of using URIs to name things and including links between URIs. It then discusses querying linked data from multiple sources using either a materialization or distributed query processing approach. It proposes the concept of linked data services that adhere to REST principles and linked data principles by describing their input and output using RDF graph patterns. Integrating linked data services with linked open data could enable querying across both interconnected datasets and services.
The document discusses searching for answers to keyword queries in linked data. It presents the problem of keyword query routing, which aims to identify a valid set of data sources that can produce non-empty answers to a keyword query. It proposes using keyword-element relationship graphs at the element, schema, and data source levels to model relationships between keywords and data elements or sources. Experiments on a chunk of the Billion Triple Challenge dataset indicate that considering relationships between elements within a maximum path length performs better than considering only direct relationships, and identifies valid plans for multi-source queries.
This document provides a summary of a talk given by Tope Omitola on using linked data for world sense-making. The talk discussed EnAKTing, a project focused on building ontologies from large-scale user participation and querying linked data. It also covered publishing and consuming public sector datasets as linked data, including challenges around data integration, normalization and alignment. The talk concluded with a discussion of linked data services and applications developed by the project to enhance findability, search, and visualization of linked data.
The document introduces the concept of Linked Data and discusses how it can be used to publish structured data on the web by connecting data from different sources. It explains the principles of Linked Data, including using HTTP URIs to identify things, providing useful information when URIs are dereferenced, and including links to other URIs to enable discovery of related data. Examples of existing Linked Data datasets and applications that consume Linked Data are also presented.
The document discusses the concepts of the semantic web and linked data. It explains that the semantic web aims to convert the web into a single database that can be understood by machines through linking data using URIs, RDF, and other standards. It provides examples of projects like DBpedia and the Linking Open Data cloud that publish open government and other data as linked data. The document outlines some of the technologies and best practices for publishing and connecting data as linked data.
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
The document summarizes an open genomic data project called OpenFlyData that links and integrates gene expression data from multiple sources using semantic web technologies. It describes how RDF and SPARQL are used to query linked data from sources like FlyBase, BDGP and FlyTED. It also discusses applications built on top of the linked data as well as performance and challenges of the system.
Linked Data, the Semantic Web, and You discusses key concepts related to Linked Data and the Semantic Web. It defines Linked Data as a set of best practices for publishing and connecting structured data on the web using URIs, HTTP, RDF, and other standards. It also explains semantic web technologies like RDF, ontologies, SKOS, and SPARQL that enable representing and querying structured data on the web. Finally, it discusses how libraries are applying these concepts through projects like BIBFRAME, FAST, library linked data platforms, and the LD4L project to represent bibliographic data as linked open data.
Linked Open Data projects aim to extend the web of documents to a web of linked data by adding semantics through standards like RDF and ontologies. The Linked Open Data cloud has grown significantly since 2007 and contains billions of RDF triples and links between data sources. Projects like LOD2 build on this by developing technologies and linking more open datasets to enable new applications. For Linked Data to achieve its full potential, openness and allowing free access and reuse is important, though it does mean losing some control over data usage.
The document discusses querying live linked data from millions of diverse data sources on the web. It presents different approaches for source selection when querying over dynamic linked data, including using indexes, data summaries, and direct execution. Evaluation of the approaches shows that combining querying of static RDF stores and the live web through source selection dynamics can improve query time and return fresher results.
Linked Data Driven Data Virtualization for Web-scale Integrationrumito
- Linked data and data virtualization can help address challenges of growing data heterogeneity, complexity, and need for agility by providing a common data model and identifiers.
- Linked data uses RDF to represent information as graphs of triples connected by URIs, allowing different data sources to be integrated and queried together.
- As more data is published using common vocabularies and linking to existing URIs, it increases opportunities for discovery, integration and novel ways to extract value from diverse data sources.
The document describes the aDORe Federation Architecture, which was developed to address challenges of scale in digital repositories. The key aspects are:
1) It is a 3-tier architecture that federates distributed digital object repositories to provide unified access to content.
2) The first tier consists of surrogate and sometimes datastream repositories that store metadata about digital objects and bitstreams.
3) The architecture leverages URIs to identify digital objects, surrogates, repositories and interfaces to allow federated access across repositories.
This document discusses verifying the integrity constraints of the Portuguese WordNet (OpenWordnet-PT) against the ontology for encoding wordnets. It was the first attempt to check correctness and improve the linguistic data by correcting errors found. Various types of errors were discovered, including datatype errors, domain and range errors, and structural errors. Explanations provided by reasoning tools helped identify and fix issues, improving the overall quality and accuracy of the OpenWordnet-PT resource.
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...Herbert Van de Sompel
The document discusses the OAI-ORE Interoperability Framework in the context of current scholarly communication. It describes how OAI-ORE was funded and lists the editors. It then discusses how the current scholarly system is like a scanned paper system and outlines some technical trends emerging, including augmenting scholarship with machine-readable content, integrating datasets into the scholarly record, and exposing scholarly processes.
The document discusses creating Linked Open Data (LOD) microthesauri from the Art & Architecture Thesaurus (AAT). It defines a microthesaurus as a designated subset of a thesaurus that can function independently. The document provides an overview of creating an AAT-based LOD dataset for a digital art and architecture collection. It also demonstrates how to extract concept URIs and labels from the AAT thesaurus structure using SPARQL queries to build microthesauri.
Open library data and embrace the world library linked data皓仁 柯
The document discusses linked open data and library linked data. It provides an overview of semantic web and linked data principles. It then describes the linked data life cycle and implementation process. The document presents three case studies: 1) Linked data in the National Central Library of Taiwan including datasets and metadata terms. 2) Movie linked data. 3) Linked OPAC. It concludes by discussing the future of linked open data.
TranSMART: How open source software revolutionizes drug discovery through cro...keesvb
Presentation about the use of open source software in pharmaceutical companies at Global Discovery & Development Innovation Summit (GDDIS) in Princeton, NY, fall 2013.
- FactForge is a semantic data service that provides access to a large collection of heterogeneous linked open data through inference and a reference ontology.
- It allows exploration of inferred knowledge through SPARQL queries, an RDF search, and relationship browsing.
- Challenges include cleaning input data, detecting contradictions, consistency checking, and curating and upgrading the methodology. FactForge has been used to generate linked data from unstructured sources and integrate metadata.
The bX project: Federating and Mining Usage Logs from Linking ServersHerbert Van de Sompel
The document describes the bX project which aims to federate and mine usage log data from linking servers. It discusses analyzing local usage data, moving towards sharing federated usage data across institutions, and collaborating on the bX project. The goal is to mine the federated usage data to develop novel methods for evaluating scholarly resources based on usage patterns.
The document discusses linked data and services. It describes the linked data principles of using URIs to name things and including links between URIs. It then discusses querying linked data from multiple sources using either a materialization or distributed query processing approach. It proposes the concept of linked data services that adhere to REST principles and linked data principles by describing their input and output using RDF graph patterns. Integrating linked data services with linked open data could enable querying across both interconnected datasets and services.
The document discusses searching for answers to keyword queries in linked data. It presents the problem of keyword query routing, which aims to identify a valid set of data sources that can produce non-empty answers to a keyword query. It proposes using keyword-element relationship graphs at the element, schema, and data source levels to model relationships between keywords and data elements or sources. Experiments on a chunk of the Billion Triple Challenge dataset indicate that considering relationships between elements within a maximum path length performs better than considering only direct relationships, and identifies valid plans for multi-source queries.
This document provides a summary of a talk given by Tope Omitola on using linked data for world sense-making. The talk discussed EnAKTing, a project focused on building ontologies from large-scale user participation and querying linked data. It also covered publishing and consuming public sector datasets as linked data, including challenges around data integration, normalization and alignment. The talk concluded with a discussion of linked data services and applications developed by the project to enhance findability, search, and visualization of linked data.
The document introduces the concept of Linked Data and discusses how it can be used to publish structured data on the web by connecting data from different sources. It explains the principles of Linked Data, including using HTTP URIs to identify things, providing useful information when URIs are dereferenced, and including links to other URIs to enable discovery of related data. Examples of existing Linked Data datasets and applications that consume Linked Data are also presented.
The document discusses the concepts of the semantic web and linked data. It explains that the semantic web aims to convert the web into a single database that can be understood by machines through linking data using URIs, RDF, and other standards. It provides examples of projects like DBpedia and the Linking Open Data cloud that publish open government and other data as linked data. The document outlines some of the technologies and best practices for publishing and connecting data as linked data.
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
The document summarizes an open genomic data project called OpenFlyData that links and integrates gene expression data from multiple sources using semantic web technologies. It describes how RDF and SPARQL are used to query linked data from sources like FlyBase, BDGP and FlyTED. It also discusses applications built on top of the linked data as well as performance and challenges of the system.
Linked Data, the Semantic Web, and You discusses key concepts related to Linked Data and the Semantic Web. It defines Linked Data as a set of best practices for publishing and connecting structured data on the web using URIs, HTTP, RDF, and other standards. It also explains semantic web technologies like RDF, ontologies, SKOS, and SPARQL that enable representing and querying structured data on the web. Finally, it discusses how libraries are applying these concepts through projects like BIBFRAME, FAST, library linked data platforms, and the LD4L project to represent bibliographic data as linked open data.
Linked Open Data projects aim to extend the web of documents to a web of linked data by adding semantics through standards like RDF and ontologies. The Linked Open Data cloud has grown significantly since 2007 and contains billions of RDF triples and links between data sources. Projects like LOD2 build on this by developing technologies and linking more open datasets to enable new applications. For Linked Data to achieve its full potential, openness and allowing free access and reuse is important, though it does mean losing some control over data usage.
The document discusses querying live linked data from millions of diverse data sources on the web. It presents different approaches for source selection when querying over dynamic linked data, including using indexes, data summaries, and direct execution. Evaluation of the approaches shows that combining querying of static RDF stores and the live web through source selection dynamics can improve query time and return fresher results.
Linked Data Driven Data Virtualization for Web-scale Integrationrumito
- Linked data and data virtualization can help address challenges of growing data heterogeneity, complexity, and need for agility by providing a common data model and identifiers.
- Linked data uses RDF to represent information as graphs of triples connected by URIs, allowing different data sources to be integrated and queried together.
- As more data is published using common vocabularies and linking to existing URIs, it increases opportunities for discovery, integration and novel ways to extract value from diverse data sources.
The Linked Data Research Centre (LiDRC) is a new effort within DERI to advance linked data research and development. The LiDRC operates across existing units and has 11 DERI researchers working with 9 international peers. Its research themes include publishing, discovery, application domains, and streamed linked data. The LiDRC contributes linked data infrastructure, provides tools and libraries, and participates in standards activities. It is calling for input on a technical report about linked data applications.
Information Extraction and Linked Data CloudDhaval Thakker
The document discusses Press Association's semantic technology project which aims to generate a knowledge base using information extraction and the Linked Data Cloud. It outlines Press Association's operations and workflow, and how semantic technologies can be used to develop taxonomies, annotate images, and extract entities from captions into an ontology-based knowledge base. The knowledge base can then be populated and interlinked with external datasets from the Linked Data Cloud like DBpedia to provide a comprehensive, semantically-structured source of information.
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
The presentation investigates the challenges that we must face to share scientific datasets on the Web following the Linked Open Data principles. We present the standards of the Semantic Web and investigate how they can help address those challenges. We give tips as to how to choose vocabularies to describe data and metadata, link datasets to other related datasets by making appropriate alignments, translate existing data sources to RDF and publish it on the Web as linked data.
Over the last years, the Semantic Web has been growing steadily. Today, we count more than 10,000 datasets made available online following Semantic Web standards. Nevertheless, many applications, such as data integration, search, and interlinking, may not take the full advantage of the data without having a priori statistical information about its internal structure and coverage. In fact, there are already a number of tools, which offer such statistics, providing basic information about RDF datasets and vocabularies. However, those usually show severe deficiencies in terms of performance once the dataset size grows beyond the capabilities of a single machine. In this paper, we introduce a software component for statistical calculations of large RDF datasets, which scales out to clusters of machines. More specifically, we describe the first distributed inmemory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark. The preliminary results show that our distributed approach improves upon a previous centralized approach we compare against and provides approximately linear horizontal scale-up. The criteria are extensible beyond the 32 default criteria, is integrated into the larger SANSA framework and employed in at least four major usage scenarios beyond the SANSA community.
Efficient source selection for sparql endpoint federationMuhammad Saleem
Muhammad Saleem defended his PhD thesis on efficient source selection for SPARQL endpoint query federation. The thesis addressed five main research questions: (1) how to perform join-aware source selection while ensuring complete result sets, (2) how to perform duplicate-aware source selection, (3) how to perform policy-aware source selection, (4) how to perform data distribution-aware source selection, and (5) how to design comprehensive benchmarks for federated SPARQL queries and triple stores. The thesis proposed four source selection algorithms (HIBISCUS, DAW, SAFE, TopFed) and two benchmarking systems (LargeRDFBench, FEASIBLE) to address the identified
This document discusses linked data and its use for publishing and connecting environmental data on the web. It describes how linked data allows data to work like web pages by using URIs and standards like RDF to connect related information. The document provides an overview of linked data basics including its underlying structure using triples, standards for formatting and sharing data, and techniques for querying linked data using SPARQL similar to SQL. It also discusses ongoing work by the EPA and other organizations to publish environmental and geospatial data as linked open data.
This document describes a Contextualized Knowledge Repository (CKR) framework that allows for representing and reasoning with contextual knowledge on the Semantic Web. The CKR extends the description logic SROIQ-RL to include defeasible axioms in the global context. Defeasible axioms can be overridden by local contexts, allowing exceptions. The CKR is composed of two layers - a global context containing metadata and defeasible axioms, and local contexts containing object knowledge with references. An interpretation of a CKR maps local contexts to descriptions logic interpretations over the object vocabulary, respecting references between contexts.
The document describes a Contextualized Knowledge Repository (CKR) framework for representing and reasoning with contextual knowledge on the Semantic Web. It discusses the need to make context explicit in the Semantic Web in order to represent knowledge that holds in specific contextual spaces like time, location, or topic. The CKR is presented as a formalism based on description logics that defines contexts as first-class objects and allows associating knowledge with contexts. It describes a prototype CKR implementation and outlines how a CKR could be used to represent open data about the Trentino region with contextual metadata.
This document discusses leveraging crowdsourcing techniques and consistency constraints to optimize the reconciliation of schema matching networks. It proposes:
1) Defining consistency constraints within schema matching networks and designing validation questions for crowdsourced workers.
2) Using consistency constraints to reduce reconciliation error rates and the monetary cost of asking additional validation questions.
3) Modeling a crowdsourcing process for schema matching networks that aims to minimize cost while maximizing accuracy through the application of consistency constraints.
This document discusses privacy-preserving schema reuse. It introduces the challenges of defining privacy constraints, generating an anonymized schema from multiple schemas while satisfying privacy constraints, defining a utility function for anonymized schemas, and solving the optimization problem of finding the anonymized schema with the highest utility that satisfies all privacy constraints. Experimental results demonstrate the trade-off between privacy enforcement and utility loss. The solution presents an approach for generating anonymized schemas from multiple schemas in a privacy-preserving manner.
Authros: Nguyen Quoc Viet Hung (1), Nguyen Thanh Tam (1), Zoltán Miklós (2), Karl Aberer (1),
Avigdor Gal (3), and Matthias Weidlich (4)
1 École Polytechnique Fédérale de Lausanne
2 Université de Rennes 1
3 Technion – Israel Institute of Technology
4 Imperial College London
This document summarizes a demo of using SPARQLstream and Morphstreams to visualize transport data from Madrid's public transport company (EMT) in a tablet application. Static EMT data like bus stop locations are extracted and mapped to RDF, while live bus waiting time data streams are transformed and queried in real-time. This allows a Map4RDF iOS app to retrieve bus stop information and lookup estimated arrival times using SPARQL and SPARQLstream queries. The demo illustrates how standards like SSN and R2RML can integrate static and streaming sensor data for web-based applications.
The document discusses the need for a W3C community group on RDF stream processing. It notes there is currently heterogeneity in RDF stream models, query languages, implementations, and operational semantics. The speaker proposes creating a W3C community group to better understand these differences, requirements, and potentially develop recommendations. The group's mission would be to define common models for producing, transmitting, and continuously querying RDF streams. The presentation provides examples of use cases and outlines a template for describing them to collect more cases to understand requirements.
by Irene Celino, Simone Contessa, Marta Corubolo, Daniele Dell’Aglio, Emanuele Della Valle, Stefano Fumeo and Thorsten Krüger
CEFRIEL – Politecnico di Milano – SIEMENS
This document describes SciQL, a language that bridges the gap between science and relational database management systems (DBMS). SciQL allows for the seamless integration of relational and array paradigms within DBMSs. It defines arrays and tables as first-class citizens and supports named dimensions, flexible structure-based grouping, and the distinction between arrays and tables. SciQL aims to lower the barrier for scientists to use DBMSs for array-based data while revealing new optimization opportunities for databases.
by G. Larkou, J. Metochi, G. Chatzimilioudis and D. Zeinalipour-Yazti
Presented at: 1st IEEE International Workshop on Mobile Data Management Mining and Computing on Social Networks, collocated with IEEE MDM'13
This document summarizes research on implementing defeasible logic, a non-monotonic reasoning method, in a distributed manner using the MapReduce framework. Defeasible logic allows commonsense reasoning over low-quality data and has low computational complexity. However, existing implementations did not scale to huge datasets. The researchers developed a multi-argument MapReduce implementation of defeasible logic that distributes the reasoning process. Experimental evaluation on large datasets showed this approach provides scalable defeasible reasoning over distributed data. Future work will address challenges with non-stratified rulesets and test the approach on additional real-world applications and knowledge representation methods.
This document discusses data and knowledge evolution on the semantic web. It begins by explaining the limitations of the current web in representing semantic content and introduces the semantic web as a way to give data well-defined meaning. It then discusses how ontologies and datasets are used to describe semantic data and how datasets are dynamic and change over time. It also introduces linked open data as a way to interconnect datasets and the challenges this presents. Finally, it outlines the scope of the talk, which is to survey research areas related to managing dynamic linked datasets, including remote change management, repair, and data/knowledge evolution.
This document discusses evolving workflow provenance information in the presence of custom inference rules. It presents three inference rules for provenance data, including that actors are associated with all subactivities if one activity, objects and their parts are used together, and information objects are present where physical objects carrying them are. It examines handling updates to provenance knowledge bases using these rules either by deleting all inferred facts or only as needed, and considers complexity of different approaches.
This document discusses access control for RDF graphs using abstract models. It presents an abstract access control model defined using abstract tokens and operators to model the computation of access labels for inferred RDF triples. The model supports dynamic datasets and policies. Experiments show that annotation time increases with the number of implied triples, while evaluation time increases linearly with the total number of triples. The abstract model approach allows different concrete access control policies to be applied to the same dataset.
Here are a few ways SciQL could help with this seismology use case:
1. The mseed array allows storing and querying the large seismic data in an efficient columnar format.
2. Window-based aggregation with dimensional grouping enables filtering signals by station/LTA ratios over time windows.
3. Views and queries on dimensional groups facilitate removing false positives by comparing signals across nearby stations over time.
4. Further window-based grouping and UDFs can extract signal windows for additional heuristic analysis.
By integrating the array and relational models, SciQL provides a declarative way to analyze large multidimensional scientific datasets like seismic signals interactively.
This talk was given by FORTH, Greece, at the European Data Forum (EDF) 2012 took place on June 6-7, 2012 in Copenhagen (Denmark) at the Copenhagen Business School (CBS).
Abstract:
Given the increasing amount of sensitive RDF data available on the Web, it becomes increasingly critical to guarantee secure access to this content. Access control is complicated when RDFS inference rules and other dependencies between access permissions of triples need to be considered; this is necessary, e.g., when we want to associate the access permissions of inferred triples with the ones that implied it. In this paper we advocate the use of abstract provenance models that are defined by means of abstract tokens operators to support fine grained access control for RDF graphs. The access label of a triple is a complex expression that encodes how said label was produced (i.e., the triples that contributed to its computation). This feature allows us to know exactly the effects of any possible change, thereby avoiding a complete recomputation of the labels when a change occurs. In addition, the same application can choose to enforce different access control policies or, different applications can enforce different policies on the same data, avoiding the recomputation of the label of a triple. Preliminary experiments have shown the applicability and benefits of our approach.
This talk has been given at the 13th International Conference on Principles of Knowledge Representation and Reasoning (KR 2012) to be held in Rome, Italy, June 10-14, 2012 by Ilias Tahmazidis (FORTH).
Abstract:
We are witnessing an explosion of available data from the Web, government authorities, scientific databases, sensors and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling the vast amounts of data for these applications. In this paper, we consider nonmonotonic reasoning, which has traditionally focused on rich knowledge structures. In particular, we consider defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge data sets. Our experimental results demonstrate that defeasible reasoning with billions of data is performant, and has the potential to scale to trillions of facts.
The presentation was delivered during the 1st International Conference on Health Information Science (HIS 2012) on April 9th, 2012 in Beijing, China.
Abstract:
In cytomics bookkeeping of the data generated during lab experiments is crucial. The current approach in cytomics is to conduct High-Throughput Screening (HTS) experiments so that cells can be tested under many different experimental conditions. Given the large amount of different conditions and the readout of the conditions through images, it is clear that the HTS approach requires a proper data management system to reduce the time needed for experiments and the chance of man-made errors. As different types of data exist, the experimental conditions need to be linked to the images produced by the HTS experiments with their metadata and the results of further analysis. Moreover, HTS experiments never stand by themselves, as more experiments are lined up, the amount of data and computations needed to analyze these increases rapidly. To that end cytomic experiments call for automated and systematic solutions that provide convenient and robust features for scientists to manage and analyze their data. In this paper, we propose a platform for managing and analyzing HTS images resulting from cytomics screens taking the automated HTS workflow as a starting point. This platform seamlessly integrates the whole HTS workflow into a single system. The platform relies on a modern relational database system to store user data and process user requests, while providing a convenient web interface to end-users. By implementing this platform, the overall workload of HTS experiments, from experiment design to data analysis, is reduced significantly. Additionally, the platform provides the potential for data integration to accomplish genotype-to-phenotype modeling studies.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
1. Linked Data and Services
Andreas Harth and Barry Norton
Institute AIFB
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association www.kit.edu
2. Outline
! Motivation
! Linked Data Principles
! Query Processing over Linked Data
! Linked Data Services (LIDS) and Linked Open
Services (LOS)
! Conclusion
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
3. Motivation
! Semantic Web/Linked Data technologies are well-suited
for data integration
? !
Common Data
Data Interactive Data
Format/Access
Integration Exploration
Protocol
8/10/11 Taking the LIDS off Data Silos KIT – University of the State of Baden-Wuerttemberg and
Andreas Harth National Laboratory of the Helmholtz Association
4. Linked Data Principles*
1. Use URIs to name things; not only documents, but
also people, locations, concepts, etc.
2. To enable agents (human users and machine agents
alike) to look up those names, use HTTP URIs
3. When someone looks up a URI we provide useful
information; with 'useful' in the strict sense we usually
mean structured data in RDF.
4. Include links to other URIs allowing agents (machines
and humans) to discover more things
(*) http://www.w3.org/DesignIssues/LinkedData.html
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
5. Correspondence between thing-URI and
source-URI
User Agent
http://www.polleres.net/foaf.rdf#me
HTTP RDF
GET
Web Server
http://www.polleres.net/foaf.rdf
5 KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
6. Correspondence between thing-URI and
source-URI
User Agent
http://dbpedia.org/resource/Gordon_Brown
HTTP 303 HTTP RDF
GET GET
http://dbpedia.org/data/Gordon_Brown
Web Server
http://dbpedia.org/page/Gordon_Brown
6 KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
7. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
8. Queries over Linked Data
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
?f ?n
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
9. Querying Data Across Sources
! Data warehousing or materialisation-based approaches
(MAT)
CRAWL INDEX SERVE
! Distributed query processing approaches (DQP)
SELECT * R S
FROM…
R S
9 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
10. DQP on Linked Data
SELECT * R S
FROM…
R S ODBC ODBC
SELECT ?s TP TP
WHERE… HTTP HTTP
TP TP GET GET
10 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
11. Query Processing Overview
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
TP TP
(an:f#ah foaf:knows ?f) (?f foaf:name ?n)
Select source HTTP RDF Select source
HTTP RDF
(s) GET GET (s)
?f ?n
http://danbri.org/foaf.rdf#danbri Dan Brickley
11 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
12. Barry
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
13. Problem: Source Selection for Triple Patterns
! (?s ?p ?o)
! (#s ?p ?o)
! (?s #p ?o)
! (?s ?p #o)
! (#s #p ?o)
! (#s ?p #o)
! (?s #p #o)
! (#s #p #o)
! Given a triple pattern, which source can contribute bindings
for the triple pattern?
13 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
14. Schema-Level Indices [Stuckenschmidt et al.
2004]
! Keep index of properties and/or classes contained in
sources
! (?s #p ?o), (?s rdf:type #o)
! Covers only queries containing schema-level elements
! Commonly used properties select potentially too many
sources
SELECT ?x1 ?x2 WHERE {
SELECT ?f ?n WHERE {
dblppub:HoganHP08 dc:creator ?a1.
an:f#ah foaf:knows ?f.
?x1 owl:sameAs ?a1.
?f foaf:name ?n.
?x2 foaf:knows ?x1.
}
}
14 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
15. Direct Lookup (DL) [Hartig et al. 2009]
! Exploits correspondence between thing-URI and source-URI
! Linked Data sources (aka RDF files) return typically triples with a
subject corresponding to the source
! Sometimes the sources return triples with object corresponding to the
source
! (#s ?p ?o), (#s #p ?o), (#s #p #o)
! (?s ?p #o), (?s #p #o)
! Incomplete wrt. patterns but also wrt. to URI reuse across sources
! Limited parallelism, unclear how to schedule lookups
SELECT ?x1 ?x2 WHERE {
SELECT ?f ?n WHERE {
dblppub:HoganHP08 dc:creator ?a1.
an:f#ah foaf:knows ?f.
?x1 owl:sameAs ?a1.
?f foaf:name ?n.
?x2 foaf:knows ?x1.
}
}
15 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
16. Approximate Data Summaries
! Combined description of schema-level and instance-level
! Use approximation to reduce index size (incurs false positives)
! Possible to use entire query for source selection
! Parallel lookups since sources can be determined for the entire query
! (?s ?p ?o), (#s ?p ?o), (?s #p ?o), (?s ?p #o), (#s #p ?
o), (#s ?p #o), (?s #p #o), (#s #p #o)
! and combinations of triple patterns
SELECT ?x1 ?x2 WHERE {
SELECT ?f ?n WHERE {
dblppub:HoganHP08 dc:creator ?a1.
an:f#ah foaf:knows ?f.
?x1 owl:sameAs ?a1.
?f foaf:name ?n.
?x2 foaf:knows ?x1.
}
}
16 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and
Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
17. Implementation
! Deploy wrappers „in the cloud“
! Google App Engine: hosting of Java and Python
webapps on Google’s Cloud infrastructure
! Limited amount of processing time (6hrs/day)
! Single-threaded applications
! Suited for deploying wrappers
! e.g. http://twitter2foaf.appspot.com/ converts Twitter
user data to RDF
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
18. Linking Open Data Cloud 2007
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
19. Linking Open Data Cloud 2008
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
20. Linking Open Data Cloud 2009
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
21. Linking Open Data Cloud 2010
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
22. Geonames Services
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
23. Geonames Services
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
24. Geonames Services
{"weatherObservation":
{"clouds":"broken clouds",
"weatherCondition":"drizzle",
"observation":"LESO 251300Z 03007KT
340V040 CAVOK 23/15 Q1010",
"windDirection":30,
"ICAO":"LESO", ...
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
25. Geonames Services
{"weatherObservation":
{"clouds":"broken clouds",
"weatherCondition":"drizzle",
"observation":"LESO 251300Z 03007KT
340V040 CAVOK 23/15 Q1010",
"windDirection":30,
"ICAO":"LESO", ...
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
26. Linked Open Service Principles
REST Principles
1. Application state and functionality is divided into resources
2. Every resource is uniquely addressable
3. All resources share a uniform interface:
a) A constrained set of well-defined operations
b) A constrained set of content types
Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using
the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
Linked Open Service Principles
1. Describe services as LOD prosumers with input and output
descriptions as SPARQL graph patterns
2. Communicate RDF by RESTful content negotiation
3. The output should make explicit its relation with the input
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
27. LOS Weather Service
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
28. LOS Geo Resources
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
29. Resource-Based Linked Open Services
GET
Accept: text/html
303 REDIRECT /page
GET
Accept: application/rdf
Linked Data
+xml
(or text/n3)
303 REDIRECT /data
GET /weather
Linked Service
Accept: application/rdf
+xml
(or text/n3)
200 <rdf:Description>
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
30. Interlinking Data with Data from Services?
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
31. Data Services
! Given input, provide output
! Input and output are related in a service-specific way
! Do not change the state of the world
Input relation Output
defines
Service
! E.g. GeoNames findNearbyWikipedia service
! Input: lat/lon
! Output: places KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
! Relation: output places that are nearby input place
32. Linked Data Services
! We’d like to integrate data services with Linked Data
1. LIDS need to adhere to Linked Data principles
! We’d like to use data services in software programs
2. LIDS need machine-readable descriptions of input and
output
! Compared to naïve approach: assign URI to service output
! Relationship between input and output is explicitly
described
! Dynamicity is supported KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
33. 1. Data Services as Linked Data
! Input is given as URI Service Endpoint
http://geowrap.openlids.org/findNearbyWikipedia
?lat=37.416&lng=-122.152 Parameters
#point Input Identifier
Output
! Resolving the URI yields
Relation
RDF: Input
@prefix dbp: <http://dbpedia.org/resource/> .
@prefix : <http://geo..Wiki?
lat=37.416&lng=-122.152#>
:point
foaf:based_near dbp:Palo_Alto
KIT – University of the State of Baden-Wuerttemberg and
%2C_California ; National Laboratory of the Helmholtz Association
foaf:based_near dbp:Packard%27s_garage .
34. 2. LIDS Descriptions
! LIDS characterised by
! Endpoint URI ep, which is the base for all input entities
! Local identifier i of input entity
! List of parameters Xi
! Basic graph pattern Ti describing conditions on parameters
! Basic graph pattern To describing minimum output data
! Example:
ep = <http:/geowrap.openlids.org/findNearbyWikipedia>
i = point
Xi = {?lat, ?lng}
Ti = ?point a Point . ?point geo:lat ?lat .
?point geo:long ?lng
To = ?point foaf:based_near ?feature
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
35. Interlink LIDS and Linked Data
! Generate service URIs
with input bindings,
from evaluating :
select Xi where Ti
! sameAs: binding for i
36. Scale-Up Experiment: Link BTC to GeoNames
! 3 billion triples from the Billion Triple Challenge (BTC) 2010
data set:
! Annotate with LIDS wrapper of GeoNames findNearby
service
! Annotation time: < 12 hours on laptop!
! ~ 12 hours for uncompressing the data set, cleaning
results, and gather statistics
! Original BTC data: 74 different domains that linked to
GeoNames URIs
! Interlinking process added 891 new now linked to LIDS
geowrap
! In total 2,448,160 new links were added
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
37. Query Answering using LIDS and Linked Data
! Query execution
resolves URIs
! => enlarges data set
! LIDS are interlinked
! Query is executed
again on new data set
! Repeat until no new
links or no new data
! Combine results
38. Experiment: Query Answering
! Input:
List of 562 (potential) universities from Facebook Graph
API
! Output:
Facebook fans and DBpedia student numbers for 104
universities
! PREFIX u: <http://openlids.org/
universities.rdf#> SELECT ?n ?f ?s WHERE {
u:list foaf:topic ?u . ?u foaf:name ?
n .
?u og:fan_count ?f .?u
d:numberOfStudents ?s }
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
39. Linked Services and PlanetData
! Several areas seem likely to produce services:
! Stream, inc. Sensor, resources (latest values)
! Any others exposing dynamic resources
! Dynamic computations, inc. on-the-fly quality
assessments
! Other areas seem likely to consider service
technologies and move towards more service-like
HTTP interactions
! Access control (OpenID, OAuth, etc.)
! Finally, remaining areas could serve to complement
LIDS/LOS alignment
! Provenance
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association