The slides of my talk at INSIGHT Centre for Data Analytics (in NUI Galway) where I presented TripleWave (http://streamreasoning.github.io/TripleWave/), an open-source framework to create and publish streams of RDF data.
This document provides an overview of RDF stream processing and existing RDF stream processing engines. It discusses RDF streams and how sensor data can be represented as RDF streams. It also summarizes some existing RDF stream processing query languages and systems, including C-SPARQL, and the features they support like continuous execution, operators, and time-based windows. The document is intended as a tutorial for developers on working with RDF stream processing.
The document provides an introduction to Prof. Dr. Sören Auer and his background in knowledge graphs. It discusses his current role as a professor and director focusing on organizing research data using knowledge graphs. It also briefly outlines some of his past roles and major scientific contributions in the areas of technology platforms, funding acquisition, and strategic projects related to knowledge graphs.
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
1) Knowledge graphs are graphs that are enriched with data over time, resulting in graphs that capture more detail and context about real world entities and their relationships. This allows the information in the graph to be meaningfully searched.
2) In Neo4j, knowledge graphs are built by connecting diverse data across an enterprise using nodes, relationships, and properties. Tools like natural language processing and graph algorithms further enrich the data.
3) Cypher is Neo4j's graph query language that allows users to search for graph patterns and return relevant data and paths. This reveals why certain information was returned based on the context and structure of the knowledge graph.
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingSemantic Web Company
See how ontologies and taxonomies can play together to reach the ultimate goal, which is the cost-efficient creation and maintenance of an enterprise knowledge graph. The knowledge modelling methodology is supported by approaches taken from NLP, data science, and machine learning.
Transforming BT’s Infrastructure Management with Graph TechnologyNeo4j
Join us for this 45-minute discussion on network digital twins and how BT is transforming its infrastructure management with graph technology and Neo4j.
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Neo4j
This document discusses how knowledge graphs and graph data science can provide more context and better predictions than traditional data approaches. It describes how knowledge graphs can represent rich, complex data involving entities with various relationship types. Graph algorithms and machine learning techniques can be applied to knowledge graphs to identify patterns, anomalies, and trends in connected data. This additional context from modeling data as a graph versus separate entities can help answer important questions about what is important, unusual, or likely to happen next.
Visualizing data tells compelling stories that increase research impact. It is important to know the audience and find the key story or message in the data. The type of visualization should be chosen based on the data, goals, and audience. Effective use of color, choosing the right visualization type, and understanding visual literacy principles are important for communicating with visualizations.
This document provides an overview of RDF stream processing and existing RDF stream processing engines. It discusses RDF streams and how sensor data can be represented as RDF streams. It also summarizes some existing RDF stream processing query languages and systems, including C-SPARQL, and the features they support like continuous execution, operators, and time-based windows. The document is intended as a tutorial for developers on working with RDF stream processing.
The document provides an introduction to Prof. Dr. Sören Auer and his background in knowledge graphs. It discusses his current role as a professor and director focusing on organizing research data using knowledge graphs. It also briefly outlines some of his past roles and major scientific contributions in the areas of technology platforms, funding acquisition, and strategic projects related to knowledge graphs.
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
1) Knowledge graphs are graphs that are enriched with data over time, resulting in graphs that capture more detail and context about real world entities and their relationships. This allows the information in the graph to be meaningfully searched.
2) In Neo4j, knowledge graphs are built by connecting diverse data across an enterprise using nodes, relationships, and properties. Tools like natural language processing and graph algorithms further enrich the data.
3) Cypher is Neo4j's graph query language that allows users to search for graph patterns and return relevant data and paths. This reveals why certain information was returned based on the context and structure of the knowledge graph.
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingSemantic Web Company
See how ontologies and taxonomies can play together to reach the ultimate goal, which is the cost-efficient creation and maintenance of an enterprise knowledge graph. The knowledge modelling methodology is supported by approaches taken from NLP, data science, and machine learning.
Transforming BT’s Infrastructure Management with Graph TechnologyNeo4j
Join us for this 45-minute discussion on network digital twins and how BT is transforming its infrastructure management with graph technology and Neo4j.
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Neo4j
This document discusses how knowledge graphs and graph data science can provide more context and better predictions than traditional data approaches. It describes how knowledge graphs can represent rich, complex data involving entities with various relationship types. Graph algorithms and machine learning techniques can be applied to knowledge graphs to identify patterns, anomalies, and trends in connected data. This additional context from modeling data as a graph versus separate entities can help answer important questions about what is important, unusual, or likely to happen next.
Visualizing data tells compelling stories that increase research impact. It is important to know the audience and find the key story or message in the data. The type of visualization should be chosen based on the data, goals, and audience. Effective use of color, choosing the right visualization type, and understanding visual literacy principles are important for communicating with visualizations.
Graph Database Meetup in Seoul #1. What is Graph Database? (그래프 데이터베이스 소개)bitnineglobal
Graph Database Meetup in Seoul #1. What is Graph Database?
국내 유일 그래프 데이터베이스 연구 개발 전문 기업, <비트나인> 주최로 진행된
그래프 데이터베이스 밋업(Meetup) "그래프 데이터베이스 기본 개념 소개" 입니다.
그래프 데이터베이스의 기본 개념 및 특징, 활용 분야 등에 대해 간략하게 소개하였으며, 추후 진행되는 밋업에서 좀 더 자세한 실제 활용 사례 등을 소개드릴 예정입니다.
밋업 관련 정보는, https://www.meetup.com/ko-KR/graphdatabase/
관련 문의는 hnkim@bitnine.net으로 부탁드립니다.
https://bitnine.net/ 에서 그래프 데이터베이스 솔루션 AgensGraph를 직접 다운로드 하시어 사용해 보실 수 있습니다. :)
This document provides an overview of various methods for anonymizing data. It discusses three main types of anonymization: 1) masking identifiers in unstructured data, 2) privacy-preserving data analysis for interactive scenarios, and 3) transforming structured data for non-interactive scenarios. For each type, it provides examples of relevant methods and implementations. It also discusses important considerations like balancing data utility and privacy, and the relationships between different aspects of anonymization like use cases, data types, privacy models, transformation models, and more.
The document discusses the role of data lakes in healthcare. It defines a data lake as a system that holds large amounts of raw data from various sources in its original format to enable analysis. Data lakes allow healthcare organizations to gain insights from patient outcomes, fraud detection, clinical trials, and more. Examples of potential use cases in healthcare include genomic analytics, improving clinical trials, predictive healthcare costs, creating a 360-degree view of patients, identifying billing opportunities from unstructured text, and psychographic prescriptive modeling. The document outlines best practices for assessing the need for a data lake, planning, implementing, and governing a data lake project in a healthcare organization.
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22GiacomoBalloccu
This document provides an overview of an upcoming tutorial on explainable recommender systems with knowledge graphs. The tutorial will include two sessions - an introductory session on explainable recommendation principles and modeling approaches, and a hands-on session using Jupyter notebooks to build and evaluate recommendation models using knowledge graphs. Attendees will learn about explainable recommendation methods, loading and preprocessing interaction datasets with knowledge graphs, building recommendation models with knowledge graphs, and evaluating and generating explanations from models. The tutorial aims to help attendees understand explainable recommender systems and apply techniques using knowledge graphs.
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
This document discusses principles for building a healthy data platform, including:
1. Establishing explicit contracts between teams to define dependencies and service level agreements.
2. Abstracting the data platform into services for ingesting, storing, and processing data in motion and at rest.
3. Enabling observability of data pipelines through metadata collection and integration with tools like Marquez to provide lineage, availability, and change management visibility.
This document introduces databases and database management systems (DBMS). It defines key terms like data, database, and DBMS. It describes typical DBMS functionality including defining and constructing databases, and allowing querying, updating, and concurrent access. Example database applications are given ranging from traditional to more recent ones like multimedia and geographic databases. Main characteristics of the database approach are outlined. Database users are categorized and advantages of the database approach are summarized.
This document covers guidelines around achieving multitenancy in a data lake environment. It mentions the different design and implementation guidelines necessary for on premise as well as cloud-based multitenant data lake, and highlights the reference architecture for both these deployment options.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
The document describes the RAG (Retrieval-Augmented Generation) model for knowledge-intensive NLP tasks. RAG combines a pre-trained language generator (BART) with a dense passage retriever (DPR) to retrieve and incorporate relevant knowledge from Wikipedia. RAG achieves state-of-the-art results on open-domain question answering, abstractive question answering, and fact verification by leveraging both parametric knowledge from the generator and non-parametric knowledge retrieved from Wikipedia. The retrieved knowledge can also be updated without retraining the model.
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
NOTE THAT I HAVE MOVED AWAY FROM SLIDESHARE TO ZENODO
The identical presentation is now here:
https://doi.org/10.5281/zenodo.7778641
General introduction to LinkML, The Linked Data Modeling Language.
Adapter from presentation given to NIH May 2022
https://linkml.io/linkml
This document provides an overview of the RDF data model. It discusses the history and development of RDF standards from 1997 to 2014. It explains that an RDF graph is made up of triples consisting of a subject, predicate, and object. It provides examples of RDF triples and their N-triples representation. It also describes RDF syntaxes like Turtle and features of RDF like literals, blank nodes, and language-tagged strings.
Towards an Open Research Knowledge GraphSören Auer
The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.
This document discusses using graphs and graph databases for machine learning. It provides an overview of graph analytics algorithms that can be used to solve problems with graph data, including recommendations, fraud detection, and network analysis. It also discusses using graph embeddings and graph neural networks for tasks like node classification and link prediction. Finally, it discusses how graphs can be used for machine learning infrastructure and metadata tasks like data provenance, audit trails, and privacy.
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
I discuss a good big data architecture which includes Data Warehouse / Business Intelligence + Apache Hadoop + Real Time / Stream Processing. Several real world example are shown. TIBCO offers some very nice products for realizing these use cases, e.g. Spotfire (Business Intelligence / BI), StreamBase (Stream Processing), BusinessEvents (Complex Event Processing / CEP) and BusinessWorks (Integration / ESB). TIBCO is also ready for Hadoop by offering connectors and plugins for many important Hadoop frameworks / interfaces such as HDFS, Pig, Hive, Impala, Apache Flume and more.
OLAP on the Cloud with Azure Databricks and Azure SynapseAtScale
This presentation was part of the 2020 Global Summer Azure Data Fest. It explains how Cloud OLAP helps you to analyze large amounts of data on Azure Databricks, Azure Synapse and other data platforms without moving it. And, shows how to leverage AtScale’s Cloud OLAP perform multidimensional analysis – and derive business insights – on data sets from multiple providers – with no data prep or data engineering required.
Using an employee knowledge graph for employee engagement and career mobilityNeo4j
Learn what a knowledge graph is and how it plays a salient role in enterprises and how to apply knowledge graphs for various business use cases across the data spectrum – from management to analytics and machine learning.
This training module introduces Resource Description Framework (RDF) for describing data, including representing data as triples, graphs and syntax; it also introduces the SPARQL query language for querying and manipulating RDF data, covering SELECT, CONSTRUCT, DESCRIBE, and ASK query types and the structure of SPARQL queries. The module provides learning objectives and an overview of the content which includes an introduction to RDF and SPARQL with examples and pointers to further resources.
Ontology quality, ontology design patterns, and competency questionsNicola Guarino
This document discusses ontology quality and the role of ontology design patterns (ODPs) in improving quality. It addresses three dimensions of ontology quality: correctness, precision, and accuracy. While ODPs aim to improve reusability, their simplicity may decrease interoperability if connections between patterns are overlooked. The original intent of competency questions was for more complex queries than simple lookups. Properly defining terms and examples/counter-examples for a target community helps improve an ontology's quality.
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
Presentation at Presto Conference Tokyo 2019
- Arm Treasure Data
- Plazma DB Indexes
- Real-time, Archive Storages
- Schema-on-read data processing
- Physical partition maintenance via presto-stella plugin
This document summarizes Jean-Paul Calbimonte's presentation on connecting stream reasoners on the web. It discusses representing data streams as RDF and using RDF stream processing systems. Key points include:
- RDF streams can be represented as sequences of timestamped RDF graphs.
- The W3C RSP community group is working to standardize RDF stream models and query languages.
- Producing RDF streams involves mapping live data sources to RDF and adding timestamps.
- Consuming RDF streams involves discovering stream metadata and endpoints to access the streams.
- Systems like TripleWave demonstrate approaches for spreading RDF streams on the web.
Presentation on RDF Stream Processing models given at the SR4LD tutorial (ISWC 2013) -- updated version at: http://www.slideshare.net/dellaglio/rsp2014-01rspmodelsss
Graph Database Meetup in Seoul #1. What is Graph Database? (그래프 데이터베이스 소개)bitnineglobal
Graph Database Meetup in Seoul #1. What is Graph Database?
국내 유일 그래프 데이터베이스 연구 개발 전문 기업, <비트나인> 주최로 진행된
그래프 데이터베이스 밋업(Meetup) "그래프 데이터베이스 기본 개념 소개" 입니다.
그래프 데이터베이스의 기본 개념 및 특징, 활용 분야 등에 대해 간략하게 소개하였으며, 추후 진행되는 밋업에서 좀 더 자세한 실제 활용 사례 등을 소개드릴 예정입니다.
밋업 관련 정보는, https://www.meetup.com/ko-KR/graphdatabase/
관련 문의는 hnkim@bitnine.net으로 부탁드립니다.
https://bitnine.net/ 에서 그래프 데이터베이스 솔루션 AgensGraph를 직접 다운로드 하시어 사용해 보실 수 있습니다. :)
This document provides an overview of various methods for anonymizing data. It discusses three main types of anonymization: 1) masking identifiers in unstructured data, 2) privacy-preserving data analysis for interactive scenarios, and 3) transforming structured data for non-interactive scenarios. For each type, it provides examples of relevant methods and implementations. It also discusses important considerations like balancing data utility and privacy, and the relationships between different aspects of anonymization like use cases, data types, privacy models, transformation models, and more.
The document discusses the role of data lakes in healthcare. It defines a data lake as a system that holds large amounts of raw data from various sources in its original format to enable analysis. Data lakes allow healthcare organizations to gain insights from patient outcomes, fraud detection, clinical trials, and more. Examples of potential use cases in healthcare include genomic analytics, improving clinical trials, predictive healthcare costs, creating a 360-degree view of patients, identifying billing opportunities from unstructured text, and psychographic prescriptive modeling. The document outlines best practices for assessing the need for a data lake, planning, implementing, and governing a data lake project in a healthcare organization.
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22GiacomoBalloccu
This document provides an overview of an upcoming tutorial on explainable recommender systems with knowledge graphs. The tutorial will include two sessions - an introductory session on explainable recommendation principles and modeling approaches, and a hands-on session using Jupyter notebooks to build and evaluate recommendation models using knowledge graphs. Attendees will learn about explainable recommendation methods, loading and preprocessing interaction datasets with knowledge graphs, building recommendation models with knowledge graphs, and evaluating and generating explanations from models. The tutorial aims to help attendees understand explainable recommender systems and apply techniques using knowledge graphs.
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
This document discusses principles for building a healthy data platform, including:
1. Establishing explicit contracts between teams to define dependencies and service level agreements.
2. Abstracting the data platform into services for ingesting, storing, and processing data in motion and at rest.
3. Enabling observability of data pipelines through metadata collection and integration with tools like Marquez to provide lineage, availability, and change management visibility.
This document introduces databases and database management systems (DBMS). It defines key terms like data, database, and DBMS. It describes typical DBMS functionality including defining and constructing databases, and allowing querying, updating, and concurrent access. Example database applications are given ranging from traditional to more recent ones like multimedia and geographic databases. Main characteristics of the database approach are outlined. Database users are categorized and advantages of the database approach are summarized.
This document covers guidelines around achieving multitenancy in a data lake environment. It mentions the different design and implementation guidelines necessary for on premise as well as cloud-based multitenant data lake, and highlights the reference architecture for both these deployment options.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
The document describes the RAG (Retrieval-Augmented Generation) model for knowledge-intensive NLP tasks. RAG combines a pre-trained language generator (BART) with a dense passage retriever (DPR) to retrieve and incorporate relevant knowledge from Wikipedia. RAG achieves state-of-the-art results on open-domain question answering, abstractive question answering, and fact verification by leveraging both parametric knowledge from the generator and non-parametric knowledge retrieved from Wikipedia. The retrieved knowledge can also be updated without retraining the model.
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
NOTE THAT I HAVE MOVED AWAY FROM SLIDESHARE TO ZENODO
The identical presentation is now here:
https://doi.org/10.5281/zenodo.7778641
General introduction to LinkML, The Linked Data Modeling Language.
Adapter from presentation given to NIH May 2022
https://linkml.io/linkml
This document provides an overview of the RDF data model. It discusses the history and development of RDF standards from 1997 to 2014. It explains that an RDF graph is made up of triples consisting of a subject, predicate, and object. It provides examples of RDF triples and their N-triples representation. It also describes RDF syntaxes like Turtle and features of RDF like literals, blank nodes, and language-tagged strings.
Towards an Open Research Knowledge GraphSören Auer
The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.
This document discusses using graphs and graph databases for machine learning. It provides an overview of graph analytics algorithms that can be used to solve problems with graph data, including recommendations, fraud detection, and network analysis. It also discusses using graph embeddings and graph neural networks for tasks like node classification and link prediction. Finally, it discusses how graphs can be used for machine learning infrastructure and metadata tasks like data provenance, audit trails, and privacy.
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
I discuss a good big data architecture which includes Data Warehouse / Business Intelligence + Apache Hadoop + Real Time / Stream Processing. Several real world example are shown. TIBCO offers some very nice products for realizing these use cases, e.g. Spotfire (Business Intelligence / BI), StreamBase (Stream Processing), BusinessEvents (Complex Event Processing / CEP) and BusinessWorks (Integration / ESB). TIBCO is also ready for Hadoop by offering connectors and plugins for many important Hadoop frameworks / interfaces such as HDFS, Pig, Hive, Impala, Apache Flume and more.
OLAP on the Cloud with Azure Databricks and Azure SynapseAtScale
This presentation was part of the 2020 Global Summer Azure Data Fest. It explains how Cloud OLAP helps you to analyze large amounts of data on Azure Databricks, Azure Synapse and other data platforms without moving it. And, shows how to leverage AtScale’s Cloud OLAP perform multidimensional analysis – and derive business insights – on data sets from multiple providers – with no data prep or data engineering required.
Using an employee knowledge graph for employee engagement and career mobilityNeo4j
Learn what a knowledge graph is and how it plays a salient role in enterprises and how to apply knowledge graphs for various business use cases across the data spectrum – from management to analytics and machine learning.
This training module introduces Resource Description Framework (RDF) for describing data, including representing data as triples, graphs and syntax; it also introduces the SPARQL query language for querying and manipulating RDF data, covering SELECT, CONSTRUCT, DESCRIBE, and ASK query types and the structure of SPARQL queries. The module provides learning objectives and an overview of the content which includes an introduction to RDF and SPARQL with examples and pointers to further resources.
Ontology quality, ontology design patterns, and competency questionsNicola Guarino
This document discusses ontology quality and the role of ontology design patterns (ODPs) in improving quality. It addresses three dimensions of ontology quality: correctness, precision, and accuracy. While ODPs aim to improve reusability, their simplicity may decrease interoperability if connections between patterns are overlooked. The original intent of competency questions was for more complex queries than simple lookups. Properly defining terms and examples/counter-examples for a target community helps improve an ontology's quality.
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
Presentation at Presto Conference Tokyo 2019
- Arm Treasure Data
- Plazma DB Indexes
- Real-time, Archive Storages
- Schema-on-read data processing
- Physical partition maintenance via presto-stella plugin
This document summarizes Jean-Paul Calbimonte's presentation on connecting stream reasoners on the web. It discusses representing data streams as RDF and using RDF stream processing systems. Key points include:
- RDF streams can be represented as sequences of timestamped RDF graphs.
- The W3C RSP community group is working to standardize RDF stream models and query languages.
- Producing RDF streams involves mapping live data sources to RDF and adding timestamps.
- Consuming RDF streams involves discovering stream metadata and endpoints to access the streams.
- Systems like TripleWave demonstrate approaches for spreading RDF streams on the web.
Presentation on RDF Stream Processing models given at the SR4LD tutorial (ISWC 2013) -- updated version at: http://www.slideshare.net/dellaglio/rsp2014-01rspmodelsss
Graph analytics can be used to analyze a social graph constructed from email messages on the Spark user mailing list. Key metrics like PageRank, in-degrees, and strongly connected components can be computed using the GraphX API in Spark. For example, PageRank was computed on the 4Q2014 email graph, identifying the top contributors to the mailing list.
Linked Open Data is the most usable kind of Open Data. An example of a well integrated source of Linked Open Data on tourism and mobility is the Open Data Hub operated by NOI. We will use the SPARQL querying language, a W3C standard, to query the data and show how this differs from other access methods. The tour will start by querying the end point directly from the command line with tools, like curl. Then, one by one, well known data science software packages. like R and Pandas, will be used to directly work with these datasets, to perform statistical calculations and generating graphs from data.
In the final part, these software packages will be used to query data from other well known data sources, like Wikidata and DBpedia.
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
The document provides an overview of Graph Analytics in Spark. It discusses Spark components and key distinctions from MapReduce. It also covers GraphX terminology and examples of composing node and edge RDDs into a graph. The document provides examples of simple traversals and routing problems on graphs. It discusses using GraphX for topic modeling with LDA and provides further reading resources on GraphX, algebraic graph theory, and graph analysis tools and frameworks.
Sustainable queryable access to Linked DataRuben Verborgh
This document discusses sustainable queryable access to Linked Data through the use of Triple Pattern Fragments (TPF). TPFs provide a low-cost interface that allows clients to query datasets through triple patterns. Intelligent clients can execute SPARQL queries over TPFs by breaking queries into triple patterns and aggregating the results. TPFs also enable federated querying across multiple datasets by treating them uniformly as fragments that can be retrieved. The document demonstrates federated querying over DBpedia, VIAF, and Harvard Library datasets using TPF interfaces.
TripleWave: Spreading RDF Streams on the WebAndrea Mauri
TripleWave is an open-source framework for creating and publishing RDF streams over the Web. It converts various data sources like temporal RDF datasets and web streams into RDF streams. TripleWave makes these streams available via standard protocols and allows consuming applications to access the streams through pull via Linked Data principles or push using RSP services. The framework is implemented in NodeJS and available on GitHub to help spread the use of RDF streams on the semantic web.
This document discusses RDF stream processing and the role of semantics. It begins by outlining common sources of streaming data on the internet of things. It then discusses challenges of querying streaming data and existing approaches like CQL. Existing RDF stream processing systems are classified based on their query capabilities and use of time windows and reasoning. The role of linked data principles and HTTP URIs for representing streaming sensor data is discussed. Finally, requirements for reactive stream processing systems are outlined, including keeping data moving, integrating stored and streaming data, and responding instantaneously. The document argues that building relevant RDF stream processing systems requires going beyond existing requirements to address data heterogeneity, stream reasoning, and optimization.
This document provides an overview of the Semantic Web, RDF, SPARQL, and triplestores. It discusses how RDF structures and links data using subject-predicate-object triples. SPARQL is introduced as a standard query language for retrieving and manipulating data stored in RDF format. Popular triplestore implementations like Apache Jena and applications of linked data like DBPedia are also summarized.
The document discusses Semantic Web technologies including RDF, SPARQL and ontologies. It provides:
1) An introduction to the Semantic Web vision of machines being able to understand and respond to complex requests based on meaning. This requires information to be semantically structured.
2) A brief overview of key concepts in RDF including triples, nodes, blank nodes, and predefined RDF structures like bags and lists.
3) An explanation of the SPARQL query language, which is similar to SQL but interrogates the Semantic Web. SPARQL clauses like SELECT, CONSTRUCT, DESCRIBE and ASK are covered.
4) A discussion of ontological representations including R
This document discusses using Linked Data Notifications (LDN) for RDF data streams. It proposes modeling RDF streams as identified Web resources with input and output endpoints. Streams can be discovered and their endpoints retrieved. Data can be sent to input endpoints and retrieved from output endpoints. Queries can be registered against streams to generate output streams. The approach uses existing standards like LDP and aims to provide a simple, generic protocol for decentralized communication between heterogeneous RDF stream processors and consumers.
July Clojure Users Group Meeting: "Using Cascalog with Palo Alto Open Data"Paco Nathan
Cascading is an open source data workflow framework that allows programmers to define data pipelines and complex multi-step workflows using functional programming concepts. It originated from the need to leverage Hadoop and big data technologies using languages like Java that developers were already familiar with. Cascading integrates with various data sources and targets and can be used with languages like Java, Clojure, and Scala to define declarative workflows at scale.
The document provides an overview of the semantic web including its goals of making data meaningful and discoverable. It discusses approaches to building the semantic web such as RDF, RDFS, OWL, and SPARQL. It also covers microformats as a more practical approach and provides examples of using RDF, OWL, SPARQL, and various microformats.
This document provides an overview of stream reasoning and discusses Riccardo Tommasini's master's thesis. It outlines Tommasini's background, research interests, and research group in stream reasoning. It also presents an example of how stream reasoning can be used to query streaming sensor data to determine who is in what room. Tommasini's goal is to enable systematic comparative evaluation of RSP engines using standardized queries, datasets, and metrics within a test stand framework.
Culture Geeks Feb talk: Adventures in Linked Data Landval.cartei
Culture Geeks talk: "Adventures in Linked Data Land", by Richard Light.
Feb, 25th 2009 - Regency Town House
Culture Geeks is a Brighton-based community open to everyone who is
interested in using digital technologies in the cultural sector.
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...Chetan Khatri
This document summarizes a presentation about scaling terabytes of data with Apache Spark and Scala. The key points are:
1) The presenter discusses how to use Apache Spark and Scala to process large scale data in a distributed manner across clusters. Spark operations like RDDs, DataFrames and Datasets are covered.
2) A case study is presented about reengineering a data processing platform for a retail business to improve performance. Changes included parallelizing jobs, tuning Spark hyperparameters, and building a fast data architecture using Spark, Kafka and data lakes.
3) Performance was improved through techniques like dynamic resource allocation in YARN, reducing memory and cores per executor to better utilize cluster resources, and processing data
The document discusses requirements and approaches for RDF stream processing (RSP). It covers the following key points in 3 sentences:
RSP aims to process continuous RDF streams to address scenarios like sensor data and social media. It involves querying streaming data, integrating streams with static data, and handling issues like imperfections. The document reviews existing RSP systems and languages, actor-based approaches, and the 8 requirements for real-time stream processing including keeping data moving, generating predictable outcomes, and responding instantaneously.
The document discusses the Semantic Web, including its languages (RDF, RDFS, OWL, Turtle), storage and querying (SPARQL), integration (RDFa, GRDDL), browsing and visualization (Fresnel lenses), and mashups (Yahoo Pipes). It also covers reasoning with OWL and potential applications like legal assessment. While the technologies exist, broader adoption on the web is still developing.
Transient and persistent RDF views over relational databases in the context o...Nikolaos Konstantinou
As far as digital repositories are concerned, numerous benefits emerge from the disposal of their contents as Linked Open Data (LOD). This leads more and more repositories towards this direction. However, several factors need to be taken into account in doing so, among which is whether the transition needs to be materialized in real-time or in asynchronous time intervals. In this paper we provide the problem framework in the context of digital repositories, we discuss the benefits and drawbacks of both approaches and draw our conclusions after evaluating a set of performance measurements. Overall, we argue that in contexts with infrequent data updates, as is the case with digital repositories, persistent RDF views are more efficient than real-time SPARQL-to-SQL rewriting systems in terms of query response times, especially when expensive SQL queries are involved.
The document discusses the need for a W3C community group on RDF stream processing. It notes there is currently heterogeneity in RDF stream models, query languages, implementations, and operational semantics. The speaker proposes creating a W3C community group to better understand these differences, requirements, and potentially develop recommendations. The group's mission would be to define common models for producing, transmitting, and continuously querying RDF streams. The presentation provides examples of use cases and outlines a template for describing them to collect more cases to understand requirements.
Similar to Triplewave: a step towards RDF Stream Processing on the Web (20)
This document discusses methods for distributed stream consistency checking against a conceptual model. It presents the problem of ensuring streaming data complies with an ontology model while dealing with noise and large volumes. Two methods - NTM and LN - are proposed and evaluated. The LN method models the negative inclusion axioms in the ontology as a pipeline of bolts, reducing the load on individual bolts compared to NTM and improving performance up to 300%. Future work is discussed around more expressive languages, inconsistency repair, and implementation on other stream processing engines.
The presentation I gave at Linköping University about web stream processing. I discuss two problems: (i) exchanging data streams on the web, and (ii) combining streams and contextual quasi-static data on the web
The talk I gave at the Stream Reasoning workshop in TU Berlin on December 8. I give an overview of RSEP-QL and how it can capture and formalise the behaviour of existing RSP engines, e.g. CSPARQL, EP-SPARQL, CQELS, SPARQLstream
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...Daniele Dell'Aglio
This document proposes a query model called RSEP-QL to capture event pattern matching in RDF stream processing languages. It presents RSEP-QL's data model of RDF streams and windows, basic operators like EVENT and SEQ, and evaluation semantics. The goal is to provide a reference model for comparing different RSP query languages and studying related problems in a standardized way.
Brief report about the contents of the Stream Reasoning workshop at SIWC 2016. Additional info about the event are available at: http://streamreasoning.org/events/sr2016
On Unified Stream Reasoning - The RDF Stream Processing realmDaniele Dell'Aglio
The presentation of my talk at WU Vienna on 18/2/2016. I discuss the problem of unifying existing solutions to process semantic streams - with a particular focus on the ones that perform continuous query answering over RDF streams
XSPARQL is a query language that allows querying of both XML and RDF data sources simultaneously. It extends the syntax of XQuery with a SPARQL-for clause to query RDF data and a CONSTRUCT clause to produce RDF output. XSPARQL 1.1 supports SPARQL 1.1 operators like aggregation, federation, negation and property paths. It also allows processing of JSON files. The XSPARQL evaluator takes an XSPARQL query, rewrites it, optimizes it, and executes it using XQuery and SPARQL engines to retrieve and combine data from different sources into a unified XML or RDF answer.
Augmented Participation to Live Events through Social Network Content Enrichm...Daniele Dell'Aglio
The document describes ECSTASYS, a system that captures social media content related to live events and enriches it to provide more context and value for event attendees. ECSTASYS retrieves tweets about an event, filters irrelevant ones, identifies event-related entities, associates tweets with specific event sub-topics, and visualizes the information organized by event. It uses a knowledge base derived from event schedules and ontologies to link tweets to the correct event components to provide a more holistic view of the complex live event through social media.
This document discusses an empirical study of RDF stream processing systems. The study aimed to understand why different systems can produce different outputs for the same inputs. Through experiments, the study found that differences could be explained by parameters like the starting time (t0) of windows in continuous queries. A more detailed model called SECRET was then developed to describe stream processing and help predict system outputs. This led to the CSR-bench benchmark for evaluating and comparing RDF stream reasoning systems.
The document provides an overview of RDF stream processing, including:
- Extending the RDF data model to represent RDF streams and associate application times to data items
- Modeling continuous query evaluation over RDF streams using the CQL/STREAM model of mapping streams to relations and using sliding windows
- How existing systems extend CQL with operators for mapping between RDF streams and relations and for evaluating continuous SPARQL queries over windows of streaming RDF data.
This document presents a survey of temporal extensions of description logics (DLs) conducted by Daniele Dell'Aglio, Fariz Darari and Davide Lanti. It begins with an overview and outline of the topics that will be covered, including a running example to model how to become a doctor. The paper then surveys existing solutions for extending DLs with temporal aspects, including state-change based DLs, temporal DLs with an internal approach, point-based temporal DLs and interval-based temporal DLs. It concludes with a discussion of current hot topics and future directions for research on temporal extensions of DLs.
IMaRS - Incremental Materialization for RDF Streams (SR4LD2013)Daniele Dell'Aglio
This document discusses incremental materialization for RDF streams (IMaRS). IMaRS is an approach for incremental reasoning over sliding windows of RDF streams. It avoids recomputing the entire materialization when the window slides by tracking expiration times and computing only the changes (additions and removals) needed for the new materialization. The maintenance is done through execution of a logic program that uses contexts to build the delta sets for updating the materialization incrementally as new data enters the window.
Ontology based top-k query answering over massive, heterogeneous, and dynamic...Daniele Dell'Aglio
This document discusses ontology-based top-k continuous query answering over streaming data from multiple heterogeneous sources. It aims to investigate how ontologies and top-k queries can improve continuous query processing by exploiting ordering. The research will analyze state of the art solutions, define an evaluation framework, and assess the effects on correctness and performance of techniques that integrate stream reasoning and top-k queries. Preliminary results include an extension of an RDF stream processor testbench and a case study on real-time social media analytics.
This document discusses correctness in benchmarking RDF stream processors. It proposes a common model for the operational semantics of these systems called CSR and an extension to an existing benchmark called CSR-bench that focuses on correctness. CSR-bench includes an oracle to automatically validate correctness and a test suite. Experiments with three systems showed incorrect behaviors related to window initialization, slide parameters, window contents and timestamps. The work aims to improve understanding and assessment of these systems through a shared test environment.
Maven is a build automation tool that uses conventions over configurations. It utilizes a project object model (POM) file that defines project coordinates, dependencies, plugins, and repositories. Maven projects follow a standard directory structure and use lifecycles made up of phases to execute goals like compiling, testing, packaging, and deploying. It retrieves dependencies and plugins from repositories, caching artifacts locally for reuse.
The document discusses revision control systems and their main concepts and operations. It describes how revision control allows for backup of files, sharing of work, and cooperative development. The key operations covered are checkout, commit, update, and revert. It also discusses branches, tags, and distributed version control systems.
The document discusses unit testing and the JUnit framework. It defines unit testing as testing individual units or modules of code in isolation to determine if they work as expected. JUnit is introduced as a unit testing framework for Java. Key concepts covered include test cases, test fixtures, test suites, annotations for setup and teardown like @Before and @After, and best practices for test-driven development. Examples are provided of writing test cases using JUnit to test a TreeNode class and its methods.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Triplewave: a step towards RDF Stream Processing on the Web
1. Department of Informatics
TripleWave: a step towards
RDF Stream Processing on the
Web
Daniele Dell’Aglio
dellaglio@ifi.uzh.ch http://dellaglio.org @dandellaglio
Galway, 16.12.2016
2. An (incomplete) overview of the RSP/SR research
IMaRS
STARQL
DynamiTE
TROWL
EuropeMapfromWikipedia
StreamRule
StarCITY
SPARKWAVE
LARS
Key
Daniele Dell'Aglio - TripleWave 2/28
3. Connecting RSPs on the Web
Morph
Streams
CSPARQL
Etalis
TrOWL
Stream
Rule
CQELS
How far are we?
• Working prototypes/systems
• Formal models for RSPs and reasoning
• Minimal agreements: standards, serialization, interfaces
Daniele Dell'Aglio - TripleWave 3/28
4. Looking for minimal agreements: the RSP
Community group
Research work
Many Papers
PhD Theses
Datasets
Prototypes
Benchmarks
RDF Streams
Stream Reasoning
Complex Event Processing
Stream Query Processing
Stream Compression
Semantic Sensor Web
Manytopics
Tonsofwork
http://www.w3.org/community/rsp
W3C RSP Community Group
Effort to our work on RDF stream processing
discuss
standardize
combine
formalize
evangelize
Daniele Dell'Aglio - TripleWave 4/28
6. But...
W3C RSP set some foundations and requirements, but:
• Standard protocols and exchanging mechanisms for
RDF stream are missing.
• We need generic and flexible solutions for making RDF
streams available and exchangeable on the Web.
Daniele Dell'Aglio - TripleWave 6/28
7. TripleWave
TripleWave is an open-source
framework for creating and publishing
RDF streams over the Web.
Triple
Wave
how?input?
RDF Stream
what is it?
Daniele Dell'Aglio - TripleWave 7/28
8. TripleWave’s RDF streams
TripleWave should exploit and be compatible to
existing standards and recommendations
• The data model should be compatible with the abstract
model defined by the W3C RSP CG
• The output format compatible with RDF
Daniele Dell'Aglio - TripleWave 8/28
9. TripleWave serialization format
In TripleWave, an RDF stream is an (infinite) ordered
sequence of time-annotated data items (RDF graphs)…
... serialized in JSON-LD
[{ "@graph": {
"@id": "http://.../G1",
{"@id": "http://.../a",
"http://.../isIn": {"@id":"http://.../rRoom"}}
},{ "@id": "http://.../G1",
"generatedAt":"2016-16-12T00:01:00"
}
},
{ "@graph": {
"@id": "http://.../G2",
{"@id": "http://.../b",
"http://.../isIn": {"@id":"http://.../rRoom"}}
},{ "@id": "http://.../G2",
"generatedAt":" 2016-16-12T00:03:00"
}
},…
G1
G2
G3
{:a :isIn :rRoom}
{:b :isIn :bRoom}
{:c :talksIn :rRoom,
:d :talksIn :bRoom}
S
3
5
1
t
Daniele Dell'Aglio - TripleWave 9/28
11. Spreading the RDF streams
TripleWave must be able to provide the stream to
RDF Stream Processing engines (query
processors and reasoners) through the Web.
• HTTP
• HTTP chunk
• Web sockets
• MQTT (upcoming)
Daniele Dell'Aglio - TripleWave 11/28
12. TripleWave Stream Descriptor
TripleWave must provide information about how
to access the stream
• TripleWave exposes an RDF description of the RDF
stream
• RDF Stream Descriptor (sGraph)
• It contains:
• The identifier of the stream
• Data item samples (see next slide)
• A description of the schema
• The location of the stream endpoint (e.g. WebSocket URL)
Daniele Dell'Aglio - TripleWave 12/28
16. Feeding TripleWave
TripleWave should support a variety of data
sources.
• RDF dumps with temporal information
• RDF with temporal information exposed through
SPARQL endpoints
• Streams available on the Web
Daniele Dell'Aglio - TripleWave 15/28
17. From RDF to RDF streams
Converts RDF stored in files/SPARQL endpoints
• Containing some time information
… into an RDF stream
• continuous flow of RDF data
• ordered according the original timestamps
• the time between two items is preserved
Use Cases
• Evaluation, testing and benchmarking
• Simulation systems
Daniele Dell'Aglio - TripleWave 16/28
18. From Web stream to RDF stream
Consumes an existing Web stream…
• through connectors
… and converts it into an RDF Stream
• Each data item is lifted to RDF
Use Cases
• Querying and reasoning
• Data integration
Web
Service Connector TW Core
Web Service API
Daniele Dell'Aglio - TripleWave 17/28
19. From Web stream to RDF stream
Convertion is made through R2RML
• Mappings to convert each data item in RDF
Example: map a field
{
“userUrl”:”foo”
}
rr:predicateObjectMap [
rr:predicate schema:agent;
rr:objectMap [ rr:column "userUrl"] ];
{
"https://schema.org/agent": {"@id": ”foo"},
}
Daniele Dell'Aglio - TripleWave 18/28
20. From Web stream to RDF stream
Convertion is made through R2RML
• Mappings to convert each data item in RDF
Example: map a field with template
{
“time”:”value”
}
rr:subjectMap [
rr:template ”something {time}”
{
“@id”:”something value”
}
Daniele Dell'Aglio - TripleWave 19/28
21. From Web stream to RDF stream
Convertion is made through R2RML
• Mappings to convert each data item in RDF
Example: add a new constant field
rr:predicateObjectMap
[ rr:predicate rdf:type; rr:objectMap
[ rr:constant schema:UpdateAction]];
{
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":
{"@id": "https://schema.org/UpdateAction"}
}
Daniele Dell'Aglio - TripleWave 20/28
23. Implementing TripleWave
TripleWave is a NodeJS Web Application
• NodeJS is a JavaScript runtime built on Chrome's V8 JavaScript
engine.
TripleWave is open source
• Released with a Apache 2.0 licence
• Source code available at:
https://github.com/streamreasoning/TripleWave
Daniele Dell'Aglio - TripleWave 22/28
26. Consuming TripleWave RDF Stream - Push
The TripleWave stream can be consumed via push by
extending the RSP service framework1
1https://github.com/streamreasoning/rsp-services
TripleWave
RSP-
Service
C-SPARQL
Register the stream, the query
and the observers
Connect to the
RDF stream
descriptor
Connect to the
RDF stream
endpoint Declare the stream, the query
and the observers
Inject the stream
Daniele Dell'Aglio - TripleWave 24/28
27. Show cases
Three demos have been deployed to show the capabilities
of the system.
Wikipedia changes stream conversion.
http://131.175.141.249/TripleWave-transform/sgraph
Endlessly replay as a stream the Linked Sensor Data
dataset.
http://131.175.141.249/TripleWave-endless/sgraph
Endlessly replay as a stream the LDBC social graph dataset.
http://131.175.141.249/TripleWave-ldbc/sgraph
Daniele Dell'Aglio - TripleWave 25/28
28. Find more...
• Andrea Mauri, Jean-Paul Calbimonte, Daniele Dell’Aglio, Marco
Balduini, Marco Brambilla, Emanuele Della Valle, Karl Aberer:
TripleWave: Spreading RDF Streams on the Web. Resource Paper
at International Semantic Web Conference 2016.
• Andrea Mauri, Jean-Paul Calbimonte, Daniele Dell’Aglio, Marco
Balduini, Emanuele Della Valle, Karl Aberer: Where Are the RDF
Streams?: On Deploying RDF Streams on the Web of Data with
TripleWave. Poster at International Semantic Web Conference
2015.
• A special thanks to Jean-Paul Calbimonte and Andrea Mauri for
supplying me parts of the today slides
Daniele Dell'Aglio - TripleWave 26/28
29. Conclusions
RDF streams are getting a momentum
• Several active research groups
• Prototypes, methods and applications
TripleWave shows that it is possible to exchange RDF
streams over the Web
• It uses standard technologies
• It feeds C-SPARQL (and soon CQELS)
There is a potential huge value in putting together the
results we are obtaining
Daniele Dell'Aglio - TripleWave 27/28
30. Thank you! Questions?
TripleWave: a step towards RDF Stream
Processing on the Web
http://streamreasoning.github.io/TripleWave
Daniele Dell’Aglio
dellaglio@ifi.uzh.ch
http://dellaglio.org
@dandellaglio
Daniele Dell'Aglio - TripleWave 28/28