Parallel Datalog Reasoning in RDFox PresentationDBOnto
Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors' most important findings.
Parallel Datalog Reasoning in RDFox PresentationDBOnto
Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors' most important findings.
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
The slides of my talk at INSIGHT Centre for Data Analytics (in NUI Galway) where I presented TripleWave (http://streamreasoning.github.io/TripleWave/), an open-source framework to create and publish streams of RDF data.
Apache Hivemall is a scalable machine learning library for Apache Hive, Apache Spark, and Apache Pig.
Hivemall provides a number of machine learning functionalities across classification, regression, ensemble learning, and feature engineering through UDFs/UDAFs/UDTFs of Hive.
We have released the first Apache release (v0.5.0-incubating) on Mar 5, 2018 and the project plans to release v0.5.2 in Q2, 2018.
We will first give a quick walk-through of features, usages, what's new in v0.5.0, and future roadmaps of Apache Hivemall. Next, we will introduce Hivemall on Apache Spark in depth such as DataFrame integration and Spark 2.3 supports in Hivemall.
Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
Analysis of Air Pollution in Nova Scotia PresentationCarlo Carandang
This presentation is an analysis of air pollution in Nova Scotia. We detail how we obtain the dataset, how we clean it, how we process and analyze it, and then we visualize the results of the analysis.
Accelerating NLP with Dask and Saturn CloudSujit Pal
Slides for talk delivered at NY NLP Meetup. Abstract -- Python has a great ecosystem of tools for natural language processing (NLP) pipelines, but challenges arise when data sizes and computational complexity grows. Best case, a pipeline is left to run overnight or even over several days. Worst case, certain analyses or computations are just not possible. Dask is a Python-native parallel processing tool that enables Python users to easily scale their code across a cluster of machines. This talk presents an example of an NLP entity extraction pipeline using SciSpacy with Dask for parallelization. This pipeline extracts named entities from the CORD-19 dataset, using trained models from the SciSpaCy project, and makes them available for downstream tasks in the form of structured Parquet files. The pipeline was built and executed on Saturn Cloud, a platform that makes it easy to launch and manage Dask clusters. The talk will present an introduction to Dask and explain how users can easily accelerate Python and NLP code across clusters of machines.
Full version of http://www.slideshare.net/valexiev1/gvp-lodcidocshort. Same is available on http://vladimiralexiev.github.io/pres/20140905-CIDOC-GVP/index.html
CIDOC Congress, Dresden, Germany
2014-09-05: International Terminology Working Group: full version.
2014-09-09: Getty special session: short version
Linked geospatial data has recently received attention, as researchers and practitioners have started tapping the wealth of geospatial information available on the Web. Incomplete geospatial information, although appearing often in the applications captured by such datasets, is not represented and queried properly due to the lack of appropriate data models and query languages. We discuss our recent work on the model RDFi, an extension of RDF with the ability to represent property values that exist, but are unknown or partially known, using constraints, and an extension of the query language SPARQL with qualitative and quantitative geospatial querying capabilities. We demonstrate the usefulness of RDFi in geospatial Semantic Web applications by giving examples and comparing the modeling capabilities of RDFi with the ones of related Semantic Web systems.
In this paper, we propose the problem of implementing an efficient query processing system for incomplete temporal and geospatial information in RDFi as a challenge to the SSTD community.
DGraph: Introduction To Basics & Quick Start W/RatelKnoldus Inc.
The presentation introduces you to DGraph and explains about the data types, indexes, edges, facets and type of mutation using RDF Triples or JSON. Also, take through the GQL+/- functions, filters, connectives, reverse edges, facets and complex graph queries for DGraph using GQL+/-.
The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and describes the data model used within LarKC.
... or how to query an RDF graph with 28 billion triples in a standard laptop
These slides correspond to my talk at the Stanford Center for Biomedical Informatics, on 25th April 2018
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
The slides of my talk at INSIGHT Centre for Data Analytics (in NUI Galway) where I presented TripleWave (http://streamreasoning.github.io/TripleWave/), an open-source framework to create and publish streams of RDF data.
Apache Hivemall is a scalable machine learning library for Apache Hive, Apache Spark, and Apache Pig.
Hivemall provides a number of machine learning functionalities across classification, regression, ensemble learning, and feature engineering through UDFs/UDAFs/UDTFs of Hive.
We have released the first Apache release (v0.5.0-incubating) on Mar 5, 2018 and the project plans to release v0.5.2 in Q2, 2018.
We will first give a quick walk-through of features, usages, what's new in v0.5.0, and future roadmaps of Apache Hivemall. Next, we will introduce Hivemall on Apache Spark in depth such as DataFrame integration and Spark 2.3 supports in Hivemall.
Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
Analysis of Air Pollution in Nova Scotia PresentationCarlo Carandang
This presentation is an analysis of air pollution in Nova Scotia. We detail how we obtain the dataset, how we clean it, how we process and analyze it, and then we visualize the results of the analysis.
Accelerating NLP with Dask and Saturn CloudSujit Pal
Slides for talk delivered at NY NLP Meetup. Abstract -- Python has a great ecosystem of tools for natural language processing (NLP) pipelines, but challenges arise when data sizes and computational complexity grows. Best case, a pipeline is left to run overnight or even over several days. Worst case, certain analyses or computations are just not possible. Dask is a Python-native parallel processing tool that enables Python users to easily scale their code across a cluster of machines. This talk presents an example of an NLP entity extraction pipeline using SciSpacy with Dask for parallelization. This pipeline extracts named entities from the CORD-19 dataset, using trained models from the SciSpaCy project, and makes them available for downstream tasks in the form of structured Parquet files. The pipeline was built and executed on Saturn Cloud, a platform that makes it easy to launch and manage Dask clusters. The talk will present an introduction to Dask and explain how users can easily accelerate Python and NLP code across clusters of machines.
Full version of http://www.slideshare.net/valexiev1/gvp-lodcidocshort. Same is available on http://vladimiralexiev.github.io/pres/20140905-CIDOC-GVP/index.html
CIDOC Congress, Dresden, Germany
2014-09-05: International Terminology Working Group: full version.
2014-09-09: Getty special session: short version
Linked geospatial data has recently received attention, as researchers and practitioners have started tapping the wealth of geospatial information available on the Web. Incomplete geospatial information, although appearing often in the applications captured by such datasets, is not represented and queried properly due to the lack of appropriate data models and query languages. We discuss our recent work on the model RDFi, an extension of RDF with the ability to represent property values that exist, but are unknown or partially known, using constraints, and an extension of the query language SPARQL with qualitative and quantitative geospatial querying capabilities. We demonstrate the usefulness of RDFi in geospatial Semantic Web applications by giving examples and comparing the modeling capabilities of RDFi with the ones of related Semantic Web systems.
In this paper, we propose the problem of implementing an efficient query processing system for incomplete temporal and geospatial information in RDFi as a challenge to the SSTD community.
DGraph: Introduction To Basics & Quick Start W/RatelKnoldus Inc.
The presentation introduces you to DGraph and explains about the data types, indexes, edges, facets and type of mutation using RDF Triples or JSON. Also, take through the GQL+/- functions, filters, connectives, reverse edges, facets and complex graph queries for DGraph using GQL+/-.
The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and describes the data model used within LarKC.
... or how to query an RDF graph with 28 billion triples in a standard laptop
These slides correspond to my talk at the Stanford Center for Biomedical Informatics, on 25th April 2018
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
Abstract:
We present a novel approach to parallel materialisation (i.e.,
fixpoint computation) of datalog programs in centralised,
main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13.9 times faster than with just one core.
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012taxonbytes
Presentation on reconciling taxonomic concepts using the Euler approach, given at the 2012 Annual Meeting of Entomological Society of America, Knoxville, TN.
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the semantics of the RDF query language SPARQL.
CIDOC Congress, Dresden, Germany
2014-09-05: International Terminology Working Group: full version (http://vladimiralexiev.github.io/pres/20140905-CIDOC-GVP/index.html)
2014-09-09: Getty special session: short version (http://VladimirAlexiev.github.io/pres/20140905-CIDOC-GVP/GVP-LOD-CIDOC-short.pdf)
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
Presentation of the tool LODeX (http://www.dbgroup.unimore.it/lodex2/testCluster) at the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, Singapore, December 6-8, 2015
Similar to Parallel and incremental materialisation of RDF/DATALOG in RDFOX (20)
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...LDBC council
Juan Sequeda, Co-founder of Capsenta, gave an interesting talk on how can we integrate data using graphs and semantics (semantic data virtualization). As Mr. Sequeda said, the idea is to integrate data without needing to move it around. Juan started off his presentation talking about the huge gap that exists between the IT departments, guardians of the data and the business development departments, trying to extract insights about the data.
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...LDBC council
During the 8th TUC Meeting held at Oracle’s facilities in Redwood City, California, Zhe Wu, Software Architect at Oracle Spatial and Graph, explained how is his team trying to bridge RDF Graph and Property Data Models.
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics EngineLDBC council
Yinlong started his talk with an introduction of his new position at Huawei, what is the company doing and more specifically how is it involved with Big Data Research and graphs. He also explained that his research center is currently working on Big Data Analytics and Management from 4 sides: Natural Language Processing, Graph analyrics, Machine Learning and Deep Learning.
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...LDBC council
George Fletcher, Associate Professor at the Eindhoven University of Technology, presented gMark, an open-source framework for generating synthetic graph instances and workloads. The main focus of gMark has been to tailor different graph data management scenarios, often driven by query workloads. Such as multi-query optimization, workload-driven graph database physical design or mapping discovery and query rewriting data integration systems.
8th TUC Meeting – Marcus Paradies (SAP) Social Network BenchmarkLDBC council
Marcus Paradies, Software developer at SAP extended the talk Arnau Prats gave about the SNB, in this case about the Intelligence workload. In contrast with the 17+4 queries the Interactive workload has, the Business Intelligence (BI) workload consists on 24 queries that can be seen as OLAP-style against the OLTP-style of the Interactive one. The BI focuses on analytic queries and they touch the whole graph.
Sergey Edunov, Software Engineer at Facebook gave a great talk on how and why his company generating large-scale social graphs. The underlying reasons to start such an ambitious project are capacity planning to make sure that their system will be able to handle a graph that keeps growing year after year and fair evaluation of their system against the ones being implemented by other companies.
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...LDBC council
Weining Qian, professor at East China Normal University presented his talk on Statistical Characteristics of Real-Life Knowledge graphs during the 8th TUC Meeting held at Oracle’s facilities in Redwood City, California.
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...LDBC council
Lijun Chang, DECRA Fellow at the University of New South Wales talked about how to make subgraph matching more efficient thanks to postponing Cartesian products.
Jerven Bolleman, Lead Software Developer at Swiss-Prot Group, explained why are they offering a free SPARQL and RDF endpoint for the world to use and why is it hard to optimize it.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
20 Comprehensive Checklist of Designing and Developing a Website
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
1. PARALLEL AND INCREMENTAL MATERIALISATION OF
RDF/DATALOG IN RDFOX
Boris Motik
University of Oxford
March 20, 2015
2. TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
3. Introduction
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
4. Introduction
RDFOX SUMMARY
RDFox: a new RDF store and reasoner
http://www.cs.ox.ac.uk/isg/tools/RDFox/
Features:
RAM-based storage of RDF data
Currently centralised, but a distributed system is in the works
Datalog reasoning via materialisation
Can handle arbitrary (recursive) datalog rules, not just OWL 2 RL
Very effective parallelisation
Efficient reasoning with owl:sameAs via rewriting
Known and widely-used technique, but correctness not trivial
The Backward-Forward (B/F) incremental maintenance algorithm
Considerably improves on DRed
Compatible with rewriting of owl:sameAs
SPARQL query answering
Most of SPARQL 1.0 and some of SPARQL 1.1
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
5. Parallel Materialisation of Datalog
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
6. Parallel Materialisation of Datalog
MAIN CHALLENGES
1 Assign workload to threads evenly
Rules are generally not independent due to recursion
Static assignment of rule instances can be affected due to data skew
⇒ Dynamic assignment with low overhead needed
2 Efficiently interleave . . .
. . . querying (during evaluation of rule bodies)
. . . updates (during updates of derived facts)
3 Provide indexes for efficient rule body evaluation
Crucial for elimination of duplicate triples ⇒ ensures termination
Usually sorted (and clustered) to allow for merge joins
Hash indexes can also be used
Individual (i.e., not bulk) index updates are inefficient
B. Motik, Y. Nenov, R. Piro, I. Horrocks, and D. Olteanu. Parallel Materialisation of Datalog Programs in
Centralised, Main-Memory RDF Systems. AAAI 2014, pages 129–137
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 2/13
7. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery:
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
8. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
⇒ R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
9. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
⇒ R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
10. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
⇒ R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(b)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
11. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
⇒ R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(b)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
12. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
⇒ A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(a,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
13. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
⇒ R(c,f)
R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(c)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
14. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
⇒ R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(c)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
15. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
⇒ A(b)
A(c)
A(d)
A(e)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(b,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
16. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
⇒ A(c)
A(d)
A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(c,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
17. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
⇒ A(d)
A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(d,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
18. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
⇒ A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(e,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
19. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
⇒ A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(f,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
20. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
A(f)
⇒ A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(g,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
21. Parallel Materialisation of Datalog
PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of facts
ensures in practice that threads are equally loaded
Requires no thread synchronisation
⇒ We partition rule instances dynamically and with little overhead
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 4/13
22. Parallel Materialisation of Datalog
SOLUTION PART II: INDEXING RDF DATA IN MAIN MEMORY
The critically algorithm depends on:
matching atoms t1, t2, t3 with ti a constant or variable
continuous concurrent updates
Our RDF storage data structure:
Hash-based indexes ⇒ naturally parallel data structure
‘Mostly’ lock-free: at least one thread makes progress at most of the time
compare: if a thread acquire a lock and dies, other threads are blocked
main benefit: performance is less susceptible to scheduling decisions
Main technical challenge: reduce thread interference
When A writes to a location cached by B, the cache of B is invalidated
Our updates ensure that threads (typically) write to different locations
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 5/13
23. Parallel Materialisation of Datalog
EVALUATION I: PARALLELISATION SPEEDUP
RDFox: an RDF store developed at Oxford University
http://www.cs.ox.ac.uk/isg/tools/RDFox/
8 16 24 32
2
4
6
8
10
12
14
16
18
20
ClarosL
ClarosLE
DBpediaL
DBpediaLE
LUBML 01K
LUBMU 01K
8 16 24 32
2
4
6
8
10
12
14
16
18
20
UOBML 01K
UOBMU 010
LUBMLE 01K
LUBML 05K
LUBMLE 05K
LUBMU 05K
Small concurrency overhead; parallelisation pays off already with two threads
Speedup continues to increase after we exhaust all physical cores
⇒ hyperthreading and parallelism can compensate CPU cache misses
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 6/13
24. Parallel Materialisation of Datalog
EVALUATION II: ORACLE’S SPARC T5
Machine specification:
4 TB of RAM
128 physical cores that support 1024 virtual cores via hyperthreading
Name Triples Time (s) Speedup Inference
Initial Resulting Threads rate
1 1024 (triples/s)
ClarosLE 18.8 M 533.7 M 7484 74 101 7.2 M
LUBML-1K 133.6 M 182.4 M 511 10 51 4.9 M
LUBMLE -1K 133.6 M 332.6 M 5267 37 142 5.4 M
LUBML-140k: 8 G triples, materialised to 10.9 G triples
20 threads: 2000 s, inference rate 1.45 M triples/s
128 threads: 599 s, inference rate 4.84 M triples/s
Materialised dataset used about half of RAM (2 TB)
Thanks to Jay Banerjee, Brian Whitney, Hassan Chafi, and Zhe Wu @ Oracle
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
25. Handling owl:sameAs via Rewriting
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
26. Handling owl:sameAs via Rewriting
HANDLING owl:sameAs VIA REWRITING
Rewriting: replace all equal constants with one representative
Well known approach; used in graphDB, Oracle, WebPIE, . . .
Much more efficient than direct materialisation
Open question I: Effective parallelisation
Lock-free maintenance of representatives
Care needed to ensure correctness and nonrepetition of derivations
Open question II: Query evaluation
Could expand rewritten data before query evaluation, but that is inefficient
Better: evaluate queries on rewritten data and expand the answer
Such a straightforward approach is incorrect:
Result cardinalities might be wrong
The presence of FILTER and BIND can make results incorrect
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Handling owl:sameAs via Rewriting. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 8/13
27. Handling owl:sameAs via Rewriting
EVALUATION OF REWRITING
UOBM 279 4 2.2M
AX 36M 1.2 332M 16,152M
REW 9.4M 9.7M 0.4 33.8M 4,256M 686
factor 3.2x 3.2x 9.9x 3.8x
Table 3: Materialisation Times with Axiomatisation and Rewriting
Test Claros DBpedia OpenCyc
Threads AX REW AX
REW
AX REW AX
REW
AX REW AX
REW
sec spd sec spd sec spd sec spd sec spd sec spd
1 2042.9 1.0 65.8 1.0 31.1 219.8 1.0 31.7 1.0 6.9 2093.7 1.0 119.9 1.0 17.5
2 969.7 2.1 35.2 1.9 27.6 114.6 1.9 17.6 1.8 6.5 1326.5 1.6 78.3 1.5 16.9
4 462.0 4.4 18.1 3.6 25.5 66.3 3.3 10.7 3.0 6.2 692.6 3.0 40.5 3.0 17.1
8 237.2 8.6 9.9 6.7 24.1 36.1 6.1 5.2 6.0 6.9 351.3 6.0 23.0 5.2 15.2
12 184.9 11.1 7.9 8.3 23.3 31.9 6.9 4.1 7.7 7.7 291.8 7.2 56.2 2.1 5.5
16 153.4 13.3 6.9 9.6 22.3 27.5 8.0 3.6 8.8 7.7 254.0 8.2 52.3 2.3 4.9
Test UniProt UOBM
Threads AX REW AX
REW
AX REW AX
REW
sec spd sec spd sec spd sec spd
1 370.6 1.0 143.4 1.0 2.6 2696.7 1.0 1152.7 1.0 2.3
2 232.3 1.6 86.7 1.7 2.7 1524.6 1.8 599.6 1.9 2.5
4 129.2 2.9 46.5 3.1 2.8 813.3 3.3 318.3 3.6 2.6
8 74.7 5.0 25.1 5.7 3.0 439.9 6.1 177.7 6.5 2.5
12 61.0 6.1 19.9 7.2 3.1 348.9 7.7 152.7 7.6 2.3
16 61.9 6.0 17.1 8.4 3.6 314.4 8.6 137.9 8.4 2.3
mode takes less than ten seconds, these results are difficult
o measure and are susceptible to skew.
Our results confirm that rewriting can significantly reduce
materialisation times. RDFox was consistently faster in the
REW mode than in the AX mode even on UniProt, where the
eduction in the number of triples is negligible. This is due to
he reduction in the number of derivations, mainly involving
ules (⇡1)–(⇡5), which is still significant on UniProt. In all
cases, the speedup of rewriting is typically much larger than
connected by the :hasSameHomeTownWith property. Thi
property is also symmetric and transitive so, for each pai
of connected resources, the number of times each triple i
derived by the transitivity rule is quadratic in the number o
connected resources. This leads to a large number of dupli
cate derivations that do not involve equality. Thus, althoug
it is helpful, rewriting does not reduce the number of deriva
tion in the same way as, for example, on Claros, which ex
plains the relatively modest speedup of REW over AX.
Speedup is bigger than the reduction in the number of triples
The number of derivations is the determining factor
Contrary to popular belief!
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
28. Incremental Materialisation Maintenance
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
29. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
30. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
31. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
32. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
1 Delete all facts with a derivation from A(a)
C0(x)D ← A(x)D
C0(x)D ← B(x)D
Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n
C0(x)D ← Cn(x)D
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
33. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
1 Delete all facts with a derivation from A(a)
C0(x)D ← A(x)D
C0(x)D ← B(x)D
Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n
C0(x)D ← Cn(x)D
2 Rederive facts that have an alternative derivation
C0(x) ← C0(x)D ∧ A(x)
C0(x) ← C0(x)D ∧ B(x)
Ci (x) ← Ci (x)D ∧ Ci−1(x) for 1 ≤ i ≤ n
C0(x) ← C0(x)D ∧ Cn(x)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
34. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
35. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
36. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ?
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
37. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
38. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
39. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a) ?
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
40. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
41. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
6 So C0(a) is derivable too
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
42. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
6 So C0(a) is derivable too
7 Stop propagation and terminate
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
44. Conclusion
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
45. Conclusion
RESEARCH DIRECTIONS
Add a data/query/reasoning distribution layer:
Initial results very promising
Implementation in progress
Future work:
Investigate potential for data compression
Improve join cardinality estimation
Improve query planning
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 13/13