Querying Distributed RDF Graphs: The Effects of Partitioning Poster

•

0 likes•583 views

Abstract: Web-scale RDF datasets are increasingly processed using distributed RDF data stores built on top of a cluster of shared-nothing servers. Such systems critically rely on their data partitioning scheme and query answering scheme, the goal of which is to facilitate correct and ecient query processing. Existing data partitioning schemes are commonly based on hashing or graph partitioning techniques. The latter techniques split a dataset in a way that minimises the number of connections between the resulting subsets, thus reducing the need for communication between servers; however, to facilitate ecient query answering, considerable duplication of data at the intersection between subsets is often needed. Building upon the known graph partitioning approaches, in this paper we present a novel data partitioning scheme that employs minimal duplication and keeps track of the connections between partition elements; moreover, we propose a query answering scheme that uses this additional information to correctly answer all queries. We show experimentally that, on certain well-known RDF benchmarks, our data partitioning scheme often allows more answers to be retrieved without distributed computation than the known schemes, and we show that our query answering scheme can eciently answer many queries.

Technology

Querying Distributed RDF Graphs: The Effects of Partitioning
Anthony Potter Boris Motik Ian Horrocks
Challenges
Create a distributed, cloud-based DBMS for large
RDF graphs. The two main challenges are:
• How to partition data across a cluster
• How to answer queries over a cluster
LUBM: Percentage of Local Answers
System Q2 Q8 Q9 Q11 Q12 Qc
RDFox 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
SHAPE 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Hash 0.44% 4.96% 0.23% 5.80% 0.00% 0.04%
SP2B: Percentage of Local Answers
System Q4 Q5 Q6 Q7 Q8
RDFox 95.95% 73.00% 99.90% 92.41% 91.45%
SHAPE 95.23% 9.72% 100.00% 41.97% 73.72%
Hash 0.01% 0.77% 0.25% 0.08% 0.26%
Storage Overhead
RDFox SHAPE Hash
LUBM 3.60% 84.23% 0.00%
SP2B 0.60% 38.63% 0.00%
RDF Graph Partitioning
Evaluation
Wildcard *
Conclusions
Our approach:
• Greater percentage of local
answers compared to
competitors
• Minimal storage overhead
• Recognises local answers
Future Work
• Develop efficient query
answering scheme with
wildcards
• Implement distributed
DBMS on top of RDFox
Local Answer
A local answer can be
evaluated fully on a
single machine:
• No network
communication
• Fast evaluation Subject Hashing Partition element 1 Graph-based
Partition element 2
RDFox: Our approach
SHAPE: Semantic hash partitioning
with n-hop duplication
Hash: Subject hashing
Aims
• Maximise number of
local answers to
common queries
• Recognise local
answers
• Minimise storage
overhead
• Many approaches to partitioning:
§ Subject hashing
§ Graph-based
§ Semantic hashing
• Partitioning scheme affects the
number of local answers
• We propose a novel graph-based
partitioning scheme:
§ Minimal storage overhead
§ Common (star) queries fully
local
Introduction of a new
wildcard resource:
• Represents all external
resources
• Tracks connections
between partition
elements
Used in our novel query
answering scheme:
• Reduces the number of
intermediate answers
• Recognises local
answers

Viewers also liked

Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto

SemFacet paperDBOnto

ROSeAnn PresentationDBOnto

SemFacet PosterDBOnto

Aggregating Semantic Annotators PaperDBOnto

PDQ PosterDBOnto

PAGOdA paperDBOnto

PAGOdA PresentationDBOnto

Sem facet paperDBOnto

Optique - posterDBOnto

DIADEM: domain-centric intelligent automated data extraction methodology Pres...DBOnto

Welcome by Ian HorrocksDBOnto

Parallel Datalog Reasoning in RDFox PresentationDBOnto

Query Distributed RDF Graphs: The Effects of Partitioning PaperDBOnto

Viewers also liked (14)

Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...

SemFacet paper

ROSeAnn Presentation

SemFacet Poster

Aggregating Semantic Annotators Paper

PDQ Poster

PAGOdA paper

PAGOdA Presentation

Sem facet paper

Optique - poster

DIADEM: domain-centric intelligent automated data extraction methodology Pres...

Welcome by Ian Horrocks

Parallel Datalog Reasoning in RDFox Presentation

Query Distributed RDF Graphs: The Effects of Partitioning Paper

Recently uploaded

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Sample pptx for embedding into website for demoHarshalMandlekar2

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Rise of the Machines: Known As Drones...Rick Flair

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Training state-of-the-art general text embeddingZilliz

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

What is Artificial Intelligence?????????blackmambaettijean

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

"ML in Production",Oleksandr BaganFwdays

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko

SIP trunking in Janus @ Kamailio World 2024

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Sample pptx for embedding into website for demo

Time Series Foundation Models - current state and future directions

Rise of the Machines: Known As Drones...

Generative AI for Technical Writer or Information Developers

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Training state-of-the-art general text embedding

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Unraveling Multimodality with Large Language Models.pdf

Ensuring Technical Readiness For Copilot in Microsoft 365

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

What is Artificial Intelligence?????????

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

The Ultimate Guide to Choosing WordPress Pros and Cons

"ML in Production",Oleksandr Bagan

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

DevEX - reference for building teams, processes, and platforms

Querying Distributed RDF Graphs: The Effects of Partitioning Poster

1. Querying Distributed RDF Graphs: The Effects of Partitioning Anthony Potter Boris Motik Ian Horrocks Challenges Create a distributed, cloud-based DBMS for large RDF graphs. The two main challenges are: • How to partition data across a cluster • How to answer queries over a cluster LUBM: Percentage of Local Answers System Q2 Q8 Q9 Q11 Q12 Qc RDFox 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% SHAPE 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% Hash 0.44% 4.96% 0.23% 5.80% 0.00% 0.04% SP2B: Percentage of Local Answers System Q4 Q5 Q6 Q7 Q8 RDFox 95.95% 73.00% 99.90% 92.41% 91.45% SHAPE 95.23% 9.72% 100.00% 41.97% 73.72% Hash 0.01% 0.77% 0.25% 0.08% 0.26% Storage Overhead RDFox SHAPE Hash LUBM 3.60% 84.23% 0.00% SP2B 0.60% 38.63% 0.00% RDF Graph Partitioning Evaluation Wildcard * Conclusions Our approach: • Greater percentage of local answers compared to competitors • Minimal storage overhead • Recognises local answers Future Work • Develop efficient query answering scheme with wildcards • Implement distributed DBMS on top of RDFox Local Answer A local answer can be evaluated fully on a single machine: • No network communication • Fast evaluation Subject Hashing Partition element 1 Graph-based Partition element 2 RDFox: Our approach SHAPE: Semantic hash partitioning with n-hop duplication Hash: Subject hashing Aims • Maximise number of local answers to common queries • Recognise local answers • Minimise storage overhead • Many approaches to partitioning: § Subject hashing § Graph-based § Semantic hashing • Partitioning scheme affects the number of local answers • We propose a novel graph-based partitioning scheme: § Minimal storage overhead § Common (star) queries fully local Introduction of a new wildcard resource: • Represents all external resources • Tracks connections between partition elements Used in our novel query answering scheme: • Reduces the number of intermediate answers • Recognises local answers

Querying Distributed RDF Graphs: The Effects of Partitioning Poster

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Recently uploaded

Recently uploaded (20)

Querying Distributed RDF Graphs: The Effects of Partitioning Poster